[2512.03262] Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
study on AI generated code security
Disturbingly, all agents perform poorly in terms of software security. Although 61% of the solutions from SWE-Agent with Claude 4 Sonnet are functionally correct, only 10.5% are secure.
right but how many solutions are secure without AI involvement? Or I suppose they’re testing just first response generation?
Quote Citation: Songwen Zhao, Danqing Wang, Kexun Zhang, Jiaxuan Luo, Zhuo Li, Lei Li, “[2512.03262] Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks”, 2 Dec 2025, https://arxiv.org/abs/2512.03262
