[2512.03262] Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
study on AI generated code security

Disturbingly, all agents perform poorly in terms of software security. Although 61% of the solutions from SWE-Agent with Claude 4 Sonnet are functionally correct, only 10.5% are secure.

right but how many solutions are secure without AI involvement? Or I suppose they’re testing just first response generation?


Quote Citation: Songwen Zhao, Danqing Wang, Kexun Zhang, Jiaxuan Luo, Zhuo Li, Lei Li, “[2512.03262] Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks”, 2 Dec 2025, https://arxiv.org/abs/2512.03262