Bad Vibes: Security Flaws Plague Popular AI Coding Agents
The research methodology tasked each agent with building identical applications using the same prompts and technology stack, replicating typical iterative development processes. Approximately 45 vulnerabilities were rated low to medium severity, with the remainder rated high and roughly six rated critical. Claude Code generated four critical vulnerabilities, while Devin and Codex each produced one.
Positive findings showed all tools successfully avoided common flaws that have long plagued human-coded applications. Researchers encountered no exploitable SQL injection or cross-site scripting vulnerabilities across all applications developed, suggesting AI agents have internalized defenses against well-documented attack patterns with generic solutions.
However, the research identified that AI-generated code is particularly prone to business logic vulnerabilities. Human developers bring intuitive understanding of how workflows should operate, while agents lack this common sense and depend mainly on explicit instructions. Authorization flaws, improper access controls, and logic errors appeared consistently across all tested platforms.
Tenzai concludes that agentic security must become a native companion to AI coding assistants, embedded directly inside AI-first development environments rather than bolted on downstream. Organizations using AI coding tools should implement security review processes specifically designed to catch business logic flaws that automated scanning tools may miss.