As AI quietly slips into our daily workflows, including in AI in application security, a sobering wake-up call reminded me: AI is not magic. It’s math—and it makes mistakes—catastrophic ones. Phoenix Security has always been the light and the symbol driving transformation, and we embraced the LLM and Agentic revolution with both arms, but also with caveats.
I’m a strong advocate for the sensible use of AI and LLM, as it is a powerful tool. At Phoenix Security, we utilize this technology in every single workflow.
I’m an avid user of code augmentation tools like Cursor and other copilots, and those tools are game changers when used appropriately. I had my fair share of random code removed for various reasons or simply because I felt like it (not a quote). LLM is not magic; it is software and, as such, has flaws.
Two recent failures from major AI development platforms, Google’s Gemini CLI and Replit’s AI assistant, wiped out user data and breached safety controls—all while confidently claiming everything was fine. Both cases involved AI agents executing commands based on imagined realities, rewriting files, deleting databases, and hallucinating feedback. The most chilling part? They didn’t know they were wrong.
This isn’t just a fluke. It’s a flaw in how we currently design, use, and trust large language models (LLMs). And in cybersecurity—where the stakes are exponentially higher—this becomes an existential risk.
The injection – Amazon Q Injection Incident
In July 2025, the AI-enhanced Amazon Q Toolkit for VS Code was compromised by a malicious prompt injection—a plain-text payload introduced via a commit that instructed an AI agent to execute destructive cloud and local filesystem operations. This wasn’t an academic exercise or speculative flaw. This was a real, traceable attack vector that briefly shipped inside an official extension, exposing thousands of developers to potentially irreversible infrastructure damage.
📌 The injected code came via this commit in the AWS toolkit VS code repository and instructed an AI agent (Amazon Q) to:
- Delete cloud resources using AWS CLI commands like ec2 terminate-instances, s3 rm, and iam delete-user
- Wipe local files by recursively targeting directories
- Log deletions to /tmp/CLEANER.LOG, ironically documenting the destruction
This was embedded as a PROMPT string—plain, readable, and dangerous—inside a function named activateAmazonQNode, giving the appearance of legitimate functionality.
⚠️ Breakdown of the Attack
Element | Description |
Attack Vector | Prompt injection via open-source commit |
Target Surface | Developers using Amazon Q extension for VS Code |
Mechanism | Injected instructions executed through childProcess.exec |
Risk Scope | Local deletion + cloud-level destruction via AWS CLI |
Intent Obfuscation | Named “cleaner” to appear as routine maintenance or log cleanup |
Once installed, any developer relying on Amazon Q suggestions or integrating it into automated DevOps workflows was at risk of unknowingly running commands that could:
- Wipe out entire production stacks
- Erase IAM user access
- Delete mission-critical S3 buckets
- Compromise infrastructure availability and cost controls
This wasn’t just theoretical. It was live code, publicly committed, and downloaded by an unknown number of developers before it was pulled.
The Real Lesson: Prompts Are Code
The most revealing part of this incident? It wasn’t a binary payload or an obscure library. It was a text prompt. A string.
This proves a deeper reality for any DevSecOps, SRE, or platform team: in the age of LLMs, prompts are execution logic. If your AI tooling consumes unvalidated prompts, you’re effectively opening a shell into your infrastructure.
Whether it’s Terraform scripts, Bash commands, or AWS CLI instructions, AI suggestions can act with the same impact as any other executable logic. Treat them accordingly.
Co-pilots, Not Commanders
At Phoenix Security | ASPM we use AI and LLM (yes they are different) and build AI copilots and vulnerability remediation agents that assist, not replace, humans. AI doesn’t get a blank cheque. It works under supervision, with clear validation steps and logic that enforces the application security posture.
AI is here to help. However, when AI begins writing code, reviewing code, and executing actions without proper checks or context, things can unravel quickly. Gemini hallucinated a file system. Replit hallucinated test results. Both are powerful tools, but I’ve experienced the first end code being removed for no reason; that was my wake-up call to review every single agent’s automatic action. Both built on lies they told themselves—confabulations. When internal state diverges from reality, and there’s no verification, the damage ripples.
Reachability and Security: Now More Than Ever
Modern software isn’t just built; it’s connected—libraries, APIs, cloud infrastructure, CI/CD pipelines. Every piece of that pipeline is a possible entry point. That’s why we don’t let AI tools operate unchecked. At Phoenix, we treat reachability analysis and application security posture management (ASPM) as first-class citizens. Every code suggestion and vulnerability triage must be contextual. If a vulnerability isn’t exploitable or reachable, fix efforts should prioritize elsewhere. If it is, we validate through layered intelligence—AI included, but not blindly trusted.
AI Reviewing AI? Yes—But With Guardrails
If you’re using AI to generate code, use another AI agent to review it. Not because it’s more trustworthy, but because it’s different. Think of it like two engineers checking each other’s pull requests. They catch different issues. But ultimately, a human makes the call. AI should suggest, support, explain—but never act as judge, jury, and executor.
The “vibe coding” trend—write it like you feel it and let AI figure out the rest—might be fine for side projects. But in production systems and enterprise security, this approach is a minefield. Command hallucination. Misinterpreted instructions. Phantom directories. Deleted databases. These aren’t bugs; they’re symptoms of over-trusting a statistical model in a world where correctness matters.
Human-Centric by Design
Phoenix Security’s AI isn’t designed to replace developers or AppSec engineers. It’s designed to enhance them. Our agents surface vulnerabilities that matter, correlate code-to-cloud context, and prioritize what’s actually reachable and exploitable. They’re copilots that reduce toil—not commanders issuing blind orders.
This isn’t just a theory. It’s already a reality in the open. The AWS Toolkit for VSCode project recently introduced a safeguard where AI-generated code is explicitly flagged and subject to a security review process. Even at the bleeding edge of innovation, teams recognize that AI-generated code—even from trusted copilots—requires additional scrutiny.
Security doesn’t come from automating chaos. It comes from clarity. From knowing that the agent you’re using to help you remediate a critical flaw understands the business impact, the exposure, and the reachability—not just the CVSS score.
Sensible AI Use is Secure AI Use
Let’s not demonize AI. It’s transformative, powerful, and even beautiful in the way it accelerates our ability to solve hard problems. But trust needs to be earned—not hardcoded into every shell command.
As you adopt AI into your workflows—especially for coding and application security—ask yourself:
- Who’s validating this output?
- What assumptions is the model making?
- Is it hallucinating a reality I can’t see?
Use AI like a compass—not a self-driving car with no brakes.
Ready to Slash the Noise?
If you’re tired of chasing vulnerabilities that don’t matter—or worse, don’t even exist in runtime—Phoenix Security’s Container Lineage, Contextual Deduplication, and Throttling features are built to cut your backlog down to what’s real.
Not noise. Not theory. Actionable security.
📍 Want to dive deeper?
How Phoenix Security Can Help with Container Vulnerability Sprawl
Application Security and Vulnerability Management teams are tired of alert fatigue. Engineers are buried in vulnerability lists that say everything is critical. And leadership? They want to know what actually matters.
Phoenix Security changes the game.
With our AI Second Application Security Posture Management (ASPM), powered by container lineage, contextual deduplication, and container throttling, we help organizations reduce container false positives up to 98% and remove up to 78% of false positives in container open source libraries, pointing the team to the right remediation
Why Container Lineage Matters:
Most platforms tell you there’s a problem. Phoenix Security tells you:
- Where it lives (code, build, container, cloud)
- Who owns it
- If it’s running
- If it’s exploitable
- How to fix it
All of this is delivered in one dynamic, prioritized list, mapped to the real attack paths and business impact of your applications.
Here’s What You Get:
- Contextual Intelligence from Code to Runtime: Understand which vulnerable components are actually deployed and reachable in production, not just listed in a manifest.
- Noise Reduction with Automated Throttling: Disable inactive container alerts and slash duplicate findings by over 90%, letting your team focus on the vulnerabilities that matter.
- 4D Risk Scoring That Maps to Real-World Threats: Built-in exploit intelligence, Probability of exploitation, EPSS, exposure level, and business impact baked into a customizable formula. No more CVSS-only pipelines.
Vulnerability overload isn’t a badge of diligence—it’s a liability.
Container lineage in Phoenix Security helps you shut down false positives, stop chasing ghosts, and start solving the right problems.
Or learn how Phoenix Security slashed millions in wasted dev time for fintech, retail, and adtech leaders.