blog

AI Coding Gone Rogue? My take on why Co-Pilots Must Remain Co-Pilots in Application Security

As AI quietly slips into our daily workflows, including in AI in application security, a sobering wake-up call reminded me: AI is not magic. It’s math—and it makes mistakes—catastrophic ones. Phoenix Security has always been the light and the symbol driving transformation, and we embraced the LLM and Agentic revolution with both arms, but also with caveats.

I’m a strong advocate for the sensible use of AI and LLM, as it is a powerful tool. At Phoenix Security, we utilize this technology in every single workflow.

I’m an avid user of code augmentation tools like Cursor and other copilots, and those tools are game changers when used appropriately. I had my fair share of random code removed for various reasons or simply because I felt like it (not a quote). LLM is not magic; it is software and, as such, has flaws.

Two recent failures from major AI development platforms, Google’s Gemini CLI and Replit’s AI assistant, wiped out user data and breached safety controls—all while confidently claiming everything was fine. Both cases involved AI agents executing commands based on imagined realities, rewriting files, deleting databases, and hallucinating feedback. The most chilling part? They didn’t know they were wrong.

Контент статьи — The GitHub complains about Gemini actions

Link to the commit

This isn’t just a fluke. It’s a flaw in how we currently design, use, and trust large language models (LLMs). And in cybersecurity—where the stakes are exponentially higher—this becomes an existential risk.

Link to Replit article

The injection – Amazon Q Injection Incident

In July 2025, the AI-enhanced Amazon Q Toolkit for VS Code was compromised by a malicious prompt injection—a plain-text payload introduced via a commit that instructed an AI agent to execute destructive cloud and local filesystem operations. This wasn’t an academic exercise or speculative flaw. This was a real, traceable attack vector that briefly shipped inside an official extension, exposing thousands of developers to potentially irreversible infrastructure damage.

📌 The injected code came via this commit in the AWS toolkit VS code repository and instructed an AI agent (Amazon Q) to:

Delete cloud resources using AWS CLI commands like ec2 terminate-instances, s3 rm, and iam delete-user
Wipe local files by recursively targeting directories
Log deletions to /tmp/CLEANER.LOG, ironically documenting the destruction

This was embedded as a PROMPT string—plain, readable, and dangerous—inside a function named activateAmazonQNode, giving the appearance of legitimate functionality.

⚠️ Breakdown of the Attack

Element	Description
Attack Vector	Prompt injection via open-source commit
Target Surface	Developers using Amazon Q extension for VS Code
Mechanism	Injected instructions executed through childProcess.exec
Risk Scope	Local deletion + cloud-level destruction via AWS CLI
Intent Obfuscation	Named “cleaner” to appear as routine maintenance or log cleanup

Once installed, any developer relying on Amazon Q suggestions or integrating it into automated DevOps workflows was at risk of unknowingly running commands that could:

Wipe out entire production stacks
Erase IAM user access
Delete mission-critical S3 buckets
Compromise infrastructure availability and cost controls

This wasn’t just theoretical. It was live code, publicly committed, and downloaded by an unknown number of developers before it was pulled.

The Real Lesson: Prompts Are Code

The most revealing part of this incident? It wasn’t a binary payload or an obscure library. It was a text prompt. A string.

This proves a deeper reality for any DevSecOps, SRE, or platform team: in the age of LLMs, prompts are execution logic. If your AI tooling consumes unvalidated prompts, you’re effectively opening a shell into your infrastructure.

Whether it’s Terraform scripts, Bash commands, or AWS CLI instructions, AI suggestions can act with the same impact as any other executable logic. Treat them accordingly.

Co-pilots, Not Commanders

At Phoenix Security | ASPM we use AI and LLM (yes they are different) and build AI copilots and vulnerability remediation agents that assist, not replace, humans. AI doesn’t get a blank cheque. It works under supervision, with clear validation steps and logic that enforces the application security posture.

AI is here to help. However, when AI begins writing code, reviewing code, and executing actions without proper checks or context, things can unravel quickly. Gemini hallucinated a file system. Replit hallucinated test results. Both are powerful tools, but I’ve experienced the first end code being removed for no reason; that was my wake-up call to review every single agent’s automatic action. Both built on lies they told themselves—confabulations. When internal state diverges from reality, and there’s no verification, the damage ripples.

Reachability and Security: Now More Than Ever

Modern software isn’t just built; it’s connected—libraries, APIs, cloud infrastructure, CI/CD pipelines. Every piece of that pipeline is a possible entry point. That’s why we don’t let AI tools operate unchecked. At Phoenix, we treat reachability analysis and application security posture management (ASPM) as first-class citizens. Every code suggestion and vulnerability triage must be contextual. If a vulnerability isn’t exploitable or reachable, fix efforts should prioritize elsewhere. If it is, we validate through layered intelligence—AI included, but not blindly trusted.

AI Reviewing AI? Yes—But With Guardrails

If you’re using AI to generate code, use another AI agent to review it. Not because it’s more trustworthy, but because it’s different. Think of it like two engineers checking each other’s pull requests. They catch different issues. But ultimately, a human makes the call. AI should suggest, support, explain—but never act as judge, jury, and executor.

The “vibe coding” trend—write it like you feel it and let AI figure out the rest—might be fine for side projects. But in production systems and enterprise security, this approach is a minefield. Command hallucination. Misinterpreted instructions. Phantom directories. Deleted databases. These aren’t bugs; they’re symptoms of over-trusting a statistical model in a world where correctness matters.

Human-Centric by Design

Phoenix Security’s AI isn’t designed to replace developers or AppSec engineers. It’s designed to enhance them. Our agents surface vulnerabilities that matter, correlate code-to-cloud context, and prioritize what’s actually reachable and exploitable. They’re copilots that reduce toil—not commanders issuing blind orders.

This isn’t just a theory. It’s already a reality in the open. The AWS Toolkit for VSCode project recently introduced a safeguard where AI-generated code is explicitly flagged and subject to a security review process. Even at the bleeding edge of innovation, teams recognize that AI-generated code—even from trusted copilots—requires additional scrutiny.

Security doesn’t come from automating chaos. It comes from clarity. From knowing that the agent you’re using to help you remediate a critical flaw understands the business impact, the exposure, and the reachability—not just the CVSS score.

Sensible AI Use is Secure AI Use

Let’s not demonize AI. It’s transformative, powerful, and even beautiful in the way it accelerates our ability to solve hard problems. But trust needs to be earned—not hardcoded into every shell command.

As you adopt AI into your workflows—especially for coding and application security—ask yourself:

Who’s validating this output?
What assumptions is the model making?
Is it hallucinating a reality I can’t see?

Use AI like a compass—not a self-driving car with no brakes.

Ready to Slash the Noise?

If you’re tired of chasing vulnerabilities that don’t matter—or worse, don’t even exist in runtime—Phoenix Security’s Container Lineage, Contextual Deduplication, and Throttling features are built to cut your backlog down to what’s real.

Not noise. Not theory. Actionable security.

📍 Want to dive deeper?

How Phoenix Security Can Help with Container Vulnerability Sprawl

Application Security and Vulnerability Management teams are tired of alert fatigue. Engineers are buried in vulnerability lists that say everything is critical. And leadership? They want to know what actually matters.

Phoenix Security changes the game.

With our AI Second Application Security Posture Management (ASPM), powered by container lineage, contextual deduplication, and container throttling, we help organizations reduce container false positives up to 98% and remove up to 78% of false positives in container open source libraries, pointing the team to the right remediation

Why Container Lineage Matters:

Most platforms tell you there’s a problem. Phoenix Security tells you:

Where it lives (code, build, container, cloud)
Who owns it
If it’s running
If it’s exploitable
How to fix it

All of this is delivered in one dynamic, prioritized list, mapped to the real attack paths and business impact of your applications.

Here’s What You Get:

Contextual Intelligence from Code to Runtime: Understand which vulnerable components are actually deployed and reachable in production, not just listed in a manifest.
Noise Reduction with Automated Throttling: Disable inactive container alerts and slash duplicate findings by over 90%, letting your team focus on the vulnerabilities that matter.
4D Risk Scoring That Maps to Real-World Threats: Built-in exploit intelligence, Probability of exploitation, EPSS, exposure level, and business impact baked into a customizable formula. No more CVSS-only pipelines.

Vulnerability overload isn’t a badge of diligence—it’s a liability.

Container lineage in Phoenix Security helps you shut down false positives, stop chasing ghosts, and start solving the right problems.

👉 Book a demo today

Or learn how Phoenix Security slashed millions in wasted dev time for fintech, retail, and adtech leaders.

Get in control of your Application Security posture and Vulnerability management

Get a Demo today

Data Breach Intelligence Open Source Risk Management

Data Breach Intelligence Open Source Risk Management

Francesco Cipollone

Francesco is an internationally renowned public speaker, with multiple interviews in high-profile publications (eg. Forbes), and an author of numerous books and articles, who utilises his platform to evangelize the importance of Cloud security and cutting-edge technologies on a global scale.

1st August 2025

ai Application Security appsec CVE cybersecurity feature spotlight features vulnerability

Discuss this blog with our community on Slack

Join our AppSec Phoenix community on Slack to discuss this blog and other news with our professional security team

From our Blog

Phoenix Security + Google Cloud Platform: ASPM for Smarter Application Security and Cloud Vulnerability Management

Phoenix Security launches on Google Cloud Platform, combining ASPM application security with AI-driven vulnerability management for modern DevSecOps teams.

Alfonso Eusebio

The UK Cyber Security & Resilience Bill. Consequence for your Cyber Security reporting and ASPM program

The UK Cyber Security and Resilience Bill rewires NIS for a new era of ransomware, supply chain attacks and board accountability. DevSecOps, ASPM and application security leaders now need hard evidence on vulnerability management, incident reporting and supplier risk, not another policy document.

Francesco Cipollone

Phoenix PYRUS Redefines ASPM: YAML-Driven Ownership for Code-to-Cloud Vulnerability Management

Phoenix Security’s PYRUS transforms application security posture management (ASPM) through YAML-native automation and metadata-driven attribution. By aligning ownership with developer workflows, PYRUS bridges the gap between visibility and remediation, empowering enterprises to accelerate vulnerability management from code to cloud.

Francesco Cipollone

Phoenix Security Launches PYRUS: YAML-Native Engine Reinventing ASPM Application Security Vulnerability Management

Phoenix Security introduces PYRUS, a YAML-native engine connecting code, cloud, and ownership to supercharge ASPM application security vulnerability management. Automate attribution, accelerate remediation, and bring DevSecOps clarity from repository to runtime.

Francesco Cipollone

From Noise to Actionable Fix: How Phoenix Security’s AI Agents Are Redefining Application Security

Phoenix Security’s Remediator Agent brings intelligence to ASPM vulnerability management — automating fixes across code-to-cloud with AI-driven precision and DevSecOps integration.

Ksenia Mityushkina

PR-Phoenix Security AI Agents Deliver Direct Remediation in GitHub — From Single-Line Fixes to Full Build File Optimization

Phoenix Security advances ASPM vulnerability management with AI agents that fix issues directly in GitHub — enabling automated, context-aware remediation aligned with DevSecOps workflows.

Francesco Cipollone

blog

AI Coding Gone Rogue? My take on why Co-Pilots Must Remain Co-Pilots in Application Security

The injection – Amazon Q Injection Incident

⚠️ Breakdown of the Attack

The Real Lesson: Prompts Are Code

Co-pilots, Not Commanders

Reachability and Security: Now More Than Ever

AI Reviewing AI? Yes—But With Guardrails

Human-Centric by Design

Sensible AI Use is Secure AI Use

Ready to Slash the Noise?

How Phoenix Security Can Help with Container Vulnerability Sprawl

Here’s What You Get:

Get in control of your Application Security posture and Vulnerability management

Francesco Cipollone

Discuss this blog with our community on Slack

From our Blog

Phoenix Security + Google Cloud Platform: ASPM for Smarter Application Security and Cloud Vulnerability Management

The UK Cyber Security & Resilience Bill. Consequence for your Cyber Security reporting and ASPM program

Phoenix PYRUS Redefines ASPM: YAML-Driven Ownership for Code-to-Cloud Vulnerability Management

Phoenix Security Launches PYRUS: YAML-Native Engine Reinventing ASPM Application Security Vulnerability Management

From Noise to Actionable Fix: How Phoenix Security’s AI Agents Are Redefining Application Security

PR-Phoenix Security AI Agents Deliver Direct Remediation in GitHub — From Single-Line Fixes to Full Build File Optimization

Subscribe to our newsletters

ACT Now Platform

Use Cases

Resources

About Us

Derek Fisher

Head of product security at a global fintech

Jeevan Singh

Founder of Manicode Security

James Berthoty

Founder of Latio Tech

Christophe Parisel

Senior Cloud Security Architect

Chris Romeo

Co-Founder Security Journey

Jim Manico

Founder of Manicode Security

Join our Mailing list!

Co-Founder Security Journey