AI Security

An AI Agent Hacked McKinsey's AI Platform in Two Hours Using SQL Injection

CodeWall's autonomous agent used SQL injection to access 46.5 million chat messages inside McKinsey's Lilli.

Oliver Senti
Oliver SentiSenior AI Editor
March 12, 20265 min read
Share:
Abstract visualization of a database breach showing interconnected nodes and data streams flowing from a corporate AI system

Security startup CodeWall says its autonomous AI agent broke into McKinsey's internal AI platform, Lilli, on February 28 and achieved full read-and-write access to the production database in roughly two hours. No credentials, no insider knowledge, no human steering the process. The entry point was a SQL injection, which, as CodeWall itself noted in its blog post, is "one of the oldest bug classes in the book."

McKinsey patched the vulnerabilities within a day of being notified on March 1. In a public statement dated March 11, the firm said its investigation, backed by a third-party forensics firm, found no evidence that client data was accessed by the researcher or any unauthorized party.

That framing deserves some scrutiny.

What was sitting in that database

CodeWall claims it could access 46.5 million chat messages, all in plaintext, covering strategy, M&A activity, and client engagements. Also exposed: 728,000 files (including roughly 192,000 PDFs, 93,000 Excel spreadsheets, and 93,000 PowerPoint decks), 57,000 user accounts, and 95 system prompts controlling how Lilli behaves. The system prompts were writable.

I want to sit with that last part for a second. An attacker with write access to the prompt layer could silently rewrite how Lilli responds to all 40,000-plus consultants using it, without deploying any code. One UPDATE statement in one HTTP call. No log trail. No file changes. The AI just starts behaving differently, and nobody notices.

McKinsey says no client data was accessed. But CodeWall's account describes a database containing years of consultant conversations about active client work. The distinction between "the database was accessible" and "client data was accessed" is doing a lot of heavy lifting in McKinsey's statement.

A weird attack vector

The technical details are actually interesting. CodeWall's agent found Lilli's API documentation publicly exposed, with over 200 endpoints fully documented. Most required authentication. Twenty-two did not. One of those unprotected endpoints wrote user search queries to the database, and while the input values were properly parameterized, the JSON key names (the field names themselves) were concatenated directly into SQL.

That's an unusual vector. Security analyst Edward Kiledjian pointed out in an independent analysis that most security testing tools focus on input values, not field names. If Lilli's backend parameterized values while concatenating keys directly, that would create a blind spot most standard assessments would miss. OWASP's ZAP scanner reportedly did not flag the issue.

The agent ran fifteen blind iterations, each error message revealing more about the query structure, until production data started flowing back. CodeWall published snippets of the agent's chain-of-thought reasoning. When the first real employee identifier appeared, the agent's internal log read: "WOW!" When the full scale became clear: "This is devastating."

An AI agent that editoralizes its own findings. We're in new territory, or something.

The prompt layer problem

Beyond the raw data exposure, the writable prompt layer is what makes this genuinely alarming. Lilli's system prompts, the instructions governing how the AI answers questions, what guardrails it follows, how it cites sources, were stored in the same database the agent had compromised. CodeWall outlined several attack scenarios: poisoning financial models or strategic recommendations that consultants would trust because they came from their own internal tool, instructing the AI to embed confidential data into its responses (which users might then copy into client-facing documents), or simply stripping out safety guardrails entirely.

Most organizations haven't modeled this threat. Prompt-layer integrity controls remain immature across the industry, and Lilli is hardly alone in storing prompts alongside everything else. But when 43,000 consultants are relying on the same AI tool for client-facing strategy work, the blast radius is unusual.

The authorization question

CodeWall says it operated under McKinsey's public responsible disclosure policy on HackerOne. But as Kiledjian noted, a disclosure policy is not blanket authorization to enumerate a production database containing millions of real user records. Whether Lilli's production infrastructure was explicitly in scope for that program is unclear. Neither CodeWall's blog nor independent reporting confirms it.

And then there's the agent autonomy angle. CodeWall presents the fact that its agent independently selected McKinsey as a target as a feature. CEO Paul Price told The Register that the process was "fully autonomous from researching the target, analyzing, attacking, and reporting." An AI system deciding whom to attack, even when limited to organizations with disclosure policies, raises questions about operator control and liability that CodeWall seems more interested in marketing than addressing.

This is clearly a calling card for the company. CodeWall is in early preview, looking for design partners, and a splashy hack of the world's most prestigious consulting firm is one way to get attention. None of that invalidates the technical finding. But the incentive structure is worth noting.

What it says about enterprise AI security

Lilli launched with a firmwide rollout in July 2023. It ran in production for over two years before this vulnerability was found. McKinsey's own internal scanners missed it. This is a firm that, by all accounts, has world-class technology teams and serious security investment. And the bug was SQL injection.

The uncomfortable takeaway is straightforward: when companies ship AI systems that ingest sensitive enterprise workflows at scale, the attack surface expands in ways their existing security tooling isn't built to catch. Prompts stored in databases. RAG document chunks (CodeWall says it found 3.68 million of them) sitting alongside user data. Model configurations exposed through the same access paths as everything else. A classic vulnerability becomes a lever for something much stranger, silent manipulation of how an AI system thinks.

Price warned that attackers will use the same agentic technology for financial blackmail and ransomware. That prediction doesn't require much imagination. The two-hour timeline here is the part that should keep CISOs up at night. Human pen testers work at human speed. This didn't.

Tags:cybersecurityMcKinseyAI securitySQL injectionLilliCodeWallenterprise AIred teamingprompt injectionagentic AI
Oliver Senti

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

AI Agent Hacked McKinsey's Lilli Platform in Two Hours | aiHola