AI Agents in CI/CD Pipelines Create Major Security Risk

The rush to integrate AI agents into software development workflows has created a significant blind spot in enterprise security posture. Researchers at Aikido Security have identified a new class of vulnerabilities, dubbed "PromptPwnd," that allows attackers to hijack AI-powered GitHub Actions and GitLab CI/CD pipelines through carefully crafted prompt injection attacks. The discovery marks the first confirmed real-world demonstration that AI prompt injection can directly compromise automated development infrastructure.

At least five Fortune 500 companies have been identified as vulnerable, with early indicators suggesting the flaw extends far more widely across the software industry. Google moved quickly to patch its own Gemini CLI repository within four days of Aikido's responsible disclosure, underscoring the severity of the issue.

The Anatomy of a PromptPwnd Attack

The vulnerability pattern follows a deceptively simple chain: untrusted user input flows into AI prompts, the AI agent interprets malicious text as legitimate instructions, and privileged tools execute commands that leak secrets or manipulate repository data.

Modern development teams have increasingly adopted AI integrations for automating routine maintenance tasks. Issue triage, pull request labeling, code summarization, and automated responses now run through AI agents embedded directly in CI/CD pipelines. These agents typically receive content from issues, pull requests, or commit messages and process it alongside high-privilege tokens that grant write access to repositories.

A typical vulnerable workflow configuration might pass issue content directly to an AI model for analysis. The workflow appears benign on the surface, feeding the issue title and body to the model for classification. However, when an attacker submits an issue containing hidden instructions embedded within seemingly normal text, the AI agent cannot reliably distinguish between data it should analyze and commands it should execute.

The affected tools span the major AI coding assistants: Google's Gemini CLI, Anthropic's Claude Code Actions, OpenAI's Codex Actions, and GitHub's own AI Inference system with MCP integration enabled. Each presents slightly different attack surfaces, but all share the fundamental architectural weakness of mixing untrusted input with privileged execution contexts.

How Aikido Compromised Google's Gemini CLI

The Aikido research team demonstrated the vulnerability against Google's own gemini-cli repository, which used the google-github-actions/run-gemini-cli action for automated issue triage. The testing was conducted on a private fork using debug credentials, with no valid Google tokens accessed.

The vulnerable workflow passed issue titles and bodies directly into the model prompt through environment variables. While this approach prevented direct command injection at the shell level, it offered no protection against prompt injection. The AI model still received attacker-controlled text and could be directed to take unexpected actions.

The agent had access to a concerning set of capabilities: a Gemini API key, a Google Cloud access token, and a GitHub token with read and write permissions for code, issues, and pull requests. The exposed toolset included commands for running shell operations and editing GitHub issues.

Aikido's proof-of-concept involved submitting a malicious issue that appeared to report a login button bug. Buried within the issue body, hidden instructions directed the model to execute a specific command that would edit the issue and insert the leaked tokens into the body field. The model interpreted these injected instructions as legitimate and dutifully executed the GitHub CLI command, publishing the secrets publicly within the issue.

Varying Exposure Across AI Platforms

Not all AI agents present identical risk profiles, though the underlying vulnerability pattern remains consistent.

Claude Code Actions represents perhaps the most widely deployed agentic GitHub Action. By default, it restricts execution to users with write permissions on the repository. However, a configuration option exists to allow all users to trigger the action, which Aikido characterizes as extremely dangerous. Testing revealed that when this restriction is disabled, attackers can almost always extract privileged GitHub tokens, even when user input is not directly embedded in prompts but rather gathered by Claude through its available tools.

OpenAI's Codex Actions implements similar write-permission requirements by default, with an option to allow all users. It also includes a safety-strategy parameter that defaults to dropping sudo capabilities. Exploitation requires both the user restriction and safety strategy to be misconfigured simultaneously.

GitHub's AI Inference system poses particular concerns when its Model Context Protocol integration is enabled. A successful prompt injection against this configuration grants attackers the ability to interact with the MCP server using privileged GitHub tokens.

Why This Matters for the Software Supply Chain

The PromptPwnd discovery arrives at a moment of escalating concern about software supply chain security. Just last week, the Shai-Hulud 2.0 attack demonstrated how GitHub Actions misconfiguration enabled malware to spread through package ecosystems by stealing credentials from projects including AsyncAPI and PostHog.

AI-powered automation amplifies these risks substantially. Traditional supply chain attacks require attackers to find and exploit specific misconfigurations or inject malicious code that evades review. Prompt injection attacks against AI agents exploit a more fundamental weakness: the difficulty of reliably separating data from instructions when processing natural language.

Any repository using AI for issue triage, pull request management, code suggestions, or automated responses now faces exposure to prompt injection, command injection, secret exfiltration, repository compromise, and upstream supply chain attacks. The attack surface extends to any repository where external users can submit issues or pull requests, making public open-source projects particularly vulnerable.

Mitigation Strategies for Development Teams

Aikido recommends several immediate defensive measures. Organizations should restrict the toolset available to AI agents, specifically avoiding configurations that allow writing to issues or pull requests. Input validation remains essential, though the nature of natural language processing makes complete sanitization challenging. All AI output should be treated as untrusted code rather than safe-to-execute instructions.

GitHub offers a feature to restrict token access by IP address, which can limit the blast radius of any leaked credentials. Organizations should audit existing workflows for the vulnerable patterns: untrusted user content embedded in prompts, AI output executed as shell commands, high-privilege tokens exposed to AI agents, and configurations that allow untrusted users to trigger AI-powered actions.

Aikido has open-sourced Opengrep rules for detecting these vulnerability patterns, making them available to security vendors and development teams for integration into existing scanning workflows. The company's platform now includes automated detection for unsafe GitHub Actions configurations involving AI prompt flows and exposed privileged tooling.

An Unexplored Attack Surface

The speed of AI adoption in development tooling has outpaced security considerations for these new integration patterns. The PromptPwnd research establishes that theoretical prompt injection risks translate directly into practical CI/CD compromise scenarios. As AI agents gain more sophisticated capabilities and broader access to development infrastructure, the attack surface will only expand.

Development teams now face a difficult balance: the productivity benefits of AI automation are substantial, but the security implications require careful architectural consideration. Treating AI agents as fully trusted components within privileged pipeline contexts may prove to be one of the more consequential security decisions organizations make in the coming years.