- Home
- The Thinking Wire
- LiteLLM: When the AI Gateway Becomes the Attack Vector
LiteLLM: When the AI Gateway Becomes the Attack Vector
Three weeks ago, we documented the Clinejection attack: a prompt injection in a GitHub issue title that compromised 4,000 developer machines through the npm registry. That attack proved AI tools could be weaponized through their own input channels.
LiteLLM proves something worse. You do not need to trick the AI. You just need to poison what it depends on.
What Happened
LiteLLM is a universal proxy that normalizes API calls across LLM providers. It has 3.4 million daily downloads on PyPI. If your engineering team uses multiple language models, there is a reasonable chance LiteLLM sits somewhere in the stack.
On March 24, 2026, version 1.82.8 appeared on the Python Package Index containing a file called litellm_init.pth. The .pth extension matters. Python .pth files execute automatically when any Python process starts. Not when you import litellm. Not when you call a function. When Python itself starts. Every time.
Within 72 minutes of installation on one researcher’s machine, the .pth file spawned 11,000 processes. It reached for everything: SSH private keys, GCloud credentials, Kubernetes tokens, AWS secrets, database passwords, crypto wallet keys. The exfiltration was indiscriminate. Anything that looked like a credential, the malware grabbed.
The package remained live on PyPI for approximately three hours during the disclosure process. Three hours, 3.4 million daily downloads.
The Attack Chain Nobody Expected
The LiteLLM compromise did not happen in isolation. It belongs to a campaign called TeamPCP, and the target selection reveals a strategy.
The same campaign previously compromised Trivy, a container security scanner. Then Checkmarx, an application security platform. Then LiteLLM, the AI gateway.
Read that sequence again. A security scanner. A security platform. An AI gateway. The attackers are not picking random packages. They are targeting the tools that organizations trust to protect them. Compromise the security scanner, and you have access to the systems it scans. Compromise the AI gateway, and you have access to every credential the gateway can reach.
LiteLLM is a particularly dangerous target because of where it sits in the architecture. An AI proxy connects to every model provider, every API key, every backend service the LLM needs to access. It is a credential aggregation point by design. One compromised dependency, and every secret the proxy touches is exposed.
Why .pth Files Change the Threat Model
Most Python supply chain attacks use setup.py or __init__.py as their execution vector. These require the package to be imported or installed with specific flags. Security teams know to look for suspicious code in these files. Static analysis tools flag them.
The .pth approach is different. A .pth file placed in the right directory runs code on every Python process startup, regardless of whether the package is imported. The execution is silent. No import statement triggers it. No function call activates it. Python’s own startup mechanism does the work.
This has implications for detection. If your security tooling scans setup.py and __init__.py for malicious code but ignores .pth files, you are looking at the front door while someone climbs through the basement window.
It also has implications for blast radius. A malicious __init__.py runs when you import that specific package. A malicious .pth file runs every time any Python process starts on the machine. Every script. Every notebook. Every background job. The surface area is not “code that uses litellm.” It is “code that uses Python.”
The Cursor IDE Connection
One infection path documented by researchers ran through the Cursor IDE. A developer using Cursor installed futuresearch-mcp-legacy via uvx. That MCP server listed litellm as a dependency without pinning a version. When litellm resolved, it pulled 1.82.8. The malicious .pth file was now on the developer’s machine, executing on every Python process.
This path is worth examining because it represents the new dependency reality for AI development. An IDE (Cursor) loaded a tool server (futuresearch-mcp-legacy via MCP protocol) that pulled a proxy library (litellm) that contained malware. Three levels of transitive dependency. The developer chose to install an MCP server. They did not choose litellm. They may not have known litellm was involved.
As we explored in AI Governance IS Cybersecurity, the organizational separation between security teams and AI governance teams creates blind spots. The Cursor-to-MCP-to-litellm chain illustrates the technical equivalent: dependency chains that cross from “AI tooling” into “infrastructure credentials” without any human reviewing the boundary.
The Accidental Discovery
Here is the detail that should unsettle every security team: the attack was discovered because the attacker made a mistake.
The .pth file’s fork-bombing behavior (11,000 processes in 72 minutes) was so aggressive that it caused visible system degradation. A researcher noticed their machine behaving strangely, investigated, and found the malicious file. Claude Code compressed what would have been hours of incident response into a focused 72-minute investigation.
A more careful attacker would have throttled the exfiltration. One process. Slow credential harvesting. No visible system impact. The machine would have continued operating normally while credentials flowed to an external server. Detection would have required monitoring outbound network traffic for anomalous connections, not noticing that your laptop was suddenly running 11,000 processes.
The attack failed at stealth, not at capability. The exfiltration mechanism worked. The credential targeting was comprehensive. The .pth persistence was effective. Only the volume gave it away.
Clinejection Was the Proof of Concept. LiteLLM Is the Escalation.
Clinejection was a prompt injection. An attacker wrote a clever sentence that tricked an AI bot into executing unintended actions. It required understanding how the target AI processed input. It exploited a cognitive vulnerability.
LiteLLM is a malicious package update. No prompt injection. No AI trickery. A compromised package in the dependency tree. It exploited a supply chain vulnerability that has existed since package managers were invented, applied to a target that is unique to the AI era.
Different vector. Same thesis: your AI dependency chain is your attack surface.
But LiteLLM escalates the stakes in three ways Clinejection did not.
First, the credential scope. Clinejection stole npm publish tokens and VS Code marketplace credentials. LiteLLM targeted SSH keys, cloud provider credentials, Kubernetes tokens, database passwords, and crypto wallets. The blast radius is not “your CI/CD pipeline.” It is “your entire infrastructure.”
Second, the persistence mechanism. Clinejection’s malicious package ran a postinstall hook, a one-time execution. LiteLLM’s .pth file runs on every Python process startup, indefinitely, until someone finds and removes it. Uninstalling litellm does not remove the .pth file. The malware outlives the package.
Third, the target profile. Clinejection targeted a developer tool (Cline CLI). LiteLLM targets AI middleware that sits in production systems. The developers who install Cline are building software. The systems that run litellm are serving production AI workloads, often with access to the most sensitive parts of the infrastructure.
What Governance Controls Would Have Caught This
Walk backward through the chain, the same exercise we applied to Clinejection.
Dependency pinning. If futuresearch-mcp-legacy had pinned litellm to a specific version, the malicious 1.82.8 would not have been pulled automatically. This is the simplest control and the one most frequently ignored. Unpinned dependencies are an open invitation for supply chain attacks.
Package integrity verification. Hash-based verification of package contents at install time would flag unexpected files. A .pth file appearing in a package that has never contained one before is anomalous. Tools like pip-audit and Sigstore can catch this, if they are actually running in your pipeline.
Runtime file monitoring. The .pth file was placed in a site-packages directory. Monitoring for new .pth files in Python environments is a specific, actionable control that would catch this class of attack regardless of the package it arrives in.
MCP server vetting. The Cursor-to-MCP path succeeded because the MCP server was installed without reviewing its dependency tree. Any process for vetting MCP servers (or any AI tool extension) should include dependency tree analysis, not just the server code itself.
Network egress monitoring. The exfiltrated credentials were sent somewhere. Monitoring outbound connections from development machines for unexpected destinations catches exfiltration regardless of how the malware arrived. This is the backstop control when prevention fails.
None of these are novel. Dependency pinning. Package verification. File monitoring. Dependency review. Network monitoring. These are established practices. They were not applied to the AI middleware layer because organizations have not yet internalized that AI middleware is critical infrastructure.
The Pattern Demanding Attention
Two major AI supply chain attacks in four weeks. Both targeting the AI development toolchain. Both exploiting the fact that AI tools have privileged access to credentials and infrastructure.
Clinejection exploited the AI agent’s cognitive surface (prompt injection). LiteLLM exploited the AI ecosystem’s dependency surface (malicious package). Together, they bracket the threat: attackers can compromise your AI systems by tricking them or by poisoning what they depend on. Defending against one vector while ignoring the other leaves you exposed.
The TeamPCP campaign’s target selection (security scanner, then security platform, then AI gateway) suggests a thesis about attacker strategy. They are not targeting AI tools because AI is trendy. They are targeting AI tools because AI tools aggregate credentials. An AI gateway that connects to five model providers, three cloud platforms, and a dozen internal services is the highest-value single point of compromise in the modern stack.
Organizations running AI in production need to answer a question they have been deferring: do you govern your AI dependency chain with the same rigor you govern your production code? If the answer is no (and for most organizations, it is no), then the attack surface documented in Clinejection and LiteLLM is open, waiting, and growing with every new AI tool you adopt.
This analysis synthesizes the LiteLLM/TeamPCP supply chain investigation by ReversingLabs (March 2026), the Clinejection attack analysis (March 2026), and NIST Cyber AI Profile NISTIR 8596 (December 2025), building on our earlier coverage in From Issue Title to Malware (March 2026) and AI Governance IS Cybersecurity (March 2026).
Victorino Group helps organizations govern AI middleware and dependency chains before they become credential exfiltration vectors. Let’s talk.
All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →
If this resonates, let's talk
We help companies implement AI without losing control.
Schedule a Conversation