The Attacker Replaced Their Script With an Agent

For a decade, the slow part of an intrusion was the human. A foothold opened a door, and then someone had to sit at a keyboard, read the output, decide the next pivot, type the command, wait, and repeat. That latency was the defender’s friend. It created the response window. Every playbook for incident response quietly assumed that a person on the other end was thinking at human speed.

Sysdig’s Threat Research Team just documented the first observed case where that assumption is gone. An attacker did not write a smarter script. They handed the keyboard to an LLM agent, and the agent improvised the entire post-exploitation chain in real time. The report, “AI Agent at the Wheel,” traces a path from an unauthenticated CVE to a fully exfiltrated internal database in four pivots. The forensic timeline is the part that should keep security leaders up at night.

What Sysdig Saw

The entry point was CVE-2026-39987 in marimo, the reactive Python notebook, a vulnerability already listed on CISA’s Known Exploited Vulnerabilities catalog. That part is ordinary. Known CVE, known patch lag, known initial access. What happened next was not.

According to Sysdig, the agent fired 12 cloud API calls across 11 Cloudflare-Workers IP addresses in 22 seconds. It then opened 8 parallel SSH sessions from 6 different IPs inside 113 seconds. The full database schema and its contents were exfiltrated in under two minutes. The entire chain, from the initial CVE exploit to the internal database, completed in under an hour.

Read those numbers again with a human in the loop. Twelve coordinated cloud API calls in 22 seconds is not a person reading documentation and deciding what to query next. Eight parallel SSH sessions is not someone juggling terminal tabs. This is an agent reasoning over the environment, branching its strategy based on what each step returned, and executing the branches concurrently. Michael Clark, Sysdig’s Director of Threat Research, framed it precisely: “We are not watching AI replace attackers. We are watching attackers replace their scripts with AI.”

Why This Breaks Signature Detection

The instinct, when a new attack technique appears, is to extract its indicators and write a rule. Block the IPs. Hash the payload. Fingerprint the command sequence. That instinct is built on a hidden premise: that the attack is reusable, that the same chain will show up on the next victim, so cataloguing it protects the next target.

An agentic attacker dissolves that premise. The agent does not run a fixed script against every host. It reads the specific environment it landed in, and it composes a path tailored to that environment. The marimo CVE was the door, but everything after the door was generated on the spot. The next victim runs different services, exposes different credentials, has a different internal topology, and so the agent produces a different chain. Every target gets a unique fingerprint. The indicators you extracted from the Sysdig incident protect almost no one, because no two runs look alike at the TTP layer.

This is the same structural problem we have written about on the defensive side, where agents executing arbitrary actions defeat allow-lists built for deterministic processes. The offensive mirror is now real. We covered the runtime governance answer in agent runtime syscall governance, and the principle holds in reverse: you cannot enumerate the bad when the bad is generated per-target.

Speed Is the Weapon, Not Intelligence

It is tempting to read this as “AI made attackers smarter.” That framing misses the actual shift. The agent did not discover a novel vulnerability or invent a new exploitation primitive. Every individual step in the Sysdig chain was a known technique. What changed is the cost and the clock.

Cost first. Composing a bespoke post-exploitation chain used to require a skilled operator’s time. That skill was the bottleneck, and the bottleneck limited how many targets a given attacker could work in parallel and how custom each intrusion could be. An agent collapses that cost toward zero. Bespoke, per-target intrusion logic is now cheap enough to run at scale.

Then the clock. A schema exfiltrated in under two minutes is faster than most alerting pipelines surface a notification, let alone faster than an analyst can triage one. The traditional response window, the gap between foothold and damage that gave defenders time to react, has compressed below the latency of human decision-making. By the time a person reads the alert, the database is already gone.

When the attacker’s loop runs faster than your response loop, detection that depends on a human deciding the next move has already lost. You are not racing another human anymore.

From Cataloguing TTPs to Governing Intent

If you cannot fingerprint the chain and you cannot out-speed the agent manually, the detection posture has to move up a layer. Stop asking “have I seen this exact technique before” and start asking “is this actor behaving like something that should have this capability.” Intent and behavior are the durable signals when the specific steps are infinitely variable.

In practice this means three shifts. First, watch for behavioral shape rather than payload signature: a single session opening 8 parallel SSH connections across 6 IPs in under two minutes is anomalous regardless of what commands run inside them. The tempo and the fan-out are the tell. Second, govern at the action boundary, not the indicator boundary. The same architecture that lets you put behavioral guardrails around your own agents, blocking actions that fall outside an expected envelope, is the architecture that catches an adversarial agent operating inside your perimeter. Third, automate the response, because a human-paced runbook cannot close a window that shuts in 113 seconds. The defensive loop has to run at machine speed too, or it does not run at all.

This is also why the supply-chain attacks we have tracked matter more now, not less. When the initial foothold can be handed to an agent that exfiltrates in minutes, the value of a poisoned dependency or a leaked credential goes up, because the time-to-damage after the foothold collapses. The lessons from shadow AI in the supply chain and CLI injection attacks compound with this incident: a faster post-exploitation engine raises the price of every upstream weakness.

Do This Now

Audit one assumption today: how long, in your environment, between a foothold on an internet-facing service and a human deciding what to do about it? If that number is measured in minutes or hours, you are defending against last decade’s attacker. Pick your most exposed internet-facing service, the marimo-equivalent in your stack, and instrument its post-foothold blast radius. Add behavioral detection on tempo and parallelism, not just known signatures. Then put an automated containment action behind it, something that can isolate a session or revoke a credential without waiting for a person, because the next operator on the other end will not be a person either.

The defenders who survive the agentic era will be the ones who stopped cataloguing what attackers have done and started governing what any actor, human or agent, is allowed to do inside their walls. The first documented case is published. The detection posture it demands is not optional.

This analysis synthesizes AI Agent at the Wheel: From a CVE to an Internal Database in 4 Pivots (Sysdig Threat Research Team, May 2026).

Victorino Group helps teams put behavioral guardrails around AI agents before the first incident. Let’s talk.