Governed Implementation

Skills Are Not Replacing Agents. They Are Making Agents Governable.

TV
Thiago Victorino
7 min read
Skills Are Not Replacing Agents. They Are Making Agents Governable.

In June 2025, Gartner published a forecast that should have sobered the industry: more than 40% of agentic AI projects will be scaled back or canceled by end of 2027. The reasons were not technical capability shortfalls. They were escalating costs, unclear business value, and inadequate risk controls. The agents worked. The architecture around them did not.

That same press release noted that of the thousands of vendors claiming “agentic AI,” roughly 130 were building real agent systems. The rest were relabeling existing automation. Gartner called it what it was: agent washing.

Against that backdrop, Anthropic released the Agent Skills open standard in December 2025. It was a quiet release. A specification. A folder structure. A set of conventions for how AI agents should load, scope, and execute capabilities. No new model. No product launch. Just infrastructure.

Six months later, 26 platforms have adopted it. Claude, OpenAI Codex, Gemini CLI, GitHub Copilot, VS Code, Cursor, Atlassian Rovo. Simon Willison, one of the sharpest observers in the AI tooling space, wrote on December 19, 2025 that skills were potentially “a bigger deal than MCPs.”

He may be right. But not for the reason most people think.

The False Dichotomy

A narrative has emerged: skills are replacing agents. This is content marketing, not engineering analysis.

Skills are folders with files. A SKILL.md file containing instructions. A references/ directory with documentation. Optional scripts/ for executable logic. They follow a naming convention and a loading protocol. That is the entire specification.

Skills do not replace agents. They run inside agents. The Anthropic documentation shows skills loaded by subagents, executed within agent runtimes, constrained by agent-level permissions. Declaring agents dead because skills exist is like declaring operating systems dead because someone invented package managers.

The honest framing: skills modularize agent capabilities. They make those capabilities auditable, version-controlled, and governable. That is enough. It is more than enough. You do not need the hyperbole.

Context Engineering, Not Context Stuffing

The most underappreciated aspect of the skills specification is its loading architecture.

When an agent encounters a skill, it does not load the entire skill into its context window. It loads in tiers. First, metadata: the skill name, description, and trigger conditions. This costs roughly 100 tokens. If the skill is relevant to the current task, the agent loads the full instructions from SKILL.md, capped at 5,000 tokens. Only when execution requires it does the agent load scripts and reference materials.

The specification enforces a context budget: skills should consume no more than 2% of the available context window. This is not a suggestion buried in documentation. It is a design constraint that shapes how skills are written.

Why does this matter? Because context rot is real and measurable. Hong et al. (2025) demonstrated that language model performance degrades as context length grows. Information in the middle of a long context window is processed less reliably than information at the boundaries. Factory.ai, a company building enterprise agent infrastructure, published a principle that captures the implication: “effective agentic systems must treat context the way operating systems treat memory.”

Monolithic agents load everything at once. Every tool description, every instruction set, every piece of reference material competes for the same context window. By task five of a complex workflow, the constraints you defined at the start are literally less accessible to the model than they were at task one.

Progressive skill loading solves this structurally. Each skill loads only what it needs, when it needs it, and releases context when execution completes. The agent’s context window stays lean. Governance instructions remain at the boundaries where the model processes them most reliably.

Call it what it is: an architecture decision with direct governance implications.

Governance That Happens Automatically

Here is where the skills specification becomes genuinely interesting for enterprise AI.

Consider the allowed-tools field. A skill can declare which tools it is permitted to use. A code review skill can access file reading and diff tools but not deployment tools. A data analysis skill can query databases but not modify them. The boundary is explicit, declared in the skill definition, and enforced by the runtime.

Consider context: fork. When a skill runs with a forked context, it operates in isolation. It cannot read the parent agent’s context. It cannot modify the parent agent’s state. If the skill fails or produces dangerous output, the blast radius is contained to its own execution environment.

Consider the priority hierarchy: Enterprise overrides Personal, which overrides Project. An organization can define skill-level restrictions that no individual developer or project can circumvent. The IT security team sets the ceiling. Everyone else works within it.

Consider version control. Skills are folders in a git repository. Every change to a skill’s instructions, permissions, or scripts creates a commit. Every commit creates an audit trail. When a regulatory body asks “what could this AI system do on March 15th, and who authorized that capability?”, the answer is in the git log.

Now consider all four together. Explicit tool boundaries. Isolated execution. Organizational override authority. Complete audit history.

Governance is not bolted on after deployment. It happens automatically when you follow the specification. The developer writing a skill does not need to think about compliance. The structure produces compliance as a side effect of normal development.

The Open Standard Is the Real Story

Individual tools come and go. Standards persist.

When Microsoft, OpenAI, GitHub, Cursor, and Atlassian adopt the same capability format, something structural has happened. A skill written for Claude Code works in GitHub Copilot, works in Cursor, works in Gemini CLI. Write once, run anywhere for AI capabilities. This has not existed before.

The interoperability matters because it solves the vendor lock-in problem that has plagued every previous generation of AI tooling. An organization that invests in building a library of governed, tested skills is not making a bet on Anthropic or OpenAI or Google. They are building portable intellectual property.

But honesty requires acknowledging the risk. The standard is Anthropic-led. The specification lives at agentskills.io. If Anthropic makes governance decisions about the standard that conflict with other adopters’ interests, fragmentation is possible. Platform-specific extensions could erode interoperability. The history of web standards (and the browser wars that preceded their stabilization) suggests this risk is not theoretical.

The counterargument: the specification is simple enough that fragmentation has limited surface area. A folder, a markdown file, optional scripts. There is not much to disagree about. The simplicity may be its best defense against the politics that complicated more ambitious standards.

What Skills Do Not Solve

Skills are not a complete answer to enterprise AI governance. Acknowledging this honestly is more useful than pretending otherwise.

Orchestration remains unsolved. Skills define individual capabilities. They do not define how capabilities coordinate in multi-agent systems. When three skills need to execute in sequence with shared state, the orchestration layer sits above the skills specification. That layer has no standard.

Maintenance at scale is an open question. An enterprise with 50 skills needs a strategy for testing, updating, and deprecating them. The specification does not address lifecycle management. Git provides versioning, but versioning is not the same as maintenance.

Testing frameworks do not exist yet. How do you test that a skill behaves correctly across different models, context lengths, and edge cases? Unit testing for deterministic code is well understood. Behavioral testing for non-deterministic AI capabilities is an emerging discipline without mature tooling.

Skills are a feature of agent systems. They improve the governability of agents. They do not replace the need for agent architecture, orchestration patterns, or operational monitoring. Organizations that adopt skills still need to solve the full stack of agent governance.

What This Means

The 40% of agentic AI projects that Gartner predicts will be canceled share a common architecture: monolithic agents with broad permissions, unlimited context, and no audit trail. They fail not because AI is incapable but because ungoverned capability produces ungovernable risk.

Skills offer a structural alternative. Modular capabilities with explicit boundaries. Progressive context loading that preserves governance fidelity. Version-controlled audit trails that satisfy compliance requirements. An open standard that prevents vendor lock-in.

None of this is revolutionary. Package managers modularized software dependencies. Role-based access control modularized permissions. Containerization modularized deployment. Skills modularize AI capabilities. The pattern is familiar. The application to AI governance is new.

The organizations that will succeed with agentic AI are not the ones deploying the most powerful models. They are the ones building the most governable architectures. Skills do not guarantee success. They make governance possible at a level of granularity that monolithic agents never could.

That is a quieter claim than “skills replace agents.” It is also a true one.


Sources

  • Gartner. “Gartner Predicts 40% of Agentic AI Projects Will Be Scaled Back or Canceled by End of 2027.” Press Release. June 2025.
  • Anthropic. “Agent Skills.” Open standard specification. December 18, 2025. https://agentskills.io
  • Willison, Simon. “Agent Skills.” December 19, 2025. https://simonwillison.net/2025/Dec/19/agent-skills/
  • Hong et al. “Context Length and Model Performance Degradation.” 2025.
  • Factory.ai. “Context as Memory: Principles for Agentic Architecture.” 2025.

Victorino Group helps organizations build governed AI architectures that scale. If your agentic AI projects are expanding faster than your controls, the answer is modular governance built into the system, not bolted on after. Reach out at contact@victorinollc.com or visit www.victorinollc.com.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation