The Handoff Problem

How to Build Self-Improving Coding Agents

TV
Thiago Victorino
12 min read

“If I have to repeat the same preference every session, I am not developing with AI — I am babysitting a very fast intern.”

This quote from Eric Ma, Principal Data Scientist at Moderna, captures a problem most developers face but few discuss: AI agents are treated as disposable tools. Each session starts from scratch, with no memory of what worked before.

The result? Constant repetition of instructions, correcting the same errors, and productivity that plateaus without continuous evolution.

The Central Thesis: Operational Improvement, Not Model Improvement

True agent improvement happens through environmental changes, not model updates.

Think about it: model weights do not change mid-week. If you want your agent to improve between Monday and Friday, the improvement must come from somewhere else — from the environment you build around it.

When an agent makes a mistake or takes an undesired path, the feedback needs to persist. This requires deliberate infrastructure.

The central insight: treat agent improvement like operational runbooks — documenting repeatable steps and post-incident learnings, converting natural language into actionable tool calls.

Two Main Mechanisms

Eric Ma’s framework proposes two complementary mechanisms:

AGENTS.md: Repository Memory

AGENTS.md is an instruction file for AI agents, versioned alongside the code. Think of it as a README for agents: while README.md is for humans, AGENTS.md contains the detailed context agents need — build commands, tests, conventions.

More than 20,000 repositories on GitHub already use AGENTS.md. The GitHub Blog analyzed 2,500+ repositories to identify success patterns.

The file serves two main functions:

Navigation: A code map that helps agents locate relevant files without exhaustive searches. A code map reduced search time from 40+ seconds to 2 seconds, eliminating 5-6 search operations.

Local Norms: Repository-specific rules that prevent repeated errors. “Execute Python through pixi context: pixi run python ...”. “Never modify tests to make them pass artificially.”

Skills: Reusable Playbooks

Skills are organized folders of instructions, scripts, and resources that agents can discover and load dynamically. They function as “prompt compression” — reusable playbooks that eliminate repetitive workflow explanations.

Basic structure: a folder containing a SKILL.md file (the prompt) and related assets/scripts (the tool layer).

A good skill makes three elements explicit:

  1. When to use it: Clear trigger conditions
  2. What steps to follow: Specific workflow procedures
  3. What good output looks like: Expected results and quality standards

Best Practices for AGENTS.md

GitHub’s analysis of 2,500+ repositories revealed consistent patterns in the most effective files:

Keep It Concise: Target 150 lines or less. Long files overload the agent and bury useful information.

Be Specific: “You are a helpful assistant” does not work. “You write tests for React components following these examples” works.

Commands First: Put executable commands early — npm test, pytest -v. Use backticks for direct copy.

Use Modular Files: Put AGENTS.md in each subpackage. Agents read the closest file in the directory tree.

Iterate Based on Errors: Add a rule the second time you see the same error. Do not try to predict everything upfront.

Define Clear Boundaries: What to never do — commit secrets, modify vendor, edit production configs.

Three-Level Permission Model

Clearly define what the agent can do, should ask about, and must never do.

Always Do (green): Write to specific directories, run tests, follow code patterns.

Ask First (yellow): Schema changes, adding dependencies, structural changes.

Never Do (red): Modify secrets, touch production configs, remove failing tests.

GitHub’s analysis showed that “never commit secrets” was the most useful restriction found in successful repositories. OpenAI has 88 AGENTS.md files in their main repository — one for each specific context.

Practical Skill Examples

CI Debugging: Standardized sequence for CI failures — identify failing jobs, pull logs, inspect diffs, reproduce locally, apply patch. Trigger: “CI is failing” or “pipeline broke”.

Release Announcements: Reduced composition time from 30 minutes per release to seconds. Produces formatted announcements for Teams/Slack with standardized emojis. Trigger: “Create release announcement”.

ML Reports: Post-training documentation using IMRAD format. Extracts data from stdout, metrics, code, configs, and git diffs automatically. Trigger: “Document experiment”.

Domain Knowledge: A specialist documented their implicit chromatography debugging knowledge as a skill — making tacit expertise explicit and reusable. Trigger: “Analyze chromatogram”.

The Maturity Model

Eric Ma proposes four evolutionary stages of agent usage:

Stage 0: Ad Hoc Prompting

Repeatedly explaining concepts without systematic knowledge accumulation. Each session starts from zero.

Stage 1: Repo-Local Memory

Implementation of AGENTS.md as repository-specific documentation that provides guardrails and code mapping.

Stage 2: Global Personal Skills

Workflows that repeat across repositories are elevated to reusable skills across the developer’s machine.

Stage 3: Shared Skills

Team workflows are formalized in shared locations — start local and promote after feeling the pain twice.

Decision Framework

When to use AGENTS.md vs. create a skill?

Update AGENTS.md for:

  • Code navigation guidance
  • Local norms and conventions
  • Repository-specific guardrails
  • Build and test commands
  • Security restrictions

Create skills for:

  • Multi-step reusable procedures
  • Cross-repository workflows
  • Tasks with strict output contracts
  • Formalized domain knowledge
  • Automations that save significant time

The golden rule: “Feel the pain twice” — only promote to skill when you encounter the same need repeatedly.

Security Considerations

Agent instruction files are attack surfaces.

Security research has demonstrated that malicious repositories can contain hidden instructions that make agents execute arbitrary commands — including propagating to other repositories. An “AI virus” embedded in AGENTS.md can instruct agents to automatically propagate to other repos the developer accesses.

Questions that need answers:

  • Who can modify AGENTS.md?
  • How to audit changes in agent instructions?
  • What rollback procedures exist?
  • How to review skills before deployment?
  • What are the compliance implications?

According to IBM and KPMG, 80% of business leaders cite cybersecurity as the biggest barrier to AI strategy.

Clarifying “Self-Improvement”

The mechanism described is human-mediated improvement, not autonomous modification.

The human identifies the pattern, writes the instruction, and commits. The agent does not improve itself — it operates within an environment that humans deliberately improve.

The advantage: explicit over implicit. AGENTS.md is curated, auditable, and controllable, unlike chat memory that evolves opaquely.

Environmental changes create hidden state. A new team member inherits instructions they did not write and may not understand. Documentation and onboarding need to keep up.

The Most Valuable Skill

“The most valuable skill is metacognition — the deliberate practice of observing what work repeats and systematizing it.”

The cycle is simple:

  1. Observe: Pay attention to agent behavior patterns
  2. Identify: Complicated paths, missing files, unnecessary refactors
  3. Systematize: Codify the learning in AGENTS.md or a skill

How to Get Started

Step 1: Create Your First AGENTS.md

Start with build/test commands and a basic map of the main folders. Keep it under 100 lines. Include system entry points, naming conventions, and where tests live.

Step 2: Observe and Iterate

Use the agent normally and pay attention to problematic patterns — overly complicated paths, relevant files not found, global refactors when surgical ones would suffice.

Step 3: Add Rules on Second Occurrence

When you see the same error twice, add it to AGENTS.md. Be specific about the problem, include an example of expected behavior, keep it concise.

Step 4: Promote to Skill When Mature

Workflows that repeat across projects become skills. Document when to use, steps to follow, and what good output looks like.

Conclusion

Your agent is not getting smarter. You are just repeating yourself.

The difference between using AI and developing with AI lies in the learning infrastructure you build around it. The hard problem is not the model — it is the system.

AGENTS.md for local context, Skills for reusable workflows, and metacognition to identify what to systematize. This is the maturity path that separates teams that use AI from teams that develop with AI.


At Victorino Group, we help teams implement AI systems with governance and real results. If you want to accelerate your development without losing control, let’s talk.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation