Governance as Advantage

How to Write Specs for AI Agents

TV
Thiago Victorino
12 min read

AI agents are powerful, but without a good spec, they are like brilliant interns without direction. The problem is not the model’s capability. It is the quality of the specification you provide.

“The specification becomes the source of truth. Code becomes just its expression in a specific language.” — GitHub Spec Kit

The Central Challenge

Three common problems degrade output quality:

Vague specs: “Build something cool” provides no anchoring for the agent to work with.

Excessive context: Dumping documentation without hierarchy causes focus loss and performance degradation.

Lack of boundaries: Without saying what NOT to do, the agent may add authentication where it is not needed.

The Curse of Instructions

Research demonstrates that LLM performance degrades significantly when receiving too many requirements simultaneously.

  • Cognitive overload: too many directives make the model follow none of them well
  • 20% degradation when information is in the middle of 70-80% of context

The solution is modularity: teams maintain separate specs per domain (SPEC_backend.md, SPEC_frontend.md, SPEC_database.md). Each agent references only the spec relevant to its task.

This mimics what developers do: mentally compartmentalize a large spec into relevant chunks.

The 5 Principles for Effective Specs

Principle 1: Start High Level, Let AI Expand

Instead of exhaustive upfront engineering, start with a clear objective statement and let the agent elaborate.

The recommended flow:

  1. Provide high-level project description
  2. Agent elaborates structured spec (objectives, features, constraints)
  3. Review and refine before execution
  4. Save as persistent document (SPEC.md)

Use “Plan Mode” (read-only operations) for the agent to analyze existing code before generating anything.

Principle 2: Structure Like Professional PRD Documents

GitHub analysis of 2,500+ agent configuration files identified 6 critical areas:

  • Commands: Complete executable commands with flags
  • Tests: Framework, file location, coverage expectations
  • Structure: Explicit directory mapping
  • Code style: Real snippets demonstrating conventions
  • Git workflow: Branch naming, commit format, PR requirements
  • Boundaries: What the agent should NEVER touch

3-Tier Permission System

ALWAYS DO (actions without approval):

  • Run automated tests
  • Format code with linter
  • Create feature/* branches
  • Add debug logs
  • Update inline documentation

ASK FIRST (high-impact changes):

  • Modify database schemas
  • Change public APIs
  • Add dependencies
  • Change CI/CD configurations
  • Refactor shared code

NEVER DO (categorical prohibitions):

  • Commit secrets/keys
  • Push directly to main
  • Delete production data
  • Expose endpoints without auth
  • Ignore failing tests

AI cannot infer by omission. If you do not explicitly say “do not implement authentication in this phase”, the agent may add it because most applications need it.

Principle 3: Divide into Modular Prompts

The “curse of instructions” shows that many requirements result in low performance.

Modular strategies:

  • Split specs into focused components
  • Use sub-agents for different domains
  • Run parallel agents on non-overlapping work
  • Refresh context for each major task

Example of parallel agents: Agent 1 receives SPEC_backend.md and implements API. Agent 2 receives SPEC_frontend.md and creates components. Agent 3 receives SPEC_testing.md and writes E2E tests. They run in parallel because they do not step on each other.

Principle 4: Include Self-Checks and Expertise

Specs should function as coaching guides AND quality guardians.

Self-verification: Instruct the AI to compare output with spec requirements. Add checklists: do all endpoints have auth? Do errors return valid JSON? Do logs not expose PII?

LLM-as-Judge: Use a second agent to evaluate subjective criteria like code readability, pattern adherence, edge case coverage.

Conformance tests: Create test suites derived from specs. Language-independent tests serving as implementation contracts.

Knowledge injection: Include warnings about library quirks and edge cases. Ex: “Prisma has issues with serverless connections, use connection pooling.”

Principle 5: Test, Iterate, and Evolve

Specs are living documents requiring continuous refinement based on execution feedback.

Iteration practices:

  • Run tests after each milestone, not just at the end
  • Update specs when gaps emerge
  • Use version control to track evolution
  • Maintain logs of agent decisions

Refinement flow:

  1. Execute the agent with current spec
  2. Observe where the agent erred
  3. Analyze if the spec was ambiguous or incomplete
  4. Refine the spec and re-execute

GitHub Spec Kit: The Open-Source Framework

The Spec Kit defines a 4-phase workflow:

Phase 1 - SPECIFY: Describe objectives and user journeys. What, why, user experience.

Phase 2 - PLAN: Declare architecture, stack, and constraints. Technical decisions, data models, API contracts.

Phase 3 - TASKS: Decompose into testable units. Small tasks, [P] marking for parallel ones, tasks.md generated.

Phase 4 - IMPLEMENT: Execution with validation checkpoints. Agent executes, you verify, iterate as needed.

Workflow impact: Traditional approach takes ~12 hours of sequential documentation. With SDD, ~15 minutes produce complete specs, plans, and task lists with version control integration.

Minimum Spec Template

# Project Spec: [Name]

## Objective
[Clear objective statement]

## Tech Stack
- Runtime: Node.js 20
- Framework: Next.js 14
- Database: PostgreSQL 15
- ORM: Prisma

## Commands
- Build: `npm run build`
- Test: `npm test`
- Lint: `npm run lint`

## Project Structure
src/ - Source code
tests/ - Tests
docs/ - Documentation

## Boundaries

### Always do
- Run tests before commit
- Use TypeScript strict mode
- Follow ESLint rules

### Ask first
- Add dependencies
- Modify DB schema
- Change public APIs

### Never do
- Commit .env
- Push directly to main
- Ignore type errors

## Non-Goals
- Auth (future phase)
- i18n (not needed)
- Mobile (web only)

Anti-patterns to Avoid

  • Vague prompts: “Build something cool” provides no anchoring
  • Context without hierarchy: Dumping excessive documentation causes focus loss
  • Skipping human review: Passing tests do not guarantee correctness or security
  • Confusing prototype with production: “Vibe coding” requires discipline for production use
  • Specs without the 6 critical areas: Omitting commands, tests, structure, style, git, or boundaries leaves the agent without guidance
  • Not declaring non-goals: AI does not infer by omission

Advanced Tips

Cost optimization: Use cheap models for drafts; reserve expensive models for critical decisions. Plan-and-Execute pattern can reduce costs by 90%.

Parallel agents: Run 2-3 agents simultaneously on independent tasks.

Conformance suites: Language-independent tests serving as implementation contracts.

Spec personas: For specialized domains, create agents.md defining specific areas of expertise.

RAG and MCP: Use retrieval-augmented generation or Model Context Protocol for dynamic context management.

Adoption Metrics

  • 84% of devs use or plan to use AI tools
  • 90% of Fortune 100 have adopted agentic coding
  • 40% of enterprise apps will have task-specific agents by 2026

The Takeaway

“Effective specs for AI agents require balance between comprehensiveness and cognitive load. Success emerges from clear structure, modular organization, embedded quality checks, and continuous refinement.” — Addy Osmani

Treat specs as executable artifacts that evolve with projects — not as static documentation discarded after coding starts.


At Victorino Group, we help companies create effective specs for their AI agents. If you need agentic systems that work in production, let’s talk.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation