How to Write Specs for AI Agents

AI agents are powerful, but without a good spec, they are like brilliant interns without direction. The problem is not the model’s capability. It is the quality of the specification you provide.

“The specification becomes the source of truth. Code becomes just its expression in a specific language.” — GitHub Spec Kit

The Central Challenge

Three common problems degrade output quality:

Vague specs: “Build something cool” provides no anchoring for the agent to work with.

Excessive context: Dumping documentation without hierarchy causes focus loss and performance degradation.

Lack of boundaries: Without saying what NOT to do, the agent may add authentication where it is not needed.

The Curse of Instructions

Research demonstrates that LLM performance degrades significantly when receiving too many requirements simultaneously.

Cognitive overload: too many directives make the model follow none of them well
20% degradation when information is in the middle of 70-80% of context

The solution is modularity: teams maintain separate specs per domain (SPEC_backend.md, SPEC_frontend.md, SPEC_database.md). Each agent references only the spec relevant to its task.

This mimics what developers do: mentally compartmentalize a large spec into relevant chunks.

The 5 Principles for Effective Specs

Principle 1: Start High Level, Let AI Expand

Instead of exhaustive upfront engineering, start with a clear objective statement and let the agent elaborate.

The recommended flow:

Provide high-level project description
Agent elaborates structured spec (objectives, features, constraints)
Review and refine before execution
Save as persistent document (SPEC.md)

Use “Plan Mode” (read-only operations) for the agent to analyze existing code before generating anything.

Principle 2: Structure Like Professional PRD Documents

GitHub analysis of 2,500+ agent configuration files identified 6 critical areas:

Commands: Complete executable commands with flags
Tests: Framework, file location, coverage expectations
Structure: Explicit directory mapping
Code style: Real snippets demonstrating conventions
Git workflow: Branch naming, commit format, PR requirements
Boundaries: What the agent should NEVER touch

3-Tier Permission System

ALWAYS DO (actions without approval):

Run automated tests
Format code with linter
Create feature/* branches
Add debug logs
Update inline documentation

ASK FIRST (high-impact changes):

Modify database schemas
Change public APIs
Add dependencies
Change CI/CD configurations
Refactor shared code

NEVER DO (categorical prohibitions):

Commit secrets/keys
Push directly to main
Delete production data
Expose endpoints without auth
Ignore failing tests

AI cannot infer by omission. If you do not explicitly say “do not implement authentication in this phase”, the agent may add it because most applications need it.

Principle 3: Divide into Modular Prompts

The “curse of instructions” shows that many requirements result in low performance.

Modular strategies:

Split specs into focused components
Use sub-agents for different domains
Run parallel agents on non-overlapping work
Refresh context for each major task

Example of parallel agents: Agent 1 receives SPEC_backend.md and implements API. Agent 2 receives SPEC_frontend.md and creates components. Agent 3 receives SPEC_testing.md and writes E2E tests. They run in parallel because they do not step on each other.

Principle 4: Include Self-Checks and Expertise

Specs should function as coaching guides AND quality guardians.

Self-verification: Instruct the AI to compare output with spec requirements. Add checklists: do all endpoints have auth? Do errors return valid JSON? Do logs not expose PII?

LLM-as-Judge: Use a second agent to evaluate subjective criteria like code readability, pattern adherence, edge case coverage.

Conformance tests: Create test suites derived from specs. Language-independent tests serving as implementation contracts.

Knowledge injection: Include warnings about library quirks and edge cases. Ex: “Prisma has issues with serverless connections, use connection pooling.”

Principle 5: Test, Iterate, and Evolve

Specs are living documents requiring continuous refinement based on execution feedback.

Iteration practices:

Run tests after each milestone, not just at the end
Update specs when gaps emerge
Use version control to track evolution
Maintain logs of agent decisions

Refinement flow:

Execute the agent with current spec
Observe where the agent erred
Analyze if the spec was ambiguous or incomplete
Refine the spec and re-execute

GitHub Spec Kit: The Open-Source Framework

The Spec Kit defines a 4-phase workflow:

Phase 1 - SPECIFY: Describe objectives and user journeys. What, why, user experience.

Phase 2 - PLAN: Declare architecture, stack, and constraints. Technical decisions, data models, API contracts.

Phase 3 - TASKS: Decompose into testable units. Small tasks, [P] marking for parallel ones, tasks.md generated.

Phase 4 - IMPLEMENT: Execution with validation checkpoints. Agent executes, you verify, iterate as needed.

Workflow impact: Traditional approach takes ~12 hours of sequential documentation. With SDD, ~15 minutes produce complete specs, plans, and task lists with version control integration.

Minimum Spec Template

# Project Spec: [Name]

## Objective
[Clear objective statement]

## Tech Stack
- Runtime: Node.js 20
- Framework: Next.js 14
- Database: PostgreSQL 15
- ORM: Prisma

## Commands
- Build: `npm run build`
- Test: `npm test`
- Lint: `npm run lint`

## Project Structure
src/ - Source code
tests/ - Tests
docs/ - Documentation

## Boundaries

### Always do
- Run tests before commit
- Use TypeScript strict mode
- Follow ESLint rules

### Ask first
- Add dependencies
- Modify DB schema
- Change public APIs

### Never do
- Commit .env
- Push directly to main
- Ignore type errors

## Non-Goals
- Auth (future phase)
- i18n (not needed)
- Mobile (web only)

Anti-patterns to Avoid

Vague prompts: “Build something cool” provides no anchoring
Context without hierarchy: Dumping excessive documentation causes focus loss
Skipping human review: Passing tests do not guarantee correctness or security
Confusing prototype with production: “Vibe coding” requires discipline for production use
Specs without the 6 critical areas: Omitting commands, tests, structure, style, git, or boundaries leaves the agent without guidance
Not declaring non-goals: AI does not infer by omission

Advanced Tips

Cost optimization: Use cheap models for drafts; reserve expensive models for critical decisions. Plan-and-Execute pattern can reduce costs by 90%.

Parallel agents: Run 2-3 agents simultaneously on independent tasks.

Conformance suites: Language-independent tests serving as implementation contracts.

Spec personas: For specialized domains, create agents.md defining specific areas of expertise.

RAG and MCP: Use retrieval-augmented generation or Model Context Protocol for dynamic context management.

Adoption Metrics

84% of devs use or plan to use AI tools
90% of Fortune 100 have adopted agentic coding
40% of enterprise apps will have task-specific agents by 2026

The Takeaway

“Effective specs for AI agents require balance between comprehensiveness and cognitive load. Success emerges from clear structure, modular organization, embedded quality checks, and continuous refinement.” — Addy Osmani

Treat specs as executable artifacts that evolve with projects — not as static documentation discarded after coding starts.

At Victorino Group, we help companies create effective specs for their AI agents. If you need agentic systems that work in production, let’s talk.