Governance as Advantage

Context Engineering for AI Agents: Lessons from Azure and Manus

TV
Thiago Victorino
14 min read

The era of prompt engineering has evolved. Building with LLMs is no longer about finding the right words — it is about answering: “What context configuration is most likely to generate the desired model behavior?”

The Azure SRE Agent team discovered that context improvements outperform model upgrades and prompt optimization combined.

Definition

Context Engineering is the discipline of designing the architecture that feeds the LLM with the right information at the right time.

“Give the model fewer, cleaner choices, and spend your effort making context small, structured, and easy to operate on.” — Azure SRE Agent Team

Context as RAM

Andrej Karpathy proposed treating context windows like memory management:

  • Load: What enters the context
  • Compress: Reduce without losing essence
  • Page: Move to external storage
  • Compute: Process externally

Context Rot is a real phenomenon: quality degrades non-linearly as tokens fill the window. Effects appear well before advertised limits. Advertised windows of 200k+ tokens often have effective capacity below 128k. Recommendation: keep utilization below 40%.

Seven Lessons from Azure SRE Agent

Lesson 1: Trust Enables Reasoning

Systems with 100+ narrow tools created fragility. The change came from trusting the model to reason within broad guardrails.

Before: Encode each scenario into specific tools After: Broad tools + guardrails + trust in reasoning

Lesson 2: Leverage Existing Knowledge

Instead of creating abstractions for Azure CLI and Kubernetes, the team exposed commands directly. LLMs already know these CLIs from training data.

Fighting against the model’s pre-existing knowledge through abstraction layers is counterproductive. The model already knows how to use kubectl, az, and git. Let it use them.

Lesson 3: Multi-Agent Coordination Is Complex

Scaling from 10 to 50+ specialized agents created predictable failures:

  • Discovery problems: Agents did not know about distant capabilities
  • Prompt fragility: One poorly tuned agent corrupted the entire chain
  • Infinite loops: Agents delegating work circularly
  • Tunnel vision: Rigid boundaries prevented cross-domain reasoning

The solution was to collapse dozens of specialists into few generalists with broad tools and on-demand knowledge files.

Lesson 4: Invest in Capabilities, Not Restrictions

Inspired by Anthropic’s “agent skills” concept: on-demand knowledge instead of rigid specialization.

The pattern that works: few generalist agents, broad and flexible tools, knowledge files loaded when needed, guardrails instead of hard-coded restrictions.

Lesson 5: LLMs Orchestrate, They Do Not Calculate

Dumping 50k tokens of raw metrics into context was the wrong approach.

Wrong: Raw metrics in context for analysis Right: Model writes code (pandas/numpy), executes, returns results

Result: Eliminated token overhead and extended analysis windows by 10x.

Lesson 6: Externalize Plans and Compact History

Use explicit checklists (todo-style planners) outside the model’s context. Plans in external files (todo.md), history compacted into summaries, structured state preserved.

Manus uses “recitation”: constantly updates todo.md to keep objectives in the recent part of context.

Lesson 7: Treat Large Outputs as Data Sources

When tools return massive payloads (200k+ tokens from database queries):

  1. Intercept into session-based files
  2. Model inspects via additional tools
  3. Filters and analyzes incrementally
  4. Never dump raw data into context

File system as unlimited extended context.

Tool Call Chaining

This emerging practice reduces token overhead by 60-70%.

Traditional: Model calls Tool A, returns to model, calls Tool B, returns to model, calls Tool C…

Improved: Model generates script that executes A, B, C in sequence, then returns to model.

When the sequence is predictable, execute it in a block. Reserve the model for decisions requiring reasoning.

Manus Techniques: KV-Cache

Manus maintains a 100:1 ratio between input and output tokens. The key is cache optimization.

  • With cache on Claude Sonnet: $0.30/MTok
  • Without cache: $3.00/MTok

10x savings when you keep prompt prefixes stable.

Practices for High Cache Rate

  • Keep prefixes stable: Timestamps destroy cache
  • Append-only architecture: Add to context, do not reorder
  • Deterministic serialization: JSON with sorted keys
  • Explicit breakpoints: Mark where cache can be invalidated

Hierarchical Action Space

100+ tools cause “Context Confusion”. The solution is hierarchy.

Level 1 - Atomic: ~20 core tools always visible (file_write, browser_navigate, bash, message_user)

Level 2 - Sandbox Utilities: CLI commands via bash for more complex operations

Level 3 - Code and Packages: Complex logic in code chains, Python libraries

With 100+ visible tools, models hallucinate parameters or call wrong tools. Hierarchy keeps the visible set small while preserving full power.

Anti-patterns to Avoid

  • Adding/removing tools dynamically (invalidates cache)
  • Aggressive context compression (irreversible loss)
  • Hiding errors from the model (prevents learning)
  • Timestamps in system prompts (destroys cache)
  • Unstable JSON serialization (cache miss)
  • Too uniform context (causes pattern collapse)
  • Dynamic RAG for tool definitions (generates hallucinations)

Manus Principle: Introduce controlled variation in serialization, phrasing, and ordering to break repetitive patterns.

Three Context Problems

Context Rot

Performance degrades as window fills. Solution: define pre-rot thresholds (~128k), keep utilization below 40%, compact history proactively.

Context Pollution

Irrelevant information distracts the model. Solution: compaction to remove redundancy, preserve paths instead of content, selective summarization.

Context Confusion

Model cannot distinguish instructions, data, and markers. Solution: clearly separate sections, use explicit delimiters, avoid conflicting instructions.

Patterns That Survived Production

  • Broad Tools: Few powerful tools outperform many narrow ones
  • Code Interpretation: For deterministic analysis, model writes code
  • Context Compaction: Continuous history summarization
  • Progressive Disclosure: Session-based file system
  • Tool Chaining: Predictable sequences execute in blocks
  • Preserve Failures: Errors remain visible for learning

Real Case: Azure SRE Solves the Unexpected

The team’s own Azure OpenAI deployment started failing. There was no pre-defined workflow. The agent:

  1. Checked error logs
  2. Identified quota error
  3. Queried subscription limits
  4. Found the correct support category
  5. Automatically opened a ticket
  6. Next day, quota increased

Why it worked: Generalist agents, broad tools (real CLI), trust in reasoning, clean and structured context.

Emergent behavior arises from well-designed context, not explicit scenario programming.

Design Framework

Four Context Engineering operations:

  • Write: Write information into context. What enters and in what format.
  • Select: Choose which information to include at each step.
  • Compress: Reduce without losing essence. Prioritize reversibility.
  • Isolate: Separate contexts. Sub-agents with minimum necessary context.

The Philosophy That Worked

“Performance gains came from removing things, not adding complexity.” — Manus AI Principle

As models improve, systems should reduce scaffolding, not increase it. Context engineering focuses on finding the minimum effective context needed per step.


At Victorino Group, we apply governed context engineering for companies that need reliable AI agents in production. If you need to optimize your agentic systems, let’s talk.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation