Engineering Notes

The tools change fast. The engineering principles that make AI governable do not.

51 articles

Essays

Essay
Four Months, One Redis Array Type, and What It Tells You About AI on Production Code
Salvatore Sanfilippo spent four months building a Redis Array type with AI assistance. The discipline he describes is the harness thesis in practice.
May 5, 20266 min read Read more
Essay
Three Memory Patterns That Hold Up (and Two That Don't)
Tim Kellogg's taxonomy of agent memory, with one operational number worth governing around. Memory blocks above 500 characters confuse the agent.
Apr 29, 20266 min read Read more
Essay
Tool Sprawl Has a Token Tax: When 50 MCP Tools Eat 7% of Context
At 50+ MCP tools, schemas eat 5–7% of an agent's context window before the user types. Code Mode is one answer. Tool surface governability is the question.
Apr 29, 20266 min read Read more
Essay
A Top Agent's Core Is a While-Loop
Claude Code's architecture is a loop. The interesting work, and the governance surface, lives at the perimeter: tools, prompts, checkpoints.
Apr 21, 20265 min read Read more
Essay
What Changed in the Opus 4.7 System Prompt Is a Governance Reference
Simon Willison's 4.6-to-4.7 diff turns a vendor system prompt into an auditable reference for teams building their own AI guardrails.
Apr 21, 20265 min read Read more
Essay
The Three-Layer Connectivity Stack Just Became Official
Anthropic's MCP co-creator just made the 2026 agent stack official. Here is what procurement should demand before Q2 contracts close.
Apr 21, 20268 min read Read more
Essay
Screenshots Are Lazy Context Engineering
Callstack cut agent token usage by 50% without changing models. They just stopped sending lazy context. Cost governance hiding as an engineering note.
Apr 15, 20265 min read Read more
Essay
Cursor's Multi-Agent Kernels: What Measured Agent Teams Look Like
Cursor shipped 235 optimized CUDA kernels with a 38% average speedup. The real story isn't the number. It's the measurement discipline behind it.
Apr 15, 20265 min read Read more
Essay
AI Code Review Is Setting Its Own Standards. Who Reviews the Reviewer?
BugBot hits 78% resolution with 44K autonomous rules. Claude PR review needs human-written config. Same question, two answers.
Apr 9, 20269 min read Read more
Essay
An AI Found Five Linux Kernel Bugs. Now What?
A researcher used Claude to find a 23-year-old Linux vulnerability. The governance question is not whether AI can find bugs. It is who else is already looking.
Apr 6, 20268 min read Read more
Essay
When the Harness Engineers Itself
Stanford's Meta-Harness automates harness optimization with an agentic search loop. If harnesses are governance, governance is now optimizable by machines.
Apr 6, 20267 min read Read more
Essay
Only 27% of ChatGPT Citations Rank on Google. That Is a Governance Problem.
New data: 60% of ChatGPT citations don't appear in search results at all. Organizations can't govern AI visibility like SEO.
Apr 1, 20269 min read Read more
Essay
Workload-Harness Fit: The Governance Framework Hiding in Agent Lab Economics
Agent labs are training their own models. Whether to train vs. harness depends on four dimensions that double as a governance framework.
Apr 1, 20269 min read Read more
Essay
From 6.75% to 99.8%: What Type-Constrained Verification Delivers
A type-constrained harness turned Qwen's 6.75% function calling success into 99.8%. The harness tax is negative. It pays for itself on the first call.
Mar 30, 20268 min read Read more
Essay
What Is an Agentic Mesh? The Infrastructure Layer Your Multi-Agent System Is Missing
An agentic mesh connects isolated AI agents into a governed network. Most organizations need one. Few know it exists.
Mar 28, 202610 min read Read more
Essay
When Your AI Explains Its Reasoning, It's Making It Up
Anthropic's interpretability research shows Claude uses different computation than it describes. Chain-of-thought is post-hoc rationalization.
Mar 26, 20267 min read Read more
Essay
What Is an Agent Harness? The Concept Your AI Strategy Is Missing
An agent harness is the infrastructure wrapping your AI model. Same model, different harness, 85% better results. Here is what that means.
Mar 26, 20269 min read Read more
Essay
Generator-Evaluator Loops: What Anthropic's Harness Design Actually Teaches
Anthropic's long-running app harness reveals generator-evaluator loops, sprint contracts, and context decay. The data is thin. The patterns are real.
Mar 24, 202610 min read Read more
Essay
Your 'Expert AI' Prompt Is Making Your AI Dumber
USC research: expert persona prompts drop accuracy 3.6pp while boosting safety 17.7pp. The fix is not removing personas. It is routing them.
Mar 24, 20269 min read Read more
Essay
The Market Chose Governance: What Ramp's AI Spend Data Reveals About Brand Trust
Ramp's spend data shows Anthropic grew from 4% to 24.4% in one year, winning 70% of head-to-head matchups. Governance posture drives commercial outcomes.
Mar 20, 20267 min read Read more
Essay
The Context Crisis: Three Architecture Bets That Shrink the Agent's World to Make It Work
One team burned 72% of their context on tool definitions. Three independent solutions converge on the same principle: less is more.
Mar 19, 20269 min read Read more
Essay
What Hundreds of Skills Taught Anthropic About Governing AI Agents
An Anthropic insider reveals the operational patterns from hundreds of production skills. The lessons challenge how most teams think about agent capabilities.
Mar 18, 202610 min read Read more
Essay
MCP Is Dead; Long Live MCP — Why the CLI vs Protocol Debate Misses the Point
CLIs save tokens for individuals. MCP wins for enterprises. The real question is not which is better but who is asking.
Mar 17, 20269 min read Read more
Essay
The Harness Difference: When 42% Becomes 78% Without Changing the Model
Same model, same benchmark. Claude Opus 4.5 scored 42% and 78% depending on harness. The model is not the product.
Mar 5, 202610 min read Read more
Essay
Seeing Like an Agent: What Claude Code's Tool Design Reveals About Building AI Systems
Anthropic's Claude Code team shares 4 tool design lessons. First-party evidence confirms: tools shape agent behavior more reliably than instructions.
Mar 3, 202610 min read Read more
Essay
What 1.2 Million ChatGPT Responses Actually Reveal About LLM Citation Patterns
Separating verified signal from marketing noise in the emerging field of Generative Engine Optimization.
Feb 23, 202610 min read Read more
Essay
The Architecture of Multi-Agent Systems: What Erlang, VS Code, and a Simpsons-Themed Loop Reveal
Erlang solved agent orchestration in 1986. Four converging signals show the industry is rediscovering supervision, isolation, and context discipline.
Feb 20, 202612 min read Read more
Essay
Your Training Data Is Going Stale. Composition-RL Shows What to Do About It.
Training data goes stale fast. An ICML 2026 paper shows how composing existing problems extends its life and why smaller models can win.
Feb 16, 202610 min read Read more
Essay
WebMCP: Every Website Just Became a Tool for AI Agents
Chrome's new WebMCP standard lets websites expose structured tools to AI agents. What this means for the web.
Feb 12, 20268 min read Read more
Essay
When AI Passed the Exam, The Exam Was the Problem
Anthropic retired a take-home exam after Claude matched top human performance. The real lesson isn't about AI speed. It's about governance.
Feb 4, 20267 min read Read more
Essay
What Claude Code's /insights Reveals About Measuring AI-Assisted Development
A technical breakdown of the /insights command architecture and what its 6-stage pipeline teaches about observability in agentic workflows.
Feb 4, 20268 min read Read more
Essay
What the Codex Agent Loop Reveals About Building Production AI Agents
An engineering teardown of OpenAI's open-source agent architecture and what it means for teams building their own.
Feb 4, 202610 min read Read more
Essay
Passive Context Wins: Why AGENTS.md Outperforms Skills in AI Agent Evals
Vercel study shows 100% pass rate with AGENTS.md vs 53% for skills. The problem: agents never invoked the skills they needed.
Jan 29, 202610 min read Read more
Essay
Claude + FigJam: When AI Outputs Stop Dying in Chat Logs
AI conversations generate insights that remain trapped in chat logs. FigJam integration creates shareable team artifacts.
Jan 28, 202610 min read Read more
Essay
AI Code Review: Lessons from Cursor BugBot
How BugBot evolved from static pipeline to autonomous agent and what engineering teams can learn about AI-powered DevEx.
Jan 22, 202612 min read Read more
Essay
Context Engineering for AI Agents: Lessons from Azure and Manus
How to manage context in production agentic systems. Seven lessons from Azure SRE Agent and advanced Manus techniques.
Jan 22, 202614 min read Read more
Essay
FastMCP 3.0: The New Architecture for Production MCP Servers
FastMCP 3.0 rebuilds the framework around three fundamental primitives. Understand Components, Providers, and Transforms for enterprise.
Jan 22, 202614 min read Read more
Essay
MCP Design Patterns: Building Servers That Agents Can Actually Use
97 million monthly downloads. Also blamed for hallucinating agents. The problem is not the protocol - it is your server design.
Jan 22, 202612 min read Read more
Essay
How to Build Self-Improving Coding Agents
If you repeat the same instructions every session, you are not developing with AI. Learn how to create systems that learn from each interaction.
Jan 22, 202612 min read Read more
Essay
CLAUDE.md: The Instruction Manual for Your Code Assistant
Practical guide to creating an effective CLAUDE.md. Best practices, structure, examples, and common mistakes when setting up Claude Code.
Jan 19, 202612 min read Read more
Essay
Claude Cowork: When AI Finally Works Alongside You
Anthropic releases Claude Cowork: agentic automation for everyone. Technical analysis, real use cases, and what actually works.
Jan 13, 20269 min read Read more

Engineering Notes

Essays

Four Months, One Redis Array Type, and What It Tells You About AI on Production Code

Three Memory Patterns That Hold Up (and Two That Don't)

Tool Sprawl Has a Token Tax: When 50 MCP Tools Eat 7% of Context

A Top Agent's Core Is a While-Loop

What Changed in the Opus 4.7 System Prompt Is a Governance Reference

The Three-Layer Connectivity Stack Just Became Official

Screenshots Are Lazy Context Engineering

Cursor's Multi-Agent Kernels: What Measured Agent Teams Look Like

AI Code Review Is Setting Its Own Standards. Who Reviews the Reviewer?

An AI Found Five Linux Kernel Bugs. Now What?

When the Harness Engineers Itself

Only 27% of ChatGPT Citations Rank on Google. That Is a Governance Problem.

Workload-Harness Fit: The Governance Framework Hiding in Agent Lab Economics

From 6.75% to 99.8%: What Type-Constrained Verification Delivers

What Is an Agentic Mesh? The Infrastructure Layer Your Multi-Agent System Is Missing

When Your AI Explains Its Reasoning, It's Making It Up

What Is an Agent Harness? The Concept Your AI Strategy Is Missing

Generator-Evaluator Loops: What Anthropic's Harness Design Actually Teaches

Your 'Expert AI' Prompt Is Making Your AI Dumber

The Market Chose Governance: What Ramp's AI Spend Data Reveals About Brand Trust

The Context Crisis: Three Architecture Bets That Shrink the Agent's World to Make It Work

What Hundreds of Skills Taught Anthropic About Governing AI Agents

MCP Is Dead; Long Live MCP — Why the CLI vs Protocol Debate Misses the Point

The Harness Difference: When 42% Becomes 78% Without Changing the Model

Seeing Like an Agent: What Claude Code's Tool Design Reveals About Building AI Systems

What 1.2 Million ChatGPT Responses Actually Reveal About LLM Citation Patterns

The Architecture of Multi-Agent Systems: What Erlang, VS Code, and a Simpsons-Themed Loop Reveal

Your Training Data Is Going Stale. Composition-RL Shows What to Do About It.

WebMCP: Every Website Just Became a Tool for AI Agents

When AI Passed the Exam, The Exam Was the Problem

What Claude Code's /insights Reveals About Measuring AI-Assisted Development

What the Codex Agent Loop Reveals About Building Production AI Agents

Passive Context Wins: Why AGENTS.md Outperforms Skills in AI Agent Evals

Claude + FigJam: When AI Outputs Stop Dying in Chat Logs

AI Code Review: Lessons from Cursor BugBot

Context Engineering for AI Agents: Lessons from Azure and Manus

FastMCP 3.0: The New Architecture for Production MCP Servers

MCP Design Patterns: Building Servers That Agents Can Actually Use

How to Build Self-Improving Coding Agents

CLAUDE.md: The Instruction Manual for Your Code Assistant

Claude Cowork: When AI Finally Works Alongside You

Points of View

Cloud Next 2026 Field Notes: Why Your Agent Needs a Context Graph

Seven Hours to Generate, One Week to Trust

What Stripe's Agentic Layer Reveals About the Next Engineering Paradigm

Harness Engineering Is Not New — But Naming It Matters

Agents Are Not Tools — But Sometimes They Should Be

Continuous AI: What GitHub's Agentic Workflows Actually Change

The February 5th Convergence: What GPT-5.3-Codex and Opus 4.6 Reveal

Your Codebase Already Has an AI Governance Layer. You Just Don't Know It.

Context Is the New Perimeter: Why AI Agent Governance Starts in the Context Window

DeepSeek mHC: How a 1967 Technique Is Reinventing Neural Networks

Explore Other Perspectives

The AI Control Problem

Governed Implementation

Operating AI

Schedule a Technical Discussion