Four Months, One Redis Array Type, and What It Tells You About AI on Production Code
Salvatore Sanfilippo spent four months building a Redis Array type with AI assistance. The discipline he describes is the harness thesis in practice.
The tools change fast. The engineering principles that make AI governable do not.
51 articles
Salvatore Sanfilippo spent four months building a Redis Array type with AI assistance. The discipline he describes is the harness thesis in practice.
Tim Kellogg's taxonomy of agent memory, with one operational number worth governing around. Memory blocks above 500 characters confuse the agent.
At 50+ MCP tools, schemas eat 5–7% of an agent's context window before the user types. Code Mode is one answer. Tool surface governability is the question.
Claude Code's architecture is a loop. The interesting work, and the governance surface, lives at the perimeter: tools, prompts, checkpoints.
Simon Willison's 4.6-to-4.7 diff turns a vendor system prompt into an auditable reference for teams building their own AI guardrails.
Anthropic's MCP co-creator just made the 2026 agent stack official. Here is what procurement should demand before Q2 contracts close.
Callstack cut agent token usage by 50% without changing models. They just stopped sending lazy context. Cost governance hiding as an engineering note.
Cursor shipped 235 optimized CUDA kernels with a 38% average speedup. The real story isn't the number. It's the measurement discipline behind it.
BugBot hits 78% resolution with 44K autonomous rules. Claude PR review needs human-written config. Same question, two answers.
A researcher used Claude to find a 23-year-old Linux vulnerability. The governance question is not whether AI can find bugs. It is who else is already looking.
Stanford's Meta-Harness automates harness optimization with an agentic search loop. If harnesses are governance, governance is now optimizable by machines.
New data: 60% of ChatGPT citations don't appear in search results at all. Organizations can't govern AI visibility like SEO.
Agent labs are training their own models. Whether to train vs. harness depends on four dimensions that double as a governance framework.
A type-constrained harness turned Qwen's 6.75% function calling success into 99.8%. The harness tax is negative. It pays for itself on the first call.
An agentic mesh connects isolated AI agents into a governed network. Most organizations need one. Few know it exists.
Anthropic's interpretability research shows Claude uses different computation than it describes. Chain-of-thought is post-hoc rationalization.
An agent harness is the infrastructure wrapping your AI model. Same model, different harness, 85% better results. Here is what that means.
Anthropic's long-running app harness reveals generator-evaluator loops, sprint contracts, and context decay. The data is thin. The patterns are real.
USC research: expert persona prompts drop accuracy 3.6pp while boosting safety 17.7pp. The fix is not removing personas. It is routing them.
Ramp's spend data shows Anthropic grew from 4% to 24.4% in one year, winning 70% of head-to-head matchups. Governance posture drives commercial outcomes.
One team burned 72% of their context on tool definitions. Three independent solutions converge on the same principle: less is more.
An Anthropic insider reveals the operational patterns from hundreds of production skills. The lessons challenge how most teams think about agent capabilities.
CLIs save tokens for individuals. MCP wins for enterprises. The real question is not which is better but who is asking.
Same model, same benchmark. Claude Opus 4.5 scored 42% and 78% depending on harness. The model is not the product.
Anthropic's Claude Code team shares 4 tool design lessons. First-party evidence confirms: tools shape agent behavior more reliably than instructions.
Separating verified signal from marketing noise in the emerging field of Generative Engine Optimization.
Erlang solved agent orchestration in 1986. Four converging signals show the industry is rediscovering supervision, isolation, and context discipline.
Training data goes stale fast. An ICML 2026 paper shows how composing existing problems extends its life and why smaller models can win.
Chrome's new WebMCP standard lets websites expose structured tools to AI agents. What this means for the web.
Anthropic retired a take-home exam after Claude matched top human performance. The real lesson isn't about AI speed. It's about governance.
A technical breakdown of the /insights command architecture and what its 6-stage pipeline teaches about observability in agentic workflows.
An engineering teardown of OpenAI's open-source agent architecture and what it means for teams building their own.
Vercel study shows 100% pass rate with AGENTS.md vs 53% for skills. The problem: agents never invoked the skills they needed.
AI conversations generate insights that remain trapped in chat logs. FigJam integration creates shareable team artifacts.
How BugBot evolved from static pipeline to autonomous agent and what engineering teams can learn about AI-powered DevEx.
How to manage context in production agentic systems. Seven lessons from Azure SRE Agent and advanced Manus techniques.
FastMCP 3.0 rebuilds the framework around three fundamental primitives. Understand Components, Providers, and Transforms for enterprise.
97 million monthly downloads. Also blamed for hallucinating agents. The problem is not the protocol - it is your server design.
If you repeat the same instructions every session, you are not developing with AI. Learn how to create systems that learn from each interaction.
Practical guide to creating an effective CLAUDE.md. Best practices, structure, examples, and common mistakes when setting up Claude Code.
Anthropic releases Claude Cowork: agentic automation for everyone. Technical analysis, real use cases, and what actually works.
Live demo from Cloud Next 2026: same question, two answers. Metadata alone got it wrong. A context graph got it right. Notes from the room.
Reco.ai spent $400 and 7 hours on AI code generation. Verification took a full week. Three sources confirm: judgment is the last constraint.
Stripe's 1,300 weekly agent PRs prove the point: the architecture around the AI matters more than the AI itself.
OpenAI calls it harness engineering. Anthropic calls it effective harnesses. The discipline is old. The recognition is overdue.
The agent-vs-tool debate masks a deeper architectural question. The real skill is knowing when each pattern applies.
GitHub adds AI agents to Actions. The real shift is not markdown over YAML -- it is making non-deterministic work systematic.
Both OpenAI and Anthropic released frontier models the same day. The real story is not which won — it is what their convergent bets tell practitioners.
Stanford data shows codebase quality predicts AI productivity. The governance infrastructure was always there: linters, tests, type safety.
Tool descriptions steer agents. Context rot erodes control. The real governance challenge is not protocol choice — it is what the agent sees.
DeepSeek's mHC innovation uses doubly stochastic matrices to stabilize deep networks. Technical analysis and strategic impact for leaders.
Technical deep-dives into code, MCP, agents, and engineering patterns.
Schedule a Technical Discussion