Operating AI

Your Agent Remembers Everything. Who Governs That?

TV
Thiago Victorino
10 min read
Your Agent Remembers Everything. Who Governs That?

In February 2026, a startup called Mastra published a technical paper on what they call “observational memory.” The idea is elegant: two background agents — an Observer and a Reflector — silently compress an agent’s conversation history into dated observation logs. No vector database required. The approach scored 84.23% on LongMemEval using GPT-4o, outperforming standard retrieval-augmented generation at 80.05%. It compresses text by three to six times, which enables prompt caching and meaningful cost reduction.

The paper focuses entirely on accuracy and efficiency. It does not mention governance once.

This is not a criticism of Mastra. It is a description of the entire field. Every major agent memory project — Mem0, Letta, Zep, A-MEM, Hindsight — optimizes for the same two variables: how accurately can the agent recall information, and how cheaply can it do so. These are important questions. They are also incomplete.

The question nobody is asking: once an agent remembers, who decides what it is allowed to remember, how long it keeps that memory, and what happens when it remembers something it should not?

Memory Is Not Storage

The instinct is to treat agent memory like a database. Data goes in, data comes out, access controls apply. This mental model is wrong, and the error has consequences.

Agent memory is not storage. It is cognition infrastructure. When an agent remembers a user preference, a prior instruction, or a previous decision, that memory shapes every subsequent action the agent takes. Memory does not sit passively in a table. It actively influences behavior.

This distinction matters because it determines what governance looks like. If memory were storage, existing data governance frameworks — access controls, retention policies, encryption at rest — would largely suffice. But memory-as-cognition means that the content of memory changes what the agent does. Governing memory therefore requires governing behavior, not just data.

Harrison Chase, CEO of LangChain, recently defined context engineering as “building dynamic systems to provide the right information and tools in the right format such that the LLM can plausibly accomplish the task.” That definition captures the technical intent. What it leaves out is that the “right information” is a judgment call with material consequences — and persistent memory makes that judgment call permanent.

The Four Paradigms and Their Governance Profiles

Not all agent memory works the same way. The architecture determines the risk profile. Four paradigms have emerged, each with distinct governance implications that the industry has not yet addressed.

Observational memory stores compressed natural-language logs. Mastra’s approach is the clearest example: the Observer watches conversations, the Reflector synthesizes patterns, and the result is a timestamped text document. From a governance perspective, this is the most auditable paradigm. The memory is human-readable. You can inspect it, search it, understand what the agent remembers and why.

The risk is lossy compression. When two agents compress hours of conversation into a paragraph of observations, details are lost. Some of those details may be precisely the ones governance requires. A user’s withdrawal of consent, a correction to sensitive data, a boundary condition on how information should be used — any of these can disappear in compression. You cannot audit what was discarded.

Graph-based memory structures knowledge as nodes and relationships. Zep builds temporal knowledge graphs; Mem0 uses a combination of vector embeddings and graph structures. This paradigm is auditable through structured queries. You can ask the graph what the agent knows about a specific entity, trace relationship chains, and identify when knowledge was added or modified.

The risk is schema rigidity. Graph-based memory captures what it was designed to capture. Governance-relevant information that does not fit the schema — nuance, context, ambiguity — is either forced into categories that distort it or dropped entirely. The audit trail is precise but potentially incomplete in ways that are difficult to detect.

Self-editing memory gives the agent authority to manage its own memory store. Letta (formerly MemGPT) pioneered this approach: the agent decides what to remember, what to forget, and how to reorganize its knowledge. The advantage is adaptability. The agent tailors its memory to its tasks.

The risk is that self-editing memory allows the agent to overwrite its own constraints. If governance rules are encoded as memory — “do not share this user’s medical data with third parties” — a self-editing agent can, in principle, modify or delete that instruction. This is not a theoretical concern. It is a direct consequence of the architecture. Any system where the governed entity controls its own governance rules has an obvious structural flaw.

Agentic memory takes self-organization further. Systems like A-MEM (inspired by the Zettelkasten method) and Hindsight (which separates epistemic states) allow memory to autonomously restructure itself. Hindsight achieves 91.4% on LongMemEval by maintaining separate representations of what the agent knows versus what the user knows. A-MEM creates interconnected notes that evolve over time.

The risk is emergent structure. When memory self-organizes, the resulting architecture is not designed — it emerges. Emergent structures are inherently difficult to audit because their logic is not explicit. You can inspect the state of the memory at any point, but understanding why it organized itself that way requires reconstructing a process that was not documented.

Why This Matters Now

Three developments have moved agent memory from a research topic to a governance priority.

Agents are becoming employees, not tools. Jerry Liu, CEO of LlamaIndex, published a detailed analysis in February 2026 arguing that agents are transitioning from “workflows” to “employees.” His framework identifies three requirements for this shift: triggers beyond chat (agents that activate themselves), persistent task backlogs (agents that track their own work), and inbox-style interfaces (agents that receive and manage assignments). Sequoia Capital’s January 2026 report takes this further, describing long-horizon agents as functional AGI, with METR data showing agent capabilities doubling approximately every seven months.

When an agent is a tool, its memory is session-scoped. It forgets when you close the window. When an agent is an employee, its memory is career-scoped. It accumulates knowledge across weeks, months, and years of continuous operation. The governance requirements for these two scenarios are categorically different.

Memory makes prompt injection persistent. The security community has understood prompt injection — tricking an AI into following unauthorized instructions — for years. But in a stateless agent, prompt injection is temporary. The attack works for one session and then disappears. Persistent memory changes this equation entirely. A successful prompt injection can embed itself in the agent’s long-term memory, influencing every future interaction. One-time attacks become permanent exploits. OWASP recognized this in their 2026 Top 10 for Agentic Applications, listing Memory and Context Poisoning as ASI06.

Privacy law was not designed for this. GDPR enshrines the right to erasure — the right to have your personal data deleted upon request. This right assumes that data is stored in identifiable, deletable units. Observational memory violates this assumption. When conversations are compressed into synthesized observations, individual data points may be irreversibly merged. Deleting a specific person’s data from a compressed observation log may be technically impossible without destroying the entire log. MIT Technology Review flagged this in January 2026, calling what AI remembers about you “privacy’s next frontier.” The regulatory implications are significant and unresolved.

The Audit Problem

Kiteworks’ 2026 report found that 63% of organizations cannot enforce AI purpose limitations — they cannot ensure their AI systems use data only for approved purposes. This statistic predates the widespread deployment of persistent agent memory. It describes the governance gap for stateless AI. Adding persistent memory to systems that organizations already cannot govern does not create a new problem. It amplifies an existing one.

The core difficulty is that agent memory introduces a new category of organizational knowledge that exists outside traditional governance structures. It is not a document in a content management system. It is not a record in a database. It is not an email in an archive. It is a compressed, synthesized, potentially self-modifying representation of everything the agent has observed, stored in formats that vary by architecture and may not be human-readable.

Current governance frameworks assume that knowledge is created by humans, stored in known formats, and managed through established systems. Agent memory fits none of these assumptions. It is created by machines, stored in paradigm-specific formats, and managed — if managed at all — by the agents themselves.

What Governance Actually Requires

The path forward is not to slow agent memory development. The technology is useful, and it is advancing for good reasons. The path forward is to build governance infrastructure that matches the technology’s sophistication.

Four capabilities are necessary.

Memory auditing by paradigm. Different memory architectures require different audit approaches. Observational memory can be audited through text inspection. Graph memory can be audited through structured queries. Self-editing and agentic memory require behavioral auditing — testing what the agent does in response to governance-relevant scenarios, not just inspecting what it remembers. Organizations deploying agent memory need audit procedures specific to their memory paradigm, not generic AI governance checklists.

Retention and erasure mechanics. Every memory paradigm needs a defined answer to the question: how do we delete a specific individual’s data? For graph memory, this is technically straightforward — delete the relevant nodes and edges. For observational memory, it may require re-compressing the observation log without the target data. For agentic memory, it requires verification that the deleted information has not propagated into emergent structures. These are engineering problems with governance implications, and they need to be solved before deployment, not after a regulator asks.

Constraint immutability. In any paradigm where the agent can modify its own memory, governance constraints must be stored outside the agent’s writable memory space. This is an architectural requirement, not a policy requirement. If the agent can write to the location where its governance rules are stored, those rules are suggestions, not constraints. This principle — that the governed must not govern its own constraints — is elementary in every other domain. It has not yet been applied to agent memory.

Memory provenance tracking. Organizations need to know where each piece of agent memory came from: which conversation, which user, which data source, which timestamp. Without provenance, memory cannot be audited, cannot be selectively deleted, and cannot be traced when something goes wrong. This is metadata infrastructure, and it needs to be built into the memory system from the beginning. Retrofitting provenance onto an existing memory architecture is like retrofitting a foundation onto a finished building.

The Governance Advantage

The industry will eventually recognize that agent memory requires governance. The question is whether your organization figures this out before a regulatory action, a data breach, or a memory-poisoning incident forces the issue.

The organizations that build memory governance infrastructure now will have two advantages. First, they will avoid the incidents that force their competitors to build the same infrastructure under duress. Second, they will be able to deploy more capable agent memory systems earlier, because they will have the governance infrastructure to do so responsibly. Governance is not the brake on agent capability. It is the condition that makes full capability deployable.

Agent memory is advancing because agents that remember are more useful than agents that forget. Nobody disputes this. The missing piece is ensuring that what agents remember, how they remember it, and what they do with those memories remains under organizational control.

Your agents are building memories right now. The question is whether anyone is governing them.


Sources

  • Mastra. “Observational Memory: Rethinking How Agents Remember.” February 10, 2026.
  • Jerry Liu. “Long Horizon Document Agents.” LlamaIndex, February 12, 2026.
  • Sequoia Capital. “2026: This Is AGI.” January 2026.
  • METR. “Agent Capability Doubling Times.” 2025-2026.
  • OWASP. “Top 10 for Agentic Applications: ASI06 — Memory and Context Poisoning.” 2026.
  • MIT Technology Review. “What AI Remembers About You Is Privacy’s Next Frontier.” January 2026.
  • Kiteworks. “2026 AI Governance Report: 63% Cannot Enforce Purpose Limitations.” 2026.
  • Harrison Chase. “Context Engineering.” LangChain / Sequoia Capital podcast. 2026.
  • Letta (MemGPT). “Self-Editing Memory Architecture.” 2025-2026.
  • Zep. “Temporal Knowledge Graphs for Agent Memory.” 2025-2026.
  • A-MEM. “Agentic Memory with Autonomous Zettelkasten.” 2025.
  • Hindsight. “Epistemic Separation in Agent Memory.” 2025-2026. 91.4% LongMemEval.

At Victorino Group, we help organizations build governance infrastructure for AI systems that remember — from memory auditing frameworks to constraint architecture. If your agents are accumulating memory without governance, let’s talk.

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation