The Redux Maintainer Just Documented the Most Honest Agent Workflow of 2026

On May 7, 2026, Mark Erikson published Part 2 of his AI workflow series. Erikson maintains Redux. His day job is at Replay. He is the person who answers when a senior frontend engineer files a state-management bug at three in the morning, and he has answered enough of them across enough years to have opinions worth reading.

The headline of the post is not the tool stack. The headline is what he refused to do, and what he openly admitted he still cannot solve.

The stack itself is interesting. OpenCode with the CodeNomad UI. Claude Opus 4.5 and 4.6 over the API. Custom MCPs named grepika, tilth, and cachebro. A custom Bun script called devplans.ts that handles session handoffs. Most of that is replaceable. Tools change every six weeks. The discipline does not.

The discipline reads like a checklist of things that a lot of practitioners pretend they have already moved past. Erikson has not moved past them. He runs a parent orchestrator session that spawns interactive child subtask sessions, and he limits himself to one concurrent workstream. His own words: “I am intentionally choosing to limit the workflow to what I can manage in my own head.” He refuses YOLO permission modes. He uses regex-based command filtering rather than agent-call-based safety. He commits to Git by hand.

If you read that and felt the urge to argue with him, the next two sections are for you.

What Erikson Refused, and Why It Lands Differently in 2026

Three refusals stand out. Each of them puts pressure on something a vendor or a thought leader has been selling for the last twelve months.

The first refusal is YOLO permission modes. Most agent runners ship with a “go” switch that turns off the prompt for individual tool calls. Erikson does not flip it. The argument for flipping it is throughput. The argument against, which Erikson makes by simply not using it, is that an unprompted agent run is a run you cannot reconstruct. You traded a slower loop for a faster loop with no record of which decisions the model made on your behalf. When something breaks, you have no idea where to start reading.

The second refusal is agent-call-based safety. Many recent safety architectures route every tool call through a guardian agent that decides whether it is allowed. The pitch is that an LLM understands intent and can block dangerous calls that a regex cannot. Erikson chose the regex. The regex has the property that it is deterministic, auditable, and cannot itself be hallucinated past. Two engineers reading the same regex see the same set of allowed commands. Two engineers reading a guardian agent’s recent log do not.

The third refusal is concurrent subtasks. The frontier of agentic workflows is many parallel sub-agents, hierarchies, swarms. Erikson runs one at a time. His reason is not that the technology cannot do more. His reason is that he cannot mentally model more than one in flight, and he refuses to operate a system whose state he cannot hold in his head. Apply that test to your own production agents. How many of them produce output that any single engineer on your team can fully reconstruct after the fact? If the answer is “none,” that is a finding, not an achievement.

None of these three positions are radical in isolation. What is striking is that a maintainer of Erikson’s caliber publishes them together and is not embarrassed to say “I limit myself.” The implicit message is that the people shipping the loudest, fastest, most parallel agent stacks may be doing so because they have not yet had to live with the consequences.

The Two Open Problems He Was Honest About

The more important contribution of the post is not the refusals. It is the two surfaces Erikson openly named as unsolved.

The first is long-term memory and context. Erikson is explicit. When he needs to reconstruct what he and the agent decided two sessions ago, he digs through prior sessions by hand. There is no working long-term memory. The session is the memory. Cross-session continuity is a manual archaeology problem, and the workaround is his devplans.ts script that hand-rolls handoffs between sessions.

The second is code review and intent verification. His exact framing: “code review and ensuring intent are still hard.” This is the part that engineering leaders are most likely to misread. He is not saying that agents cannot write code. He is saying that nobody, including him, has a reliable way to confirm that the code the agent produced reflects the intent the human had at the start. The verification surface is still human.

Both of these are surfaces that vendors are racing to fill. The race is real, and somebody will eventually ship something useful in each lane. Today, in May 2026, the most respected practitioner publishing on this topic says neither lane is closed. Your operating assumption should match his.

There is a connection between his three refusals and his two open problems that is worth naming. The refusals exist because the open problems exist. If long-term memory worked, the case for stateless YOLO runs would be much stronger because you could reconstruct what happened. If reliable AI code review existed, the case for high-throughput parallel subtasks would be much stronger because each output would be independently verifiable. The discipline he practices is not arbitrary. It is exactly the discipline a senior practitioner adopts when the two load-bearing primitives are still missing.

Why This Matters for Your Stack

We have written before about why the harness is your memory and why subtraction beats addition in harness design. Erikson’s post is the field validation for those positions, written by somebody who is not building a Victorino service.

Read his post against your own production agent setup. Three questions are worth asking.

Are your agents running with permission models that an outside reviewer could reconstruct? If your team uses YOLO modes in production, you have implicitly accepted that you will not be able to explain individual decisions after the fact. Erikson chose not to make that trade. The question is whether your team made the choice deliberately or by default.

Is your safety layer deterministic or model-based? A guardian agent is a useful complement to a deterministic filter. It is a dangerous replacement for one. The regex is boring and that is the point. Boring is auditable.

Do you have a written rule for how many concurrent subtasks any one operator manages? If not, you have one in practice and you are not measuring it. The number does not have to be one. Erikson chose one for himself. A team operating at scale will choose more. But the number should be a decision, not an emergent property of whatever the tooling defaults to.

Where the Frontier Actually Is

The post draws a sharper line than most leaders are willing to draw publicly between solved and unsolved. The solved part is hands-on engineering with an agent that obeys a deterministic filter, commits when a human says commit, and works on one thing at a time. That part works today and it works well. The unsolved part is cross-session continuity and verification of intent.

The vendor pitches around memory products and AI-driven code review are not wrong to exist. They are pointed at real surfaces. But they are pointed at surfaces that the best practicing maintainer in the industry says are not yet closed. If you are budgeting agent investment for the next two quarters, weight the budget toward the parts Erikson confirms are solved, and treat the memory and review parts as research bets, not as production primitives.

Do This Now

Open the operating doc for one production agent on your team. Find three things.

Find the permission model. Write down whether your operators are running with prompted approval, deterministic filters, or YOLO mode. If it is YOLO, schedule a review with the owning engineer this week. The question is not whether you trust the agent. The question is whether you can reconstruct its decisions if a customer asks.

Find the concurrency cap. Write down the maximum number of concurrent subtasks any single operator runs. If the number is not written down, write it down today. Pick a number you can defend. Erikson picked one. Your team may pick three. The number itself is less important than the fact that somebody owns it.

Find the memory story. Write down how an operator reconstructs what was decided three sessions ago. If the answer is “they grep through prior logs,” you are running the same workaround Erikson is, and that is acceptable. If the answer is “our memory product handles it,” verify that claim against a real case before you bet a release on it.

The bar Erikson set in this post is not a high bar. It is an honest one. Match it.

This analysis synthesizes My Thoughts on AI, Part 2: Agent Setup, Workflow, and Tools by Mark Erikson (May 2026).

Victorino Group helps engineering leaders codify the agent discipline that top-tier maintainers practice, with explicit guardrails for the gaps vendors have not yet closed. Let’s talk.