Explore, Plan, Code, Commit: The Cheapest Place to Fix an Agent's Work Is Before It Writes Code

Most teams using a coding agent paste a prompt and let the agent type. The model writes code. The engineer reacts to the diff. Every correction at that stage rewrites what was already written. The expensive habit hides in plain sight: nobody planned the work before the agent started spending tokens to produce it.

Anthropic’s canonical workflow for Claude Code has a name for that habit’s opposite. Explore, Plan, Code, Commit. The structure is simple, and the discipline is unfashionable: the agent is not allowed to edit anything until the plan is approved. Plan mode is read-only. The human reviews the plan, not the code. Once the plan is good, the work proceeds.

That single inversion changes the economics of agentic development. The cost of fixing a bad design in a plan is a few sentences of text. The cost of fixing the same bad design in 500 lines of diff is the diff plus the test rerun plus the review loop plus the commit history cleanup. Teams skip plan mode because their organizational muscle still rewards visible typing. They pay for the skip in rework.

The Four Phases, In Order

The workflow has four phases. Each one corresponds to a different posture the agent and the human take toward the work.

Explore. The agent reads files, runs searches, and forms a mental map of where the change belongs. It does not propose actions yet. It is figuring out what it does not know.

Plan. Entered with Shift+Tab in Claude Code, plan mode locks the agent into a read-only posture. The agent can still read, search, and reason. It cannot edit, run shell commands that mutate, or create files. It produces a numbered list of actions it intends to take. The human reads the list and approves, edits, or rejects it.

Code. With the plan approved, the agent toggles through the proposed actions and executes them. The plan is now a checklist, not a free-form session. Drift is visible because the plan is visible.

Commit. Before the change is committed, a sub-agent code reviewer inspects the diff. Then the agent generates a commit message in the team’s style. The human approves the commit.

The order matters. Each phase is cheaper than the next to correct. Explore corrections cost a search. Plan corrections cost a sentence. Code corrections cost a diff. Commit corrections cost the diff plus the audit trail. Teams that skip directly from prompt to code are choosing the most expensive correction surface as their first line of defense.

The Canonical Example

The Anthropic tutorial uses a concrete prompt to demonstrate the shape: “I need to add WebP conversion to our image upload pipeline. Figure out where in the pipeline it should happen, whether we need new dependencies, and how to approach it.”

Notice what the prompt does not say. It does not say “write the code.” It does not say “open the file and start.” It says “figure out and propose.” That framing puts the agent in explore-then-plan posture by default. The agent reads the pipeline files, runs a web search to check current best practices, and returns a plan. The human reads the plan and decides whether the proposed dependency, the proposed insertion point, and the proposed handling of edge cases are right. The human is reviewing six lines of plan, not 200 lines of diff.

If the plan is wrong, the conversation continues in plan mode. If the plan is right, the human approves and the agent proceeds. The first line of code is the first line of code that already survived design review.

Three Verification Surfaces

The workflow assumes verification, and Anthropic recommends three surfaces the agent should learn to use.

The first is the test suite as source of truth. The agent runs tests continuously while coding and treats the test result as the authoritative signal of whether the work is done. Passing tests do not prove correctness, but they remove the class of “I think it works” claims that polluted the first year of agentic development.

The second is browser control for UI work. Claude can drive a Chrome tab through MCP, open the running app, and verify that the change behaves as intended before claiming success. The agent does not just compile the change. It checks that the change does what was asked at the surface a user would touch.

The third is the Claude.md file. Recurring fixes, repository conventions, and decisions that the team has already made get written into Claude.md so the agent stops re-discovering them. Treat Claude.md as the agent’s institutional memory. Every time a code reviewer pastes the same correction twice, that correction belongs in Claude.md the third time.

Why Plan Mode Is Architecturally Important

Plan mode is not a UX flourish. It is a containment boundary at the perimeter of the agent’s core loop. We have written before about the while-loop architecture at the heart of Claude Code: the agent is a loop that decides on a tool call, executes it, observes the result, and decides on the next tool call. The loop is fast and capable. The loop is also expensive when it produces work that has to be discarded.

Plan mode wraps the loop. Inside plan mode, the agent’s tool catalog is restricted to read operations. The reasoning is unchanged. The output is a proposal, not a side effect. The human inspects the proposal and either approves it or sends the agent back to think again. The expensive loop only runs against work the human has already endorsed.

This is the same containment instinct that drives the agent harness and the broader harness primitives we have argued for: trust is moved from per-action to per-environment, and the environment now includes a phase where the agent reasons without consequences. The savings are structural. You are not catching bad work after it is written. You are catching it before it is written.

Where Teams Stall

The most common failure is not technical. It is organizational. Engineers feel productive when they see code being typed. Plan mode does not produce typing. It produces deliberation. To a culture that rewards visible motion, deliberation looks like the agent is stuck.

The fix is to measure rework instead of throughput. Count the number of times a change was committed, reverted, and re-committed in the same week. Count the number of PRs that required a second round of substantive changes after the first review. Both numbers fall when plan mode is enforced. Both numbers stay high when teams skip plan mode and react to diffs.

The second failure is treating plan mode as optional friction. It is optional in the same way wearing a seatbelt is optional. The cost is small. The expected loss in the small fraction of cases where the plan was wrong is enormous. Teams that learn this learn it after the first time an agent confidently refactors the wrong file at production scale.

Do This Now

Pick one repository this week. Establish the rule: every change made with a coding agent must go through plan mode. The human approves the plan before any file is edited. The plan lives in the PR description so the reviewer can see what was proposed and what shipped.

Add a Claude.md to the repo if it does not have one. Put three things in it: the test command, the lint command, and the three corrections your team has had to make twice in the last month. Update it every Friday.

Spawn a sub-agent code reviewer for the commit step. Pre-commit, the reviewer reads the diff against the plan and flags drift. The human still owns the merge. The reviewer is a cheap second pair of eyes that runs every time, not the times when someone remembers to ask.

Two weeks in, count the rework. Compare to the prior month. The number that goes down is the number that decides whether your team can scale agent-assisted development without scaling the cost of fixing what the agent already wrote.

This analysis synthesizes The Explore → Plan → Code → Commit workflow in Claude Code (Anthropic, May 2026), the Claude Code overview, and prior Victorino analysis of the while-loop architecture.

Victorino Group helps engineering teams adopt agent-native workflows without losing review discipline. Let’s talk.