Codex Just Shipped Agents Into Finance and Sales. Where Are the Boundaries?

For a year the warning was abstract: engineering built AI governance, and every other function would eventually need its own. On June 2, that abstraction shipped as a product. OpenAI released six role-specific Codex plugins that move agents out of the IDE and into sales, finance, design, and investing. The agent that used to write code now updates Salesforce records, pulls FactSet data, and judges whether an investment thesis is getting stronger or weaker.

The capability is impressive. The boundary question is the part nobody packaged.

What actually shipped

According to OpenAI’s announcement, more than 5 million people now use Codex every week, and non-developers (analysts, marketers, operators, designers, researchers, investors, bankers) make up about 20% of users while growing more than 3x faster than developers. Treat those as vendor figures, not independently verified counts. The direction is the signal regardless: the fastest-growing Codex population does not write software.

The six plugins target that population directly. Data analytics, creative production, sales, product design, public equity investing, and investment banking. Together, per the announcement, they include 62 popular apps and 110 skills. More are coming: Corporate Finance, Private Equity Investing, Marketing Strategy, Strategy Consulting, and Legal.

Read the connector lists and the autonomy becomes concrete. The sales plugin wires into Salesforce, HubSpot, Slack, Outreach, Clay, Rox, and Actively, and OpenAI says it can update customer records, build close plans, and review deals at risk. The public equity investing plugin connects Moody’s, Daloopa, Datasite, FactSet, LSEG, S&P, PitchBook, and Hebbia, and can assess whether an investment thesis is strengthening or weakening. The investment banking plugin helps bankers prepare pitch materials, analyze comparable companies and transactions, and turn diligence into recommendations.

These are not read-only research toys. They write to systems of record and they produce client-facing judgments.

The boundary question changes shape per domain

Engineering spent the last year answering four questions about its agents: who can act, what they can reach, how they consume resources, how they coordinate. The reason those answers matter is that the failure mode is contained. A bad code suggestion gets caught at review. A runaway tool call hits a cost ceiling. The blast radius has walls.

Now move the same autonomy into these new domains and the walls are not there.

A sales agent updating CRM records is editing the company’s source of truth for revenue. Pipeline data feeds forecasts, forecasts feed board reporting, board reporting feeds guidance. When the agent marks a deal at risk or rewrites a close plan, who reconciles that against what the rep actually believes? The CRM was already the system most prone to garbage-in. An autonomous writer accelerates both the cleanup and the contamination.

An investing agent assessing whether a thesis is strengthening or weakening is doing something categorically heavier. That is not a record edit. That is analytical judgment feeding a capital allocation decision. If the agent leans on FactSet and S&P data and concludes the thesis holds, and a human signs off without reconstructing the reasoning, the firm has outsourced part of its investment process to a system whose inputs it did not audit. The source caveat matters here too: OpenAI describes the capability, not its accuracy.

A banker agent producing pitch materials and diligence recommendations operates in the most regulated room of the four. Client-ready output in investment banking carries fiduciary and disclosure obligations. A comparable-company analysis is not a draft; it is the basis someone uses to advise a client on a transaction. The agent does not know which obligations attach to which deliverable, because nothing told it.

Three domains, three different boundaries, one shared problem: the systems these agents now write to were built assuming a human was the last actor before anything left the building.

The single control surface, and what it does not cover

OpenAI states one governance lever. For Business and Enterprise workspaces, admins can control underlying app permissions in workspace settings. That is real and it is useful. It is also app-level on-off plumbing, which is far below what these domains actually require.

App permissions answer one question: can this agent touch Salesforce at all? They do not answer the questions a regulated function has to answer. Which records can it write versus only read? What change triggers mandatory human sign-off before it commits? Where is the immutable log of what the agent did, when, and on whose authority? How does an auditor reconstruct the reasoning behind a thesis call six months later? What stops a banking deliverable from reaching a client without the disclosure review that compliance requires?

None of that is a permission toggle. All of it is the oversight scaffolding engineering built over a year and these functions have not started. We argued the broad version of this absence in Engineering Has Cloudflare. Marketing Has Nothing., and the finance-specific version in Financial Services Adopted AI. It Forgot to Watch It.. The new fact is that the warning now has a ship date and a connector list.

There is a second surface worth flagging. OpenAI’s Codex “Sites” preview lets agents create and share interactive hosted apps via a workspace URL, and annotations now extend from code into documents, spreadsheets, and slides. That widens the output channel. An agent that can publish a hosted page is an agent that can put a client-facing artifact into the world without passing through any of the review steps a marketing or compliance team would normally impose.

What an operator wraps around this before turning it on

The mistake to avoid is treating “admins can control app permissions” as the governance story and flipping the plugin on. The pattern from teams that adopted agents well is that the control plane gets built first, around the tool, not assumed inside it.

Concretely, before a sales, investing, or banking plugin goes live, an operator should be able to answer these in writing:

Write boundaries. Which systems can the agent modify versus only read? Default to read-only on every system of record until a specific write path earns an exception.
Sign-off gates. Name the actions that cannot commit without a human approval: any CRM stage change that moves a forecast, any thesis conclusion, any client-facing deliverable. The gate is the point, not the notification.
Audit trail. An immutable, queryable log of agent actions and the data they relied on. If you cannot reconstruct a decision later, you cannot defend it to a regulator or a client.
Domain ownership. A named human who owns the constraint, not just the Slack alert. Engineering learned that a permission without an owner is a permission nobody enforces.
Regulatory exposure mapping. Treat the banking and investing plugins as touching obligations, even before you cite a specific rule. The class of risk is fiduciary, disclosure, and recordkeeping. Map which deliverables fall inside it.

Codex made the agent available to every role this week. The oversight those roles legally require did not ship in the same box. The organizations that adopt these plugins fastest will be the ones that accept agent write-access to Salesforce, FactSet, and investment theses before they build the controls those systems demand. The ones that win will invert that order: build the boundary, then turn on the agent.

This analysis synthesizes Codex for every role, tool, and workflow (OpenAI, June 2026).

Victorino Group helps operators design the boundary, audit, and sign-off controls before agents touch regulated and revenue-bearing systems. Let’s talk.

What actually shipped

The boundary question changes shape per domain

The single control surface, and what it does not cover

What an operator wraps around this before turning it on

If this resonates, let's talk