40 Engineers, 1 PM: What OpenAI's Codex Team Reveals About AI-Native Organizations

Gregor Ojstersek published an interview with Thibault Sottiaux, engineering lead on OpenAI’s Codex team. The numbers that got attention: 40 engineers, 1 product manager, 2 designers. That ratio — 40:1 — would be organizational malpractice in any traditional engineering organization.

It works. But not for the reason most people assume.

The Ratio Is Not the Story

The instinctive reaction to “40 engineers, 1 PM” is that OpenAI hires exceptional engineers who don’t need coordination. That is partially true — their compensation and brand attract extraordinary talent. But talent density alone does not explain how a team functions with a 40:1 ratio.

The explanation lies in what replaces the coordination layer. During a recent bug bash, the single PM triaged over 100 issues in one hour using Codex. Most fixes shipped within 24 hours. The PM didn’t type faster or work harder. The AI compressed the triage-prioritize-communicate cycle from days to minutes.

“Watching him work is unreal, it’s on another level,” says Sottiaux about the PM’s Codex-augmented workflow.

When you remove the bottleneck of information processing from the PM function, the ratio changes. The PM doesn’t coordinate 40 engineers manually. The PM defines priorities, and the tooling propagates those priorities through automated triage, automated review, and encoded skills.

Hundreds of Skills Is the Infrastructure

The most operationally significant detail in the interview is not the team size or the shipping speed. It is the “hundreds of custom skills” the team has built.

Skills, in Codex terminology, are reusable instruction sets that encode best practices for specific tasks. The Codex team has built skills for running QA, checking features, verifying builds, enforcing code review standards, respecting module boundaries, and maintaining semantic correctness.

This is organizational knowledge made executable. Every time a senior engineer figures out the right way to do something — review a PR, validate a deployment, check a refactor — they encode it as a skill. That skill then applies to every subsequent instance of that task, performed by any team member or by Codex itself.

The traditional equivalent is documentation. The difference is that documentation requires someone to read it and choose to follow it. A skill executes automatically. The gap between “we have a best practice” and “we enforce a best practice” closes to zero.

This is what operating AI actually looks like. Not a model that writes code. An infrastructure of encoded decisions that compounds over time.

Day-One Shipping and the Onboarding Inversion

The Codex team has a cultural norm of shipping on the first day. Not the first week — the first day. New hires arrive with no prior context, and the aspiration is that they ship meaningful features before sunset.

This works because of an onboarding inversion. Traditional onboarding assumes the new hire must absorb context before being productive. The Codex model assumes the new hire can be productive immediately because the context is available through the tool.

Codex sets up the engineer’s environment. It explains the codebase, existing projects, and feature structure. It acts as what Sottiaux calls “a highly skilled engineering mentor” — not a document to read, but an interactive system that answers questions in real time.

The hundreds of custom skills amplify this. A new engineer doesn’t start with a blank Codex. They start with an “upgraded” version pre-loaded with the team’s accumulated knowledge. The onboarding document is dead. The onboarding agent is alive.

The question for other organizations is not whether they can replicate day-one shipping. It is whether they can identify what organizational knowledge they would need to encode to make it possible.

Sub-Agents as Quality Infrastructure

The team uses multiple sub-agents to review pull requests before any human sees them. Engineers run a local “review” command, and multiple automated agents examine the code from different angles — structure, semantics, module boundaries, test coverage.

This is the same pattern we see in Factory’s Signals system: automated quality infrastructure that operates continuously and at scale. The difference is that Factory monitors agent sessions for user friction, while OpenAI monitors code for engineering standards. The architectural pattern is identical: multiple automated validators examining work product before human judgment applies.

Sottiaux notes that this process “regularly surfaces small issues and improvements they would have otherwise missed.” The key word is “otherwise.” Without automated review at scale, those issues would accumulate as technical debt. The sub-agents compress the feedback loop from “discovered during incident” to “caught before merge.”

The Bell Labs Comparison and Its Limits

Sottiaux describes the team structure as “closer to a modern version of Bell Labs.” The comparison is instructive in what it gets right and what it gets wrong.

What it gets right: small teams of 2-3 people with end-to-end ownership. Bottom-up ideation where the best ideas come from individual contributors, not product committees. High autonomy with minimal meetings. Leadership that is “extremely available” so decisions happen in minutes.

What it gets wrong: Bell Labs operated on multi-year research horizons. The Codex team ships daily, sometimes multiple times per day. Bell Labs researchers explored fundamental science. The Codex team builds products for immediate use. The autonomy is similar; the time horizon is opposite.

The resolution is that AI compresses the experimentation cycle. “The cost of making mistakes is much lower because Codex is always available,” Sottiaux explains. “They can try things, observe what happens, and change course fast if needed.” When experimentation is cheap, you can have Bell Labs-style autonomy with startup-speed iteration.

This is a cultural shift most organizations have not internalized. Traditional risk management assumes mistakes are expensive and should be prevented. When AI makes mistakes cheap to detect and reverse, the optimal strategy shifts from prevention to rapid experimentation.

What Cannot Be Replicated

Before drawing conclusions for other organizations, the important caveats.

First, OpenAI is the world’s most well-funded AI company using their own flagship product. The Codex team building Codex with Codex is inherently circular. Their productivity numbers are not transferable to teams using Codex on different problems.

Second, the team explicitly withheld their code generation percentage. “They cannot share the precise number currently.” The absence of this data point means we cannot assess how much of the productivity claim is attributable to AI versus exceptional talent.

Third, the interview contains zero failure cases. No mention of what doesn’t work, where sub-agents produce poor reviews, or what types of tasks still require entirely manual work. The narrative is promotional — OpenAI’s team describing their experience with OpenAI’s product.

Fourth, the 40:1 PM ratio may reflect a coordination gap, not an optimization. In many organizations, engineers doing PM work on top of engineering would be considered a structural problem, not a feature. Whether this works long-term is unproven.

The Transferable Pattern

Strip away the OpenAI-specific advantages and three patterns remain:

Encoded knowledge compounds. Custom skills — reusable instruction sets encoding best practices — create a knowledge asset that grows with every problem solved. This works at any scale and with any AI tooling.

Automated quality infrastructure replaces coordination. When code review, QA, and triage are partially automated, you need fewer people coordinating and more people building. The PM ratio changes because the PM function is partially automated, not eliminated.

Context availability replaces documentation. When an AI system can explain the codebase, navigate projects, and apply team-specific practices, the barrier to productivity drops. Onboarding becomes a conversation, not a reading assignment.

These patterns are not exclusive to OpenAI. They are available to any organization willing to invest in building the infrastructure. The model matters far less than the skills library, the automated review pipeline, and the cultural willingness to trust encoded knowledge over manual processes.

The question for your organization is not “how do we become OpenAI?” It is “how many of our best practices exist only in people’s heads, and what would happen if we made them executable?”