Healthy AI-Native IER Is 5:1. Most AI Features Ship Without Anyone Calculating It.

TV
Thiago Victorino
7 min read
Healthy AI-Native IER Is 5:1. Most AI Features Ship Without Anyone Calculating It.

Ben Murray, who writes The SaaS CFO, published the cleanest piece of finance governance for AI features I have read this year. He calls the metric the Inference Efficiency Ratio, and he proposes it should be a launch gate. Not a dashboard tile. Not a quarterly review item. A gate. If the ratio is below threshold, the feature does not ship.

Most AI features in 2026 ship without anyone calculating it.

The formula is intentionally boring.

Inference Efficiency Ratio (IER) = AI Product Revenue / Inference Cost

Revenue attributable to the AI feature on top. Inference spend (model API calls, GPU time, vector queries, the entire serving stack) on the bottom. One number. The ratio answers a question CFOs already know how to ask in every other line of business: for every dollar I spend serving this thing, how many dollars does it bring in?

The number that surprised me is not the formula. It is how few teams produce it before launch.

Why Margin Analysis Was Not Enough

Murray’s frame is direct: “The issue isn’t that AI is expensive. The issue is that most SaaS companies have no framework other than margin analysis for measuring whether their AI spend is efficient.”

Margin analysis assumes you know unit cost. For traditional SaaS, you do. Hosting, support, and licensing per seat are stable. You set price, you know cost, gross margin falls out. The 75 to 85 percent gross margin band that defined SaaS for a decade was built on cost predictability, not cost minimization.

AI products break this. Inference cost moves with usage shape, prompt length, model choice, and caching behavior. Two customers paying the same subscription can produce 50x different inference bills. A power user on a reasoning model can swing margin negative on a single prompt. Average margin tells you nothing about which features are bleeding and which are healthy.

Murray cites the data point that made the financial press: AI-native SaaS companies are running at roughly 52 percent gross margin in 2026, against the 75 to 85 percent traditional SaaS band. The fastest-scaling AI startups (those that hit $100M ARR in 18 months) averaged closer to 25 percent. Inference now consumes 20 to 23 percent of total AI product cost at scaling stage. The cost line has moved from rounding error to the second-largest variable expense, and most CFOs do not have a per-feature view of it.

The Benchmarks Exist Now

What Murray adds beyond the formula is the benchmark grid. Industry comparables now exist for both architectures. Two product types, three thresholds each.

Product typeWarningTargetHealthy
AI-infused (AI feature inside traditional SaaS)< 5:18:110:1 or higher
AI-native (entire product is the AI)< 3:14:15:1 or higher

The asymmetry between the two is the point. An AI-infused product, where the model is one feature inside a software business with normal margin economics, can absorb a higher inference cost only if the surrounding revenue dilutes it. The healthy bar is 10:1 because the surrounding business needs the AI feature to earn its keep against everything else funded by the same dollar.

An AI-native product is the inverse. The entire revenue line is exposed to inference. There is no surrounding business to subsidize. A 5:1 ratio is healthy here because the architecture has nothing else to amortize against. Below 3:1 and the unit economics are upside down before customer acquisition cost enters the picture.

The “fastest-scaling AI startups at 25 percent gross margin” are sitting in or below the warning band on AI-native math. That is not a growth story. It is a runway story dressed as a growth story.

Why This Is Finance Governance, Not Engineering Hygiene

We have written about the engineer-side of this twice already. Engineers’ AI cost reality across 900 respondents showed how individual contributors are absorbing usage shocks they cannot model. Cursor’s negative gross margin showed the SaaS inversion symptom at the company level. Both pieces traced consequences. Neither offered the lever.

IER is the lever. It is a CFO control point that sits before launch, not after. The discipline it forces:

  • Every AI feature carries a revenue attribution model. If you cannot say which dollars the feature earned, you cannot compute the ratio.
  • Every AI feature carries a per-call cost model. Provider invoices alone are not enough; you need allocation per feature, per cohort, ideally per customer.
  • The ratio is computed before launch using projected usage, then recomputed monthly with real usage.
  • A feature whose ratio sits below warning either gets a pricing change, a model swap, a caching layer, or it sunsets. It does not coast.

This is the discipline finance applied to every other variable cost line in the business. Sales commissions get attribution. Marketing spend gets ROAS. Cloud infrastructure gets unit cost per request. Inference gets… an end-of-month invoice and a slack message that says “this is fine.”

The “Governance Beyond Engineering” Read

The recurring theme in 2026 is that AI governance is no longer something the CTO owns alone. Marketing has Agent Guidance. Legal has model-use policies. Design has system-level constraints. Finance now has IER.

Each of these is the same pattern: a function that traditionally measured outcomes after the fact gets a pre-execution control point because AI moves cost and risk too fast for retrospective measurement to matter. The CFOs who install IER as a launch gate this quarter will look prescient in eighteen months. The CFOs who first see the inference line on a board deck because it doubled will spend the following quarter explaining what changed.

Murray’s piece is doing for finance what a good runbook does for SRE. It names the metric, it sets the thresholds, and it puts the metric in front of the action.

Install IER This Week

Three steps. None of them require new tooling.

Step 1: Pick one AI feature and compute its IER. Use the last full month of revenue attributable to it (subscription contribution, usage-based revenue, or a defensible allocation if it is bundled). Use the last full month of inference cost from your provider invoices, allocated to that feature. Divide. Write the number down. Compare against the table above.

Step 2: Decide where the gate sits. AI-infused or AI-native? Your warning, target, and healthy bars are different. Be honest about which architecture the feature actually is, not which architecture the marketing deck says it is. If the feature requires the model to do the work, it is AI-native math even if it lives inside an AI-infused product.

Step 3: Make IER a launch criterion for the next AI feature in the roadmap. Not a metric to monitor. A gate to clear. The product team brings projected revenue and projected inference cost. Finance computes the ratio. Below warning, the feature does not ship. The conversation moves to pricing, caching, model selection, or scope reduction.

If the team cannot project revenue and cost at this granularity, that is the finding. The feature is shipping blind, and the cloud bill is the only feedback loop.

The CFOs who win the next two years of AI product economics are not the ones who block AI spend. They are the ones who installed a metric the rest of the company actually had to clear before the spend happened.


This analysis synthesizes How to Calculate the Inference Efficiency Ratio (The SaaS CFO, May 2026).

Victorino Group helps CFOs and product leaders install IER as a launch gate before the cloud bill writes the postmortem. Let’s talk.

All articles on The Thinking Wire are written with the assistance of Anthropic's Opus LLM. Each piece goes through multi-agent research to verify facts and surface contradictions, followed by human review and approval before publication. If you find any inaccurate information or wish to contact our editorial team, please reach out at editorial@victorinollc.com . About The Thinking Wire →

If this resonates, let's talk

We help companies implement AI without losing control.

Schedule a Conversation