AI Agent Economics: Token Tax Locks Gross Margins 30 Points Below SaaS Baseline

Microsoft canceled most Claude Code licenses last week as Uber burned its entire AI budget in four months

ChatGPT Deepseek
Taken on February 11, 2025 in Toulouse, southwestern France shows a tablet screen displaying the logo of DeepSeek, a Chinese artificial intelligence company that develops open-source large language models, and the logo of OpenAI's artificial intelligence chatbot ChatGPT. Getty Images/Lionel BONAVENTURE

Last week, Microsoft began canceling most of its internal Claude Code licenses — telling developers in its Experiences and Devices division their access would end by June 30 after compute costs exceeded the cost of the human employees the tools were supposed to augment. Days later, Fortune published an analysis warning that runaway token consumption by agentic systems could crash the AI economy. The same week, Uber confirmed it had burned through its entire 2026 AI budget in just four months. These are not billing accidents. They are the visible surface of a structural problem that now sits underneath every company building AI-powered products on top of someone else's model.

The problem has a name: the token tax. It is the structural levy any company pays for building on inference it does not own. And as the enterprise AI cost panic of late May 2026 has made clear, the gap between what it costs to run an AI agent product and what the old software playbook expected gross margins to look like is now impossible to ignore.

Two Kinds of Economics, One Wide Gap

To understand why the token tax exists, start with who pays what. When a model maker — Anthropic, OpenAI, Google — serves its own subscribers, the marginal cost of inference is an internal number: electricity, hardware depreciation, and operations inside data centers it owns and optimizes. It can smooth heavy users against light ones, impose rate limits, and treat the whole thing as a customer-acquisition cost.

When that same company sells API access to a third party, the price is a published list rate that already includes the model maker's profit margin. The third party never buys at true cost. It buys at retail, with the supplier's markup baked into every token. This is why, as Anthropic itself disclosed in July 2025, one user consuming "tens of thousands" in model usage on a $200-a-month plan exposed a cost-structure disparity that no outside startup can match: the model maker pools and subsidizes; the startup pays per token at retail.

The width of that gap is visible in an estimate from Cursor, the AI coding tool built by Anysphere. According to Cursor's own analysis, a $200-a-month Claude Code subscription can translate into roughly $5,000 in underlying compute costs. Cursor's figures are the company's own estimates, not Anthropic's audited disclosures, and should be read as rough directional indicators rather than precise accounting. But the direction is the story: first-party subscriptions can absorb a cost subsidy that third-party API customers cannot access at any price.

Why AI Agent Gross Margins Fall Short of SaaS Norms

For two decades, software was a spectacular business because its marginal cost per user approached zero. Once the code was written, serving the next customer cost almost nothing. Gross margins of 75 to 85 percent followed automatically, and those margins funded the growth that defined the cloud era.

AI inference breaks that premise at the foundation. Every user interaction triggers a model call. For agentic products — where a single request fans out into a chain of reasoning steps, tool calls, and retries — the cost of one session can multiply many times over. The marginal cost is no longer trivial; it is the single largest variable line in cost of goods sold, and it scales with usage rather than declining at scale.

The numbers confirm the gap. ICONIQ Capital's January 2026 State of AI survey of roughly 300 software executives found AI-native product gross margins projected at 52 percent in 2026, up from 41 percent in 2024 and 45 percent in 2025. That improving trajectory is real — the best operators are learning to manage inference costs — but even at 52 percent, AI-native products still run 23 to 33 percentage points below the 75 to 85 percent that mature SaaS businesses routinely achieve. Bessemer Venture Partners data shows early-stage AI companies that have not yet optimized their model stack can operate at margins as low as 25 percent, while those with efficient inference architectures converge toward 60 percent.

Inference alone consumes roughly 23 percent of revenue at scaling-stage AI companies, according to ICONIQ's data — meaning that for every $1 million in AI product revenue, roughly $230,000 exits the door as inference cost before a single engineer, salesperson, or marketer gets paid. The gap is worth roughly two to three decades of accumulated SaaS expectations: call it the cost of building on rented intelligence.

Gartner has added a structural caution to the optimistic trajectory: even a 90 percent drop in inference costs will not automatically produce cheaper enterprise AI, because agentic systems consume far more tokens per task than single-turn completions. Goldman Sachs Research projects a 24-fold increase in token consumption between now and 2030 as agentic workflows replace single-turn completions. The arithmetic is unforgiving. Falling unit prices and rising consumption volumes have been running roughly in parallel — and the math means that even as each token gets cheaper, total enterprise AI budgets will continue rising. Microsoft and Uber confirmed that dynamic empirically this past week.

Cursor's Case Study: How Inference Economics Nearly Broke a $50 Billion Company

No company illustrates the trap — or the escape — better than Cursor, built by Anysphere. The AI code editor became the fastest business-to-business software product ever to reach $1 billion in annualized revenue, surpassing Slack, Zoom, and Snowflake, before crossing $2 billion by February 2026.

The cost side, however, was punishing. For most of 2024 and into 2025, Cursor's cost of goods sold was dominated by inference fees paid to Anthropic and OpenAI. Every heavy user represented a direct pass-through cost that exceeded what Cursor was charging. TechCrunch reported in April 2026 that the company had operated at negative gross margins until recently, meaning it cost more to run the product than the company could collect in subscription revenue.

Cursor's escape is the instructive part. In November 2025, Anysphere shipped Composer, its first in-house inference model optimized for code generation. Before Composer, every query routed to a third-party model — primarily Claude and GPT — and every token cost money that flowed out of Anysphere's gross margin. The company later expanded its inference stack, and in March 2026, TechCrunch reported that Cursor admitted Composer 2 was built on top of Moonshot AI's Kimi as an open-source base, with roughly three-quarters of the compute budget going to Cursor's own continued training. Moonshot AI is a Beijing-based company whose operations fall under China's National Intelligence Law, which requires Chinese companies to cooperate with government intelligence requests on demand; that obligation applies regardless of where Cursor's own servers are located or what open-source license governs the model weights.

The result, per TechCrunch's April 2026 reporting, was that Cursor's proprietary model family helped push the company to "slight gross margin profitability" on large enterprise sales — even as individual developer subscriptions remain unprofitable. Exact figures remain estimates from sources familiar with the financials rather than audited disclosures. But the structural lesson is unambiguous: for a vertical AI agent company, the real margin inflection does not come from pricing adjustments or feature tiering. It comes from moving inference — the heaviest cost in the business — from something you buy to something you produce.

That move also addressed a second danger. Anthropic's Claude Code reached approximately $2.5 billion in annualized revenue and more than 300,000 business customers by early 2026, competing directly with Cursor for the same engineering teams. When the firm selling you tokens also sells a polished vertical product in your category at a cost structure you cannot match, owning your own model stops being a margin optimization and becomes a strategic necessity.

How Does Outcome-Based Pricing Work for AI Agents?

There is a second escape route, and it reframes the problem rather than attacking the cost directly. If the unpredictable cost of the process makes customers nervous and the vendor's margins thin, stop selling the process and sell the result instead.

Customer service has become the proving ground. Intercom rebuilt its product around its Fin agent and prices it at $0.99 per resolved ticket — the customer pays only when the AI actually closes a support conversation, not for tokens consumed in getting there. Sierra, co-founded by former Salesforce co-CEO Bret Taylor, built outcome pricing in from day one. Taylor has described it as the logically correct way to sell software, drawing the analogy to sales commissions: you pay when a result is delivered. Sierra reached $100 million in annualized revenue in roughly 21 months and crossed $150 million by early 2026. HubSpot dropped its Customer Agent price to $0.50 per resolved conversation in April 2026 — down from $1.00 — in a direct move toward the outcome model.

It is worth being precise about what outcome pricing does and does not do. It changes what the customer pays for — a verifiable result rather than raw token volume — and in doing so protects the customer from the wild cost swings of a long, runaway agentic session. But it does not make inference cost disappear. It shifts the usage risk onto the vendor, who must engineer the agent to be efficient enough that each verified resolution is profitable at the published rate. If a ticket resolution costs $1.50 in inference to produce and sells for $0.99, the vendor loses money in a different currency. Outcome pricing is a discipline, not a free lunch — it only pays off when the underlying inference engineering has also been done.

Outcome pricing also exposes an asymmetry that legacy incumbents find uncomfortable. Salesforce, Zendesk, and HubSpot built their core revenue on per-seat pricing. The better their AI becomes — the more tickets it resolves, the fewer human agents customers need — the more their own AI cannibalizes their seat count. Sierra and Intercom Fin carry none of that structural conflict: they have no seats to protect and every incentive to automate as aggressively as possible.

The model has a ceiling, however: it works only where the result is mechanically verifiable. A ticket resolved, a test passed, a sale confirmed — these have clean binary triggers. Where the deliverable is a matter of judgment, outcome pricing breaks down. How good was this essay? Was this research deep enough? These questions have no tamper-proof answer, and any product in that territory is forced back into charging for process.

What the Token Tax Means for Every AI Business Built on Rented Inference

The agent boom is real, the revenue is real, and the growth rates are unlike anything software has produced in a generation. But the underlying economics are still being worked out, and the comfortable assumptions of the SaaS era do not transfer intact. The defining question for any company building on a model maker's tokens is no longer "how do we price this?" It is "how do we survive above a cost floor we don't control?"

The two answers emerging from the market's most successful players point in the same direction. Own your inference, so the heaviest cost in the business is one you produce rather than rent. Or sell verifiable outcomes rather than raw compute, so your revenue tracks the value you deliver instead of the tokens you burn. Both routes require that the supplier's cost floor is not permanently the article's ceiling. Durable advantage in AI does not come from arbitraging the spread on someone else's tokens. That spread, as the past week's enterprise AI cost panic has confirmed, runs in the supplier's favor — and it always has.


Frequently Asked Questions

Why are AI agent gross margins lower than SaaS?

AI agent products carry real, variable inference costs for every user interaction — each query, agent step, and tool call runs the model and consumes compute. Traditional SaaS had near-zero marginal cost per user once the software was written. ICONIQ Capital's 2026 data shows AI-native product gross margins averaging 52 percent, compared with 75 to 85 percent for mature software-as-a-service businesses — a gap of 23 to 33 percentage points driven almost entirely by inference spend.

What is the token tax in AI?

The token tax is the structural cost premium paid by any company building AI products on top of a model it does not own. Model makers serve their own subscribers at internal cost and can subsidize heavy usage; third parties pay published API list rates that already include the supplier's profit margin. The difference — which can represent a cost-subsidy ratio of roughly 10 to 25 times between first-party and third-party serving — is the effective "tax" on building with rented intelligence rather than owned infrastructure.

How does outcome-based pricing work for AI agents?

Outcome-based pricing charges only when the AI agent delivers a defined result — a resolved support ticket, a qualified lead, a completed task — rather than for the tokens consumed reaching that result. Intercom's Fin charges $0.99 per resolution; HubSpot's Customer Agent dropped to $0.50 per resolved conversation in April 2026. The model transfers the usage-cost risk from the customer to the vendor, which must engineer the agent to be profitable at the published per-outcome rate.

How did Cursor improve its AI SaaS gross margins?

Cursor moved its inference from rented to owned by launching Composer, its in-house code-generation model, in November 2025, following it with Composer 2 in March 2026. Routing a large share of its completions through Composer instead of third-party APIs shifted a major variable cost from retail API pricing to internally produced compute. By April 2026, TechCrunch reported that Cursor had reached "slight gross margin profitability" on large enterprise sales, though individual developer subscriptions remain unprofitable.

ⓒ 2026 TECHTIMES.com All rights reserved. Do not reproduce without permission.

Tags:ChatGPT
Join the Discussion