9 Comments
User's avatar
Keith Amaral's avatar

The market spent two years celebrating token growth. Now the question is whether token consumption represents revenue creation or cost creation. As CFOs shift their focus from how many tokens are being used to what economic value those tokens actually generate, the debate will no longer be about whether AI works but whether the return on AI exceeds the cost of AI. Assuming token consumption continues growing faster than economic value creation, where do you think the market will first see the reckoning: enterprise budgets, AI valuations, or infrastructure spending?

Marginal Gains's avatar

Even though the 82% waste statistic is striking, I do not think it is surprising. I am seeing that most users struggle to use the product effectively. More about it is in the following paragraphs. It reflects a structural mismatch between how AI is being built, sold, measured, and actually used.

The issue is not simply “too many tokens.” There are too many tokens per unit of useful work. High token usage can be completely defensible if it leads to faster product development, resolved tickets, better decisions, lower support costs, or measurable productivity gains. The problem begins when token growth reflects re-prompting, hallucination correction, bloated context windows, unnecessary agent loops, or users struggling to ask the right question.

One under-discussed issue is incentive alignment. Frontier model companies and GPU providers benefit from more compute demand, more token consumption, larger models, longer context windows, and greater infrastructure buildout. Customers, however, benefit from reliable task completion at the lowest reasonable cost. In that sense, much of AI pricing behaves more like a cost-plus contract than a firm fixed-price contract. Vendors monetize the compute used to reach an answer, while customers care about the answer or the workflow being completed. That shifts the risk of inefficiency, retries, hallucination correction, bloated prompts, excessive tool calls, and agent loops onto the customer. Until pricing and product design shift from token volume toward task success, cost per workflow, or outcome-based value, it will be easy to confuse consumption with productivity.

Another factor is that the early subsidy phase may be coming to an end. A lot of AI adoption was helped by strategic underpricing, venture funding, cloud credits, promotional pricing, and vendors absorbing infrastructure costs to drive usage. As those subsidies shrink, the true cost of inefficient token consumption becomes harder to hide. At the same time, newer and more capable models do not always mean cheaper outcomes. They may use more tokens through longer answers, larger context windows, hidden reasoning, tool calls, or agentic workflows. So even if the cost per token falls, the cost per completed task can still rise.

There is also a lot of architectural overkill. Too many simple tasks, such as classification, extraction, summarization, formatting, and lookup, are still being pushed through large general-purpose models when smaller specialized models or deterministic systems could handle them more cheaply and reliably. Long context windows also encourage “context stuffing” instead of careful retrieval and compression. Agents can silently burn tokens through planning, tool calls, retries, and sub-agent loops. Many companies are also repeatedly paying for the same work because they lack caching, reuse, and workflow memory.

Then there is the human side. For expert users, prompting can be productive because they know how to frame the problem, provide context, constrain the answer, and evaluate the result. For average users, it can become a costly trial-and-error loop: ask a vague question, get a generic or partially wrong answer, clarify, regenerate, verify, correct, and repeat. Every one of those steps burns tokens and time. Even when hallucinations are reduced, the need to review and validate outputs remains a hidden cost.

This is why raw AI adoption is the wrong metric. The more useful metric is cost-adjusted productivity: tokens per verified outcome, cost per resolved ticket, cost per shipped feature, cost per completed workflow, and human time saved after review and correction.

The next phase of AI should be less about maximizing usage and more about maximizing useful outcomes per dollar spent.

Cathie Campbell's avatar

Rationally reasoned, MG.

Marginal Gains's avatar

And someone has to pay for this subsidy, which comes to about $33/year. From the NY Times: Everywhere as part of the California State University system’s broader A.I. Initiative, introduced in February 2025. Anchored by a $16.9 mIn a deal with OpenAI, the initiative provides a total of 500,000 ChatGPT.edu licenses to be issued to all students, faculty, and administrators.

Oliver Meiklejohn's avatar

Worth flagging that the 82-cents-not-reaching-production figure comes from EntelligenceAI, a startup with an obvious interest in making AI spend look wasteful — treat it with some skepticism. The broader ROI concern is real, but the more interesting economic angle is who this problem actually hurts: Anthropic and OpenAI get paid on all tokens regardless of whether they ship to production, so unproductive consumption is still revenue for them. The productivity gap is fundamentally a buyer-side problem, not a supplier-side one — which helps explain why frontier model pricing has stayed sticky even as competition intensifies.

Alec Pritzos's avatar

The split the piece draws between tokenmaxxing and unproductive use is the useful part, because the two have opposite fixes. Leaderboard-driven waste is a governance problem you can meter and cap. Waste from spend that never finds a real job is a demand problem, and no budget alarm solves that one. The Uber comment landing as hard as it did suggests the market has started pricing the second question, not the first.

Stuart Miller's avatar

Marty/Alex, the token meter running in the model is one thing, and clearly visible. What’s more hidden (deliberately) is the token consumption in the rest of you agentic stack. As you engineer the total tool stack to execute on your agentic stack (harness), the vendors you choose will all have consumption metering buried in there.

In this future, every road is a toll road, but different vendors will own the toll booths depending on the architecture you chose. Worse still as you scale to production from pilot and take more trips down that toll road, the EZ-pass meter will spin faster and power users will be a liability that make the meter spin dizzingly out of control.

All to say, the good old business case rigor of old will become more apparently in demand and the abundant usage budgets of creativity will become reigned in by the bean counters.

We do not live in an AI world of boundless resources. The cost of energy, water, data center build, silicon is all racing way ahead of inflation due to supply and demand. And that means one thing, those costs pass on to the consumer, wherever they sit. Commercial, prosumer, end consumer.

DIDIER BOREL's avatar

I enjoy your podcast. I enjoyed the recent interviews with Costello, ex- Twitter and Brian Chermy - Anthropic. Concerning using agents, I would be very interested to know what security precautions you take, or Brian Chermy takes when you use an agent for example to book a flight or a hotel room. I suppose the agent needs you credit card info, or some ID info. what precautions do you take, or your guests, to protect against hacks identity thefts etc. ? Aside from that I am fan of your podcast. I appreciate as you say the 'nuanced and no hype conversations, without a 'sales' imperative. keep it up

Josh Bersin's avatar

Is the ServiceNow video sponsored? Or is it actual Research?