Discussion about this post

User's avatar
Keith Amaral's avatar

The market spent two years celebrating token growth. Now the question is whether token consumption represents revenue creation or cost creation. As CFOs shift their focus from how many tokens are being used to what economic value those tokens actually generate, the debate will no longer be about whether AI works but whether the return on AI exceeds the cost of AI. Assuming token consumption continues growing faster than economic value creation, where do you think the market will first see the reckoning: enterprise budgets, AI valuations, or infrastructure spending?

Marginal Gains's avatar

Even though the 82% waste statistic is striking, I do not think it is surprising. I am seeing that most users struggle to use the product effectively. More about it is in the following paragraphs. It reflects a structural mismatch between how AI is being built, sold, measured, and actually used.

The issue is not simply “too many tokens.” There are too many tokens per unit of useful work. High token usage can be completely defensible if it leads to faster product development, resolved tickets, better decisions, lower support costs, or measurable productivity gains. The problem begins when token growth reflects re-prompting, hallucination correction, bloated context windows, unnecessary agent loops, or users struggling to ask the right question.

One under-discussed issue is incentive alignment. Frontier model companies and GPU providers benefit from more compute demand, more token consumption, larger models, longer context windows, and greater infrastructure buildout. Customers, however, benefit from reliable task completion at the lowest reasonable cost. In that sense, much of AI pricing behaves more like a cost-plus contract than a firm fixed-price contract. Vendors monetize the compute used to reach an answer, while customers care about the answer or the workflow being completed. That shifts the risk of inefficiency, retries, hallucination correction, bloated prompts, excessive tool calls, and agent loops onto the customer. Until pricing and product design shift from token volume toward task success, cost per workflow, or outcome-based value, it will be easy to confuse consumption with productivity.

Another factor is that the early subsidy phase may be coming to an end. A lot of AI adoption was helped by strategic underpricing, venture funding, cloud credits, promotional pricing, and vendors absorbing infrastructure costs to drive usage. As those subsidies shrink, the true cost of inefficient token consumption becomes harder to hide. At the same time, newer and more capable models do not always mean cheaper outcomes. They may use more tokens through longer answers, larger context windows, hidden reasoning, tool calls, or agentic workflows. So even if the cost per token falls, the cost per completed task can still rise.

There is also a lot of architectural overkill. Too many simple tasks, such as classification, extraction, summarization, formatting, and lookup, are still being pushed through large general-purpose models when smaller specialized models or deterministic systems could handle them more cheaply and reliably. Long context windows also encourage “context stuffing” instead of careful retrieval and compression. Agents can silently burn tokens through planning, tool calls, retries, and sub-agent loops. Many companies are also repeatedly paying for the same work because they lack caching, reuse, and workflow memory.

Then there is the human side. For expert users, prompting can be productive because they know how to frame the problem, provide context, constrain the answer, and evaluate the result. For average users, it can become a costly trial-and-error loop: ask a vague question, get a generic or partially wrong answer, clarify, regenerate, verify, correct, and repeat. Every one of those steps burns tokens and time. Even when hallucinations are reduced, the need to review and validate outputs remains a hidden cost.

This is why raw AI adoption is the wrong metric. The more useful metric is cost-adjusted productivity: tokens per verified outcome, cost per resolved ticket, cost per shipped feature, cost per completed workflow, and human time saved after review and correction.

The next phase of AI should be less about maximizing usage and more about maximizing useful outcomes per dollar spent.

No posts

Ready for more?