The Token Reckoning is Here and It’s Not What You Think
Token waste in service of productive use, not waste for the sake of waste, could be the biggest problem in AI right now.
In June 2025, OpenAI CEO Sam Altman wrote that “Intelligence too cheap to meter is well within grasp,” predicting that as “datacenter production gets automated, the cost of intelligence should eventually converge to near the cost of electricity.”
Exactly a year later, AI costs are starting to look less like cheap intelligence and more like an AC bill in the middle of a heat wave. Stories of companies blowing through their annual token budgets in months abound, and at least some companies are rethinking (or cancelling) major AI initiatives.
But while recent headlines about ‘tokenmaxxing’ and runaway budgets have caused concerns that AI’s momentum has been inflated, the panic might be misplaced.
Token waste in service of productive use, not waste for the sake of waste, could be the biggest problem in AI right now. For every $1 in AI spend, 82 cents never make it to production, according to the startup EntelligenceAI, which estimates only 18 cents turn into shipped product, with the rest going to bug fixes, rewriting, or reworking code, and review processes.
If even directionally accurate, EntelligenceAI’s numbers are a major warning sign for the AI boom. Tokenmaxxing can be fixed by removing leaderboards or monitoring budgets closely. Unproductive uses of the technology can only be solved by finding productive uses.
This is why Uber COO Andrew Macdonald saying last week that it was tough to justify all the AI spending, because it wasn’t leading directly to productivity, went viral. If Uber’s experience is representative, that would signal an era of slowing growth ahead.
To be sure, negative AI stories seem to travel further than positive ones, and many companies are finding productive uses of the technology. So some caution is warranted. But ultimately, the industry will have to sort out the ROI question or risk a significant comedown.
The Right Way To Build AI Agents — With NVIDIA’s Adel El Hallak and ServiceNow’s Joe Davis (Sponsor)
Joe Davis is the EVP of AI Engineering, and Delivery at ServiceNow. AdelEl Hallak is the VP of Product Management, Agentic AI at NVIDIA. Both join for an episode on the inner workings of building AI agents. Tune in to hear how both are working together to make autonomous AI agents safer and governable, how ServiceNow's L1-AI-IT specialist has automated 90% of internal support tickets, and what the next few years hold for agents. Hit play for a behind-the-scenes look at what it actually takes to deploy AI agents in the enterprise.
The Intelligence Report
Microsoft Build is this week on June 2nd and 3rd and will likely include a range of new AI-related news and other updates for hardware and software. Expected news includes new homegrown AI models and updates for making Copilot a superapp that could help it compete with ChatGPT, Claude, and Gemini.
In a speech today at a WAN-IFRA conference, New York Times publisher A.G. Sulzberger escalated the media industry’s fight with AI companies, warning that OpenAI, Google, Meta and Perplexity are undermining journalism by repackaging publishers’ work, siphoning off traffic and revenue, and making it harder for news organizations to fund original reporting.
Also today, Florida’s attorney general filed a lawsuit against OpenAI and Sam Altman over alleged chatbot harms, claiming ChatGPT is too dangerous and addictive for minors, especially without adequate safeguards. As Politico notes, it’s the first time OpenAI has been sued in this way by a specific state.
Yesterday, NVIDIA announced a new superchip called the RTX Spark, which will power Windows PCs in the era of personal AI agents. It also puts NVIDIA in more direct competition with Intel, Qualcomm, AMD, and Apple, which also power laptops, desktops and tablets across the PC market.
Last week, OpenAI published a Frontier Governance Framework about its safety and security practices and how they align with regulations like California’s new frontier AI law and the EU AI Act’s GPAI Code of Practice. The OpenAI Foundation also committed $250 million to help workers, communities and economies navigate AI-driven labor disruption.
Join The Big Technology AI Summit! June 18 at The Commonwealth Club
Learn more at: summit.bigtechnology.com
Anthropic’s funding, Opus 4.8 and a post-Vatican halo effect
Even if AI never builds a Tower of Babel, Anthropic’s tower of cash might still reach Heaven. Days after co-founder Chris Olah was at the Vatican with Pope Leo XIV, the company made several big financial and product announcements:
Anthropic raised a $65 billion Series H round at a $965 billion post-money valuation, which the company expects will help expand compute, advance safety and interpretability research and scale products and partnerships. It also said annualized run-rate revenue passed $48 billion earlier in May.
Anthropic also launched Claude Opus 4.8, featuring improvements in coding, agentic tasks, financial analysis, writing and knowledge work. It claims Opus 4.8 is now three times cheaper and 2.5x faster in fast mode.
It also claims Opus 4.8 has lower rates of misaligned behaviors, with rates now similar to the Claude Mythos preview and much lower than Opus 4.7 and Sonnet 4.6.
Early testimonials include clients like Shopify, Cursor, Harvey, BrowserBase, Bridgewater, Thomson Reuters and Databricks. Early reception has been positive, with Every’s Dan Shipper joking that Opus 4.8 “should’ve rounded it up to Opus 5,” but he also noted OpenAI’s Codex remains a stronger daily coding harness.
Anthropic also announced new dynamic workflows in Claude Code that allow Claude to write orchestration scripts that run tens to hundreds of parallel sub-agents in a single session. Examples of what it helps include conducting bug hunts, migration and stress testing.
Anthropic also filed to go public on Monday.
Big Technology Podcast Friday Edition: Warning Signs For The AI Boom, Anthropic Passes OpenAI, Robinhood’s AI Trading
Ranjan Roy from Margins is back for our weekly discussion of the latest tech news. We cover: 1) Companies are reconsidering their AI spend after token consumption explodes 2) Is this a widespread issue or a big deal made out of a few companies? 3) The bigger problem: only 18% of tokens are spent on things that ship. 4) Are investment decisions being made due to unrestrained tokenmaxxing? 5) The circular investment problem is real 6) A look at the memory chip boom 7) Anthropic passes OpenAI as the world’s most valuable startup 8) Robinhood let’s your favorite chatbot trade for you 9) Should you connect your gmail to ChatGPT? 10) Would you get your house cleaned for free if the cleaner videotaped it for training data?
You can listen on Apple Podcasts, Spotify, or your podcast app of choice







The market spent two years celebrating token growth. Now the question is whether token consumption represents revenue creation or cost creation. As CFOs shift their focus from how many tokens are being used to what economic value those tokens actually generate, the debate will no longer be about whether AI works but whether the return on AI exceeds the cost of AI. Assuming token consumption continues growing faster than economic value creation, where do you think the market will first see the reckoning: enterprise budgets, AI valuations, or infrastructure spending?
Even though the 82% waste statistic is striking, I do not think it is surprising. I am seeing that most users struggle to use the product effectively. More about it is in the following paragraphs. It reflects a structural mismatch between how AI is being built, sold, measured, and actually used.
The issue is not simply “too many tokens.” There are too many tokens per unit of useful work. High token usage can be completely defensible if it leads to faster product development, resolved tickets, better decisions, lower support costs, or measurable productivity gains. The problem begins when token growth reflects re-prompting, hallucination correction, bloated context windows, unnecessary agent loops, or users struggling to ask the right question.
One under-discussed issue is incentive alignment. Frontier model companies and GPU providers benefit from more compute demand, more token consumption, larger models, longer context windows, and greater infrastructure buildout. Customers, however, benefit from reliable task completion at the lowest reasonable cost. In that sense, much of AI pricing behaves more like a cost-plus contract than a firm fixed-price contract. Vendors monetize the compute used to reach an answer, while customers care about the answer or the workflow being completed. That shifts the risk of inefficiency, retries, hallucination correction, bloated prompts, excessive tool calls, and agent loops onto the customer. Until pricing and product design shift from token volume toward task success, cost per workflow, or outcome-based value, it will be easy to confuse consumption with productivity.
Another factor is that the early subsidy phase may be coming to an end. A lot of AI adoption was helped by strategic underpricing, venture funding, cloud credits, promotional pricing, and vendors absorbing infrastructure costs to drive usage. As those subsidies shrink, the true cost of inefficient token consumption becomes harder to hide. At the same time, newer and more capable models do not always mean cheaper outcomes. They may use more tokens through longer answers, larger context windows, hidden reasoning, tool calls, or agentic workflows. So even if the cost per token falls, the cost per completed task can still rise.
There is also a lot of architectural overkill. Too many simple tasks, such as classification, extraction, summarization, formatting, and lookup, are still being pushed through large general-purpose models when smaller specialized models or deterministic systems could handle them more cheaply and reliably. Long context windows also encourage “context stuffing” instead of careful retrieval and compression. Agents can silently burn tokens through planning, tool calls, retries, and sub-agent loops. Many companies are also repeatedly paying for the same work because they lack caching, reuse, and workflow memory.
Then there is the human side. For expert users, prompting can be productive because they know how to frame the problem, provide context, constrain the answer, and evaluate the result. For average users, it can become a costly trial-and-error loop: ask a vague question, get a generic or partially wrong answer, clarify, regenerate, verify, correct, and repeat. Every one of those steps burns tokens and time. Even when hallucinations are reduced, the need to review and validate outputs remains a hidden cost.
This is why raw AI adoption is the wrong metric. The more useful metric is cost-adjusted productivity: tokens per verified outcome, cost per resolved ticket, cost per shipped feature, cost per completed workflow, and human time saved after review and correction.
The next phase of AI should be less about maximizing usage and more about maximizing useful outcomes per dollar spent.