AI quota inflation is no token effort. It's baked in

Opinion Fans of the creative arts often find out where creators gather to talk among themselves, then sneak in to eavesdrop on what those masters of the art talk about. Golden insights, daring concepts, cutting-edge thinking? Not a bit. Gossip, if you're lucky. Travel miseries, if you're not. Mostly, they talk about money.

It's like that with LLMs, but instead of speaking about money directly, people are talking about tokens. A daily cadence in AI coder news – another feature has been AI'd, another "bug" fixed in subscription accounting, a change in behavior, all with one thing in common: TIBS, or token incremental burn syndrome. We may be at the start of TIBS, but to continue the metaphor of AI as pandemic, there is so very much more to come.

Tokens are the billable metric for LLM usage simply because they are easy to count – even if this seems to challenge those who count them. Throw a prompt into an LLM, and it recognizes its lexemes, a linguistic term dating from the 1930s for the units of meaning and modification. These get turned into their representations, tokens, and fed to the giant guess-what's-next machinery of the LLM. The result is a string of output tokens, converted for your pleasure to words or computer code or whatever. Count them going in, count them going out. It's not quite...

if ((ntokens_left -= (strlen(prompt) + strlen(slop))) <= 0) { printf("Cough up, sunshine\n");

... But close enough. On such a concept, the entire commercial edifice of AI hangs.

Basing costs on token consumption, whether it's for code suggestion, generation, or AI debugging, makes as much sense – less, even – than paying programmers per keystroke in and character out. That's even dafter than the lines of code per month metric for coder goodness, a concept so dumb it makes Juicero's backers look like Warren Buffett. There is no concept of usable work actually done, no sense that inefficiencies are rewarded, and no easy way to relate the price paid to the actual cost of production. But it's simple to understand and looks like any other prepayment limited use subscription model. Oddly, nobody seems minded to improve on this.

There are virtually no other metrics. You can measure tokens per second for a benchmark test case. You can measure the ratio of tokens out to tokens in, although it's not clearer why. At least with arguably comparable service models like cloud computing, you know what you're getting when you buy so much compute, memory, storage, and connectivity. You still have to watch automation or mismanagement, and Bill Shock still works at AWS, but you have a chance of linking results to costs. Good luck with LLM-based services, let alone AI agents.

Add this lack of value metrics to the ridiculous retrusn on investments the AI industry needs to show to make good on its promises, and we have a recipe for mounting TIBS inflation.

Vendors have an addiction to making everything a subscription, then frog-boiling subscribers, especially when they can incorporate an effective monopoly. Imagine the lock-in where an org has deskilled its code production humans and become reliant on a particular AI code gen chain.

Migration is the hardest word to say, even when the case for it is backed by a telephone book of metrics. You can look at cost per instance or cost per terabyte, and what you'll need to keep the business model in good shape, and perhaps you won't be entirely wrong. How that'll work with AI-heavy CI/CD is a great question, which you might care to let someone else answer first.

The vendor and infrastructure side of the industry has always seen cycles of lock-in leading to feudalism, leading to revolution, leading to the revolutionaries becoming rentier landlords themselves. Fancy 70 years of enterprise tech in 60 words? Hold on tight.

Rented mainframes to on-prem minicomputers, minis to autonomous on-desk micros, at least until Ethernet started building virtual minis again. Proprietary to open source; open source and ubiquitous, effectively unlimited compute building hyperscalers running closed services; hyperscale architectures powering AI models with per-user quotas that take us back to off-prem mainframes; closed services and user quotas again.

These cycles have been powered by Moore's Law constantly changing the economics of IT to discourage inertia in an industry that desperately craves it.

Moore's Law is over – really, it is. Areal density is now volumetric, so that instead of getting cheaper, smaller, lower power, and more democratic, silicon technology is bloating in price, size, power greed, and feudalism. AI is the only animal that can drive the market, and to stay that way it needs to feed on you. TIBS is where it's at. Work your master's land, peasants.

If AI does result in deskilling the tech workforce and recapturing the engine of IT creation, it will be as if the mainframe era came at the end of semiconductor evolution, rather than the beginning. All that can be said about the evolutionary driver that will move things on is that it has yet to be invented, despite fifty years of looking.

The AI industry builds out in gigawatts and charges in tokens. It sets the cost and scents a future where profound lock-in lets it set the rules forever. The rest of us should remember the words of the original sentient serial-ported mainframe, the WOPR from WarGames: "The only winning move is not to play." It should know. ®