Skip to content

Your AI Token Bill Is About To Explode: The Leadership Playbook To Stop The 10,000-Agent Wall

Tony Wood
Tony Wood

Most organisations are treating AI usage like a feature decision, when it has quietly become a capacity and budget decision.

Teams build a proof of concept (POC). Everyone gets excited. The demo works. Then the organisation tries to scale it, and suddenly it is not a software problem anymore. It is a supply and budgeting problem.

Here’s the thing. We are moving from “a handful of chat features” to fleets of agentic workers.

An agentic worker is simply an AI system that can take actions on your behalf across tools and workflows. It can look things up, draft, decide, route, update systems, and keep going until the job is done.

That is powerful. It is also where costs and capacity can run away from you if you do not put governance in place early.

This post is my attempt to keep it simple and leadership-ready:

  • Use the smallest viable model for each job
  • Treat tokens like an operating budget, not a developer metric
  • Put Finance in the loop now, not after the first surprise invoice

linkedin_visualization-Feb-10-2026-11-33-47-9324-AM

Minimum Viable Model: Stop Overbuying Intelligence

The first question I want leaders to normalise is not “can we do it with AI?”

It is:

  • Do we need the biggest model for this task?
  • Can we use a smaller, cheaper model?
  • Can we run part of it locally, or with simpler automation, before calling a large model?

This is not about being stingy. It is about being deliberate.

If you buy “massive intelligence” by default, you also buy:

  • Massive context windows
  • Massive token burn
  • Massive variability in spend
  • Massive operational risk when usage spikes

A simple leadership move is to require a “minimum viable model” decision in every POC. Not after. At the start.

Compute Is Finite, Tokens Are The Meter

Leaders often assume AI scale is like software scale. Add more users, add more cloud spend, it mostly works.

Agentic scale is different because the constraint is not only money. It is also capacity.

If your organisation becomes dependent on AI workflows, and your usage is constrained by provider limits, congestion, pricing changes, or internal caps, your operations can stall.

Tokens give you a practical way to manage this because tokens are measurable. They are the closest thing we have to a universal unit across model usage.

Dion Wiggins put it bluntly:

"Tokens are the unit that matters. They collapse a messy stack (model choice, context length, prompt design, infra latency, GPU utilisation) into a measurable financial signal. If you cannot measure tokens, you cannot govern AI."

Once you accept that, a lot of decisions get easier.

You stop asking “which model is coolest?” and start asking:

  • What is the token cost per workflow?
  • What is the token cost per outcome?
  • What is the token cost per department?
  • What is the acceptable variance, and where is the kill switch?

Every Proof Of Concept Needs A Token Budget

Most POCs fail in a predictable way.

They prove the workflow can work. They do not prove it can run sustainably.

If you want to avoid the “great demo, impossible rollout” trap, your POC must answer three questions:

  • What is the token budget per task?
  • What is the token budget per day, week, and month at expected volume?
  • What happens when usage doubles, or when the agent loops?

This is not about perfect accuracy. It is about giving leaders a handle on unit economics before you scale.

Gaurav Chauhan captures the real-world use case here:

"This is one of the most common questions I’m asked — especially when teams are preparing budgets or sizing infrastructure for LLM use. Here's a practical method I use that may help others too: ... This method is great for early-stage planning and stakeholder discussions."

That is the standard you want.

Early-stage planning. Stakeholder discussions. Enough clarity to make a decision, and enough measurement to learn.

The Practical POC Upgrade (Leadership Version)

Add one page to your POC template:

  • Token budget per transaction (target and ceiling)
  • Token budget per workflow step (where the burn happens)
  • Expected monthly token use at adoption levels (low, medium, high)
  • Showback plan (who pays, who approves increases)
  • Failure modes (what happens when you hit limits)

If a team cannot fill this in, the POC is not ready to graduate.

The Scaling Wall: 10,000 Agentic Workers And Then Nothing

This is the part most leaders underestimate.

A few internal copilots are manageable, even with sloppy governance.

Thousands of agents are not.

The risk is not only spend. It is operational continuity.

At scale, small inefficiencies compound:

  • Long prompts become a permanent tax
  • Overuse of large models becomes normal
  • Retrieval that is not optimised becomes constant token drain
  • Agents that retry or loop become runaway processes
  • Teams copy patterns they do not understand, and multiply the waste

And then you hit the wall.

Not because the AI “stops working”.

Because the organisation hits a token cap, a budget cap, or a provider constraint, and the business realises too late that it built a critical workflow on a finite meter.

Brijesh Akbari’s phrasing is uncomfortably accurate:

"Your AI token bill doesn’t grow slowly. It explodes quietly… and one day you notice the invoice. The quickest way to reduce it isn’t “switch models”. It’s fixing the waste... If you want AI in production, token economics is part of architecture."

That last line is the leadership takeaway.

Token economics is part of architecture.

Not a nice-to-have.

Not a Finance clean-up exercise later.

Architecture.

Finance Owns This Next

IT and engineering will always be central to delivery. But ownership of the operating limit should sit with Finance, supported by FinOps (financial operations) practices.

Why?

Because token governance looks like every other scarce resource governance problem:

  • Allocation
  • Forecasting
  • Showback and chargeback
  • Variance management
  • Controls and auditability
  • Decision rights when demand exceeds supply

Finance is structurally designed for this.

If you keep it purely in IT, it will be managed like an infrastructure line item.

If you move it into Finance, it will be managed like an operating system for the business.

The best organisations will treat tokens like a utility budget with business rules:

  • Which workflows are mission critical?
  • Which teams get priority during peak periods?
  • Which outcomes justify premium model usage?
  • Which experiments get a sandbox with a hard cap?

That is not bureaucracy. That is how you keep production stable while still moving fast.

A Practical Governance Checklist Leaders Can Use This Week

If you want something you can apply immediately, use this.

1) Set An Organisation Token Limit Per Month

  • Define a hard monthly cap at org level
  • Keep a central reserve for incident response and critical ops
  • Decide what triggers a reforecast

2) Define A Token Budget Per Workflow

  • Assign a target token budget per workflow
  • Set a ceiling that triggers throttling or fallback behaviour
  • Require a “budget owner” per workflow (a named human)

3) Add A Token Cost Section To Every POC

  • Token budget per transaction
  • Estimated monthly burn at realistic volume
  • Assumptions written in plain English

4) Add Showback Dashboards

  • Department usage
  • Workflow usage
  • Top 10 token consumers
  • Trend lines and anomaly alerts

5) Create Kill-Switch Rules For Runaway Agents

  • Maximum retries
  • Maximum tool calls per job
  • Maximum tokens per job
  • Automatic downgrade to smaller model when limits are near

6) Put Finance In The Loop

  • Finance co-owns the operating limit and allocation
  • Procurement and Finance align on vendor terms and discount structures
  • Leadership reviews token performance like any other operating metric

What I Would Do In The Next 24 Hours

If you are a leader reading this, and you have even one agentic workflow in flight, here is a simple 24-hour plan.

  • Ask for a list of your current AI use cases in build, pilot, and production
  • For each one, ask: do we have a token budget per task and per month?
  • If not, pause scale-up and add measurement before expansion
  • Nominate a Finance owner for token governance (not as an observer, as an owner)
  • Require “minimum viable model” decisions in every roadmap review

No drama. No panic. Just basic operational discipline.

Call To Action

If you do one thing this week, make tokens visible.

Make them measurable. Make them budgeted. Make them owned.

Then ask yourself two questions:

  • What could you do in the next 24 hours to put a token budget into your top two AI workflows?
  • What could you do in the next weeks to turn that into a repeatable governance pattern, with Finance properly in the loop?

You do not need perfection. You need a model you can run, measure, and improve.


Links

Quotes

Share this post