Most organisations are treating AI usage like a feature decision, when it has quietly become a capacity and budget decision.
Teams build a proof of concept (POC). Everyone gets excited. The demo works. Then the organisation tries to scale it, and suddenly it is not a software problem anymore. It is a supply and budgeting problem.
Here’s the thing. We are moving from “a handful of chat features” to fleets of agentic workers.
An agentic worker is simply an AI system that can take actions on your behalf across tools and workflows. It can look things up, draft, decide, route, update systems, and keep going until the job is done.
That is powerful. It is also where costs and capacity can run away from you if you do not put governance in place early.
This post is my attempt to keep it simple and leadership-ready:
The first question I want leaders to normalise is not “can we do it with AI?”
It is:
This is not about being stingy. It is about being deliberate.
If you buy “massive intelligence” by default, you also buy:
A simple leadership move is to require a “minimum viable model” decision in every POC. Not after. At the start.
Leaders often assume AI scale is like software scale. Add more users, add more cloud spend, it mostly works.
Agentic scale is different because the constraint is not only money. It is also capacity.
If your organisation becomes dependent on AI workflows, and your usage is constrained by provider limits, congestion, pricing changes, or internal caps, your operations can stall.
Tokens give you a practical way to manage this because tokens are measurable. They are the closest thing we have to a universal unit across model usage.
Dion Wiggins put it bluntly:
"Tokens are the unit that matters. They collapse a messy stack (model choice, context length, prompt design, infra latency, GPU utilisation) into a measurable financial signal. If you cannot measure tokens, you cannot govern AI."
Once you accept that, a lot of decisions get easier.
You stop asking “which model is coolest?” and start asking:
Most POCs fail in a predictable way.
They prove the workflow can work. They do not prove it can run sustainably.
If you want to avoid the “great demo, impossible rollout” trap, your POC must answer three questions:
This is not about perfect accuracy. It is about giving leaders a handle on unit economics before you scale.
Gaurav Chauhan captures the real-world use case here:
"This is one of the most common questions I’m asked — especially when teams are preparing budgets or sizing infrastructure for LLM use. Here's a practical method I use that may help others too: ... This method is great for early-stage planning and stakeholder discussions."
That is the standard you want.
Early-stage planning. Stakeholder discussions. Enough clarity to make a decision, and enough measurement to learn.
Add one page to your POC template:
If a team cannot fill this in, the POC is not ready to graduate.
This is the part most leaders underestimate.
A few internal copilots are manageable, even with sloppy governance.
Thousands of agents are not.
The risk is not only spend. It is operational continuity.
At scale, small inefficiencies compound:
And then you hit the wall.
Not because the AI “stops working”.
Because the organisation hits a token cap, a budget cap, or a provider constraint, and the business realises too late that it built a critical workflow on a finite meter.
Brijesh Akbari’s phrasing is uncomfortably accurate:
"Your AI token bill doesn’t grow slowly. It explodes quietly… and one day you notice the invoice. The quickest way to reduce it isn’t “switch models”. It’s fixing the waste... If you want AI in production, token economics is part of architecture."
That last line is the leadership takeaway.
Token economics is part of architecture.
Not a nice-to-have.
Not a Finance clean-up exercise later.
Architecture.
IT and engineering will always be central to delivery. But ownership of the operating limit should sit with Finance, supported by FinOps (financial operations) practices.
Why?
Because token governance looks like every other scarce resource governance problem:
Finance is structurally designed for this.
If you keep it purely in IT, it will be managed like an infrastructure line item.
If you move it into Finance, it will be managed like an operating system for the business.
The best organisations will treat tokens like a utility budget with business rules:
That is not bureaucracy. That is how you keep production stable while still moving fast.
If you want something you can apply immediately, use this.
If you are a leader reading this, and you have even one agentic workflow in flight, here is a simple 24-hour plan.
No drama. No panic. Just basic operational discipline.
If you do one thing this week, make tokens visible.
Make them measurable. Make them budgeted. Make them owned.
Then ask yourself two questions:
You do not need perfection. You need a model you can run, measure, and improve.