Model Routing Cuts AI Bills. It Also Moves Your Data., practitioner guidance from TheAICommand
← AI News
AI Strategy

Model Routing Cuts AI Bills. It Also Moves Your Data.

The enterprise AI story has shifted from which model is best to which one you can afford to keep using, and buyers are moving from tokenmaxxing to model routing. The instinct is right. But for Australian regulated work a cheaper model is usually a different provider in a different place, so every routing rule is also a Privacy Act and APRA data-flow decision.

·TheAICommand

Quick answer

Enterprises are shifting from tokenmaxxing, defaulting every task to the most powerful model, to model routing, matching the model to the task, as boards crack down on AI bills. For Australian regulated work a cheaper model is usually a different provider in a different place, so every routing rule is also a Privacy Act and APRA data-flow decision.

Most companies still send every AI task to their most expensive model. That habit is ending, and how it ends matters more than the savings.

This week the enterprise AI story stopped being which model is best and became which model you can afford to keep using. CNBC reported on 26 June that buyers are moving away from what the market has been calling "tokenmaxxing", using the most powerful model for everything regardless of cost, toward what it now calls "value-maxxing", matching the spend to the job (OpenAI and Anthropic face new AI reality as users shift from 'tokenmaxxing' to efficiency, CNBC, 26 June 2026). The instinct is right. The way most organisations will act on it is not, because the cheaper model is usually a different model in a different place, and almost nobody is treating that as the decision it is.

What actually happened

The figure framing the shift comes from inside the industry, not from CNBC's own measurement. Glean chief executive Arvind Jain told CNBC that roughly 95 per cent of enterprise AI usage still runs on frontier models, even for simple tasks a cheaper model would handle just as well. Treat it as directional rather than audited, but nobody in the coverage disputes the direction: most organisations are paying frontier rates for work that does not need frontier capability.

Chief financial officers and boards have noticed. In an earlier piece, CNBC described "a new spending discipline" taking hold "as chief financial officers and boards start cracking down on inefficient artificial intelligence spending", and named the fix: model routing, "a tool that matches the job to the model, sending hard problems to the expensive frontier models and easy ones to cheaper, faster alternatives" (Model routing is a fix for AI overspending, CNBC, 5 June 2026).

The discontent is on the record. Palantir's chief executive told CNBC that businesses are "unhappy" with the frontier labs and feel the labs only care about tokenmaxxing (CNBC, 10 June 2026). The expense-management firm Ramp reached a 44 billion dollar valuation in early June on the back of companies trying to rein in AI spend, and an AI price war is now seen as a real threat to the public listings both OpenAI and Anthropic are preparing (CNBC, 4 and 23 June 2026). On CNBC's read, pricing power is shifting from the companies selling premium AI to the companies buying it.

What it actually means

For two years the default was simple: point everything at the most capable model and move on. That was never an engineering decision. It was a habit, formed when AI was new, budgets were loose, and nobody wanted to be caught using the second-best model. The correction now under way is the market admitting that most work does not need the frontier. Summarising an email and running a multi-step legal analysis are not the same job. They should not cost the same or run on the same model.

Picture a claims or compliance team. Classifying an incoming document by type, pulling out a date, drafting a routine acknowledgement: a small, cheap model clears all of it. Weighing conflicting evidence, reasoning through an exclusion, drafting a decision that has to hold up under review: that is frontier work. Routing recognises the difference and stops paying premium rates for the clerical half of the day.

A deep navy split scene, on the left many small task shapes pouring into one oversized glowing gold funnel, on the right the same shapes sorted along gold lines into several differently sized model blocks
Tokenmaxxing sends everything to one frontier model. Routing matches the model to the task.

That is healthy. But almost all of the coverage frames it purely as a finance story, and that is exactly where it goes wrong for regulated work.

The Australian angle

When you route a task to a cheaper model, you are usually not routing it to a cheaper version of the same model. You are routing it to a different model, from a different provider, in a different place, with a different data-handling posture. CNBC's reporting says it plainly: companies are steering easy, high-volume work to cheaper open-source models out of China or elsewhere.

For an Australian regulated business, that makes every routing rule a data-flow decision. Which model handles a task determines where the data in that task goes, who can see it, whether it is retained, and whether it trains a future model. Under the Privacy Act that is a use and disclosure question under APP 6 and, the moment the cheaper model sits offshore, a cross-border disclosure question under APP 8. For APRA-regulated entities, the model behind a task is a material service consideration under CPS 234 and CPS 230. A routing layer tuned only for cost will happily send a customer's personal information to whatever model is cheapest this week, and it will never ask whether that model is allowed to see it.

So right-sizing is the right instinct. Doing it blind is the new risk. The fix is a small amount of process applied before the routing decision, not after the incident.

A left to right flow of four nodes on deep navy, a task list mark, a shield for the data class, a small model block and a ledger line, joined by a single flowing line
The order matters: attach the data class before you attach the model, then log the pairing.

Two prompts that do the triage

You do not need a routing product to start. You need your task list and twenty minutes with ChatGPT, Claude or equivalent. The first prompt turns a raw task list into a draft routing table with the data question built in.

Prompt
You are helping me right-size the AI models my team uses. I will paste a
list of our recurring AI tasks. For each task:

1. Classify the data it touches as public, internal, personal or sensitive.
   If the description does not tell you, write UNKNOWN and list the question
   I need to answer.
2. Say whether the task is clerical (classification, extraction, formatting,
   routine drafting) or judgement-heavy (weighing evidence, multi-step
   reasoning, outputs that face review).
3. Recommend a model tier: smallest viable, mid tier, or frontier. Never
   recommend a cheaper tier if the data class is personal or sensitive and
   the cheaper option is not on my approved list.
4. Flag every task where moving to a cheaper model would mean a new provider
   or a new region.

Return one line per task: task, data class, task type, recommended tier,
flags. Do not invent tasks I did not list, and do not soften the flags.

My team: [TEAM_AND_SECTOR]
Our approved models and regions: [APPROVED_MODEL_LIST]
Our tasks: [PASTE_TASK_LIST]

The second prompt is for teams that already have a routing layer or a model allow-list, whether home-grown or built into a platform. It reviews the rules the way an auditor would.

Prompt
Act as a reviewer of my AI model routing rules. I will paste our routing
rules or model allow-list. Test them and report failures only:

1. Does every rule name the data classes it applies to, or does it route on
   cost and complexity alone?
2. Can any rule send personal or sensitive information to a provider or
   region outside this approved set: [APPROVED_PROVIDERS_AND_REGIONS]?
3. Is there a logged record of which model handled which request, and would
   anyone notice if the router changed providers?
4. What happens when the preferred model is unavailable? Flag any automatic
   failover to a model outside the approved set.

For each failure, give me the rule, the risk in one sentence, and the
smallest change that fixes it.

Our rules: [PASTE_ROUTING_RULES_OR_ALLOW_LIST]

Both prompts draft. Neither decides. The data classifications and the final routing calls stay with a person who can be asked to justify them.

A worked example

Situation: the compliance team lead at [ORGANISATION] pulls last month's AI usage and finds all twelve of the team's recurring AI tasks running on the frontier tier, from classifying incoming correspondence to drafting breach-assessment reasoning. Finance has asked every team to justify its AI spend line by line.

Prompt used: the lead pastes the twelve tasks into the first prompt above, with the team described as [TEAMANDSECTOR] and the approved model list taken from the organisation's AI policy.

What came back: a draft table classing eight tasks as clerical and routable to a smaller tier, and four as judgement-heavy frontier work. Two tasks came back marked UNKNOWN on data class because the task descriptions were vague. One line was wrong in a useful way: the model marked summarising incident report extracts as routable to the cheapest available tier, which in this organisation's stack meant an offshore-hosted model.

What the human verified and decided: the lead confirmed the eight clerical calls but overrode the incident-report line, because those extracts contain personal information about identifiable staff, which makes the offshore route a cross-border disclosure question under APP 8 that the team is not set up to clear. That task stayed on the approved onshore tier. The two UNKNOWNs went back to their task owners for answers before any routing change. Ten register lines changed, two did not, and the projected spend on those tasks fell without a single item of personal information moving to a new provider.

The override is the point of the example. The model did the sorting. The human caught the one line where cheap was wrong.

The register line

One line per task in your AI register carries your cost rationale and your governance evidence at once. Each line records:

  • Task: what the AI does for you, in one line
  • Data class: public, internal, personal or sensitive
  • Model and tier: the named model, or the approved tier if a router chooses within it
  • Provider and region: where the task actually runs, not where the contract was signed
  • Why acceptable: one sentence connecting the data class to the model choice
  • Routing: fixed model or automated router, and who approved the allowed routes
  • Review date: when this line gets re-checked, because prices and routes move

If you cannot fill a line, that gap is the finding. A task with no known data class or no known region is exactly the task a cost-only router will move somewhere you would not have chosen.

The hype check

Two things are being oversold. First, "tokenmaxxing is dead" is premature. On the industry estimate CNBC reported, about 95 per cent of enterprise usage still runs on frontier models, so this is a turn in sentiment, not a finished migration. Second, "route everything cheap" is its own trap. A routing layer optimised only for price will quietly degrade quality on the tasks that matter, and quietly move data to wherever is cheapest. The discipline is not spend less. It is spend deliberately, task by task, with the data question attached to every routing rule.

A single large glowing figure of ninety five per cent at the centre of a deep navy field with a soft halo and small model shapes around it
One industry estimate reported by CNBC: about 95 per cent of enterprise AI usage still runs on frontier models.

What to do on Monday

  1. Open your AI register, or a blank page if you do not have one, and list every recurring AI task your team ran last month. Pull the list from usage reports or chat histories rather than memory, because the forgotten tasks are usually the ungoverned ones.
  2. Paste the list into the first prompt above in ChatGPT, Claude or equivalent, with your team description and approved model list filled in.
  3. Verify every data class yourself. The model drafts, you decide. Anything marked UNKNOWN goes back to the task owner before any routing change.
  4. Set an allow-list of acceptable models for each data class. Personal and sensitive data does not move to an unvetted or offshore model to save a few cents, no matter what the triage suggests.
  5. If a platform routes models for you, and Microsoft, OpenAI and others now build routing in, ask your vendor or platform team two questions: which providers and regions can it route to, and where is the log of which model handled which request. Then run the second prompt over whatever rules they show you.
  6. Write one register line per task using the template above, including the review date.
  7. Book a re-check for the first Monday of next month. Model prices and routing defaults are moving monthly, and a routing table with no review date is a snapshot, not a control.

The market is right that you are probably overspending. It is just not telling you the other half. The cheapest model is not free if it is the wrong place for your data.

TheAICommand. Intelligence, At Your Command.

Frequently asked questions

What is model routing, and why is it in the news?
Model routing sends each task to the model that fits it, hard problems to expensive frontier models and easy, high-volume work to cheaper, faster alternatives. CNBC reported in June 2026 that chief financial officers and boards are cracking down on AI bills, so buyers are moving away from defaulting every task to the most powerful model, a habit the market nicknamed tokenmaxxing.
Is it true that 95 per cent of enterprise AI still runs on frontier models?
That figure is an industry estimate reported by CNBC and attributed to Glean chief executive Arvind Jain, not a number CNBC measured itself. Treat it as directional. Most enterprise usage still defaults to frontier models even for simple tasks, which is why finance teams see room to cut, and why the shift to routing is a turn in sentiment rather than a finished migration.
Why is model routing a governance issue and not just a cost one?
Because a cheaper model is usually a different model, from a different provider, in a different place, with different data handling. CNBC's reporting notes companies steering high-volume work to cheaper open-source models out of China or elsewhere. The model behind a task decides where the data goes, who can see it, and whether it is retained or used for training.
Which Australian rules apply to where a task is routed?
Under the Privacy Act, which model handles personal information is a use and disclosure question under APP 6, and once the model sits offshore it becomes a cross-border disclosure question under APP 8. For APRA-regulated entities, a material model provider also sits inside CPS 234 information security and CPS 230 service provider obligations. A routing layer tuned only for price satisfies none of that.
What should a team do first?
List your recurring AI tasks and attach a data class to each before you attach a model. Set an allow-list of acceptable models per class, ask the smallest, cheapest model that clears each task reliably, and govern any automated routing layer like a control. Keep one register line per task recording the task, model, data class and why that pairing is acceptable.

Tags

AI strategyModel routingEnterprise AIAI costOpenAIAnthropicAI governancePrivacy Act
← Back to AI News