The model doing your work changed. You probably missed it.
On 30 June, Anthropic released Claude Sonnet 5 and made it the default. Within a day it was the default model in Claude Code (version 2.1.197, shipped 1 July) and the default for Free and Pro users on Claude.ai. Both moves are documented in Anthropic's announcement and the Claude Code changelog. If you use either surface and did not go out of your way to pin a specific model, you are now running Sonnet 5. No prompt from you, no migration, no sign-off. The engine was swapped while the car was running.
What Anthropic actually shipped
The announcement calls Sonnet 5 "the most agentic Sonnet model yet": it can make plans, drive tools like browsers and terminals, and run multi-step tasks on its own. On the SWE-bench Pro coding benchmark it scores 63.2 per cent, against 69.2 per cent for the far pricier Opus 4.8, so the company positions it as close to Opus at a fraction of the cost. Anthropic also reports an overall lower rate of undesirable behaviours than its predecessor, Sonnet 4.6.
Keep the framing precise, because precision is what makes the claim useful. Sonnet 5 is the new default on two surfaces, Claude Code and Claude.ai Free and Pro. It is not the default on the API, and Anthropic's own model documentation still points enterprise and agentic API work at Opus 4.8, with Fable 5 remaining the top of the range. New default, not new frontier.
Three practical facts matter more than the leaderboard. It carries a native one million token context window on the Claude API, Amazon Bedrock, Google Cloud and Microsoft Foundry, with that window on by default and billed at standard rates. Launch pricing is US$2 per million input tokens and US$10 per million output tokens through 31 August, after which it settles at US$3 and US$15. And it is available almost everywhere at once: Claude.ai, Claude Code, the API as claude-sonnet-5, Max, Team and Enterprise plans, and the major clouds, with Google Vertex listed as coming soon.

Why the word "default" is the story
Three things moved at once, and the third is the one to watch. Sonnet 5 is more capable than Sonnet 4.6. It is cheaper, at least until the end of August. And its context window is now large enough to hold a small book. Any one of those is a headline. Together, and applied silently to the default, they change the system that a lot of people quietly rely on to draft, summarise and decide.
The word that matters is default. Almost nobody chooses a model deliberately. They open the tool and use whatever it opens with. So when the default changes, the behaviour of a workflow you validated last month changes with it, even though nothing in your own setup did. A benchmark score is not the point. The point is that a component of a process you trusted was replaced without a release note landing on your desk. This is the second time in a fortnight an Anthropic model has changed under its users, after Fable 5 returned from a suspension on 1 July, and it will not be the last. Model change is becoming routine, which is exactly why noticing it cannot be.
For regulated Australian work, this is not pedantry
If an AI model sits inside a workflow that supports a customer outcome, a claim, a compliance report or a determination, then swapping that model is exactly the kind of change your frameworks already tell you to manage. APRA's CPS 230 treats material changes to critical operations and their service providers as change management, not a background update. The Voluntary AI Safety Standard and APRA CPS 234 point the same way for third-party AI: know what you are running, and notice when it moves.
Picture the ordinary cases. A claims summary a case manager leans on before a decision. A monitoring narrative a compliance team files. A first-draft determination that a delegate reviews and signs. If a model quietly changed how any of those are produced, the control you approved is now running on something you have not tested, and the person relying on the output has no way to know. That is not a hypothetical risk to raise at the next committee. It is a live gap that opened on 30 June, in tools your people already had open.
The one million token window deserves its own line. A window that big is not a feature to celebrate, it is a bigger opening. It means far more can be pasted, uploaded or ingested into a single request by default, including things that should never be there. What enters a context window is a decision under the Australian Privacy Principles and, for anything that could become evidence, a records decision. The bigger window is capacity, not permission. A model that will happily read a million tokens will happily read the wrong million.

Map your exposure before you tune anything
The first move is not technical, it is an inventory. You cannot manage a default you have not located. Paste this into Claude, ChatGPT or an equivalent assistant and work through it with your own facts.
The output is not the deliverable. The list of unanswered questions it produces is, because each one marks a gap between what your organisation runs and what it knows it runs.
A worked example: the eval re-run
Here is what the discipline looks like end to end, with every identifying detail replaced by a placeholder.
The situation. A risk analyst at [ORGANISATION] uses Claude.ai Pro to produce first-draft monthly control-monitoring summaries from de-identified exception reports. The workflow was reviewed and accepted in May, on the previous default. On 1 July the default became Sonnet 5, so the analyst treats the workflow as unvalidated until re-tested.
The prompt. The analyst gathers six past exception reports, already de-identified, along with the accepted summaries the earlier model produced, and runs this comparison prompt:
What came back. Four of the six fresh outputs matched the accepted versions in substance. Two did not. In one, the new model compressed the exception commentary and dropped a caveat about incomplete data. In the other, it added a confident sentence describing a continuing downward trend in control failures that the input did not support.
What the human verified and decided. The analyst checked both flags against the source register. The dropped caveat mattered, and the trend claim was unsupported, so the summary template gained an explicit instruction to retain caveats and to make no trend claims beyond the reporting period. The team pinned the model version for that workflow, logged the change with the re-test evidence, and only then let the new default carry the work. The model drafted throughout. The analyst decided throughout. That boundary did not move.
Do this Monday
- List the surfaces. Open a blank page and write down every place your team touches Claude Code, Claude.ai on Free or Pro, or a Bedrock, Vertex or Foundry deployment. That is where the new default is already live.
- Run the exposure prompt. Paste the first prompt above into your assistant, fill in your tools, and keep the list of unanswered questions it surfaces.
- Pin where it matters. For any validated workflow, ask whoever owns the configuration to pin an explicit model version rather than floating on the default alias. If it cannot be pinned, record that as a limitation.
- Assemble your eval set. Pick five to ten real, de-identified tasks that represent the work you actually rely on, with previously accepted outputs where you have them.
- Run the comparison. Use the second prompt above, read every flag, and verify each one against source material yourself. Anthropic's benchmark is not your benchmark.
- Write the data-class rule. One line per workflow: what may and may not enter a context window that now accepts a million tokens.
- Log the change. Complete the register entry below, then diarise a monthly check, because this default will move again.

The register entry, ready to copy
A model swap inside a material workflow deserves a written trace. This is the minimum entry for your AI register or change log:
- Date the default changed, and the date your team noticed it
- Surface affected: Claude Code, Claude.ai plan, or cloud deployment
- Previous model and new model, with versions
- Pinned or floating after this entry, and who owns the setting
- Workflows affected, and the data class of each
- Eval re-run: tasks used, date, result, and any template changes made
- Residual gaps and who accepted them
- Name and role of the person signing the entry
If you cannot complete the first line, that is the finding. Your organisation cannot currently say when its defaults change.
Hype check
Be clear about what is proven and what is marketing. Close to Opus is a benchmark claim, 63.2 versus 69.2 on one coding test, not a promise about your task. The promotional price is real and genuinely lowers the cost of capable work, but it is temporary and reverts on 1 September. Most agentic Sonnet yet means it will attempt more on its own, which is useful, and is also precisely why an unmanaged default swap deserves a second look rather than a shrug. This is a good model, and for most work it is an upgrade. The story is not the leaderboard. The story is that it became the default under you.
The point to keep
None of this is a reason to avoid Sonnet 5. The discipline is not suspicion, it is noticing. The quiet swap of a default is still a change, and in regulated work changes get managed, not absorbed. The teams that treat 30 June as a change event will spend an hour this week and know exactly what they are running. Everyone else will find out later, usually at an inconvenient time.
TheAICommand. Intelligence, At Your Command.



