The models under Copilot just changed. Nobody signed anything.
At Build 2026 in San Francisco on 2 and 3 June, Microsoft launched seven in-house models under the MAI family name and confirmed they are being substituted into GitHub Copilot and the Microsoft 365 stack (Microsoft AI). For the people who govern, budget for and depend on that stack, the interesting part is not the benchmark slide. It is that the vendor supplying most Australian enterprises' productivity layer is changing what sits underneath it, and answering procurement questions nobody had formally asked yet.
What happened
The Microsoft AI Superintelligence Team, led by Mustafa Suleyman, shipped seven models trained entirely in-house: MAI-Thinking-1, MAI-Code-1-Flash, MAI-Image-2.5, MAI-Image-2.5-Flash, MAI-Transcribe-1.5, MAI-Voice-2 and MAI-Voice-2-Flash (Microsoft AI). The launch post stresses that none of the seven were distilled from third-party models: "we don't distill from other labs and we don't rely on opaque data". The training corpus is described as commercially licensed, with "clean, enterprise-grade data lineage" as the headline selling point.
The pair that matters most for daily work is the reasoning and coding duo. MAI-Thinking-1 is a medium-weight reasoning model with 35 billion active parameters and a 256,000-token context window, per Build coverage (Enterprise DNA). Microsoft says it matches leading models on key software engineering benchmarks and was "preferred to Sonnet 4.6 in our blind human side-by-side evaluations". MAI-Code-1-Flash runs 5 billion active parameters, scores a reported 51 per cent on SWE-Bench Pro, is positioned as comparable to Haiku but cheaper, and is already wired into GitHub Copilot and VS Code (Windows Forum).
The rest of the family lands across the productivity stack. MAI-Image-2.5 handles text-to-image and image editing, is live in PowerPoint, rolling out on OneDrive and landing on Azure AI Foundry, with Microsoft claiming it surpasses Nano Banana Pro's Arena score (Microsoft AI). MAI-Transcribe-1.5 claims state-of-the-art transcription accuracy with domain-specific terminology across 43 languages at five times the speed of competing models. MAI-Voice-2 generates natural speech across 15 languages and can adapt to a voice from a short sample (Microsoft).
Distribution is unusually open for Microsoft. The models ship on Azure AI Foundry plus the third-party platforms OpenRouter, Fireworks AI and Baseten, and for the first time developers can tune the model weights themselves (Microsoft AI). The enterprise version of that capability is Frontier Tuning, which applies reinforcement learning within compliance boundaries so an organisation can shape model behaviour on proprietary workflows without data leaving its environment. Microsoft's worked example: an MAI model tuned for Excel "matches GPT 5.4 while being up to 10x more efficient".
Two more facts complete the picture. GitHub Copilot moved to usage-based token billing, called AI Credits, on 1 June, the same week as the model swap (Windows Forum). And the strategic frame is explicit: CNBC reports the launch is about lessening Microsoft's reliance on OpenAI and lowering costs for developers, two weeks after Google's own model push at I/O. Microsoft also published safety and technical reports alongside the launch (Microsoft AI).
What it actually means
Strip the launch theatre and this is a supply-chain event. Most organisations never chose Microsoft as a model vendor. They chose it as a productivity vendor a decade ago, and the models arrived later inside Copilot. When Microsoft substitutes its own MAI models for OpenAI's underneath that surface, the product name on the invoice does not change, the contract does not change, and in most tenants nobody is asked to approve anything. The engine changes. The badge stays.
That is why the "clean, enterprise-grade data lineage" framing deserves close reading. It is procurement language, not developer language. The claims answer three questions well: where the training data came from (commercially licensed), whether the models inherit behaviour from other labs (zero distillation), and whether documentation exists (published safety and technical reports). They do not answer the questions that matter operationally: which model is serving your tenant on a given day, how substitutions are notified, what evaluation evidence supports parity on your workloads, and what rollback looks like if a swapped model degrades a workflow you depend on.
When a vendor starts answering provenance questions before customers ask them, that tells you where enterprise deals are now being won and lost. Take the volunteered paperwork. Then ask about the gaps.
Frontier Tuning is the second genuine shift. Until now the model under Copilot was a fixed service: you consumed it, you did not shape it. Letting enterprises tune weights on proprietary workflows, with data staying in their environment, converts the model into a configurable asset. That cuts both ways. It unlocks workflow-specific performance, and it moves a slice of model risk onto the customer, because a tuned model's behaviour is now partly your organisation's doing. Governance frameworks built on the assumption that the vendor owns the model need a new row.
The AI Credits change completes the set: a new model family, a new tuning capability and a new pricing model in the same week. Usage-based billing turns Copilot cost into a function of behaviour rather than headcount. Organisations that never instrumented their usage will discover that at invoice time.
Who should care and why
Governance, risk and compliance (GRC) teams own the sharpest end. A model substitution beneath the Microsoft 365 stack is a third-party risk change that arrives without a procurement trigger. If your AI register entry for Copilot names the underlying model, it is now stale. If it does not, the register is not specific enough to detect this class of change at all.
Managers and budget owners have a date. AI Credits started on 1 June 2026. Per-seat cost assumptions for Copilot no longer hold, and any FY2026-27 budget built on them is wrong on day one.
Human resources (HR) and fraud teams need to look hard at MAI-Voice-2. Voice adaptation from a short sample is now a built-in capability of the mainstream enterprise stack, not a fringe tool. Voice-based identity checks, manager callback conventions and payroll change controls all rest on the assumption that a familiar voice is hard to fake. Revisit that assumption.
Workers compensation (WC) teams will meet MAI-Transcribe-1.5 quickly, because transcription with domain-specific terminology is precisely what case conferences and medical evidence reviews call for. The accuracy figures are vendor claims. Validate them on representative recordings before any transcript informs a liability or incapacity decision, and keep de-identification in front of every workflow that touches claimant material.
Engineering leads can assume Copilot defaults will keep moving toward MAI models. Re-run acceptance evaluations on your own repositories rather than assuming parity with whatever you tested last quarter.
The Australian angle
Microsoft 365 and GitHub Copilot sit inside most large Australian enterprises and a large slice of government. That footprint is what makes a quiet substitution underneath them a regulatory event rather than a technology story.
For APRA-regulated entities, CPS 230 requires the management of material service provider arrangements, and entities have maintained registers of those providers since October 2025. A change to the models powering a material productivity service is a provider-side change the standard expects entities to notice, assess and document, even though no new agreement was signed. The assessment does not need to be heavy. It needs to exist.
The Voluntary AI Safety Standard asks procurement teams to interrogate exactly what Microsoft is now volunteering: training data provenance, supply chain transparency and supporting documentation. The zero-distillation and commercially-licensed-data claims slot neatly into those guardrails, which is unlikely to be a coincidence. Treat the alignment as a starting position for due diligence, not a completed one.
Frontier Tuning's keep-data-in-environment design speaks directly to regulated entities that cannot ship workflow data offshore. Before relying on it, confirm where the tuning compute physically runs, where tuned checkpoints are stored, and who can access them. "Data stays in your environment" is a design intent. Residency evidence is a contract schedule.
And the AI Credits switch is a concrete FY2026-27 line item for every Australian IT shop, arriving conveniently at budget-setting season.
Hype check
The benchmark claims are Microsoft's own. "Preferred to Sonnet 4.6 in blind human side-by-side evaluations" is a vendor-run study with no published methodology in the launch post. The 51 per cent SWE-Bench Pro figure comes via Build coverage rather than an independent leaderboard run. The Excel example that "matches GPT 5.4 while being up to 10x more efficient" is one tuned model on one workload, presented as the best case because it is the best case. None of this means the models are weak. It means the evidence is promotional until third parties reproduce it, and the open distribution on OpenRouter, Fireworks AI and Baseten makes reproduction feasible within weeks.
"Clean data lineage" is also narrower than it sounds. It is a claim about training inputs. It says nothing about output accuracy on your documents, your code or your claims correspondence. Lineage reduces legal and provenance risk. It does not reduce the need for evaluation.
The genuinely undersold part is the openness. Shipping the family on third-party platforms with tunable weights gives buyers something they rarely get from Microsoft: an independent venue to test the exact models before, or after, the Copilot wrapper changes around them.
What to do this week
- Ask your Microsoft account team, or check the Microsoft 365 message centre, which Copilot surfaces in your tenant now run MAI models and how future substitutions will be notified. Record the answer in your AI register.
- Pull last quarter's Copilot usage and model it against AI Credits pricing. Put the result in the FY2026-27 budget paper now, while it is still an estimate rather than an overrun.
- Add a model substitution clause to your AI vendor assessment template: notification of model changes, evaluation evidence on representative tasks, and a pin-or-rollback option. Get it in before the next renewal cycle, not after.
Microsoft has made the model beneath the productivity layer a moving part. Well-governed organisations will treat each substitution as a change event with evidence attached. Poorly governed ones will discover the change during an incident review. Update the vendor file first. Then enjoy the new models.
References
- Microsoft AI, Building a hill-climbing machine: Launching seven new MAI models, 2 June 2026
- Microsoft, Microsoft Build 2026: Be yourself at work, 2 June 2026
- CNBC, Microsoft unveils new AI models to lessen reliance on OpenAI and lower costs for developers, 2 June 2026
- Enterprise DNA, Microsoft Launches 7 Homegrown AI Models at Build 2026
- Windows Forum, Microsoft Build 2026: Homegrown AI Models to Power GitHub Copilot
- APRA, Prudential Standard CPS 230 Operational Risk Management
- Australian Government, Voluntary AI Safety Standard
TheAICommand. Intelligence, At Your Command.





