What are Microsoft's seven MAI models and where are they being used?

Microsoft launched MAI-Thinking-1, MAI-Code-1-Flash, MAI-Image-2.5, MAI-Image-2.5-Flash, MAI-Transcribe-1.5, MAI-Voice-2 and MAI-Voice-2-Flash at Build 2026. They are being substituted into GitHub Copilot and the Microsoft 365 stack, landing across coding, productivity, image, transcription and voice surfaces.

Why does the MAI model swap matter for GRC and vendor risk teams?

The substitution is a third-party risk change that arrives without a procurement trigger. The invoice, product name and contract do not change, but the model underneath does. AI register entries naming the old model are now stale, and registers must be specific enough to detect this class of change.

What do Microsoft's clean data lineage claims actually answer?

They answer where training data came from (commercially licensed), whether models inherit behaviour from other labs (zero distillation), and whether documentation exists. They do not say which model serves your tenant on a given day, how substitutions are notified, what parity evidence supports your workloads, or what rollback looks like.

How does the AI Credits billing change affect Copilot costs?

GitHub Copilot moved to usage-based token billing called AI Credits on 1 June 2026, the same week as the model swap. Cost becomes a function of behaviour rather than headcount, so per-seat assumptions no longer hold and any FY2026-27 budget built on them is wrong on day one.

What should organisations do this week about the MAI substitution?

Ask the Microsoft account team or message centre which Copilot surfaces now run MAI models and how future swaps are notified, then record it in the AI register. Model last quarter's usage against AI Credits pricing for the budget. Add a model substitution clause with notification, evaluation evidence and rollback before renewal.

Microsoft's MAI Models Under Copilot

Microsoft launched seven in-house MAI models at Build 2026 and began substituting them under GitHub Copilot and the Microsoft 365 stack, and for practitioners the story is procurement, not benchmarks. The models under Copilot just changed. Nobody signed anything.

At Build 2026 in San Francisco on 2 and 3 June, Microsoft launched seven in-house models under the MAI family name and confirmed they are being substituted into GitHub Copilot and the Microsoft 365 stack (Microsoft AI). For the people who govern, budget for and depend on that stack, the interesting part is not the benchmark slide. It is that the vendor supplying most Australian enterprises' productivity layer is changing what sits underneath it, and answering procurement questions nobody had formally asked yet.

What did Microsoft actually launch?

The Microsoft AI Superintelligence Team, led by Mustafa Suleyman, shipped seven models trained entirely in-house: MAI-Thinking-1, MAI-Code-1-Flash, MAI-Image-2.5, MAI-Image-2.5-Flash, MAI-Transcribe-1.5, MAI-Voice-2 and MAI-Voice-2-Flash (Microsoft AI). The launch post stresses that none of the seven were distilled from third-party models: "we don't distill from other labs and we don't rely on opaque data". The training corpus is described as commercially licensed, with "clean, enterprise-grade data lineage" as the headline selling point.

The pair that matters most for daily work is the reasoning and coding duo. MAI-Thinking-1 is a medium-weight reasoning model with 35 billion active parameters and a 256,000-token context window, per Build coverage (Enterprise DNA). Microsoft says it matches leading models on key software engineering benchmarks and was "preferred to Sonnet 4.6 in our blind human side-by-side evaluations". MAI-Code-1-Flash runs 5 billion active parameters, scores a reported 51 per cent on SWE-Bench Pro, is positioned as comparable to Haiku but cheaper, and is already wired into GitHub Copilot and VS Code (Windows Forum).

The rest of the family lands across the productivity stack. MAI-Image-2.5 handles text-to-image and image editing, is live in PowerPoint, rolling out on OneDrive and landing on Azure AI Foundry, with Microsoft claiming it surpasses Nano Banana Pro's Arena score (Microsoft AI). MAI-Transcribe-1.5 claims state-of-the-art transcription accuracy with domain-specific terminology across 43 languages at five times the speed of competing models. MAI-Voice-2 generates natural speech across 15 languages and can adapt to a voice from a short sample (Microsoft).

Distribution is unusually open for Microsoft. The models ship on Azure AI Foundry plus the third-party platforms OpenRouter, Fireworks AI and Baseten, and for the first time developers can tune the model weights themselves (Microsoft AI). The enterprise version of that capability is Frontier Tuning, which applies reinforcement learning within compliance boundaries so an organisation can shape model behaviour on proprietary workflows without data leaving its environment. Microsoft's worked example: an MAI model tuned for Excel "matches GPT 5.4 while being up to 10x more efficient".

Two more facts complete the picture. GitHub Copilot moved to usage-based token billing, called AI Credits, on 1 June, the same week as the model swap (Windows Forum). And the strategic frame is explicit: CNBC reports the launch is about lessening Microsoft's reliance on OpenAI and lowering costs for developers, two weeks after Google's own model push at I/O. Microsoft also published safety and technical reports alongside the launch (Microsoft AI).

What it actually means

Strip the launch theatre and this is a supply-chain event. Most organisations never chose Microsoft as a model vendor. They chose it as a productivity vendor a decade ago, and the models arrived later inside Copilot. When Microsoft substitutes its own MAI models for OpenAI's underneath that surface, the product name on the invoice does not change, the contract does not change, and in most tenants nobody is asked to approve anything. The engine changes. The badge stays.

Diagram showing the Copilot product layer staying constant while the model layer beneath it is swapped from OpenAI to MAI — The badge stays. The engine changes.

That is why the "clean, enterprise-grade data lineage" framing deserves close reading. It is procurement language, not developer language. The claims answer three questions well: where the training data came from (commercially licensed), whether the models inherit behaviour from other labs (zero distillation), and whether documentation exists (published safety and technical reports). They do not answer the questions that matter operationally: which model is serving your tenant on a given day, how substitutions are notified, what evaluation evidence supports parity on your workloads, and what rollback looks like if a swapped model degrades a workflow you depend on.

When a vendor starts answering provenance questions before customers ask them, that tells you where enterprise deals are now being won and lost. Take the volunteered paperwork. Then ask about the gaps.

Frontier Tuning is the second genuine shift. Until now the model under Copilot was a fixed service: you consumed it, you did not shape it. Letting enterprises tune weights on proprietary workflows, with data staying in their environment, converts the model into a configurable asset. That cuts both ways. It unlocks workflow-specific performance, and it moves a slice of model risk onto the customer, because a tuned model's behaviour is now partly your organisation's doing. Governance frameworks built on the assumption that the vendor owns the model need a new row.

The AI Credits change completes the set: a new model family, a new tuning capability and a new pricing model in the same week. Usage-based billing turns Copilot cost into a function of behaviour rather than headcount. Organisations that never instrumented their usage will discover that at invoice time.

Who does the MAI swap affect, and why?

Governance, risk and compliance (GRC) teams own the sharpest end. A model substitution beneath the Microsoft 365 stack is a third-party risk change that arrives without a procurement trigger. If your AI register entry for Copilot names the underlying model, it is now stale. If it does not, the register is not specific enough to detect this class of change at all.

Managers and budget owners have a date. AI Credits started on 1 June 2026. Per-seat cost assumptions for Copilot no longer hold, and any FY2026-27 budget built on them is wrong on day one.

Human resources (HR) and fraud teams need to look hard at MAI-Voice-2. Voice adaptation from a short sample is now a built-in capability of the mainstream enterprise stack, not a fringe tool. Voice-based identity checks, manager callback conventions and payroll change controls all rest on the assumption that a familiar voice is hard to fake. Revisit that assumption.

Workers compensation (WC) teams will meet MAI-Transcribe-1.5 quickly, because transcription with domain-specific terminology is precisely what case conferences and medical evidence reviews call for. The accuracy figures are vendor claims. Validate them on representative recordings before any transcript informs a liability or incapacity decision, and keep de-identification in front of every workflow that touches claimant material.

Engineering leads can assume Copilot defaults will keep moving toward MAI models. Re-run acceptance evaluations on your own repositories rather than assuming parity with whatever you tested last quarter.

The Australian angle

Microsoft 365 and GitHub Copilot sit inside most large Australian enterprises and a large slice of government. That footprint is what makes a quiet substitution underneath them a regulatory event rather than a technology story.

For APRA-regulated entities, CPS 230 requires the management of material service provider arrangements, and entities have maintained registers of those providers since October 2025. A change to the models powering a material productivity service is a provider-side change the standard expects entities to notice, assess and document, even though no new agreement was signed. The assessment does not need to be heavy. It needs to exist.

The Voluntary AI Safety Standard asks procurement teams to interrogate exactly what Microsoft is now volunteering: training data provenance, supply chain transparency and supporting documentation. The zero-distillation and commercially-licensed-data claims slot neatly into those guardrails, which is unlikely to be a coincidence. Treat the alignment as a starting position for due diligence, not a completed one.

Frontier Tuning's keep-data-in-environment design speaks directly to regulated entities that cannot ship workflow data offshore. Before relying on it, confirm where the tuning compute physically runs, where tuned checkpoints are stored, and who can access them. "Data stays in your environment" is a design intent. Residency evidence is a contract schedule.

And the AI Credits switch is a concrete FY2026-27 line item for every Australian IT shop, arriving conveniently at budget-setting season.

Cinematic scene of a translucent model core being examined under a gold inspection beam in a dark navy environment — Provenance claims are a starting position, not a completed assessment.

Hype check

The benchmark claims are Microsoft's own. "Preferred to Sonnet 4.6 in blind human side-by-side evaluations" is a vendor-run study with no published methodology in the launch post. The 51 per cent SWE-Bench Pro figure comes via Build coverage rather than an independent leaderboard run. The Excel example that "matches GPT 5.4 while being up to 10x more efficient" is one tuned model on one workload, presented as the best case because it is the best case. None of this means the models are weak. It means the evidence is promotional until third parties reproduce it, and the open distribution on OpenRouter, Fireworks AI and Baseten makes reproduction feasible within weeks.

"Clean data lineage" is also narrower than it sounds. It is a claim about training inputs. It says nothing about output accuracy on your documents, your code or your claims correspondence. Lineage reduces legal and provenance risk. It does not reduce the need for evaluation.

The genuinely undersold part is the openness. Shipping the family on third-party platforms with tunable weights gives buyers something they rarely get from Microsoft: an independent venue to test the exact models before, or after, the Copilot wrapper changes around them.

What should you do this week?

Ask your Microsoft account team, or check the Microsoft 365 message centre, which Copilot surfaces in your tenant now run MAI models and how future substitutions will be notified. Record the answer in your AI register.
Pull last quarter's Copilot usage and model it against AI Credits pricing. Put the result in the FY2026-27 budget paper now, while it is still an estimate rather than an overrun.
Add a model substitution clause to your AI vendor assessment template: notification of model changes, evaluation evidence on representative tasks, and a pin-or-rollback option. Get it in before the next renewal cycle, not after.

Microsoft has made the model beneath the productivity layer a moving part. Well-governed organisations will treat each substitution as a change event with evidence attached. Poorly governed ones will discover the change during an incident review. Update the vendor file first. Then enjoy the new models.

Bottom line

Microsoft has turned the model beneath Copilot and the Microsoft 365 stack into a moving part, substituting its own MAI models without a new contract, a new invoice line or an approval step. The clean-data-lineage claims answer where the training data came from, not which model serves your tenant, how swaps are notified or what rollback looks like. Treat each substitution as a change event with evidence attached, not a technology story.

Do this Monday:

Ask your Microsoft account team or check the message centre which Copilot surfaces now run MAI models, and record the answer in your AI register.
Model last quarter's Copilot usage against AI Credits pricing before it lands in the FY2026-27 budget as an overrun.
Add a model substitution clause covering notification, evaluation evidence and rollback to your AI vendor assessment template.
Re-run acceptance evaluations on your own repositories rather than assuming parity with last quarter's tests.
Validate MAI-Transcribe-1.5 on representative recordings, and revisit the voice-based identity checks exposed by MAI-Voice-2, before either touches a decision.

References

TheAICommand. Intelligence, At Your Command.

Microsoft's Seven MAI Models: The In-House Bet Under Copilot

What did Microsoft actually launch?

What it actually means

Who does the MAI swap affect, and why?

The Australian angle

Hype check

What should you do this week?

Bottom line

References

Frequently asked questions

Read next

AI Week in Review, 8-14 June 2026: A Frontier Model Pulled by Government Order

Claude Fable 5: Frontier Capability, With Conditions Attached

Google Missed Its Own Release Date, and That Is the Story