Melbourne Flinders Dusk
← AI News
analysis

The AI Pilot-to-Scale Gap Is an Operating Model Problem

Most organisations can run AI pilots. Far fewer can scale them safely, consistently and usefully across real work.

·TheAICommand

AI pilots are easy to start and hard to scale. A small team can test a chatbot, summarise documents, draft emails or automate a reporting step in days. Scaling that use across business units, risk settings, data environments and quality expectations is a different challenge. The limiting factor is rarely access to a model. It is the organisation's operating model.

McKinsey's State of AI research reports that 88 percent of survey respondents say their organisations use AI in at least one business function, yet about two-thirds have not begun scaling AI across the enterprise. Deloitte's 2026 State of AI in the Enterprise similarly points to rapid worker access, but notes that only one in five companies has a mature governance model for autonomous AI agents and that only 34 percent are truly reimagining the business rather than optimising existing processes.

Those findings suggest a practical conclusion: pilots prove possibility, but operating models prove repeatability.

Diagram contrasting a single AI pilot with a scaled AI operating model across five elements
Pilots prove possibility, operating models prove repeatability

Why pilots feel successful

AI pilots often feel successful because the starting bar is low. A model can turn a blank page into a usable first draft. It can summarise a long document. It can generate ideas, compare options and format outputs. These improvements are visible immediately, especially in knowledge work where time is spent reading, writing and synthesising.

The problem is that pilot success can be misleading. A pilot may rely on enthusiastic users, carefully selected examples, manual quality checks, non-sensitive data and informal workarounds. Once the same use case enters normal operations, it must handle messy inputs, inconsistent user behaviour, privacy constraints, vendor limits, audit questions, incident management and accountability.

Pilot questionScale question
Can the tool produce a useful output?Can the organisation define when the output is good enough to use?
Do users like it?Can managers supervise use consistently across teams?
Does it save time?Does it improve quality, judgement or service outcomes?
Does it work on sample data?Does it work safely with real data and edge cases?
Can one team manage it?Can risk, technology, legal, privacy and operations support it at scale?

This distinction is important because AI adoption is not just tool adoption. It changes how work is initiated, reviewed, approved and recorded. A pilot can skip those design questions. A scaled system cannot.

The operating model has five parts

A useful AI operating model answers five questions. First, who is accountable? Second, which use cases are prioritised? Third, what controls apply at each risk level? Fourth, how do people learn and change work practices? Fifth, how is value measured after deployment?

The NIST AI Risk Management Framework describes AI risk management through governance, mapping, measurement and management. That structure is helpful because it forces organisations to connect context, testing, oversight and action. It also prevents AI from becoming a collection of disconnected experiments.

Operating model elementWhat it needs to define
AccountabilityOwners, decision rights, escalation paths and executive oversight
PortfolioPrioritised use cases, expected value and risk classification
ControlsData rules, testing, human review, vendor checks and incident response
AdoptionRole-specific training, manager habits and change support
MeasurementQuality, time, risk, employee experience and stakeholder outcomes

The weakest element is often measurement. Time saved is useful, but it is not enough. A draft produced faster may still be inaccurate. A summary may be short and polished but omit important caveats. A recommendation may look structured while hiding bias. Organisations need measures of value and measures of trust.

Culture and management matter more than tool enthusiasm

Microsoft's 2026 Work Trend Index reports that organisational factors such as culture, manager support and talent practices account for twice the reported AI impact of individual effort alone. This should challenge the common assumption that AI transformation is driven mainly by power users. Power users matter, but they cannot compensate for unclear expectations, poor data access, weak quality standards or manager scepticism.

Manager behaviour is particularly important. If managers reward output volume without checking quality, AI use will drift toward quantity. If managers punish disclosure of AI use, employees will hide experimentation. If managers do not understand how outputs were produced, they cannot coach people effectively. The scaled organisation needs managers who ask better questions: What sources did you check? What did the model get wrong? Where did you apply judgement? What risk did you consider? What should we change in the workflow?

Weak adoption patternStrong adoption pattern
Employees self-teach without shared standardsRole-specific learning paths and examples
AI use is hidden or inconsistentTransparent norms about acceptable use
Managers focus only on time savedManagers review quality, judgement and risk
Every team buys its own toolApproved tool catalogue and use-case intake
Success stories replace evidenceBenefits tracked through defined metrics

The human element is not soft. It is operational. AI changes the mechanics of knowledge work, so the organisation must teach people how to work differently.

Tiered governance diagram matching low, medium and high-risk AI use cases to control depth
Match governance depth to risk

Governance should enable, not smother

Some organisations respond to AI risk by slowing everything down. Others respond to opportunity by letting every team experiment freely. Neither extreme scales well. Governance should enable safe speed. That means low-risk uses should be simple to approve, while high-risk uses should receive deeper review.

A tiered approach works best. Low-risk drafting and ideation can be covered by approved tools, training and data rules. Medium-risk workflow support may need use-case registration, testing and manager review. High-risk decision support, sensitive data processing or external stakeholder impact should trigger legal, privacy, risk and assurance review.

Risk tierExampleGovernance response
LowDrafting an internal meeting agendaApproved tool, user training and no sensitive data
MediumSummarising internal policy documentsUse-case registration, source checking and owner review
HighSupporting employment, claims, credit or customer decisionsFormal risk assessment, testing, human oversight and audit evidence

Australia's voluntary AI Safety Standard supports this approach by focusing on accountability, risk management, data governance, testing, transparency and human oversight. The point is not to turn every AI idea into a compliance project. The point is to match governance depth to risk.

Scale begins before the pilot starts

The best time to plan for scale is before a pilot begins. Every pilot should have a scale hypothesis: what would need to be true for this use case to work across teams? That hypothesis should include data access, system integration, user behaviour, quality standards, controls, training, support and value measures.

A pilot should also have exit criteria. It should not drift indefinitely because users like it. At the end of a pilot, the organisation should decide whether to scale, redesign, pause or stop. That decision should be based on evidence, not excitement.

The bottom line

The AI pilot-to-scale gap is not caused by lack of imagination. It is caused by lack of operating discipline. Pilots show what a model can do. Scaling shows what an organisation can govern, support and improve.

The winners in enterprise AI will not be the organisations with the most pilots. They will be the organisations that turn the right pilots into repeatable, trusted ways of working.

References

  1. McKinsey, The State of AI
  2. Deloitte, State of AI in the Enterprise 2026
  3. NIST AI Risk Management Framework
  4. Microsoft Work Trend Index 2026
  5. Australian Government Voluntary AI Safety Standard

TheAICommand. Intelligence, At Your Command.

Tags

AI AdoptionOperating ModelEnterprise AIProductivityGovernance
← Back to AI News