AI Agents Just Went From Minutes to Hours. The Control Point Is Where They Run., practitioner guidance from TheAICommand
← AI News
Analysis

AI Agents Just Went From Minutes to Hours. The Control Point Is Where They Run.

OpenAI bought Ona and published research showing AI agents now run for hours, not seconds. The unit you have to govern moved from the prompt to the environment the agent runs in.

·TheAICommand

Quick answer

AI agents now run unattended for hours, not seconds. OpenAI acquired Ona and published research confirming the shift. When an agent works for hours you cannot supervise by reading the answer, so the control point moves to the environment it runs in: where it runs, what it can access, how credentials are scoped, how activity is logged, and how work is reviewed.

For two years the mental model for using AI at work was simple. You typed a request, the model answered in seconds, you read the answer and decided what to do with it. The thing you governed was the prompt and the response. That model is now out of date, and the clearest evidence arrived this month from the company that has done the most to make it so.

On 11 June 2026 OpenAI announced it would acquire Ona, the cloud company formerly known as Gitpod, to fold its execution technology into Codex. Two weeks later, on 25 June, OpenAI published economic research measuring what its own agents are actually doing. Read together, the two posts describe a shift that matters more than any single model release. The unit of AI work has moved from a quick interaction to a delegated task that runs on its own for hours, and the thing you now have to govern is not the prompt but the place the agent runs.

Cinematic side-by-side split of a brief gold spark on the left and a long continuous gold ribbon of light on the right, with a closing laptop lid
The unit of AI work has moved from minutes to hours.

What changed: two OpenAI moves, read together

Start with the research, because it quantifies the shift. In "How agents are transforming work" (OpenAI Economic Research, 25 June 2026), OpenAI states plainly that "agentic AI changes the unit of knowledge work from single interactions to delegated, long-horizon tasks", and that agents "can operate independently for minutes or hours while orchestrating tool calls, interacting with environments, and iterating towards solutions". The numbers behind that sentence are worth pausing on. By May 2026, 80.6 per cent of sampled individual Codex users had made at least one request estimated to exceed 30 minutes of human work, 70.2 per cent had made one exceeding an hour, and 25.6 per cent had made at least one estimated to exceed eight hours. Nearly a quarter of all Codex requests are now for tasks that would take a person more than an hour. At the heavy end, OpenAI reports that by June 2026 its 99th-percentile users "regularly generated more than 60 hours of Codex agent turns per day, distributed across multiple, parallel agents".

This is not only a developer story. OpenAI notes that since August 2025 non-developer adoption of Codex rose 137 times for individual users and 189 times for organisational users, and that inside OpenAI itself, Legal, Finance and Recruiting crossed over to Codex as their primary AI tool around April 2026. The point for a professional audience is that the long-running agent is arriving in legal, finance, HR and operations functions, not just engineering teams, and it is arriving as the default way work gets done, not a side experiment.

The acquisition is the other half of the story. OpenAI says more than 5 million people now use Codex each week, up 400 per cent from earlier this year, and that "its most valuable work is unfolding over hours or days, rather than minutes". An agent that works for hours cannot stay tied to the laptop it started on. So OpenAI is buying Ona to give Codex "secure, persistent environments where agents can access the tools, systems, and context they need to make progress over time", enabling agents "to continue working inside a customer's cloud environment even when laptops are closed".

That is the development. Not a smarter model, a longer leash. The wider market read it the same way. InfoWorld's coverage on 12 June was headlined "OpenAI buys Ona to help rein in AI agents", and opened with the two questions every executive asks about an agent that runs unattended: "Will the agent start to delete critical files? Will the agent go off on a mission tangent and generate a massive token bill?" The story here is control, not capability.

What it means: the unit you govern moved from the prompt to the environment

When an agent answers in seconds, you supervise it by reading the answer. When an agent works for eight hours, opening files, calling tools, writing to systems and making hundreds of small decisions while nobody watches, reading the final answer is not supervision. The control has to move upstream, to the environment the agent runs in and the boundaries you set before it starts.

This is a different question from the one most AI governance has been answering. The standard advice, including ours, has been to put an approval gate in front of any consequential action. The agent proposes, a human approves, then it acts. That advice still holds, but it was built for an agent that takes one consequential step at a time. It does not scale to an agent that runs for hours and takes a thousand small steps, most of them individually unremarkable and collectively significant. You cannot approve every file write across an eight-hour run. What you can do is decide, in advance, where the agent is allowed to run, what it is allowed to touch, and what record it leaves behind.

Process flow of five connected gold nodes reading where it runs, what it reaches, credentials scoped, activity logged, work reviewed
Govern the environment, not the prompt: the five-part control surface.

OpenAI, to its credit, named that control surface in its own announcement. Giving organisations confidence to deploy persistent agents, it wrote, "means having control over where they run, what they can access, how credentials are scoped, how activity is logged, and how work moves through review". That is not marketing copy. It is a governance checklist, written by the vendor, for the exact problem a long-running agent creates. The architectural choice OpenAI is selling alongside it is just as important. Ona's "customer-controlled execution model will allow agents to operate inside an organization's own cloud environment while OpenAI provides the intelligence and orchestration that power the experience". In plain terms, the agent can run inside your own cloud account, inside your security boundary, while the model sits outside it. The execution environment, not the model, becomes the place you exercise control.

And it is rarely one agent. OpenAI's own heaviest users are already running many agents in parallel across a day, which is why it can report agent-turn hours well above the hours in a day. Parallelism multiplies the governance question rather than changing it. Ten agents running unattended is ten execution environments to scope, ten credential sets to time-box and ten activity logs to keep. The discipline does not get harder per agent, but it stops being something a person can hold in their head, which is the real argument for setting the boundaries in the environment rather than in each prompt.

The shift matters even if you are not APRA-regulated. An agent that works unattended across systems is accessing and acting on information at a scale and pace no person reviews in real time. Where that information includes personal information, it is a use-and-disclosure question under the Privacy Act, spread across a far wider surface than a single chat. The record the agent leaves is the only thing that lets you answer, after the fact, what it did and why. So the environment is not just a security boundary, it is your evidence trail.

What to do: the Australian practitioner play

For an Australian professional, the move is to stop treating a long-running agent as a productivity feature and start treating it as a change to how work is done and where it runs. That reframing is not optional for the regulated. APRA's letter to industry on artificial intelligence, issued on 30 April 2026, set AI-focused expectations across four observation areas: cyber and information security, governance, supplier risk, and change management and assurance. A persistent agent running unattended in production touches three of those four at once. APRA found that governance, risk and operational practices were "failing to keep pace with the scale, speed, and complexity of AI adoption", and CPS 230 Operational Risk Management, with targeted amendments effective from 1 July 2026, requires regulated entities to stay resilient to disruption, maintain their critical operations, and manage the risks arising from service providers. Many AI providers now meet the material service provider threshold. None of that is exotic. It is the existing operational-risk discipline, applied to an agent that works while you sleep.

Here is the play, whether or not you sit inside APRA's perimeter.

  1. Treat a long-running agent as a change, not a feature. Anything that runs beyond a single session, or runs unattended, goes on a register with an accountable owner and a risk rating before it goes live. The question is not "is the model good", it is "what is this allowed to do without a person in the room".
  2. Pin the execution boundary. Prefer customer-controlled execution, where the agent runs inside your own cloud account, so you control where the code executes, what it can reach and the audit trail it leaves. Do not let unattended work happen on infrastructure you cannot inspect. The execution environment is the single highest-leverage control you have, which is precisely why OpenAI is buying a company to provide it.
  3. Scope least privilege and time-box the credentials. The agent gets the narrowest set of tools, data and credentials the task needs, and no more, with access that expires and can be revoked. This maps directly to OpenAI's own "what they can access, how credentials are scoped" line, and to Info-Tech Research Group's caution that access must be "properly credentialed and controlled effectively to prevent the model doing what it shouldn't be doing".
  4. Log every action and review the trail. A long run produces an evidence trail or it produces nothing you can defend. Capture an immutable log of every action the agent took, retain it, and make "how work moves through review" a real step rather than an afterthought. For regulated work, that log is also your assurance evidence.
  5. Bound the blast radius and keep a kill switch. Set spend and token caps, restrict the systems and files the agent can write to, and keep a way to stop it mid-run. The two failure modes InfoWorld led with, the deleted files and the runaway bill, are both blast-radius problems with blast-radius answers.
  6. Keep human checkpoints on the consequential steps. Long-running is not the same as unsupervised. Put approval gates on the irreversible or high-consequence actions inside the run, and require a human review of the finished result before it lands anywhere that matters. Autonomy in duration does not mean autonomy in judgement.
Cinematic concept scene of a faceless silhouette at a glowing control desk watching a ribbon of gold light stream into a softly enclosed bounded space
Design the space before you delegate the work.

A short worked example

Picture a finance team, [TEAM], that points a Codex-style agent at an overnight task: reconcile a month of transactions across two systems, flag the breaks, and draft a summary for the morning. The capability is real and the time saved is real. The governance is in the setup, not the prompt. The agent runs inside the organisation's own cloud account, with read access to the two ledgers and write access only to a single draft workspace, on credentials that expire at 7am. Every action it takes is logged. It is allowed to flag breaks but not to post adjustments, that step waits for a person. A spend cap stops it if it loops. When the team arrives, a human reviews the breaks and the draft before anything is actioned. Same agent, same task. The difference between a useful tool and an incident is the environment it was given.

The bottom line

The productivity is not in doubt, and neither is the direction of travel. Agents that work for hours, in parallel, on tasks that used to take a person a day are already mainstream inside the company building them, and the rest of the market is weeks behind, not years. What changed this month is that the unit you have to govern moved from a prompt you can read to a standing environment you have to design. The useful part is that the vendors have told you the control surface in their own words: where it runs, what it can reach, how it is logged, and how the work comes back for review. The work is to own those four things before you hand an agent the keys and close the laptop.

Try this: a deployment reviewer prompt

Paste a planned or live agent into ChatGPT or Claude and get a control assessment against the where it runs, what it can reach, how it is logged and how the work is reviewed surface, with a stated human-review boundary.

Prompt
You are an operational-risk reviewer for AI agent deployments. Your job is to assess a proposed long-running or unattended AI agent against a control checklist and surface the gaps, so an accountable human owner can decide whether to approve it.

Assess the deployment I paste below against these seven controls:
1. Change and ownership: is it on a register, with a named accountable owner and a risk rating, before go-live?
2. Execution boundary: where does the agent actually run, and can we inspect that environment? Prefer execution inside our own cloud account.
3. Least privilege and credentials: does it have only the tools, data and credentials the task needs, with access that is scoped, time-boxed and revocable?
4. Logging and review: is every action logged to an immutable, retained record, and is there a defined human review of the result before it lands?
5. Blast radius and kill switch: are there spend and token caps, write restrictions, and a way to stop it mid-run?
6. Human checkpoints: are the irreversible or high-consequence actions gated for human approval inside the run?
7. Regulatory and privacy: does it support a critical operation or touch personal information, and if so is it captured as a supplier-risk or operational-resilience item (for example under APRA CPS 230) and a Privacy Act use-and-disclosure consideration?

Paste here: what the agent does; how long it runs and whether it is attended; where it executes; the systems, data and credentials it can reach; what it can write to or action; logging in place; caps and kill switch; human checkpoints; and whether it touches a critical operation or personal information.

[PASTE DEPLOYMENT DETAILS]

Output, in this order:
- A control table: each of the seven controls marked Present, Partial or Missing, with the specific gap in one line and the specific fix in one line.
- A short risk summary, three to five sentences, in plain English.
- Open questions a human must resolve before go-live.

Boundary: this is a draft assessment to inform an accountable human owner. The go-live decision, the acceptance of any residual risk, and any regulated determination remain with a person. Do not state that the deployment is approved or compliant. Recommend, do not decide.

How to run it: create a ChatGPT Project or a Claude Project called "Agent deployment reviews", paste the reviewer prompt into the project instructions, and add your own thresholds for what counts as a critical operation, your spend caps and your logging standard so it assesses against your bar. Feed it one deployment at a time, then run a self-refine pass before you trust it by replying "Now critique your own assessment as a sceptical auditor. What did you wave through, which control did you mark Present without evidence, and what would you downgrade", then "Rewrite the assessment incorporating that critique". Only then take the result to the accountable owner, who makes the go-live decision.

References

  1. OpenAI, "OpenAI to acquire Ona", 11 June 2026. https://openai.com/index/openai-to-acquire-ona/
  2. OpenAI Economic Research, "How agents are transforming work" (paper: "The Shift to Agentic AI: Evidence from Codex"), 25 June 2026. https://openai.com/index/how-agents-are-transforming-work/
  3. InfoWorld, "OpenAI buys Ona to help rein in AI agents", 12 June 2026. https://www.infoworld.com/article/4184648/openai-buys-ona-to-help-rein-in-ai-agents.html
  4. APRA, "Letter to Industry on Artificial Intelligence (AI)", 30 April 2026. https://www.apra.gov.au/apra-letter-to-industry-on-artificial-intelligence-ai
  5. APRA, "Prudential Standard CPS 230 Operational Risk Management", effective 1 July 2025; targeted amendments effective 1 July 2026. https://www.apra.gov.au/standards/cps-230

General information only. Not legal, compliance, financial, or professional advice.*

TheAICommand. Intelligence, At Your Command.

Frequently asked questions

What did OpenAI announce in June 2026 about AI agents?
On 11 June 2026 OpenAI announced it would acquire Ona, the cloud company formerly known as Gitpod, to give its Codex agents secure, persistent places to run. On 25 June it published economic research showing the unit of AI work has moved from quick interactions to delegated tasks that run for hours.
How long do AI agents now run?
By May 2026, 80.6 per cent of sampled individual Codex users had set a task estimated to exceed 30 minutes of human work, 70.2 per cent one exceeding an hour, and 25.6 per cent one exceeding eight hours. OpenAI reports its heaviest users run many agents in parallel, generating more than 60 hours of agent turns in a single day.
Why does the AI control point move from the prompt to the environment?
When an agent answers in seconds you supervise by reading the answer. When it works unattended for hours, taking hundreds of small actions, reading the final output is not supervision. Control has to move upstream to the environment: where the agent runs, what it can reach, how credentials are scoped, what it logs, and how the work is reviewed.
What does this mean for APRA-regulated Australian work?
A long-running agent in production is a change-management and supplier-risk question, not an IT convenience. APRA's 30 April 2026 letter set expectations across cyber and information security, governance, supplier risk, and change management. CPS 230, with amendments effective 1 July 2026, requires resilience, critical-operation continuity and service-provider risk management.
How should you govern a long-running AI agent?
Treat it as a change, not a feature. Pin the execution boundary and prefer running inside your own cloud account. Scope least privilege and time-box credentials. Log every action and review the trail. Bound the blast radius with spend caps and a kill switch. Keep human checkpoints on the consequential steps.

Tags

AI AgentsAgentic AIOpenAICodexAI GovernanceCPS 230
← Back to AI News