Hand the browser back to the human

Pause and takeover during AI test runs, the part most cloud runners skip

The agent failed on step 7. There is a video. There is a stack trace. What you actually need is the browser the agent was just in, still logged in, still on the page that broke. Assrt keeps that browser alive on purpose and lets you click into it.

Matthew Diakonov, Written with AI

Published May 22, 20268 min read

Direct answer (verified 2026-05-22)

You cannot pause an Assrt run mid-step. You can take over after it finishes. The agent runs the scenario to a natural finish (pass, fail, or you press Stop). At that point the VM is deliberately left alive. A Take Over overlay appears on the browser view. Click it and the same noVNC canvas flips from view-only to interactive. You drive the exact same Chromium process the agent was just driving, with cookies, localStorage, scroll position, and open tabs intact.

The mechanism is a single React conditional (<VncViewer url={vncUrl} viewOnly={!userControlActive} />) plus an input WebSocket that the client opens only while you are in control. The comment that controls VM lifetime is one line in the open source repo: "Does NOT destroy the VM (it stays alive for user takeover)."

Why the model is take-over-after, not pause-mid-step

A lot of the people who land on this topic want true mid-step pause: freeze the agent on step 3, click into the browser, do a thing, then let the agent continue from step 4. It is the obvious feature. It is also the feature that breaks if you build it badly.

The reason it breaks is that the AI agent is holding the CDP session. Every click, type, and assertion the agent issues is a message on the same Chrome DevTools Protocol channel that a human taking over would use. Splitting that channel between two controllers turns into a racy mess: half-typed inputs, double clicks, a captured screenshot that the agent thinks it took but the human moved the page before the response landed. The honest version of mid-step pause requires a transactional handoff protocol, and no open source runner I have seen ships one that actually works under load.

Assrt makes the smaller, more defensible choice. The agent runs to completion or to a stop signal. When it lets go, the keyboard and mouse are yours. That is take-over-after. It covers ninety percent of why people ask for pause-and-takeover (the agent failed and I want to finish the flow), without inventing a handoff protocol that loses inputs.

The one comment that makes this possible

The whole feature hangs off a single design decision in the agent loop: when a run finalises, do not destroy the VM. In src/core/agent.ts the comment on finalizeVideo() says it out loud:

/** Encode and download the screencast recording from the VM.
* Does NOT destroy the VM (it stays alive for user takeover). */

Most cloud E2E products release the VM immediately. The reasoning is economic, every running VM costs money, and the user already has the video. The Assrt design rejects that trade. The VM is the most expensive single artifact of the run; throwing it away in the same second a developer wants to debug it is the worst possible time.

Three things release the VM instead: an explicit Stop click from the user, a pagehide beacon when the user closes the tab, or a server-side reaper that kills idle sessions after a timeout. Until one of those fires, you can come back to the tab an hour later and the browser is still sitting on the failure page.

The handoff, step by step

From the moment the agent reaches appState === "done" to the moment your mouse moves are five small things in sequence. Each one is a few lines of code; none of them is magic.

1
Run finishes
Agent reaches done. Video encodes. VM stays up.
2
Overlay shows
Take Over button appears as a hover overlay on the browser view.
3
You click
toggleUserControl() flips userControlActive to true.
4
Input opens
connectInputWS dials the VM input WebSocket, separate from screencast.
5
Canvas live
VncViewer flips viewOnly to false on the existing RFB connection.

Two channels, two phases

Drawn as a sequence diagram, the takeover handoff is a small dance between three actors: your browser tab, the input WebSocket, and the noVNC canvas. The thing to notice is that the screencast channel never disconnects. It is the same pixel stream throughout, which is why the picture does not flicker when you take over.

From watch-only to interactive on one canvas

The input WebSocket carries one extra job. While you are in control, an Inspect button is visible. Click it and the same WebSocket carries a single JSON message ( { type: "highlight", action: "inspect", enabled: true }) that switches CDP into element-picker mode. Hover the canvas, watch the element under your cursor highlight, click to capture its selector. That selector flows back to the React app as an inspectNode event, which dispatches a custom DOM event the rest of the UI listens for.

What changes between watch-only and interactive

The same JSX element renders the browser in both phases. The flip is one prop, no reconnect.

The flip

// src/app/app/test/page.tsx
<VncViewer
  url={vncUrl}
  viewOnly={true}
/>
// rfb.viewOnly = true; rfb.showDotCursor = false

0% fewer lines

The six-line useEffect inside the viewer that actually does the flip:

useEffect(() => {
  if (rfbRef.current) {
    rfbRef.current.viewOnly = viewOnly;
    rfbRef.current.showDotCursor = !viewOnly;
  }
}, [viewOnly]);

That is it. The RFB instance is alive, the WebSocket to the VNC server is open, the pixel stream is still flowing. Toggling rfb.viewOnly on the existing instance is what unblocks input forwarding from noVNC down to the VM. The cursor dot appears so you can see where your pointer is on the remote display.

What this gets you that a recording does not

Every E2E test product gives you a video and a screenshot. The mental model people pick up from that is "watch the failure, then go reproduce it on my laptop." That mental model breaks on three flows that come up constantly.

Debugging a failed AI test run

The agent failed on step 7. You watch the recording. You see a modal you did not predict. You open your laptop, start the dev server, reach the same screen, sign in by hand, navigate to the right state, then look for the modal. The modal is gated by a feature flag that depends on the session ID the test created, which you do not have. You spend twenty minutes trying to reproduce the conditions the agent was in.

Watch the failure, do not touch it
Reproduce the conditions from scratch locally
Auth state, session, feature flags all start over
Selector for the surprise element is in a screenshot, not the DOM

How this compares to other AI test runners

Most of the AI test products you can buy run scenarios on a managed fleet of cloud VMs, then destroy the VM as soon as the run finishes. That makes sense from a unit-economics angle and is fine if the only thing you ever do with a run is look at the report. The moment you want to touch what the agent touched, it falls apart.

Feature	Typical cloud E2E AI runner	Assrt
VM lifecycle after a run	Destroyed at run finalize, reclaim VRAM and compute.	Stays alive on purpose. Released only on explicit Stop, tab close, or idle reaper.
User input into the test browser	Not exposed. Replay or re-run is the workflow.	noVNC canvas flips from view-only to interactive on a button click.
Session state on takeover	N/A, the session is gone.	Same Chromium process. Cookies, localStorage, open tabs, scroll position all intact.
Element inspection from the running browser	Screenshots only. DOM is gone.	Inspect button toggles CDP element picker over the live page.
Output format the human keeps	Proprietary YAML or vendor-specific format.	Plain Markdown scenarios and standard Playwright code, MIT licensed.

The comparison is structural, not against a specific named product. Some closed-source vendors offer pieces of this. None of the open source ones I have read ship the whole pattern.

When this is the wrong feature for you

Honesty section. Take-over-after is not the right shape for every team. Three cases where it does the wrong thing:

You need true frame-accurate pause inside step execution so a human can patch the agent's next action and let it continue. That requires a handoff protocol Assrt does not ship today, and anyone telling you their tool does ship it should be asked to demo it under load.
Your CI does not run on a VM that can run a VNC server, so a head-up canvas is not reachable from a developer's laptop. In that environment the same scenario should run locally first and only flow to CI as a smoke job.
Your test data is so sensitive that letting a developer drive the test browser after a run would violate the same access policies that gate production access. The takeover is exactly as powerful as the test account that ran the scenario; if that is too powerful, lock it down before opening the canvas.

Try it on your own app

The whole pattern is open source and runs from one command. Point it at a URL, watch the agent crawl and write scenarios, run a scenario, then click Take Over on the result.

npx @m13v/assrt discover https://your-app.com

The CLI is the same surface used by the Model Context Protocol integration, so if you are running this from inside Claude Code or Cursor, the same tools (assrt_test, assrt_plan, assrt_diagnose) drive the same VM lifecycle. The Take Over overlay shows up on the same browser view the agent was just on.

Want to see takeover on your own failing test?

Bring a flaky scenario and we will run it together. If the agent fails, we click Take Over and fix it on the call.

Pause and takeover, the questions that come up

Can I pause an Assrt run mid-step and click in myself?

Not mid-step. The model is take-over-after, not pause-mid-run. The AI agent runs through the scenario to its natural finish (passed, failed, or you click Stop). When it stops, the VM is left alive on purpose. A Take Over button appears as a hover overlay on the browser view, and clicking it flips the same noVNC canvas from view-only to interactive in place. The reason it is not mid-step is that the agent is holding the CDP session for clicks, types, and assertions; cleanly handing that off frame by frame would create racy double-clicks. After the run, the agent has released its grip and the input channel is yours alone.

What state is preserved when I take over?

Everything. The cookies the agent set during sign-in. The localStorage the agent wrote during the cart flow. The currently open tab. The route the page is on. The scroll position. The form fields it half-filled before it failed. Because the VM is the same VM and the browser process is the same process, you are not resuming from a snapshot or replaying a script, you are literally standing in the seat the agent vacated.

How does Assrt do this when most cloud test tools destroy the VM at the end of a run?

One comment in /Users/matthewdi/assrt/src/core/agent.ts, on the finalizeVideo() method around line 449, says it explicitly: "Does NOT destroy the VM (it stays alive for user takeover)." The video is encoded and downloaded, but the underlying Freestyle VM, the Chromium process inside it, and the VNC server stay running. Destruction happens on three deliberate triggers instead: the user clicks Stop, the user closes the browser tab (a pagehide sendBeacon to /api/acp-chat/release fires), or the server-side reaper times out an idle session. Until one of those fires, the browser is sitting there waiting for you.

What is happening under the hood when I click Take Over?

Three things in sequence. First, the React execution store calls toggleUserControl() in src/lib/execution-store.ts, which sets userControlActive: true. Second, the store opens an input WebSocket (connectInputWS) to the VM, separate from the screencast channel. This is the input channel for clicks, keystrokes, and CDP highlight commands. Third, the same VncViewer component re-renders with viewOnly={!userControlActive}, which is now false. A six-line useEffect inside the viewer mutates rfb.viewOnly and rfb.showDotCursor on the existing RFB connection without reconnecting. The pixel canvas does not flicker.

What is the Inspect button that appears alongside Release?

Inspect toggles a CDP overlay so that hovering an element in the noVNC canvas highlights it and surfaces its selector. The button only appears while you are in control. Internally toggleInspectMode() sends a single JSON message over the same input WebSocket: { type: "highlight", action: "inspect", enabled: true }. The CDP-side bridge translates that into Overlay.setInspectMode, and the resulting node-picked events come back through the same WebSocket as inspectNode messages, which the React app dispatches as a custom DOM event so the rest of the page can react.

Can I take over a passing test, or only a failed one?

Either. The conditional in src/app/app/test/page.tsx that renders the Take Over overlay checks for appState === "done" and the presence of a vncUrl. That is all. Pass, fail, or aborted, if the test reached the done state and the VM hasn't been released, the overlay is there. In practice most takeovers happen after a failure (you want to finish the flow the agent could not complete), but takeovers after a pass are common too, usually to grab one more selector for the next iteration or to verify a non-obvious downstream state the agent did not assert on.

Does this work with my logged-in test account?

Yes, and that is the whole point. The agent ran the sign-in steps already. The session cookie or JWT is in the browser. When you take over, you are inside the authenticated session. You do not retype the email, you do not solve the captcha again, you do not request a fresh magic link. If the agent paid for something, you are looking at the post-payment page logged in as that test user.

Why does this matter when there is already a video recording of the run?

Video tells you what the agent saw. It does not let you do anything. If the agent failed on step 7 because a modal opened that the scenario did not predict, a video shows you a screenshot of the modal. Taking over shows you the modal in a live browser, with the DOM still attached, so you can right-click into devtools (via the Inspect button), grab the selector, dismiss the modal, finish the flow, and use what you learned to edit the scenario in place. Video is for the past tense, takeover is for the present tense.

Does the agent automatically pick back up if I release control?

Not yet. Release ends your interactive session and goes back to view-only, but it does not re-arm the same scenario to continue from your last action. The current flow assumes a takeover is terminal for that run: you finished what you needed by hand, possibly edited /tmp/assrt/scenario.md to encode what you learned, then start a fresh run. Continuing an agent run mid-flight after a manual interlude is a harder problem (it has to reconcile your edits against the agent's working memory) and Assrt does not pretend to solve it today.