Playwright e2e testing, and the video layer nobody writes about
Every guide on this topic stops once the first page.goto lights up green. None of them talk about what you actually get when you turn on video: a screencast that is fine for a human-authored test and nearly useless on an agent-driven one. This is a guide to the recording layer. What Playwright captures by default, why the default is mute on a long agent run, and the four DOM ids Assrt injects into every page to turn a ten-minute WebM into something you can watch at 5x speed and still follow.
What a Playwright e2e test actually is
Strip away the framework tutorials and a Playwright end to end test is three things. A URL. A sequence of interactions with real pages rendered by real Chromium, Firefox or WebKit. A list of assertions that either match or do not. The default setup emits a .webm recording in test-results/ when you pass use: { video: 'on' } in your config, plus a JSON run summary and a handful of trace files you can open in the Playwright Trace Viewer. That is genuinely all of it. Everything else (page objects, custom fixtures, retry logic, sharded CI) is implementation choice on top.
The reason that stack is attractive is the layer underneath: the Chrome DevTools Protocol gives you a real browser you can drive from a process outside it. Both Playwright the library and @playwright/mcp sit on CDP. Once you have that, the question is not "can the test click the button"; it is "what do you keep after the test ran, and can you still read it when something goes wrong six weeks from now."
The native video is a screencast of the page, not of the test
Playwright's video feature records the page, not the interaction. There is no cursor in the recording. There is no indicator for a click. There is no hint of what was typed into a field. This is fine when a human wrote the spec, because the spec file itself already tells you the story. It is a problem when the test was driven by an agent that made 40 tool calls in eight minutes, because the only sequential record of intent you have is the video, and the video is mute about who did what.
The fix lives one abstraction below Playwright. The CDP recorder captures the composited page. If you add a cursor to the DOM, the cursor gets recorded. If you add a keystroke indicator, the keystrokes get recorded. The overlay can be rendered at the highest z-index and set to pointer-events: none so it never interferes with the test itself. That is the entire idea behind Assrt's recording layer.
Same agent run, two videos
A page recording at 1280x720. Pages change, elements update, but there is no cursor, no keystroke feedback, no click indicator. If a button fires a network request, the user page shifts state; if it does not, you cannot tell from the video whether the agent clicked it or skipped it.
- No visible cursor
- No keystroke display
- No click indicator
- Idle pages produce frame gaps
The four ids injected before every action
All four elements live in a single 66-line script called CURSOR_INJECT_SCRIPT in browser.ts:33-98. The script is called via browser_evaluate before every tool call that visually affects the page, from injectOverlay() at line 429. The guard clause on window.__pias_cursor_injected makes it safe to call on every step: if the elements already exist, the call is a no-op.
__pias_cursor
A 20 px red circle with a white border and a soft glow. position: fixed, transition: left 0.3s, top 0.3s. Glides to the center of every clicked element. Position is tracked server-side as (cursorX, cursorY) and restored without animation on each page load.
__pias_ripple
A 40 px ring on z-index 2147483646 (one below the cursor). On every click it snaps to (x, y), sets scale to 0.5, then 50 ms later scales to 2 and fades out. The frame between the two states is what the video actually captures as a click.
__pias_toast
Black rounded pill at bottom center, green monospace text, 14 px, max 80% width. Clears itself 2500 ms after the last update. Every type_text tool call pushes 'a b c' formatted characters so the viewer can see keystrokes without reading the event log.
__pias_heartbeat
A 6 px green dot at the bottom right. Its only job is to run a CSS keyframe forever so the Chromium compositor keeps producing frames. Without it, idle pages produce gaps in the Playwright recording that look like the agent froze.
All four, invisibly
Every element is injected inside an idempotent guard and set to pointer-events: none, so the tests behave identically whether the overlay is there or not. The overlay is a viewing aid, not part of the test.
The heartbeat is the least obvious element and the one worth the whole page
A 6 px green dot at bottom: 8px; right: 8px with a Web Animations API keyframe. Two states (opacity 0.2, scale 0.8 to opacity 0.8, scale 1.2), 800 ms per iteration, infinite iterations, alternating direction, ease-in-out. That is the whole thing. It exists because Chromium's compositor is allowed to skip frames when nothing on the page has changed, and an agent that is waiting for a slow signup email produces a dashboard that is literally unchanged for ten seconds at a time. The Playwright CDP recorder sees no updates, drops frames, and your video timeline shows a nine-second gap that reads like a freeze. The heartbeat forces a repaint every 400 ms forever. When you watch the WebM back, the dot is a tiny blinking pixel you might not even notice; the recording stays smooth, which is all that matters.
The agent loop, end to end
The overlay is one step inside a longer loop. Every action the agent takes flows through the same six stages. Nothing runs in parallel; the video is a strict serialization of what happened.
One action, start to finish
Agent picks action
click, type, navigate
injectOverlay()
4 ids, guarded
showClickAt()
cursor + ripple
Playwright action
via @playwright/mcp
CDP recorder
1600x900 WebM
Range-seekable playback
5x default
Inputs into the runner, artifacts out the other side
Playback: a 5x player with Range seek, not a downloaded file
A .webm on disk is fine. A .webm you can open instantly at 5x, seek by five-second arrow-key presses, and swap between five playback speeds with the number row is materially different. The Assrt runner writes a self-contained HTML player to disk and starts a persistent Node HTTP server bound to a random localhost port (ensureVideoServer() at server.ts:118-215). That server implements the HTTP Range header with status 206 Partial Content, so the browser seeks to any timestamp without redownloading the whole file. On a 15 MB recording this is invisible. On a long background suite with 300 MB of WebMs, it is the difference between a debuggable artifact and one you do not bother to open.
“The default speed is 5x because an agent run is often four to eight minutes, and that is too long to watch at real time when you are looking for one bug.”
server.ts generateVideoPlayerHtml, lines 35-111
Numbers that describe the recording layer
Four numbers that bound the design. If you are building your own overlay or comparing recording layers, these are the knobs worth looking at.
Viewport and WebM resolution, hard-coded in browser.ts
Heartbeat keyframe duration, alternating direction, infinite iterations
Toast visible duration after the last keystroke
Default playback speed in the auto-opened HTML player
Flows worth recording at all
Not every test wants video. A pure data mapping check is better read as a passing green dot and a small JSON assertion file. The shape of flow that pays for the overlay is the one that crosses a browser, a third-party service, and an async signal. A rough sample of what customers point this at first.
Writing the scenario, not the spec
The input to the runner is not a .spec.ts file. It is a plain markdown file with one-line case headers and bullet steps. The runner reads it, the agent picks elements from a fresh accessibility snapshot on every action, and the overlay turns the whole run into a video you will actually sit through.
The case headers are the only required syntax. Everything under them is read as intent, not as a literal command. The agent is free to reach each assertion in whatever sequence the page actually requires, and the video records whatever path it chose.
Plain Playwright video versus Assrt overlay, line by line
| Feature | Playwright default video | Assrt (overlay on top of @playwright/mcp) |
|---|---|---|
| What the .webm actually shows | A page recording. No cursor, no keystrokes, no click indicators. You can see pages change but not which element was clicked. | A 20 px red cursor that glides to each element, a click ripple that snaps on every interaction, a black keystroke toast, a silent compositor heartbeat. |
| What happens on an idle page between steps | Chromium's compositor skips frames. The recording has one-second gaps that look like freezes. Timestamps drift from the real run. | __pias_heartbeat pulses every 400 ms via Web Animations API, Infinity iterations. The recorder always has something to encode. |
| How you watch the recording back | Open the .webm in any video player at 1x. Scrub frame by frame to find the moment a test failed. | Auto-opened HTML player at 5x by default, keyboard shortcuts 1/2/3/5/0 for speed, arrow keys seek 5 s. Served from a persistent Range-capable HTTP server so seek is instant. |
| How a test is written | A .spec.ts file with page.click, page.fill, page.waitForSelector, page.expect. You maintain selectors by hand. | A markdown file with #Case headers and bullet steps. No selectors. The agent picks elements from a fresh accessibility tree on every action. |
| What you own at the end of a run | A .webm in test-results/, a test-results.json per spec, no built-in viewer. The artifacts are fine; the legibility is up to you. | The overlay-enhanced .webm, a JSON event log with assertions and evidence, one screenshot per visual step, and a self-contained player. MIT licensed. Everything on disk. |
Both paths assume a working dev server. A wedged preview URL stays wedged; the overlay does not fix infrastructure.
Where to start
If you already have a Playwright setup you like, keep it. The overlay is orthogonal to the spec style you are using. Open Assrt the first time you find yourself staring at a five-minute video trying to understand what the agent or the script actually did. A plain run is one command:
The player opens on its own at 5x. The cursor glides to every element. The keystroke toast shows up at the bottom. The heartbeat pulses in the corner. The Playwright primitives run exactly as they did before; the only difference is that the recording is legible now.
Want to see the overlay on one of your current flows?
Thirty minutes with the founder. Bring one flaky Playwright run; we'll replay it with the overlay on, leave you the .webm and the JSON log.
Frequently asked questions
What is end to end testing in Playwright, as concretely as possible?
An end to end test in Playwright is a script that opens a real Chromium, Firefox, or WebKit instance, walks through a user scenario by calling page.goto, page.click, page.fill, page.waitFor and page.expect, and fails the test if any assertion does not match. The defining features are (1) a real rendering engine rather than a mocked DOM, (2) a single-file scenario that starts from a URL and ends on a verified outcome, and (3) a context object that persists cookies, localStorage and permissions across steps. Assrt runs on top of the official @playwright/mcp server, so every action is still a Playwright primitive. What changes is who writes the scenario and how you watch it back.
Playwright already has video recording. What is broken about it on an agent-driven run?
Playwright's video feature is a screencast of the page. It does not show a cursor, it does not show click coordinates, and it does not show keystrokes. On a human-written test you already know what the script did, so you are using the video mostly to confirm a visual state, and that is fine. On an agent-driven run the video is the only artifact that preserves ordering, intent and timing across 40-plus tool calls. Without a visible cursor and keystroke indicator you are reduced to scrubbing through still frames to figure out which button the agent actually clicked. Assrt fixes this by injecting a DOM overlay layer before every action, so the recording shows a red cursor gliding to each element, a green keystroke toast at the bottom of the page, and a click ripple that expands on every interaction. Source: browser.ts lines 33 to 98.
What exactly does Assrt inject into the page?
Four absolutely positioned DOM elements, all on z-index 2147483647 (the highest 32-bit integer, so nothing covers them): a 20px red cursor with a white border and a soft red shadow (id __pias_cursor), a 40px ring ripple that scales from 0.5 to 2 on every click (id __pias_ripple), a black rounded toast at the bottom center of the screen that shows typed characters in green monospace (id __pias_toast), and a 6px green dot in the bottom right corner that pulses forever (id __pias_heartbeat). The injection is idempotent: a window.__pias_cursor_injected flag prevents re-injection across re-renders. The exact script is CURSOR_INJECT_SCRIPT in /Users/matthewdi/assrt-mcp/src/core/browser.ts at lines 33 to 98. It is called via browser_evaluate before every action through injectOverlay() at line 429.
Why is the heartbeat dot necessary? It seems like a decorative detail.
It is the least obvious element and the most load-bearing. Chromium's compositor only produces video frames when something on the page has actually changed. On an idle dashboard between an agent's tool calls, nothing changes, so the Playwright CDP recorder skips frames or drops chunks entirely, and the video timeline ends up with one-second gaps that look like the agent froze. The heartbeat is a 6px dot with a CSS Web Animations API keyframe that runs 800 milliseconds per iteration, iterations: Infinity, direction: alternate. It forces a compositor repaint every 400 ms forever, so the Playwright recorder always has something to encode. When you watch the WebM at 5x, the dot is a tiny blinking green pixel in the bottom right. The video stays smooth.
How is the cursor position kept correct when the page navigates?
McpBrowserManager tracks the last cursor position server-side as two numbers, cursorX and cursorY (initialized to 640, 400 which is roughly screen center). Before every action the agent calls injectOverlay(), which reinjects the four DOM elements if they are not already there, then restores __pias_cursor to the stored (x, y) with a one-frame transition: none so it does not animate in from off-screen, and then re-enables the smooth transition 50 ms later. The result: every new page load starts with the cursor already where the last page left it, and animates smoothly to the next element.
How do you play the resulting video back? Is there a viewer?
Assrt ships a self-contained HTML player that is written to disk and auto-opened when a run finishes. It is a plain <video> tag with custom keyboard controls: Space toggles play and pause, the number keys 1, 2, 3, 5 and 0 set playback rate to 1x, 2x, 3x, 5x and 10x, left and right arrows seek by five seconds. The default speed is 5x because an agent run is often four to eight minutes, and that is too long to watch at real time when you are looking for one bug. Behind the scenes, the .webm is served from a persistent Node HTTP server on a random port that implements HTTP Range requests (status 206 Partial Content). That matters because it lets the browser seek to any frame instantly instead of redownloading the whole file. Source: server.ts lines 35 to 215.
Can I run a Playwright e2e test without any of this overlay stuff and still use Assrt?
Yes. The overlay is applied by the Assrt agent loop, not by Playwright itself. If you invoke @playwright/mcp directly (for example from your own agent or a Playwright spec file), you get the stock Playwright behavior and a stock WebM. If you call Assrt's CLI or the assrt_test MCP tool, you get the overlay plus the persistent video server plus the 5x player. Nothing about the overlay requires Assrt to own your test plans. The agent loop layer is optional, though the overlay is where most of the value lives on day one.
How is a test actually written? Do I author Playwright code or does Assrt generate it?
Neither in the usual sense. You write a plain-English plan file at /tmp/assrt/scenario.md with #Case blocks: a one-line header and three to five bullet steps under each header. The runner reads that file, builds a Playwright-backed agent loop on top of @playwright/mcp, and decides what to click and type at each step from a fresh accessibility tree snapshot. You never see a locator string. The scenario file is watched via fs.watch() with a one-second debounce (scenario-files.ts:97-103), so you can edit it while a run is in progress and the next case picks up the new version. If you want a Playwright .spec.ts file at the end, Assrt can export one, but the primary artifact is the scenario + the video + a JSON event log.
What resolution and format is the recording? Can I change it?
1600 by 900 WebM with the VP9 codec, hard-coded in browser.ts at lines 296 and 628 as --viewport-size 1600x900 and size: { width: 1600, height: 900 }. The resolution is chosen because it is wide enough to see sidebar and main content together on most modern SaaS apps, and because the compression on a mostly-static page at that size keeps a ten-minute run under about 15 MB. You can override the viewport by passing the viewport option into assrt_test, but there is no current knob for the output codec. WebM is what @playwright/mcp writes; Assrt does not re-encode.
Does the overlay affect the test itself? Could the cursor block a click?
No. Every overlay element has pointerEvents: 'none' in its inline style, so clicks pass through to the real page. The cursor is a pure visual, not an actual system cursor; the agent clicks elements via Playwright's element reference, not by moving a real mouse. The ripple, the toast and the heartbeat also do not receive events. A test that uses the overlay and a test that does not are equivalent from the page's point of view; the only difference is what appears in the recorded frames.
Is this open source, and where does my data go?
The Assrt agent and the @assrt-ai/assrt CLI are MIT licensed. When you run a test locally, the URL, the accessibility snapshot and any screenshots go directly from your machine to the LLM endpoint you configured (Anthropic by default, configurable via ANTHROPIC_BASE_URL). The video file stays on disk at /tmp/assrt/ and is served only over localhost by the persistent video server. Nothing leaves the machine unless you explicitly opt in to cloud sync, which uploads the plan and artifacts to app.assrt.ai under a scenario UUID. Compare against the $7.5K per seat per month closed runners that route every DOM through their backend by design.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.