E2E testing, beginner edition

E2E testing for beginners: the first test you can actually watch.

Every beginner guide on this keyword says the same things: the testing pyramid, pick Cypress or Selenium, set up a test environment, write a script, add it to CI. None of them address the first real beginner panic, which is that you hit npm test and sit there for 30 seconds with no idea if the browser did anything. This page fixes that.

Matthew Diakonov, Written with AI

Published April 18, 20269 min read

4.8from Assrt MCP users

Tests write as Markdown #Case blocks in plain English

Red cursor + ripple + keystroke toast injected into the page

WebM video of every run, auto-opens in a playback tab

One sentence, for beginners

The browser runs your test. Assrt paints a cursor onto the page so you can see it working.

No DSL to learn. No selectors to memorize. The test is English, the output is a video, and the cursor on that video is 20 pixels wide and red.

Install npx assrt-mcp

E2E testing for beginners

Your first test, recorded, with a cursor you can actually see.

Write a plan in English, in a .md file

Agent runs it using real Playwright

20px red cursor glides between targets

Every click leaves a ripple ring

Keystrokes show up as a toast banner

0:00 / 0:05

Why every "beginner" E2E guide leaves you stuck

Read the top ten results for this keyword. Bunnyshell, IBM, Browserstack, Microsoft's playbook, Testim, Testmu. They all teach the same abstract shape: E2E sits at the top of the testing pyramid, covers whole user journeys, is slower than unit and integration, and you should pick a tool (Cypress, Selenium, Playwright, Appium) based on whether you are testing web or mobile. Good context, but the moment you try to actually write and run a test, none of that helps.

The unwritten assumption is that a beginner will happily read their runner's stdout, open an HTML report, click into the failing spec, parse a selector timeout, and cross-reference the stack trace with the test file they wrote ten minutes ago. A beginner will not do any of that. They will hit run, see PASS or FAIL, and either feel good or feel lost. If the runner gives them no way to see what happened, the guide has failed them.

The first test: English in a .md file

Here is what your first test looks like. Two scenarios, 4 lines of English each, no imports, no config. Save it as scenario.md anywhere. That file IS the test.

scenario.md

Notice what is missing. No describe() or it(). No await page.locator(...). No selector strings. No fixture files. The agent reads your sentence, looks at the page, and picks the element that matches. If the page changes and the button moves, the plan still says "Click Get started" and still works.

The beginner panic: headless runs tell you nothing

Flip between the two tabs below. This is the real difference between a typical beginner E2E run and an Assrt run. Same Playwright underneath. Different thing to look at when it finishes.

You kick off the test. Your terminal sits there. 27 seconds later you see either PASS or FAIL with a line number. If it failed, you get a WebM where the browser ran but you have no idea which element the runner thought was 'the submit button'. You click into the HTML report and start parsing selector timeouts.

No visible cursor
No indication of click location
No keystroke feedback
Frames drop when the page is idle
Failures debugged via stack trace

The anchor fact: what Assrt actually injects into your page

Every beginner guide stops at "record a video." Assrt goes one level deeper. Before the first action on every page, a script runs inside the page under test and adds four DOM elements. They have pointer-events: none so they cannot interfere with your app, and they sit at z-index: 2147483647 so they always render on top. Here is the actual source.

assrt-mcp/src/core/browser.ts:33-98

Four elements, one purpose: make the video you get back look like a screen recording of a person using your app. The cursor uses a 0.3-second CSS transition on left and top, so when the agent moves from one button to another, the cursor glides instead of teleporting. The ripple fires on every click with a 50ms expand-and-fade. The toast catches every typed key so you can see letters land as they land. The heartbeat, tucked at bottom: 8px; right: 8px, is the weirdest one, and it has the most specific job.

Why the heartbeat exists

Chromium's compositor skips frames when nothing on the page is animating. That is an efficiency win in production. In a test recording it means your WebM literally drops frames while your app is waiting for a network response, and when you replay the video it looks like the runner froze. The heartbeat is a 6-pixel dot with an opacity animation that runs every 800ms forever. It is small enough that no reviewer notices it. It is constant enough that the compositor never goes idle. Your replay stays smooth even across 10-second waits.

How the overlay gets in, and how it survives navigation

The script is not a browser extension. It is a plain browser_evaluate call against the Playwright MCP server, guarded by a flag on window so it only runs once per page. Each time the agent navigates to a new URL, the new DOM wipes the overlay and navigate() re-injects it. The cursor's last known position is tracked server-side so the cursor doesn't teleport from off-screen on every page load.

How the cursor reaches your page

The exact numbers, from the source

Round numbers you can verify by opening assrt-mcp/src/core/browser.ts and reading lines 33 to 98 yourself.

0pxRed cursor diameter

0msCursor glide between targets

0pxClick ripple diameter

0msKeystroke toast fade

0pxHeartbeat dot diameter

0msHeartbeat pulse period

0z-index on every overlay

0kChars max per snapshot before truncation

A real first run, line by line

This is what a first-time beginner sees in the terminal after writing the two-case plan above. Each line maps to a tool the agent called. The final line names the WebM file that auto opens in your default browser with a custom player at 1x, 2x, 3x, 5x, and 10x playback speeds.

npx assrt-mcp run --url http://localhost:3000

Beginner fears, and what the runner does about each one

Every item below is a concrete thing Assrt ships, not a promise. If something stops working for you, you can open the source and read the function that powers it.

I cannot tell if the test is doing anything

Headless runs look like a frozen terminal. Assrt records a video with a visible cursor that glides between targets over 0.3s, so every click is something you can point at.

I do not know which selector to write

You do not write selectors. The plan is English, and the agent picks a click target by matching the text you wrote against the accessibility tree at runtime.

I do not know how long to wait

You do not pick a number. wait_for_stable injects a MutationObserver and returns when the DOM stops mutating for 2 seconds. Adapts to the page, not the clock.

I cannot handle the email verification code

create_temp_email spins up a disposable inbox, wait_for_verification_code polls it for up to 120s. Your first test can include signup without a Mailosaur account.

I cannot read the failure output

The failure is a WebM video plus the tool-call transcript. You watch the cursor hit the wrong button instead of parsing a selector timeout stack trace.

I will break production

Assrt launches a local Chromium by default, points it at a URL you pass in. Nothing in the runner writes to your prod database unless your plan says to.

Your first E2E test, step by step

Five steps, about ten minutes

Install the MCP server

In any project: npx assrt-mcp. No daemon, no port, no dashboard. It registers itself over stdio with your coding agent the first time you use it.

Write one #Case in a text file

Open scenario.md. Type a single #Case header and 3 lines of English. Example: Navigate to /. Verify the main heading contains your app name.

Run it against localhost

npx assrt-mcp run --url http://localhost:3000 --plan-file scenario.md. Watch the terminal log each tool call as the agent interprets your English.

Watch the replay

A video player auto-opens in your browser when the run finishes. You see the red cursor move between elements. The scrubber lets you step through frame by frame.

Add case two when case one passes

Signup is usually the next thing to test. The create_temp_email and wait_for_verification_code tools handle the inbox so you never leave the plan file.

Good signs in your first run

If you see these in the replay, the runner is working. If one is missing, something is off and the cause is usually specific.

What a healthy first run looks like

Red cursor appears in the top-left corner of the recorded video within 0.5s of the first action
Cursor glides smoothly between click targets instead of teleporting
Every click produces a visible 40px ripple that expands and fades
Every typed character shows up as a green monospace toast at the bottom of the page
The small green dot in the bottom-right corner is pulsing continuously
The terminal prints a video path and an auto-opened playback tab appears
If something fails, the last thing the cursor did is obvious from the video

Write your first E2E test with a cursor you can see

One npx command. Plan is English in a .md file. Video auto-opens with 1x to 10x playback. You will know what the runner did because you watched it happen.

Install npx assrt-mcp →

E2E testing for beginners: specific answers

I have never written an automated test. Is E2E testing actually a reasonable first step, or should I start with unit tests?

For a beginner building a small web app, starting with one good E2E test is often more motivating than writing ten unit tests. E2E asks a single concrete question: when a real user clicks through my app, does the thing they care about still work. Unit tests are worth it later, but they require you to have an opinion about the internal shape of the code, which a beginner often does not yet have. With Assrt you can write your first E2E test in plain English as a Markdown #Case, hit run, and watch the browser do it. That loop teaches you more about your own app in 10 minutes than reading a testing-pyramid article.

What is the difference between E2E testing and integration testing in practice?

Integration tests call your code across module boundaries and assert on the return value. E2E tests drive the actual UI your users see: they click buttons, type in inputs, wait for real network responses, and assert on what is on the screen. If your app is a web form that submits to an API, an integration test might call the API handler directly with a fake request body; an E2E test opens the page, types into the form, presses Submit, and verifies the success message appears. For beginners, E2E is usually the easier one to start with because the test reads like a description of what you would do manually.

Do I need to learn a programming language before I can write an E2E test?

Not with Assrt. The test format is Markdown: a header line like #Case 1: Sign up with email followed by 3-5 imperative English lines. Example: Navigate to /signup. Use a disposable email. Enter the OTP. Verify the dashboard heading appears. A coding agent interprets that, picks browser actions at runtime, and runs them through real Playwright. You never import a test framework. You never wire up a page-object pattern. If you can write a commit message or a Jira ticket, you can write an Assrt #Case. Traditional tools (Cypress, Selenium, Playwright directly) do require JavaScript or Python, and that is still a fine path, just not the only one.

What does it mean when a test runs headless, and why is that bad for beginners?

Headless means the browser does not open a window: it runs in the background with no visible UI. For production CI that is great, because it is faster. For a beginner running a test for the first time it is disorienting, because you get a terminal that sits there for 30 seconds and then prints PASS or FAIL with no sense of what actually happened. Assrt records a real video of every run and, if the test fails, it also shows the injected red cursor moving between elements, ripple animations on each click, and keystroke toasts for every typed character. A beginner can watch the replay and immediately tell whether the agent clicked the wrong button, typed the wrong text, or waited too short.

Exactly what gets injected into the page under test when Assrt runs?

Four elements, all at z-index 2147483647 (the max 32-bit int) so they always float above your app's own DOM. First, a 20px red cursor (rgba(239,68,68,0.85), white border, soft red glow) that glides between click targets over 0.3 seconds using a CSS transition. Second, a 40px ripple ring that scales from 0.5x to 2x on each click and fades to zero opacity in 50ms. Third, a monospace green-on-black keystroke toast that displays the last key pressed and fades after 2500ms. Fourth, a 6px green heartbeat in the bottom-right that animates opacity between 0.2 and 0.8 every 800ms, whose only job is to force continuous compositor frames so the CDP screencast video never stalls. The script that creates them lives at assrt-mcp/src/core/browser.ts lines 33-98 and is re-injected on every navigate() call.

Why is the heartbeat dot there? It does not look functional.

Because browsers optimize away frame rendering when the page is idle. If the test is waiting for a slow network response and nothing on the page is moving, the CDP (Chrome DevTools Protocol) screencast that captures the video will start dropping frames. When you replay, it looks like the test froze, which is confusing for a beginner who is already nervous about whether anything is working. The heartbeat is a 6-pixel dot in the bottom-right that animates every 800ms forever. It is small enough that it does not distract, and its continuous animation forces the compositor to keep producing frames, so your video stays smooth even during long waits.

If Assrt injects DOM elements into my page, does that change how my app behaves?

No, and you can verify this by reading the inject script. Every overlay element has pointer-events: none so it cannot intercept clicks or hover events. Every overlay is position: fixed with z-index 2147483647, which takes it out of your normal layout flow. The injection is guarded with if (!window.__pias_cursor_injected) so it only runs once per page. When the page navigates, the new DOM wipes the overlay and navigate() re-injects it. Nothing the overlay does can affect your app's click handlers, form state, animations, or layout. The only thing it changes is what the video looks like when you replay it.

I installed the tool. What is a reasonable first test to try?

Point Assrt at your app's landing page and ask it to verify the main heading. That is it. A one-line #Case like 'Navigate to /. Verify the main heading contains Your App Name' runs in under 2 seconds and confirms your test pipeline works end to end: browser launch, navigation, snapshot, assertion, video recording, pass/fail exit code. Once that passes, add a second case for whatever matters most in your app, usually signup or sign-in. If signup involves an email verification code, the built-in create_temp_email and wait_for_verification_code tools handle the inbox round-trip without you wiring Mailosaur.

What happens when my first test fails? How do I debug it without knowing what I am doing?

Three things land on disk: a WebM video of the full run, a text transcript of every tool call the agent made (click what, type what, assert what), and a screenshot at the failure point. Open the video first. The injected cursor shows you exactly where the agent tried to click; the ripples show you exactly when a click fired; the toasts show you each keystroke. If the cursor landed on the wrong element, your plan line was ambiguous and you can rephrase it. If the cursor landed on the right element but the click did not do anything, the page was still loading and you add wait_for_stable. You are not guessing at stack traces. You are watching a video of a bot using your app.

Is this approach production-safe, or is it just a learning tool for beginners?

It is the same code path either way. The overlay injection is purely cosmetic and runs in local and CI mode the same. In CI you would typically set --no-auto-open so the video player does not try to launch a browser on the CI runner, but the video itself still gets written to /tmp/assrt/videos and can be uploaded as a build artifact. Teams that adopted it as a beginner onboarding tool have kept it for regression runs because engineers who inherit a failing test still benefit from a replay with a visible cursor. The visibility of the runner is a feature, not training wheels.

No more invisible tests

The test is English. The cursor is red. The video is yours.

0 DOM overlays, one heartbeat, zero selector headaches.

Try Assrt free