E2E testing tutorial

Learn e2e testing by learning eighteen tools, not a framework.

Every other tutorial for this keyword starts with npm install and a selector worksheet. This one starts with a plan.md file and a closed vocabulary of eighteen browser tools an LLM agent picks from. You write intent in English; the runner picks the ref at runtime from the live accessibility tree. The anchor for everything in this page is the TOOLS array at assrt-mcp/src/core/agent.ts:16.

Talk to the team Read agent.ts on GitHub

Matthew Diakonov, Written with AI

Published April 20, 202611 min read

4.9from used by AI-first QA teams and vibecoders shipping to prod

18 tools, not a framework

Real OTP via disposable inbox

Open source, self-hosted

Built on Playwright MCP

assrt e2e, in 6 seconds

Your plan is prose. Your framework is 18 tools.

Your plan is prose

Your framework is 18 tools

No selectors, no page objects

Runs on real Playwright

0:00 / 0:05

What this tutorial is not

It is not a Playwright tutorial, a Cypress tutorial, a Selenium tutorial, or a how-to-install-a-runner tutorial. The first three chapters of those are exactly the same (install the framework, set up a test runner, copy a selector into a locator call) and the first three chapters of this one do not exist. If your mental model of "writing an e2e test" ends at the moment you type page.locator(, this page is going to feel strange.

What you are going to learn instead is the set of verbs a testing agent actually needs. There are eighteen of them. They are not a DSL you design; they are the schema the agent gets in its Anthropic tools array. You will see the file. You will count the entries. You will be able to write a complete e2e test in a markdown file once you know what each verb does.

The whole vocabulary, at a glance

navigatesnapshotclicktype_textselect_optionscrollpress_keywaitscreenshotevaluatecreate_temp_emailwait_for_verification_codecheck_email_inboxassertcomplete_scenariosuggest_improvementhttp_requestwait_for_stable

The anchor: eighteen tools, declared in one file

This is the non-prose part. Open /Users/matthewdi/assrt-mcp/src/core/agent.ts. Line 16 starts the TOOLS array. Line 196 ends it. In between you will find exactly eighteen tool schemas, each with a name, a description, and a JSON-schema input. That is the runtime for every #Case in every plan anyone ever writes against this system. There is no config that adds a nineteenth verb; the set is closed by design.

assrt-mcp/src/core/agent.ts

0Tools in the agent schema

0Selectors stored on disk

0wait_for_stable ceiling (s)

0OTP poll window (s)

“"The framework is eighteen verbs and a markdown file. We ported the entire e2e surface in a day."”

assrt-mcp/src/core/agent.ts:16

What each tool actually does

Grouped by the order you will reach for them in a real test, not the order they are declared in. The last four are the ones a selector- centric tutorial never teaches you: they are how you test signup with a real code, how you wait for async DOM without hand-rolling a waitForSelector chain, and how you verify that your app actually hit the external system it said it did.

The navigation trio

navigate, snapshot, click, type_text. Ninety percent of every scenario lives here. snapshot returns the accessibility tree with a per-run ref like e5; click and type_text take that ref. No CSS, no XPath, no page objects.

select_option

Picks an option from a native <select>. Takes an array so you can select multiple in a multi-select in one call.

scroll and press_key

scroll takes a y delta (positive for down). press_key sends Enter, Tab, Escape, or any other key name. Use scroll to bring off-screen elements into the snapshot.

wait and wait_for_stable

wait blocks for visible text OR a fixed ms (capped at 10000). wait_for_stable runs a MutationObserver loop and returns when the DOM has been quiet for N seconds. Use wait_for_stable after submitting forms or triggering streaming responses; it adapts to actual load speed.

screenshot and evaluate

screenshot captures the current frame. evaluate runs a JS expression and returns the stringified result. evaluate is the escape hatch for anything the eighteen other tools cannot express, including the OTP paste trick.

create_temp_email

Spins up a real disposable inbox. Use it BEFORE filling any signup form. The address is remembered for the rest of the run so later tools can poll the inbox.

wait_for_verification_code

Polls the disposable inbox for up to 60s (configurable up to 120s). Returns the extracted numeric code, the from, and the subject. This is the step every locator-centric tutorial skips.

check_email_inbox

Dumps the latest email from the disposable inbox as plain text. Use when the verification text is not a simple numeric code, or when you need to follow a magic link embedded in HTML.

assert

Records a named assertion with a passed boolean and a free-text evidence string. Every #Case accumulates a list of these and surfaces them in the final report; a scenario fails if any assert.passed is false.

complete_scenario

Ends the current case. Takes a one-line summary and an overall passed boolean. The agent is instructed to call this exactly once per #Case; the runner uses it to separate per-case reports from per-step telemetry.

suggest_improvement

Emits a structured UX bug report with title, severity, description, and suggested fix. It is not an assertion; it lets the agent flag obvious problems it noticed in passing without failing the test.

http_request

GET/POST/PUT/DELETE with 30s timeout and 4000-char response truncation. Use to verify that an in-app action produced the expected external effect: a webhook fired, a Telegram bot received the message, a Slack channel got the post.

Your first scenario, in markdown

Three cases. No imports. No selectors. The parser splits on #Case N: and feeds each block to the agent. State carries between cases because the browser session is shared. This is the entire plan file.

plan.md

Run it end to end

Three commands. The second one is just cat, so really two. The MCP server registration at step one only happens once per machine.

your first assrt run

The uncopyable part: a real signup test against production

This is the section other tutorials skip. The standard advice for testing signup is "stub the SMTP server" or "add a test-mode flag that bypasses the OTP." Both of those only test your stub, not your actual auth flow. Assrt tests the real thing by pairing a disposable inbox with a specific paste expression.

The expression matters. OTP widgets usually render as six single- character <input maxlength="1"> inputs and autoadvance focus on each keystroke. Typing one digit per input races with the autoadvance handler and drops digits. Firing a single ClipboardEvent('paste') on the container runs the widget's canonical fill path. This is the exact expression the agent is instructed to use, hard-coded into the system prompt at lines 234-236.

agent.ts system prompt excerpt

real OTP against production

Why this is the hardest thing to copy

Wrapping a disposable-email SDK takes an afternoon. Wiring the DataTransfer paste into a locator-based framework is where most teams give up: there is no clean place to put it. It is not a locator, it is not a step, it is a side-channel browser command that is specific to a widget pattern. Assrt has a tool called evaluate that takes exactly this kind of expression, and a system prompt that tells the agent to use it on OTP forms. That pairing is the thing no competitor page for this keyword teaches.

How the agent picks a tool for each sentence

A #Case step is prose. A tool call is JSON. The bridge between them is an LLM given the TOOLS schema plus the accessibility tree. "Click the Sign in button" plus a tree with [role=button, name="Sign in", ref=e17] collapses into click({ ref: "e17" }). The diagram below is the actual message loop.

per-run agent loop

The invisible tool: wait_for_stable

Most flakes in most suites are not selector flakes; they are timing flakes. A framework tutorial tells you to add a waitForSelector or a network idle wait. Assrt has a single tool that covers both cases and a lot more: wait_for_stable. It injects a MutationObserver, polls mutation counts every 500ms, and returns when the DOM has been quiet for a configurable window. No selector, no URL pattern, no per-page wait list.

assrt-mcp/src/core/agent.ts:962-994

Six steps from zero to a running test

If you are reading this as a step-by-step tutorial, follow these in order. The first is a one-time install. The rest are per-scenario.

Install

Run npx @assrt-ai/assrt setup once. This registers the MCP server in Claude Code (or Cursor), drops a QA reminder hook into your CLAUDE.md, and configures the CLI. Nothing is added to your app's package.json.

Write the plan in markdown

Create plan.md. Each #Case N: heading starts a scenario. Below the heading, write 2-5 English sentences describing what to click, type, and verify. The parser at agent.ts:620-631 splits on a regex that tolerates #Case, #Scenario, and #Test with or without a leading #.

Run the plan

npx assrt run --url <your-url> --plan-file plan.md --video. The runner preflights the URL with fetch to fail fast on a wedged dev server (agent.ts:518-543), launches Playwright MCP over stdio, and streams each tool call to the terminal.

The agent picks tools

For each #Case the agent calls snapshot first, reads the ARIA tree, and then picks from the 18 tools to execute the English sentences. It records assertions as it goes. Between cases, cookies and auth state carry over so you can chain sign-up then dashboard tests without re-logging in.

Watch the video, not the logs

When a case fails, open the auto-saved webm at /tmp/assrt/<run>/video.webm. The agent also dumps the accessibility tree at the moment of failure into the report, which is usually more useful than a stack trace.

Ask assrt_diagnose

If a whole scenario fails, run assrt_diagnose. It reads the failed case plus the live page and emits a corrected #Case in prose, not a selector patch. You review a sentence change, not a fuzzy locator diff.

What flows into, through, and out of the agent

one assrt run, end to end

What a tutorial teaches

Framework-centric tutorials teach you an API surface. This one teaches a tool schema.

Feature	Framework tutorial	Assrt
What you install	A framework (Playwright, Cypress, WebdriverIO) + selectors lib + assertion lib + a reporter.	One CLI and an MCP server. Nothing in your app dependencies.
What the test file looks like	A .spec.ts or .cy.js with imports, describe/it nesting, and page.locator() strings.	A plan.md with #Case N: headings and English sentences.
Size of the vocabulary you teach	A framework API surface: dozens of locator methods, matchers, fixtures, hooks, config keys.	18 named tools at agent.ts:16. No methods to chain.
Signup with a real OTP	Stub the SMTP server, bypass auth with a test flag, or mock the OTP endpoint.	create_temp_email + wait_for_verification_code + the DataTransfer paste expression at agent.ts:235.
Waiting for async DOM	auto-wait per locator, plus hand-rolled page.waitForSelector/waitForResponse calls.	wait_for_stable, a MutationObserver quiet-window loop. No per-selector wait.
Verifying external side-effects	A separate suite, a mock server, or postconditions in a different framework.	http_request: poll Telegram/Slack/GitHub APIs in the same run.
What ages between releases	CSS/XPath strings, data-test-id coverage, visual baselines, fixture bodies.	Nothing. No stored locator, no fixture snapshot. Refs are per-run.
Price to start	Free frameworks, but closed SaaS runners quoted at $7,500/month.	$0, open source, self-hosted. You pay LLM tokens only.

What to do next

Read agent.ts lines 16 to 196 once, in a text editor. It is shorter than any Playwright config you have ever seen.
Write a plan.md with one #Case that navigates to your homepage and asserts your hero heading is visible. Run it with npx assrt run --plan-file plan.md --url.
Add a #Case 2 that signs up using create_temp_email and wait_for_verification_code. Do not stub anything.
Add a #Case 3 that runs http_request against your webhook receiver and asserts the payload. Now your e2e test covers the external side-effect too.
When something fails, open the video first, then run assrt_diagnose. Resist opening the accessibility tree. The tools above are enough for ninety percent of flakes.

Eighteen tools. One call.

Fifteen minutes to see Assrt run a real plan.md against your app, including the OTP flow.

E2E testing tutorial FAQ

What is actually different about this e2e testing tutorial?

Every top result for 'e2e testing tutorial' teaches you to install a framework (Playwright, Cypress, Selenium, WebdriverIO) and then write selectors against the DOM. This tutorial has zero framework import. The entire vocabulary you learn is eighteen named tools in a TypeScript array at /Users/matthewdi/assrt-mcp/src/core/agent.ts line 16. The plan you write is a markdown file with #Case N: headings and English sentences. A Playwright MCP server is still under the hood, but you never call it directly and you never write a selector; the agent resolves each sentence against a fresh accessibility tree at runtime.

What are the eighteen tools, in order?

navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable. They are declared in /Users/matthewdi/assrt-mcp/src/core/agent.ts between lines 16 and 196. That file is the canonical reference; if you add a nineteenth tool there, every #Case in the world becomes able to use it without a plan change.

How does the OTP flow work without stubs?

Three pieces. First, create_temp_email allocates a disposable inbox and saves the address for the run. Second, wait_for_verification_code polls that inbox every three seconds for up to sixty (or up to one-twenty if you ask), extracts the numeric code with a regex, and returns it. Third, if the verification UI uses the common split-digit layout (six single-character <input maxlength=1> fields), the agent is instructed to paste the code in a single DataTransfer ClipboardEvent rather than typing one digit per field. The exact expression lives in the system prompt at agent.ts lines 234 to 236 and is worth reading even if you never use Assrt.

Why DataTransfer instead of typing each digit?

Typing one digit per input races with the widget's autoadvance handler. Most OTP widgets move focus on input; typing 'A' then 'B' often ends up as 'A_B' because the second keystroke lands after focus jumped. Pasting via ClipboardEvent on the container fires one onPaste on the React or Vue owner, which runs the widget's single canonical fill path. This works for the large majority of OTP components shipping today. It is the one thing in this entire tutorial you will copy into non-Assrt code, which is why it is linked from the anchor above.

What does 'wait for the page to stabilize' actually do?

It injects a MutationObserver into document.body that increments window.__assrt_mutations on every childList, subtree, and characterData change. The agent then polls that counter every 500ms. When the counter has not changed for stable_seconds in a row (default two), the wait returns. There is a timeout_seconds ceiling (default thirty). The implementation is in agent.ts lines 962-994. The point is that you never need to write waitForSelector('.loading-done') again; any DOM quietude satisfies it, which is what 'the page is done' actually means across a diverse UI.

Can I verify webhooks and external APIs in the same test?

Yes. http_request is a first-class tool at agent.ts:171. It supports GET, POST, PUT, DELETE, custom headers, a JSON body, and a 30s timeout. Responses over 4000 characters are truncated. The canonical example: after your webapp claims to have sent a Telegram message, the same #Case calls http_request with https://api.telegram.org/bot<token>/getUpdates and asserts the message is in the list. This collapses the usual two-suite setup (e2e for the UI + integration tests for the webhook) into one scenario.

How are scenarios separated, and does state carry over?

The parser at agent.ts lines 620-631 splits the plan on a regex that matches #Case, #Scenario, or #Test followed by a number and a colon (or period). Each block becomes a scenario with the rest of the block as its steps. Inside one run, all scenarios share the same browser session: cookies, local storage, auth, and URL state carry over. That is deliberate. It lets you write #Case 1 sign up, then #Case 2 create a project, then #Case 3 invite a teammate, with each case building on the last, instead of logging in three times.

What if the test fails? Do I have to debug like a Playwright test?

No. Two mechanisms take the place of the usual console-log-and-rerun loop. First, every run can record a video with --video, saved to /tmp/assrt/<run>/video.webm; the default is to auto-open the player when the run finishes. Second, assrt_diagnose takes the failure report plus the live page and returns a corrected #Case in prose, with Root Cause, Analysis, and Recommended Fix sections. The key line is that the diagnosis output is markdown you paste back into plan.md, not a selector patch you paste into a .spec.ts.

Does this replace Playwright, or sit on top of it?

It sits on top of Playwright MCP. @assrt-ai/assrt spawns a Playwright MCP server over stdio and calls browser_navigate, browser_click, browser_type, browser_snapshot, and friends underneath the eighteen tool names you see. The difference is what you write. You write #Case blocks, not .spec.ts files; the agent writes the Playwright calls. If you are already invested in Playwright, nothing is thrown away; you are adding a prose layer above the existing one.

How do I verify the claims in this tutorial without running anything?

Open three files. /Users/matthewdi/assrt-mcp/src/core/agent.ts line 16 for the TOOLS array and line 196 for where it ends. Lines 234-236 for the OTP paste expression. Lines 620-631 for the scenario regex. Lines 962-994 for the wait_for_stable MutationObserver loop. Every non-trivial claim in this tutorial has a file-and-line anchor in that one source file so you can diff it against this page and flag anything that has drifted.