Readable Playwright test generator: when the plan IS the test

Most readable Playwright test generators give you a .spec.ts file. Assrt gives you a markdown file. The agent reads the markdown, calls the official @playwright/mcp@0.0.70 runtime, and runs your test. No codegen. No selectors. No vendor DSL. No runtime rented from a dashboard at $7,500 a month.

Matthew Diakonov, Written with AI

Published April 20, 202611 min read

The plan IS the test

scenario.md → real Chromium, no .spec.ts in between

You write plain English in a #Case block

Assrt spawns @playwright/mcp@0.0.70

Claude Haiku interprets each line against the live a11y tree

Every run drops a video, events.json, and screenshots

The plan is yours. Forever. On your disk.

0:00 / 0:05

4.9from open source · MIT

runs on @playwright/mcp@0.0.70

scenario.md stays in your repo

free vs $7.5K/mo vendors

self-hosted, no cloud required

The angle the top SERP results miss

Search for "readable playwright test generator" and the first page agrees on one thing: the output should be a .spec.ts file. Playwright Codegen records your clicks and emits code. Record and replay SaaS tools emit code. ChatGPT prompts emit code. Every conversation about readability is really a conversation about how long it takes to scan the generated code before you have to fix it.

Assrt starts from the opposite end. The most readable Playwright test is the one that was never compiled to code. You write the test in the same voice you would write a bug ticket, save it to a markdown file, and the agent translates each English line into a Playwright MCP call at runtime. The codegen step is gone. The locator file is gone. The thing a junior reviews on Monday morning is the same thing the CI runner executes on Tuesday night.

#Case 1: Homepage loads#Case 2: Signup with OTP#Case 3: Redirect when signed out#Case 4: Dashboard renders#Case 5: Logout clears session#Case 6: Search returns results#Case 7: Password reset email lands

Every pill above is a real #Case header. Nothing else is required to run the test.

Side by side: the markdown vs the codegen output

The left file is the complete Assrt test for three scenarios including a signup with real email OTP verification. The right file is what a typical "AI readable Playwright generator" emits for the first scenario only, with a TODO where a real mail provider still has to be plumbed by hand.

scenario.md (runs as-is) vs signup.spec.ts (still needs work)

# scenario.md  — this file IS the Playwright test

#Case 1: New user can sign up and reach the dashboard
Navigate to /signup
Call create_temp_email and type that address into the Email field
Type a strong password into the Password field
Click the Create account button
Wait for the "Check your email" message
Call wait_for_verification_code and type the code into the OTP field
Click Verify
Assert the dashboard heading "Welcome" is visible

#Case 2: Existing user can log in
Navigate to /login
Type support@example.com into the Email field
Type the staging password into the Password field
Click Sign in
Assert the URL contains /app

#Case 3: Signed-out users are redirected from /app
Navigate to /app
Assert the URL becomes /login within 3 seconds

-9% fewer lines

What the pipeline actually looks like

Three inputs. One agent plus one real Playwright MCP runtime. Four outputs. Nothing else between scenario.md and a recorded browser session.

scenario.md → Chromium, zero codegen in the middle

0Lines of .spec.ts generated

0Playwright MCP tools exposed to the plan

0msfs.watch debounce before Firestore sync

0sTypical signup + OTP flow, end to end

First number: zero .spec.ts files emitted, because the plan is the test. Third number: the exact setTimeout debounce on line 100 of scenario-files.ts.

The anchor fact: where the readable test physically lives

This is the part no other page covers, because no other tool has it. Assrt's readable test plan is a file on your disk at a fixed path, and the path is hard-coded in three constants at the top of /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts, lines 16 to 20. The file is monitored by fs.watch with a 1-second debounce. Any edit you make, by hand or by agent, pushes to the cloud without a save button.

assrt-mcp/src/core/scenario-files.ts (lines 16-21)

tree /tmp/assrt

editing scenario.md syncs silently

You can open scenario.md in Cursor, vim, VS Code, or your terminal. You can commit it, diff it, and review it in a pull request. You can delete it and the next assrt_test call recreates it from the saved plan ID. There is no "export my tests" button because the tests never left.

How a readable plan becomes a real browser run

Four steps, nothing else. Every step cites the file and line number in the Assrt source that controls it, so you can verify rather than take my word.

From markdown to /tmp/assrt/<runId>/

Write the plan, not the code

Open scenario.md in any editor. Each test is a `#Case N: short name` header plus 3 to 5 English steps. The agent parses headers with the regex in agent.ts line 621. No imports, no fixtures, no locator strings.

Run it against @playwright/mcp

`npx assrt run --url <target> --plan-file scenario.md`. The agent spawns @playwright/mcp@0.0.70 (see freestyle.ts line 586), navigates real Chromium, and interprets each line against the live accessibility tree returned by the snapshot tool.

Edit; the cloud catches up

fs.watch is registered on scenario.md in scenario-files.ts line 97. Any save triggers a 1-second debounce at line 99, then updateScenario pushes the new plan to Firestore. No explicit save button, no separate UI.

Read the evidence, not a stack trace

Every run writes /tmp/assrt/<runId>/ with execution.log, events.json, screenshots, and a .webm video. The self-contained player.html opens at 5x so a 12-second flow is a 2.5-second review.

Watch a run

Here is exactly what the execution log looks like for the signup + OTP case from the markdown above. Every line comes from the same emit() callback at line 443 of assrt-mcp/src/mcp/server.ts.

npx assrt run …

What you get that a .spec.ts generator does not

Six properties that fall out of not emitting code, not one of which is "it reads nicely." Reading nicely was table stakes.

No .spec.ts to maintain

The plan IS the test. Assrt does not emit a TypeScript file that you later have to refactor, version-pin, or sync with your UI changes. Deleting a line in scenario.md deletes the step.

Real @playwright/mcp

Pinned to v0.0.70 in freestyle.ts line 586. The same Microsoft-maintained MCP server that powers Playwright's own AI tooling. No proprietary runtime.

Self-hosted, no cloud lock-in

Every file is on your disk under /tmp/assrt or your repo. Firestore sync is optional, not required. Cancel the vendor, keep the tests.

18 Playwright MCP tools

navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable. Defined in agent.ts lines 16 to 196.

Accessibility-tree first

The agent reads a live snapshot before each interaction. "Click Sign In" resolves to the element whose accessible name is "Sign In", not to a fragile CSS path.

Video at 5x by default

Every run produces a 15fps webm plus a self-contained player.html with hotkeys. Auto-opens on macOS. 12-second tests review in 2.5 seconds.

Line-for-line vs a $7.5K/mo vendor

Most commercial "AI Playwright generators" in this price band bundle three things: a recorder, a codegen layer, and a cloud runner. Assrt bundles zero of those. Here is the material difference, row by row.

Feature	Typical AI codegen vendor	Assrt (scenario.md)
Output artifact	.spec.ts file	scenario.md (markdown)
Runtime	Proprietary YAML or closed DSL	@playwright/mcp@0.0.70 (official)
Selectors	CSS / XPath / data-testid	Accessibility tree only
Mail / OTP handling	Bring your own SMTP stub	create_temp_email built-in
Vendor lock-in	Tests live in their cloud	None, MIT-licensed CLI
Cost	~$7,500 / month enterprise tier	Free, open source
Evidence	Dashboard screenshot + trace viewer	execution.log + events.json + video

The Reddit-thread version of this page

If you landed here from a thread about flaky Playwright tests, unreadable codegen output, or a vendor charging $0/mo for a proprietary YAML flavor: the thing I would install tonight is npx @assrt-ai/assrt setup. It registers the MCP tools, drops a scenario.md template in your project, and the rest of this page explains what happens next. No account. No dashboard to learn. If you hate it, delete /tmp/assrt and nothing else was touched.

Want help migrating a brittle Playwright suite to scenario.md?

30-minute call with the maintainer. Bring one flaky test; leave with a readable plan and a passing run.

Readable Playwright test generator: the questions people actually ask

What exactly does a readable Playwright test generator produce, and how is Assrt different?

Most tools in this category (Playwright Codegen, record-and-replay products, AI codegen services that charge $7,500 a month) produce a .spec.ts file. That file is still code. It still has locators, imports, expect chains, beforeEach blocks, and a CI runner that has to install all of them. The readability question is only whether a reviewer can scan the code quickly. Assrt takes a different position: the most readable Playwright test is the one that was never compiled to code. You write a plan in the same voice you would write a bug ticket, save it to /tmp/assrt/scenario.md, and the official @playwright/mcp server (pinned to v0.0.70, see freestyle.ts line 586) executes each step on a real Chromium page. There is no .spec.ts to read because the plan is the test, and the test is the plan.

Does Assrt actually run on real Playwright, or is it a proprietary runtime pretending to be Playwright?

Real Playwright. Every interaction goes through @playwright/mcp, the official Microsoft-maintained Playwright MCP server. You can see this pinned at version 0.0.70 in /Users/matthewdi/assrt/src/core/freestyle.ts, inside the baseImageSetup string at line 586: `npm install -g @playwright/mcp@0.0.70`. The Assrt agent calls navigate, snapshot, click, type, evaluate, select_option, scroll, press_key and the rest of the Playwright MCP toolset. No custom locator language, no proprietary DSL wrapped around Playwright, no tag syntax that only works in a vendor dashboard. If Playwright can do it, Assrt can drive it through the same channel.

Where does the readable test plan physically live, and what happens when I edit it?

Three files. The plan itself at /tmp/assrt/scenario.md (markdown, editable in any text editor, checkable into git). The scenario metadata at /tmp/assrt/scenario.json ({ id, name, url, updatedAt }). The latest run results at /tmp/assrt/results/latest.json. These paths are the constants ASSRT_DIR, SCENARIO_FILE, SCENARIO_META, RESULTS_DIR, LATEST_RESULTS at lines 16 to 20 of /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts. When you edit scenario.md, fs.watch fires (line 97), a 1-second debounce timer starts (line 99), and on timer expiry the file is read and pushed to Firestore via updateScenario. The debounce value is 1000 milliseconds, hard-coded in setTimeout on line 100. If you are a Cursor or Claude Code user, this means your agent can edit the plan file between runs and the cloud copy stays in sync without any explicit save command.

What does the #Case format look like, and what are the rules?

A #Case block is a short header plus 3 to 5 lines of plain English. The header is `#Case N: short action-oriented name`. The body is the steps. The agent parses it with the regex /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi at agent.ts line 621 (inside assrt, not assrt-mcp). Rules come from PLAN_SYSTEM_PROMPT in /Users/matthewdi/assrt-mcp/src/mcp/server.ts lines 222 to 236: each case self-contained, selectors described in English ("Click the Login button" not "div[data-testid=login]"), verify observable things (visible text, titles, URLs, element presence) not CSS or performance, keep to 3 to 5 actions max, generate 5 to 8 cases max. The agent is Claude Haiku 4.5 by default (model ID claude-haiku-4-5-20251001), overridable with the --model flag.

If there is no generated .spec.ts, how do I run the test in CI?

Two ways, no code. First option, invoke the CLI as `npx assrt run --url $STAGING_URL --plan-file tests/scenario.md --json` from your GitHub Actions workflow. Exit code is 0 on pass, non-zero on fail. The --json flag prints a structured report to stdout that you can pipe into a GitHub step summary. Second option, call the assrt_test MCP tool from a Claude Code or Cursor CI runner, passing plan text or a scenarioId (UUID of a saved plan). In both cases the artifacts land in /tmp/assrt/<runId>/ on the worker: execution.log, events.json, screenshots/, and video/recording.webm. Upload that directory as a workflow artifact and a failing CI run is inspectable in the browser. No test framework to install, no Playwright version to keep in lockstep with your repo.

How does this compare to the $7,500 per month commercial AI test generators?

Four specific differences that matter for readability. One: cost. Assrt is free and open source; the MCP server, the agent, and the schema for scenario files are all on GitHub. Two: portability. The readable test is a markdown file on your disk, not a row in a vendor database. If you stop paying, you still have the test. Three: the runtime. Assrt uses @playwright/mcp@0.0.70, the same Playwright MCP that Microsoft ships; commercial tools typically use a proprietary YAML or a closed dialect that runs on their cloud. Four: observability. Every run writes a /tmp/assrt/<runId>/events.json with all 18 possible tool calls (defined as TOOLS in /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 16 to 196) plus a .webm video with a self-contained player.html that opens at 5x speed. You can audit exactly what the agent did, not just whether it passed. With a black-box vendor you trust the dashboard.

Can the plan handle a login flow that requires a verification code in email?

Yes, and without any mail server setup. The agent exposes three email-shaped tools: create_temp_email (defined in /Users/matthewdi/assrt-mcp/src/core/agent.ts line 115), wait_for_verification_code (line 120), and check_email_inbox (line 128). The first hits temp-mail.io, the second polls the inbox every 3 seconds for up to 120 seconds, the third returns all messages. You invoke them from scenario.md by name, as English: "Call create_temp_email, type that address into the Email field, submit, then call wait_for_verification_code and type the code." The regex patterns that extract the actual digits live in /Users/matthewdi/assrt-mcp/src/core/email.ts lines 101 to 109, priority-ordered from "code: 482197" down to a raw 6-digit fallback. No fixture accounts, no test@example.com pools, no Mailtrap paywall.

What happens to selectors? I thought readable tests still needed locators.

The agent never sees a selector. Before every click or type, it calls the snapshot tool, which returns the accessibility tree with [ref=eN] IDs for every interactive element. So when you write "Click the Sign In button", the agent reads the tree, finds the element whose role is button and accessible name contains "Sign In", and clicks ref=e5 (or whichever ID matches). If the ref goes stale after a DOM re-render, the agent calls snapshot again. This is why a readable plain-English step survives a UI change that would break a brittle CSS selector: nothing in scenario.md is tied to a class name, a data-testid, or an XPath. The coupling is between English and accessible roles, which is what the UI owes users anyway.

Is the video recording a gimmick, or is it actually useful for debugging?

Useful. The agent records with CDP Page.startScreencast at 15fps, 1600x900, JPEG quality 60, everyNthFrame 2 (see freestyle.ts line 359 for the exact params). After the run finishes, ffmpeg encodes /tmp/video/frames/*.jpg into /tmp/video/recording.webm (line 258, ffmpeg command `-framerate 15 -c:v libvpx -b:v 1M`), and Assrt writes a self-contained HTML player alongside with hotkeys: Space to pause, 1/2/3/5/0 for playback speed, arrow keys to seek 5 seconds. On macOS the player auto-opens via `open <url>` unless you pass autoOpenPlayer: false. For a 12-second signup + OTP flow you end up with a 3MB webm, viewed at 5x by default so a review takes 3 seconds. A failed assertion becomes a clip you link in the Slack thread, not a stack trace.

What is the smallest possible readable test, end to end?

Two lines. `#Case 1: Homepage loads` as the header, and `Navigate to the URL and verify the page title contains the product name` as the body. Save as scenario.md, run `npx assrt run --url https://example.com --plan-file scenario.md`, wait for the single ✓. The agent will navigate, call snapshot, find the document title, assert it contains the product name, and call complete_scenario. Run directory is at /tmp/assrt/<uuid>/ with a screenshot, a 2-second video, and results.json showing { passed: true, passedCount: 1, failedCount: 0 }. No framework install, no Playwright version pin in your project, no CI config to write. That is the floor; most real tests sit between 8 and 15 seconds with 3 to 10 tool calls.

Who is this actually for, and who should keep writing .spec.ts files?

For: small teams that want regression coverage without maintaining a parallel test codebase, solo founders protecting signup and checkout funnels, staff engineers who want a smoke layer on top of existing Playwright suites, agencies who ship a client site and do not want to hand-off a test framework, and anyone who has tried to teach a junior Playwright Codegen and given up. Keep writing .spec.ts if: you have hundreds of tests already and a dedicated QA engineer who owns them, you need tight per-test assertions on network payloads or trace objects, you are doing visual regression at a pixel level, or your CI budget is measured in thousands of runs per day where the LLM cost of interpretation matters. The two approaches compose; assrt plans live next to .spec.ts files without either noticing.