Test automation, scenario generation

Playwright test scenario generator: what one URL actually becomes

A Playwright test scenario generator analyzes a live web app and emits a runnable plan, either as compiled TypeScript or as a Markdown file the runner reads at execution time. Playwright shipped its own Planner, Generator, and Healer agents in v1.56 in October 2025. This page is about a different shape: Assrt's open-source assrt_plan tool, where the plan stays the durable artifact, with the exact pipeline budget shown below.

Matthew Diakonov, Written with AI

Published May 12, 20268 min read

Direct answer, verified 2026-05-12

A Playwright test scenario generator turns a URL into a draft test plan by crawling the live app and proposing user flows. Two shipping options today: Playwright's built-in Planner and Generator agents (released in v1.56, October 2025, see the official docs), which compile to .spec.ts files; and Assrt's assrt_plan, an open-source MCP tool that emits Markdown the runner re-interprets each run. Both walk the page, screenshot it, and ask an LLM what to test. The difference is what you commit afterward.

The four-stage pipeline, with numbers

A scenario generator is a small pipeline with five or six tunable knobs. The defaults are the most interesting part. Below is what Assrt's open-source generator does to one URL, in order. The line numbers point at the actual source so you can verify each step yourself.

What assrt_plan does to a URL, in order

🌐

Launch local Chromium

Boots @playwright/mcp with a persistent profile so logged-in flows carry over.

🌐

Navigate and capture x3

Snapshot and screenshot at scroll 0, 800, and 1600 pixels (server.ts:794-805).

⚙️

Slice and send to Haiku

Three JPEGs plus the joined accessibility text capped at 8000 chars, model claude-haiku-4-5-20251001, max_tokens 4096 (server.ts:809, 830-831).

✅

Write scenario.md to disk

Output is a single Markdown file at /tmp/assrt/scenario.md, 5-8 #Case blocks of 3-5 actions each.

The anchor fact (verifiable, open source)

Most write-ups of a scenario generator stop at the marketing diagram. The interesting question is what the budget actually is. Here are the five numbers that define Assrt's generator, all traceable to ~/assrt-mcp/src/mcp/server.ts:

1.3 screenshots at scroll offsets 0, 800, 1600 pixels (lines 794-805). Each captures both a JPEG and the accessibility tree.
2.8000-character slice on the concatenated accessibility text (line 809). The three trees are joined with a blank line, then truncated.
3.Model is claude-haiku-4-5-20251001 by default, overridable via the model parameter (line 830).
4.max_tokens 4096 on the output (line 831). Enough for 5-8 #Case blocks with 3-5 actions each, not enough to ramble.
5.The system prompt caps output at 5-8 cases of 3-5 actions each (lines 234, 236). The cap is policy, not a parser, but it holds in practice.

The repo is at github.com/assrt-ai/assrt-mcp. Clone it, open src/mcp/server.ts, jump to line 766, and the numbers above are right there.

Inside the generator, frame by frame

Plan as artifact, or plan as a draft of code?

The single biggest decision in this category is what the artifact looks like after the generator runs. Playwright's Generator agent and Codegen both compile to a TypeScript spec; that spec is what your repo holds and what CI runs. Assrt stops at Markdown; the same file you commit is the one the runner reads at execution time. The trade is concrete: a spec file lets you eyeball the locator strategy at PR time, a plan file lets the runner re-decide on every run.

Two shapes for the same generator output

Most Playwright scenario generators, including the built-in Generator agent and Codegen, compile the plan into a TypeScript spec file. The locators are baked in at generation time. The spec is what you commit to your repo. When the design system changes, the locators go stale and the suite goes red until someone bumps every selector. Self-healing is a patch on top of this: the spec keeps the stale locator, the runtime guesses a replacement, and the file diverges from what your tree thinks it does.

Plan becomes a .spec.ts at generation time
Locator strings frozen in the test file
Design system rename breaks the file
What CI runs is the compiled file, not the plan

5-8

“Generate 5-8 cases max, focused on the MOST IMPORTANT user flows visible on the page.”

PLAN_SYSTEM_PROMPT, ~/assrt-mcp/src/mcp/server.ts:236

The constraints the prompt actually enforces

The cap of 5-8 cases is not the only thing the prompt does. Six rules sit at the heart of the generator, and the model treats them as load-bearing. Every rule maps to a behavior that, if you remove it, makes the suite worse on a real app. The full prompt is at server.ts:219-236.

What the system prompt actually enforces

Output format is exactly: #Case 1: <name>, then steps; #Case 2: <name>, then steps; up to 5-8 cases (server.ts:222-228, 236).
Each case is self-contained: do not assume a previous case ran; include the login steps if a case needs auth (rule 1, line 231).
Be specific about selectors in plain English: 'click the Login button', not 'navigate to login' (rule 2, line 232).
Verify only observable things: visible text, page title, URL, element presence; never CSS, colors, or responsive layout (rule 3, line 233).
Keep each case to 3-5 actions; a focused case that passes beats a complex one that fails (rule 4, line 234).
Do not generate cases behind auth unless a visible signup or login form is on the page (rule 5, line 235).
Generate at most 5-8 cases, focused on the most important user flows the screenshots show (rule 6, line 236).

What you get back: scenario.md

The output is one file. If you point the generator at a generic SaaS landing page, what comes back looks roughly like this. Every #Case is a separate execution unit; each numbered line is one tool call away in the runner.

# /tmp/assrt/scenario.md

#Case 1: Visitor lands on the homepage and sees the hero
1. Navigate to /
2. Assert: the page title contains "Acme"
3. Assert: a heading reading "Ship faster" is visible
4. Assert: a button reading "Start free" is visible

#Case 2: Visitor opens the pricing page from the navbar
1. Navigate to /
2. Click the "Pricing" link in the top navigation
3. Assert: the URL path is /pricing
4. Assert: a heading reading "Plans" is visible

#Case 3: Visitor submits the contact form
1. Navigate to /contact
2. Type "test@assrt.dev" into the email field
3. Type "Just curious" into the message field
4. Click the "Send" button
5. Assert: a confirmation reading "Thanks" is visible

#Case 4: Visitor signs up with a disposable email
1. Navigate to /signup
2. Create a temporary email
3. Type the temporary email into the email field
4. Type "Hunter2-Hunter2" into the password field
5. Click the "Create account" button
6. Wait for the verification code
7. Assert: the URL path is /onboarding

The file is what you commit. To re-run later, pass scenarioId (the UUID written to scenario.json alongside) to assrt_test and the same plan runs against the same URL.

How this stacks up against Playwright's v1.56 agents

Playwright shipped its Planner, Generator, and Healer agents in version 1.56 in October 2025. They are well-built and well-documented. The Planner writes a test-plan.md, the Generator compiles it to a .spec.ts, the Healer repairs the .spec.ts when a locator drifts. The pipeline assumes you proceed from plan to spec and live with the spec.

Assrt fits the same niche from the other direction. The plan stays the artifact; there is no Generator step to compile, and no Healer step is needed because the locator is re-derived from a fresh accessibility snapshot on every run. The trade-off is real: a spec file lets you statically analyze the locator strategy and run it with plain npx playwright test with no agent in the loop. A plan file requires the Assrt runner at execution time, in exchange for never having to re-bind locators after a redesign.

One specific gap worth noting: pages all over the web describe what the v1.56 agents dobut very few publish the budget inside them. The official docs are intentionally open-ended so the agents can be configured per project. That makes them flexible, but also makes the user's mental model fuzzier. Assrt's generator is the opposite: a 100-line tool with five tunable knobs, all in one file. If you want to understand the shape of a scenario generator without reverse-engineering it, the open source path is shorter.

When a scenario generator helps, and when it does not

A scenario generator earns its keep on a page where the testable affordances are visible and the user's job-to-be-done is obvious from the layout. Landing pages, marketing sites, pricing pages, public app dashboards, and most onboarding flows fit this shape. The generator looks at the hero, sees the CTA, sees the secondary nav, and writes the cases a careful human would.

It does worse on three shapes. The first is anything gated by auth: the generator sees the login form and writes a case that signs in, but it cannot guess the post-login flow without exploring it. The second is highly state-dependent apps where the visible UI depends on which row of data you happen to be viewing; the generator picks one path and misses the rest. The third is anything that depends on timing or animation; the model sees a static snapshot and does not know whether a button was about to slide in. For these, the right workflow is to generate a draft, then hand-edit scenario.md for the cases that need it.

Try the generator on your URL

The whole thing is one command, no API key signup, no cloud step. The MCP server runs in your terminal or inside Claude Desktop. You point it at any URL, get back a draft scenario.md, and decide what to keep.

# install the CLI
npx @m13v/assrt discover https://your-app.com

# or use it from inside Claude as an MCP tool:
#   assrt_plan({ url: "https://your-app.com" })
# the plan lands at /tmp/assrt/scenario.md.

# then run the cases:
#   assrt_test({ url: "https://your-app.com" })

Want a walkthrough of the generator against your own app?

Bring a URL, we will run assrt_plan live and talk through which cases to keep, which to throw away, and what to wire into CI.

Frequently asked questions

What is a Playwright test scenario generator?

A scenario generator analyzes a running web app, identifies the user flows worth covering, and emits a test plan. Two output shapes dominate today. The first is compiled code: the tool produces a .spec.ts file with hardcoded locators and assertions, ready to commit. Playwright's built-in Generator agent (shipped in v1.56, October 2025) and the older Playwright Codegen both work this way. The second shape is a plan-as-artifact: the tool emits Markdown or text the runner will re-interpret each run. Playwright's Planner agent writes test-plan.md; Assrt's assrt_plan tool writes /tmp/assrt/scenario.md. The same generator step exists in both shapes. What differs is what you commit to your repo afterward.

How does Assrt's scenario generator differ from Playwright's Planner agent?

The Planner agent (one of three agents Playwright shipped in v1.56) and Assrt's assrt_plan both crawl a live URL and emit Markdown. The differences are in how strict the budget is and what comes next. Assrt is built around a runtime: the generated scenario.md is the test, not a stepping stone to a .spec.ts. Playwright's pipeline assumes you will follow the Planner with the Generator agent to compile the plan into TypeScript and then ship the .spec.ts. Assrt also pins concrete numbers in the prompt: 5-8 cases, 3-5 actions per case, observable assertions only. The system prompt is at /Users/matthewdi/assrt-mcp/src/mcp/server.ts lines 219-236 and you can grep it. Either approach gets you a draft; what you commit decides whether you maintain a plan or maintain a spec file.

Exactly what does assrt_plan capture from a URL before it generates anything?

Three screenshots and three accessibility tree dumps, taken at three scroll offsets, then concatenated and truncated. The flow is at server.ts lines 794 through 805. The browser navigates to the URL, calls screenshot() and snapshot() at scroll position 0, scrolls down by 800 pixels and waits 500 ms, repeats at 800 and 1600. The three accessibility texts are joined with a blank line between them and sliced to 8000 characters total (server.ts:809). The three screenshots go in as base64 JPEGs alongside the text in a single Anthropic message. That whole input gets sent to claude-haiku-4-5-20251001 with max_tokens 4096 (server.ts:830-831). The model output is the plan.

Why three screenshots and not the whole page?

A bigger snapshot does not buy a better plan. The constraints are roughly token budget, model attention, and time-to-plan. Three viewports at 0, 800, and 1600 pixels cover the hero, the next fold, and one more, which is where almost all of a marketing or product page's testable affordances live. The 8000-character slice on the accessibility text is a deliberate cap so the prompt does not exceed Haiku's context budget when three JPEGs are also attached. Pages longer than 1600 pixels usually have repeating sections that do not add new flows. If you do need coverage deeper into a long page, run the generator a second time after navigating into the section that matters; the runner picks up scenario.md by mtime and you can edit before running.

Why does the prompt cap output at 5-8 cases of 3-5 actions each?

Because each #Case is a separate runner invocation, each invocation does a fresh snapshot, and each action inside a case spends a model round trip. The dispatch loop is at agent.ts lines 693-747; the cost of a 20-action mega-case is 20 round trips plus 20 snapshots, and it is much harder to localize when it fails. Cases of 3-5 actions fit in a clear pass-or-fail unit. Caps of 5-8 cases keep the whole suite runnable in a few minutes of wall clock. The cap is enforced by PLAN_SYSTEM_PROMPT at server.ts lines 234-236, not by a parser, so the model can stretch slightly when the page warrants it, but it generally does not.

What does the generator output look like on disk?

One Markdown file at /tmp/assrt/scenario.md (the plan), one JSON file at /tmp/assrt/scenario.json (id, name, url, updatedAt), and a results directory at /tmp/assrt/results/ that fills up once you run tests. The layout is fixed in /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts lines 16 through 20. When you edit scenario.md by hand, a 1-second debounced watcher (line 100) syncs the change back to central storage. You can grep the plan, diff it on a pull request, render it on GitHub, paste half of it into a different project. There is no parallel .spec.ts that needs to stay in sync.

How is this different from Playwright Codegen?

Codegen is the opposite shape from a scenario generator. You drive a browser by hand; Codegen records your clicks and emits TypeScript with locators inferred from your actions. The input is your recording, the output is a compiled .spec.ts. A scenario generator goes the other direction: you point at a URL, the tool decides what to test, and you get a plan. Codegen is great when you can demonstrate a flow but cannot articulate it. A generator is better when you want coverage of a page you have not finished exploring, or when you want the plan itself to remain the durable artifact.

Can I edit the generated scenario before running it?

Yes, that is the expected workflow. The plan is plain Markdown with #Case headers. The MCP server's instructions block on it explicitly: read /tmp/assrt/scenario.md to see the current plan, edit it to modify test cases, changes auto-sync. The common path is to generate a draft from the URL, delete cases you do not care about, rewrite a sentence the model phrased awkwardly, add a case behind auth that the URL-only generator could not see, and run. Each saved run gets a UUID in scenario.json so you can re-run the same scenario later by passing scenarioId.

Does the generator work on Vercel previews, localhost, or production?

Any URL that a local Chromium can navigate to. The generator launches a @playwright/mcp instance, which is real Chromium with a real network stack. It does not need source access. The same command works against localhost:3000 during development, a Vercel preview deployment in CI, or production over HTTPS. The accessibility tree and screenshots are taken from whatever the browser actually loads, so the generator will see auth walls the same way a real user would. If a flow lives behind auth, set the auth cookies in the persistent profile before generating and the session will carry into the snapshot.

How much does it cost to run the generator on one URL?

One call to claude-haiku-4-5-20251001 with three small JPEGs and roughly 8000 characters of text in, plus up to 4096 tokens out. At Haiku-class pricing this is fractions of a cent per page. Browser time is local Chromium launching, navigating, three scrolls of 500 ms each, and three screenshots, which adds up to a few seconds of wall clock. Nothing about this is metered by Assrt. The whole tool runs against your own Anthropic credential and your own machine; there is no cloud step in between.