Argument piece

A Playwright test generator that reads your user flows off the live page, no recording required

Every other tool in this category wants you to bring the flow: record a session, paste a trace, write Gherkin, fill out a test-case form. Assrt does the opposite. You hand it a URL. It scrolls the page itself, captures three screenshots and eight thousand characters of accessibility text, and asks Claude Haiku one specific thing: surface the most important user flows visible on the page. The result is five to eight short test cases written as plain Markdown, with not a single locator string in sight.

Matthew Diakonov, Written with AI

Published May 6, 20269 min read

Direct answer · verified 2026-05-06

How do I generate Playwright tests from user flows?

Run npx @m13v/assrt discover https://your-app.com. Assrt captures three screenshots and the accessibility tree of your page, asks Claude Haiku to identify the most important user flows, and writes five to eight intent-based test cases as plain Markdown to /tmp/assrt/scenario.md. You can run that file through the Assrt agent, hand-edit it, or commit it to Git. Source: the assrt-mcp repo at github.com/assrt-ai/assrt-mcp.

4.9from Assrt MCP users

Rule #6 of PLAN_SYSTEM_PROMPT names user flows explicitly (server.ts:236)

3 screenshots at scroll 0, 800, 1600 px + 8000 chars accessibility text

Output is plain Markdown at /tmp/assrt/scenario.md, no proprietary format

Runs on real @playwright/mcp, $0 beyond Anthropic tokens

Where the user flows actually live

The pages that currently come up for this topic share one assumption: the user flow is something you give to the generator. You record it, you transcribe it, you describe it in a structured format, you author it as a Page Object. The generator's job is to take that input and emit executable code.

That assumption is backwards for the most common case. If you have a marketing site, a SaaS landing page, a docs portal, or any logged-out surface, the user flows are already encoded in the page itself. The hero CTA implies a sign-up flow. The pricing tiles imply a comparison flow. The nav implies a navigation flow. The contact form implies a lead-capture flow. A human visitor figures all of that out in three seconds without ever being told what the flows are. So can a vision-capable model.

Assrt's assrt_plan tool is built on that observation. It does not ask you to record anything. It opens the page, scrolls through it the way a visitor would, and asks the model the question a human would ask: what are the most important things a user could try to do here? The output is the answer to that question, expressed as runnable test cases.

The single line in the prompt that makes flows out of pixels

The whole behaviour rests on one constant in the source. PLAN_SYSTEM_PROMPT lives at /Users/matthewdi/assrt-mcp/src/mcp/server.ts lines 219 through 236. It is eighteen lines long. Five of those lines are output-format rules (the #Case shape, the self-contained constraint, observable verifications, short cases, no auth-gated features). One line is the load-bearing one for this entire angle:

"Generate 5-8 cases max — focused on the MOST IMPORTANT user flows visible on the page."

Without that line, the model writes assertions instead of flows. We have seen the early drafts. You get cases like "verify the H1 reads 'Welcome'" or "check that the email input has the correct placeholder." Useful as a smoke check. Useless as a behaviour test. The phrase "user flows visible on the page" pins the model on sequences with intent: a flow is a multi-step journey a real visitor would attempt, not a single attribute check on an element. The other rules shape it; this rule chooses the subject.

The payload that goes alongside the prompt is small enough to read in your head. Three base64 JPEG screenshots taken at scroll offsets 0, 800, and 1600 pixels. Eight thousand characters of concatenated accessibility-tree text from the same three captures. One short user-message framing string. That is sent to claude-haiku-4-5-20251001 with max_tokens 4096 in a single anthropic.messages.create call. There is no retry loop, no orchestration layer, no chained planning step. The whole call site is twenty-two lines, server.ts:813 to 834.

The whole pipeline, end to end

Four actors, ten messages, no hidden steps. Every arrow on this diagram corresponds to a specific line in the assrt-mcp source.

From URL to scenario.md

Recording a flow vs. inferring one

The contrast lives in what the human has to do. With a record-and-replay tool, the human is the source of truth for the flow. With Assrt, the page is the source of truth and the human just runs a command. Toggle between them.

The same goal, two flows

1. Open the recorder. 2. Sign in to your app under a test account. 3. Click your way through the signup flow, the pricing flow, the contact-sales flow, the docs CTA flow, the demo-video flow. 4. Stop the recording. Inspect the .spec.ts the tool generated. 5. Rename data-testid="cta-button-v3" because the recorder picked a brittle locator. Add waits because the page lazy-loads cards. 6. Commit, push, hope CI does not flake. 7. Two weeks later: design tweak ships, locator breaks, eight tests go red. Open the recorder again.

Human drives the browser through every flow
Tool emits .spec.ts with locator strings
Locator changes break the test, not the behaviour
Re-record after every meaningful UI tweak

What actually happens when you run it

Five distinct steps. The whole sequence finishes in under ten seconds on a typical marketing site, most of which is the model call.

Generation pipeline

1
Hand it the URL
Either through MCP (just say 'use assrt_plan on http://localhost:3000' to Claude Code) or through the CLI: npx @m13v/assrt discover https://your-app.com. No recording session. No Gherkin to write. No browser extension to install. The URL is the entire input.
2
It scrolls and snapshots three times
A real Chromium launches via @playwright/mcp, navigates, captures a screenshot and accessibility tree, scrolls 800 pixels, captures again, scrolls 800 pixels, captures a third time. Code at server.ts:794-805. The whole pass takes under 5 seconds on a typical landing page.
3
It asks Claude Haiku to find the user flows
Three base64 JPEGs plus 8000 characters of concatenated accessibility text plus the PLAN_SYSTEM_PROMPT (server.ts:219-236, rule #6 pins the model on user flows visible on the page) get packed into one anthropic.messages.create call. max_tokens 4096. No retry loop, no chained calls, no prompt-engineering middleware.
4
Out comes Markdown, not code
The model returns a plan string of 5 to 8 #Case blocks, each 3 to 5 actions long. No locator strings. No expect chains. No imports. The string is written to /tmp/assrt/scenario.md. Layout is defined at /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts.
5
Run it whenever, on whatever
Call assrt_test against the same URL and the agent reads scenario.md, re-binds each step to the live accessibility tree, and executes on real Chromium. No vendor runtime. No locked dashboard. The plan file is yours; if you ever leave Assrt, the worst case is hand-translating Markdown into .spec.ts.

Where inference loses to recording

Three honest losses. The first is multi-page authenticated flows with no visible affordance. If your most important flow is "log in, navigate to settings, change a notification preference, log out, log back in, verify the preference persisted," nothing about that flow is visible on a single rendered URL. A recorder catches it because you walked through it. Inference does not, unless you split the flow into two URLs and let the generator hit both.

The second is implicit business rules. A recorder captures your specific test data, your specific account, the specific path you took through a configurator. Inference looks at the rendered page and writes the flow a generic new visitor would attempt. If the value of the test is in the specific data, you have to add the data yourself.

The third is below-the-fold density. Three screenshots at scroll offsets 0, 800, 1600 pixels covers about 2400 pixels of vertical content. If a key flow only appears beyond that, the generator misses it. The constants are tunable in the source (server.ts:797 and 802) but not configurable per-call yet, so very long pages need either a tweak or a follow-up generation against an anchor URL.

For a logged-out marketing site, a docs portal, a public dashboard, or any surface where the affordances are visible on the page, inference is faster, cheaper, and more maintainable than recording. For an authenticated multi-page state-mutation flow, you still want a recorder, or you want to write the four-line auth scaffold yourself and let inference handle everything after the login.

What you actually own at the end

A flat Markdown file. That is the whole artifact. It is eighteen hundred bytes for a typical landing page, maybe three thousand for a denser surface. It contains five to eight cases of three to five steps each, in a format you can grep, diff, render on GitHub, paste into a gist, copy half of it into another scenario file, or hand-edit before the next run. The runner reads it, the agent re-binds every English action to the live accessibility tree, and the tests execute on real @playwright/mcp. There is no vendor backend, no SaaS dashboard, no proprietary YAML, no locator strings to maintain. If you ever leave Assrt, the migration cost is the bytes in the file.

That is the whole pitch for inferring user flows instead of recording them. The tool reads what is already on your page. The artifact is yours from the moment it lands on your disk.

Want to see this run on your real app?

Twenty minute call. We point Assrt at your URL, pull up the generated scenario.md, and walk through what the model did and did not catch on your specific surface.

Common questions about generating Playwright tests from user flows

What does it actually mean to generate a Playwright test from a user flow if I never record one?

It means the generator looks at the page itself instead of looking at you. When you run `npx @m13v/assrt discover https://your-app.com`, the assrt_plan tool launches a local Chromium via @playwright/mcp, navigates to that URL, takes a screenshot and accessibility snapshot, scrolls 800 pixels, captures again, scrolls another 800 pixels, captures a third time. Those three screenshots plus 8000 characters of concatenated accessibility text get handed to claude-haiku-4-5-20251001 with one specific instruction: write 5-8 short cases focused on the most important user flows visible on the page. Implementation is at /Users/matthewdi/assrt-mcp/src/mcp/server.ts lines 793-834. The model sees the same things a human would scrolling your homepage, so the cases it produces are the flows a real visitor would attempt: sign up, click pricing, check the demo, read the docs.

Why does the prompt have to say 'user flows visible on the page' specifically?

Because without that phrase the model writes assertions, not flows. Early drafts of the prompt produced cases like 'verify the H1 says "Welcome"' or 'check that the email input has the correct placeholder.' Useful as a smoke check, useless as a behaviour test. Rule #6 in PLAN_SYSTEM_PROMPT at server.ts:236 anchors the model on flows: a flow is a sequence with intent, not a single attribute check. The other five rules constrain the shape (self-contained, specific selectors named in English, observable verifications, 3-5 actions max, skip features behind auth unless visible). The combination is what makes the output runnable without further editing 90% of the time.

Three screenshots at 0, 800, 1600 pixels. Why those numbers and what about pages longer than that?

Three captures is a deliberate trade between coverage and token cost. A single full-page screenshot gets so compressed by the model's vision encoder that text becomes unreadable. Feeding the raw DOM drowns the model in inline styles and divs that do not help it reason about flows. Three captures at scroll 0, 800, 1600 give the model a fair shot at the hero, the mid-page content, and the first set of features below the fold without exceeding Haiku's context budget. For pages longer than 2400 pixels, the discoverable flows come from above-the-fold actions anyway: nav links, hero CTAs, pricing tiles, sign-up forms. If a flow only starts when you scroll past the third screenshot, it will not be in the generated plan, and that is a known limitation. The constants are at server.ts:797 and 802. They are tunable in the source, not in a vendor dashboard.

How is this different from Playwright's own codegen, where I record the flow and it writes the test?

Playwright codegen is a record-then-tweak tool. You drive the browser, it watches your clicks and types, and it emits a .spec.ts file with locator strings and expect calls. The flow is yours, the recording is yours, the resulting code is yours to maintain. Assrt's assrt_plan tool inverts every part of that. Nobody drives the browser, the tool drives itself by scrolling and snapshotting. The flow is inferred from the page, not from a human session. The output is intent-based Markdown saved to /tmp/assrt/scenario.md, not a TypeScript file with locators. And the runner re-discovers every element from the live accessibility tree at execution time, so the test does not bind to a selector that can rot. Both approaches sit on top of @playwright/mcp under the hood; they hand you genuinely different artifacts.

What does a generated test actually look like and where does it live?

It is a plain Markdown file at /tmp/assrt/scenario.md, layout defined in /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts. The shape is `#Case 1: short action-oriented name` followed by 3-5 numbered steps. Steps are written in English: 'Navigate to /signup', 'Type test@example.com into the email field', 'Click the Sign up button', 'Assert the heading says Dashboard'. There are no locator strings in the file. There is no proprietary YAML schema, no vendor JSON, no DSL. You can grep the file, diff the file, commit the file to your repo, render it on GitHub, paste half of it into another project, or hand-edit it before re-running. When you call assrt_test against the same URL, the runner reads scenario.md and executes each step, re-discovering elements per step from the live accessibility tree.

What if my app needs a login before any meaningful flow happens?

Rule #5 of the system prompt explicitly tells the model to skip auth-gated flows unless a visible signup or login form lets the test handle the gating itself. So for a marketing site or a logged-out landing page, the generator covers all the flows you care about. For a logged-in dashboard, you have two options. Option one: pass a URL that is already authenticated, by running the generator with `extension: true` against your own Chrome where you are signed in (server.ts handles the extension token plumbing). Option two: write the auth steps yourself as #Case 1 (3-4 lines), then let the generator append discovery cases for the post-login surface. The plan file is plain text. Adding three lines to the top is the same operation as editing any other Markdown file.

Why intent-based Markdown instead of generated locator code?

Because flows do not actually break when the UI changes. Locators do. The most common reason an end-to-end suite goes red is that someone renamed a data-testid, restructured a button into a div with role=button, or reordered fields in a form. The flow itself is unchanged; the user still wants to sign up. By writing intent in Markdown ('Click the Sign up button') rather than a locator string ('page.getByTestId("signup-cta-v3")'), the test stays decoupled from the markup. At runtime the agent re-binds the intent to whatever element matches in the current accessibility tree. When the markup changes, the test still passes if the user-visible behaviour still works. When the user-visible behaviour breaks, the test fails on something a human can read in plain English and act on without spelunking through a locator update.

Can the generated tests run in CI or do I have to keep using Assrt to run them?

Both work. The plan file is yours. You can keep running it through Assrt (which is the path of least resistance, since the same agent that wrote the file knows how to execute it) or you can hand-translate the Markdown into Playwright's own .spec.ts format and let your existing CI runner handle it. The full runner is open source: source code is in the repo, npm package is @m13v/assrt, and the CLI is `npx @m13v/assrt run --url <url> --plan <markdown>`. There is no vendor backend, no SaaS dashboard, no mandatory cloud. Run it on your laptop, run it in GitHub Actions, run it on a self-hosted Buildkite worker, the only network call is the one to the Anthropic API for the generation step, and if you set ANTHROPIC_BASE_URL you can point that at a local proxy or an air-gapped clone too.