Generator internals

A Playwright test generator that keeps generating while the user flow is still running

Every other Playwright test generator built on user flows shares one shape: launch a browser, record one session, emit a static spec file, stop. Assrt has the same starting point and one extra branch that nobody else ships. Every navigation event during the flow under test pushes the new URL onto a discovery queue, and a parallel pipeline generates 1 to 2 fresh test cases for each page in real time, while the parent flow keeps executing. This page is about that branch: the file it lives in, the constants that bound it, and the reason it exists.

Matthew Diakonov, Written with AI

Published May 5, 20268 min read

Direct answer (verified 2026-05-05)

A Playwright test generator from user flows works one of two ways. The recorder family (Playwright codegen, BrowserStack AI Test Generator, TestDino) launches a Chromium, captures one user session, and emits a static .spec.ts. Assrt runs the flow itself with an AI agent driving Playwright MCP, and queues every URL the agent visits for parallel test discovery, capped at 20 pages and 3 concurrent passes. New #Case blocks stream into scenario.md while the parent flow is still executing. Source: playwright.dev/docs/codegen for the recorder model; github.com/assrt-ai/assrt-mcp for the runtime-discovery model.

The recorder model, and what it leaves on the floor

Playwright codegen is a beautiful tool. You run npx playwright codegen https://your-app.com, a Chromium window opens with a Playwright Inspector beside it, and every action you take is transcribed as TypeScript with role-based locators. Click a Sign in button, the inspector writes await page.getByRole('button', { name: 'Sign in' }).click(). Type into an email field, you get a getByLabel plus a fill. Stop interacting, codegen stops. The artifact is exactly the flow you ran. The shape of every recorder-style generator on the market today, and the AI variants that wrap it (BrowserStack AI Test Generator, TestDino, AegisRunner), is the same: one user session in, one static file out.

The thing this shape leaves on the floor is everything you did not click. If your flow was sign up then dashboard then settings, the settings page contains a billing link, an avatar uploader, an export button, and probably 8 other interactive elements that you walked past. The recorder has no opinion about any of them. They are not in your spec because you did not touch them. The next time someone on your team breaks the avatar uploader, no test fails, because no test was ever written for it.

You could solve this by recording 20 different flows. Most teams don't, because manual recording does not scale. The teams that try end up with a brittle library of long sessions whose locators collectively rot every sprint. There is a better answer: notice that the agent is already on the settings page, and have something else generate the tests for it while the agent moves on.

Two test generators looking at the same user flow

Launch a browser. The user clicks through one path. Every action is transcribed to a static .spec.ts. When the user stops, the generator stops. The output covers exactly the path the user walked, with locator strings (page.getByRole, page.getByText) baked in.

Static output: one .spec.ts per recording session
Coverage equal to the path you walked
Locator strings are committed and start rotting on first UI change
Adjacent pages on the same flow get zero test cases

The runtime branch, in five frames

What follows is what actually happens in the source when you call assrt_test with a flow against a URL. Every line traces back to a span in /Users/matthewdi/assrt/src/core/agent.ts; line numbers are real and grep-checkable on the open-source repo.

Discovery during execution

01 / 05

Frame 1. The agent navigates

Your scenario starts. The agent loads the URL. The first thing it does after every navigate tool call is invoke this.queueDiscoverPage(url) at agent.ts:402 and again at :691 inside the navigate tool handler. The flow is not paused. It moves to the next step in the same tick.

Discovery loop, in messages

Three concurrent passes are allowed (MAX_CONCURRENT_DISCOVERIES = 3). The diagram shows two for legibility. flushDiscovery is the gate that lets you tune both that ceiling and the per-page cap without touching anything else.

The constants, in three lines

The full configuration surface for runtime discovery is three lines at agent.ts:269-271. Every other tool in this space hides the equivalent dials behind a SaaS dashboard or a paid plan. Read them, fork them, ship.

// /Users/matthewdi/assrt/src/core/agent.ts

const MAX_CONCURRENT_DISCOVERIES = 3;

const MAX_DISCOVERED_PAGES = 20;

const SKIP_URL_PATTERNS = [ /\/logout/i, /\/api\//i, /^javascript:/i, /^about:blank/i, /^data:/i, /^chrome/i];

concurrent generator passes. Every additional pass overlaps with the parent flow execution; bump it for big apps, drop it on a token budget.

page ceiling per run. After 20 unique URLs, queueing becomes a no-op. Most user flows touch fewer than 8, so this rarely binds.

characters of accessibility tree per generator call. Plus a JPEG, plus a 1024-token output cap. Each pass settles in under two seconds on Haiku.

The SKIP_URL_PATTERNS list is short and opinionated. /logout is excluded because navigating there ends the session and breaks the parent run. /api/ paths return JSON, not pages worth UI-testing. The four scheme-based patterns (javascript:, about:blank, data:, chrome:) catch internal browser destinations that flicker through during navigation. Adding /admin or a feature-flag path is a one-line edit.

What the discovery prompt actually says

The full DISCOVERY_SYSTEM_PROMPT, copied verbatim from agent.ts:256-267. Eleven lines, no proprietary YAML, no opaque scoring rubric.

You are a QA engineer generating quick test cases for an AI browser
agent that just landed on a new page. The agent can click, type,
scroll, and verify visible text.

## Output Format
#Case 1: [short name]
[1-2 lines: what to click/type and what to verify]

## Rules
- Generate only 1-2 cases
- Each case must be completable in 3-4 actions max
- Reference ACTUAL buttons/links/inputs visible on the page
- Do NOT generate login/signup cases
- Do NOT generate cases about CSS, responsive layout, or performance

This is intentionally tight. The discovery branch is not the place to write a 12-step regression scenario; that is what the manual assrt_plan tool exists for. Discovery is the cheap pass that gives you 1 to 2 smoke cases per page, fast enough to run inside the same timebox as the user flow that triggered it. If you want richer ideas, you let the parent flow finish, then call assrt_plan on the URLs the discovery branch surfaced.

“Generates standard Playwright files you can inspect, modify, and run in any CI pipeline.”

Assrt SDK

What the runtime branch actually buys you

The flow you ran versus the surface area you touched are different things. A signup-to-dashboard run touches the marketing home, the signup form, an email-verification page, an onboarding wizard, and the dashboard itself. The flow tests one path through them. The discovery branch writes 1 to 2 cases for each one without you enumerating them.

Coverage halo from a single signup-to-dashboard flow

Marketing home: 'Click Pricing nav, assert /pricing renders'
Signup page: 'Click Continue with Google, assert OAuth dialog appears'
Email verify: 'Click Resend, assert success toast'
Onboarding step 1: 'Choose team size, assert step 2 renders'
Dashboard: 'Click Settings, assert tab list visible'
Settings: 'Click Billing, assert plan name renders'

None of these were in the original #Case 1 you wrote. The agent generated them in the background while running your scenario, gated by browser idle so they never slowed you down. The first time someone on your team renames the Resend button, the discovery-generated case fails on the next run. You did nothing to earn that test, and you do nothing to maintain it.

What it does not do, on purpose

The discovery branch is deliberately narrow. It does not write login or signup cases (the prompt forbids it; those need temp emails and live verification, which the heavyweight assrt_plan tool handles). It does not test CSS, responsive layout, or performance. It does not chain across pages: each discovered case is self-contained, 3 to 4 actions max, completable in isolation. And it does not bind to locator strings, because the runner re-discovers every element from a fresh accessibility-tree snapshot per step.

That last point matters most when the discovered cases hit your CI tomorrow. A locator-bound generator would have written page.locator('[data-testid="billing"]').click() and burned the next time someone renamed the test id. Discovery writes "Click the Billing tab" in plain Markdown; the runner finds it from the live accessibility tree on each run. There is nothing for a future UI refactor to break.

Try it on your own flow

The simplest possible reproduction: install the package, give it a URL and a one-line plan, and watch the discovered cases stream into scenario.md.

# install once
npm i -g @m13v/assrt

# run a flow against your app and watch the discovery branch
npx @m13v/assrt run \
  --url http://localhost:3000 \
  --plan "#Case: Sign up and reach the dashboard
- Navigate to /signup
- Create a temp email and use it
- Submit the form
- Assert the dashboard heading is visible" \
  --json

# tail the scenario file in another terminal
tail -f /tmp/assrt/scenario.md

The first #Case is yours. #Case 2 through whatever-the-cap-allows are the discovery branch writing tests for every URL the agent passed through. Edit them, prune them, commit the file alongside your Playwright project, re-run by scenario UUID tomorrow.

Want this branch generating against your real flows?

30 minutes. Bring a localhost URL and a flow you wish you had tests for. We will run it together and watch what the discovery loop writes.

Frequently asked questions

How does a Playwright test generator built on user flows actually work?

Two camps exist, and they answer this question very differently. The Playwright codegen camp launches a Chromium window, records every click, type, and navigation you perform, and emits a TypeScript spec file with locator strings (page.getByRole, page.getByText, and friends). You stop interacting, codegen stops generating. Assrt is in the second camp. You hand it a URL and an optional plan, the agent drives Playwright MCP through the flow itself, and every navigation event during that run is pushed onto a discovery queue at agent.ts:402 and agent.ts:691. A separate discovery pipeline, capped at 3 concurrent passes (MAX_CONCURRENT_DISCOVERIES) and 20 unique pages (MAX_DISCOVERED_PAGES), takes a snapshot plus screenshot of each newly-visited URL and asks Claude or Gemini to write 1 to 2 fresh #Case blocks for it. Those blocks stream back into scenario.md while the parent run keeps executing.

Where in the Assrt source can I read the generator that fires on every navigation?

Three blocks at /Users/matthewdi/assrt/src/core/agent.ts. Lines 256-267 hold DISCOVERY_SYSTEM_PROMPT, the 11-line prompt that constrains discovery output to '1 to 2 cases, each completable in 3 to 4 actions max, no login or signup, no CSS or responsive layout claims'. Lines 269-271 declare the constants: MAX_CONCURRENT_DISCOVERIES = 3, MAX_DISCOVERED_PAGES = 20, and SKIP_URL_PATTERNS = [/\/logout/i, /\/api\//i, /^javascript:/i, /^about:blank/i, /^data:/i, /^chrome/i]. Lines 478-541 are the moving parts: queueDiscoverPage normalises the URL and gates on the cap, flushDiscovery picks one URL when the browser is idle and starts the pass, generateDiscoveryCases streams the model output back through the emit('discovered_cases_chunk', ...) channel into the running scenario file. Forking any of these is a single-file change.

Why generate during the flow instead of just recording the flow and stopping?

Recording captures the flow you ran. The flow you ran is rarely the only flow worth testing on the pages you touched. If your scenario was 'sign up, get to dashboard, click into Settings', the codegen output covers exactly that path. The Settings page itself contains a tab bar, a billing link, an avatar uploader, an export button, and probably 8 other interactive elements. None of them get a test, because you did not click them during recording. Assrt's discovery loop fires on every URL change, so the moment your flow lands on /settings, a parallel pass writes test cases for the visible-but-unclicked things on /settings without you doing anything. By the time the parent flow finishes, you have your original cases plus a coverage halo around every page the flow touched.

Does the discovery branch slow down the main flow it is generating from?

No, because the discovery work is gated on browser idle. flushDiscovery checks browserBusy before starting any pass; if the agent is mid-action (clicking, typing, navigating), discovery waits. The accessibility-tree slice fed to the model is capped at 4000 characters and max_tokens is set to 1024 for the discovery call, so each pass is small and fast (typically under 2 seconds on Haiku). With 3 concurrent passes and a 20-page ceiling, the worst-case discovery overhead on a long flow is bounded and overlaps fully with execution. The main flow keeps running on its own scenario; discovered cases are streamed into a separate region of the scenario file and shown in the UI under 'Test Cases Auto Discovered'.

What does the generated test actually look like and where is it stored?

Markdown #Case blocks at /tmp/assrt/scenario.md, the same format the manual generator and the assrt_diagnose tool use. A discovered case after the agent visits /pricing might be: '#Case 4: Toggle annual billing\n1. Click the Annual toggle\n2. Assert: Save 20% badge appears'. No locator strings, no waits, no XPath. The runner re-discovers each element from the live accessibility tree per step (snapshot-first behavior enforced by agent.ts:206-218), so the cases never bind to a selector that can rot. Files live under /tmp/assrt by default; you check scenario.md and scenario.json into your repo if you want them version-controlled, or pull them by scenario UUID for re-runs (assrt_test({ url, scenarioId })).

How is this different from Playwright codegen, BrowserStack AI Test Generator, or TestDino?

All three of those are recorder-shaped: launch a browser, record one user session, emit a static spec. Codegen ships the locator strings raw (page.getByRole...). BrowserStack and TestDino take that output and ask an LLM to refactor it for stability. Output is a .spec.ts file you commit. Assrt does not record a one-off; it executes a flow and runs a generator pass on every page it lands on while the flow is in progress. The artifact is intent in Markdown, not code in TypeScript. Where codegen says 'here is the test for what you did', Assrt says 'here is the test for what you did, plus 1 to 2 tests per page you passed through, generated while you were still passing through them'. The two approaches are complementary: codegen is fine when you want a one-shot script and never want it to learn anything new.

Can I see which URLs the discovery branch actually picked up during my run?

Yes. The agent emits a page_discovered event (agent.ts:497) the moment a URL is normalised and queued, with the URL, an optional title, and a JPEG screenshot of the page at first contact. The web app surfaces these under the 'Test Cases Auto Discovered' panel at /Users/matthewdi/assrt/src/app/app/test/page.tsx:992-1040, with editable steps and a delete control per case so you can prune anything irrelevant before it lands in the scenario. The events also flow through the streaming JSON if you run the CLI with --json, so a CI pipeline can record every discovered URL alongside the parent run's pass/fail status.

What does it skip, and why those patterns specifically?

SKIP_URL_PATTERNS at agent.ts:271 is a literal allowlist by exclusion: /logout, /api/, javascript:, about:blank, data:, and chrome:// URLs are never queued. Logout is excluded because navigating there ends the session and breaks the parent flow. /api/ paths are excluded because they return JSON, not a page worth generating UI cases for. The four scheme-based patterns (javascript:, about:blank, data:, chrome:) catch internal browser URLs that flicker through during navigation but are not real destinations. URLs are also normalised to origin+pathname (trailing slash trimmed) before deduplication, so /pricing and /pricing/ count as one page, and a query-string tracker like ?utm_campaign=foo does not trigger a duplicate pass.

Can I run Assrt against a flow on a localhost dev server?

Yes, that is the default mode. The MCP runner spawns a local Playwright Chromium against any URL you pass to assrt_test, so http://localhost:3000 or a private preview URL works the same as a public one. The discovery loop respects whatever URL the agent ends up on, so a flow that goes from /signup to /onboarding/step-1 to /onboarding/step-2 generates discovered cases for each onboarding step without you naming them. For headless CI, set ASSRT_HEADED=0 and the runner stays invisible. For local debugging with a visible browser, pass headed: true to assrt_test.

How do I prevent the discovery branch from generating tests for pages I don't care about?

Three controls. First, the constants at agent.ts:269-271 are right there in the source; lower MAX_DISCOVERED_PAGES if 20 is too many for your app, or extend SKIP_URL_PATTERNS to exclude an admin path or a feature-flagged area. Second, the parent flow's scope determines what gets discovered: if your scenario only navigates through the checkout flow, only checkout-adjacent pages enter the queue. Third, the discovered cases panel in the web app is editable; you can prune individual cases before they merge into scenario.md. Most teams leave the defaults alone and prune at the per-case level, because 20 pages is rarely a problem on a single user flow.

Adjacent angles on how Assrt generates and maintains Playwright tests.

Keep reading

Generator internals

AI Playwright test generator with an open prompt

Read the exact 18-line system prompt that turns 3 screenshots and an accessibility tree into a #Case scenario file.

Read

Maintenance

AI Playwright test maintenance without locators

How snapshot-first re-discovery and a MutationObserver-driven wait_for_stable kill stale-selector flakes at the source.

Read

Comparison

Open source Playwright test generators, April 2026

A short field guide to the open-source generators that are actually maintained, what they emit, and which ones bind to locators.

Read

The recorder model, and what it leaves on the floor

Two test generators looking at the same user flow

The runtime branch, in five frames

Discovery during execution

Frame 1. The agent navigates

The constants, in three lines

What the discovery prompt actually says

What the runtime branch actually buys you

What it does not do, on purpose

Try it on your own flow

Want this branch generating against your real flows?

Frequently asked questions

Keep reading

AI Playwright test generator with an open prompt

AI Playwright test maintenance without locators

Open source Playwright test generators, April 2026

Comments (••)

Comments ()