From a Reddit thread on flaky AI-generated test suites

The AI Playwright test generator where the prompt is 18 lines and the output is Markdown you own

Every AI Playwright test generator on page one of Google is a black box. You paste a URL, you get tests, you hope. Assrt publishes its entire generation path: the 18-line system prompt that shapes the output, the exact 3-screenshot + accessibility snapshot payload it hands to the model, and the plain Markdown file it writes to your disk. No proprietary YAML, no vendor dashboard, no locator strings that will rot next sprint.

A
Assrt Engineering
11 min read
4.9from Assrt MCP users
PLAN_SYSTEM_PROMPT is 18 lines at server.ts:219-236
3 screenshots at scroll 0/800/1600 + 8000 chars accessibility text
Output is plain Markdown at /tmp/assrt/scenario.md
Open source, self-hosted, $0 beyond LLM tokens

What every top search result for this keyword leaves out

Search "ai playwright test generator" and the first page is Playwright's own Planner, Generator, and Healer agents (new in v1.56), BrowserStack's Playwright AI generator, ZeroStep, TestDino's tool roundups. All of them describe the generator the same way: point at a URL, receive tests. Nobody ships the prompt. Nobody publishes the payload. Nobody tells you whether the output is yours to keep or stuck in their dashboard.

That gap is the whole angle of this page. Assrt publishes all three: the prompt, the payload, and the output file. You can read them in ten minutes and then decide whether this tool belongs in your stack.

Closed generator
Prompt hidden, output locked to a dashboard

You paste a URL, a vendor prompt runs against a vendor model, and the generated test lives in a vendor UI. Leaving means re-authoring everything.

Assrt generator
Prompt in the repo, output on your disk

Three screenshots and 8000 chars of accessibility text go to Claude Haiku with an 18-line prompt. The output is #Case Markdown at /tmp/assrt/scenario.md.

The generation pipeline, with no parts hidden

Three inputs on the left. One generator in the middle. Three outputs on the right. Everything in this diagram maps to a specific line in the assrt-mcp repo.

URL → 3 snapshots → Claude Haiku → #Case Markdown

Target URL
3 screenshots
Accessibility text
assrt_plan
scenario.md
scenario.json
Runnable suite

The 18 lines that write your tests

This is the entire system prompt that shapes every #Case Assrt generates. It is not a simplified version. It is what lives at server.ts:219-236. Read it, disagree with it, copy it into your own fork, or ship it as is.

assrt-mcp/src/mcp/server.ts

Two constraints do most of the heavy lifting. The self-contained rule kills the 40-step test that assumes three previous cases set up state. The observable-things rule kills tests that assert on CSS or pixel positions (the most common flake source in AI-generated Playwright suites). Combine that with the 5-8 cases cap and the 3-5 actions cap, and the model physically cannot return the sprawling brittle suites other generators default to.

18 lines

The entire generator prompt is 18 lines of plain text in an open-source repo. No reverse engineering required. No trade secret. Fork, tune, ship.

PLAN_SYSTEM_PROMPT, assrt-mcp/src/mcp/server.ts:219-236

The call site: 3 screenshots, 8000 chars, one Haiku call

The prompt is half the story. The other half is the payload the prompt actually runs against. Here is the real code path: launch Chromium, navigate, snapshot, scroll 800, snapshot, scroll 800, snapshot, concat, slice, send. No retries, no hidden ReAct loop, no pre-processing step you cannot see.

assrt-mcp/src/mcp/server.ts

Same signup test, two output formats

A closed generator emits something like the left. Assrt emits the right. Click the toggle to flip between them. Notice that the right has zero locator strings, zero timeouts, and fits in the width of a terminal.

Closed generator (YAML in a dashboard) vs. Assrt (Markdown on disk)

Proprietary YAML. Every locator is a hardcoded string that the generator guessed from the DOM. Lives in a vendor dashboard with access controls. Migrating away means re-authoring every case.

  • data-testid strings (email-input-v3, dashboard-heading-v2)
  • Hardcoded timeout_ms: 5000 on every assertion
  • on_failure: heal_with_model is a paid vendor feature
  • Cannot grep, cannot render on GitHub, cannot diff cleanly

What a generator run actually prints

This is the sequence you see when you run assrt_plan on a live URL. The three snapshots, the scroll offsets, the slice step, the model call, the parsed cases, the file write. Each line corresponds to a block in server.ts:793-855.

npx assrt-mcp call assrt_plan --url https://example.com

Numbers from the source, not a vendor benchmark

Every figure below is a constant or default pulled directly from the assrt-mcp repo. Each cites its file and line so you can verify.

0
Lines in the generator prompt
PLAN_SYSTEM_PROMPT, server.ts:219-236.
0
Screenshots per URL
Scroll 0 / 800 / 1600, server.ts:793-805.
0
Accessibility chars sliced
.slice(0, 8000), server.ts:809.
0
max_tokens on the Haiku call
server.ts:831.

How the generator works, end to end

Five steps. Each one is a block of code you can open in the repo.

1

You hand the tool a URL

Either via MCP ('Use assrt_plan on http://localhost:3000' to Claude Code) or via CLI (npx assrt-mcp --plan --url ...). No form to fill, no dashboard to click through, no test template to pick from. The URL is the only required input.

2

Assrt spawns a real Chromium and captures three snapshots

The runner starts @playwright/mcp locally, navigates, takes a screenshot and accessibility snapshot, scrolls by 800 pixels, captures again, scrolls by 800 pixels, captures a third time. See server.ts lines 793-805. The scroll constant is hardcoded but trivially tunable in the source.

3

It builds one 4-part payload and calls Claude Haiku

Three base64 JPEGs, plus the concatenated accessibility text sliced to 8000 characters, plus a short user framing message, plus the 18-line PLAN_SYSTEM_PROMPT as system. One anthropic.messages.create call with max_tokens 4096. That is the whole model call; there is no orchestration layer hidden behind it.

4

The model returns 5 to 8 #Case blocks

Because the prompt caps the output at 5-8 cases of 3-5 actions each, the result is a small, focused plan. No sprawling 40-step Page Object dance, no generated locator strings. Every step is English: navigate, click, type, assert visible text or URL or title.

5

The plan lands on your disk as Markdown you own

The plan string is written to /tmp/assrt/scenario.md (layout defined in /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts). You can edit it, delete cases, add new ones, or paste it into your repo. When you run the suite, the Assrt runner reads this file, re-discovers every element from the live accessibility tree per step, and executes on real Chromium through @playwright/mcp.

What the generator hands you, in both worlds

Left: what a closed AI Playwright generator produces, stuck in a vendor format. Right: what Assrt writes to your disk. Both describe the same three-case signup flow.

Closed generator (yaml)24 lines
# A closed AI Playwright generator's output.
# Proprietary shape. Lives in a vendor dashboard.
# You cannot grep it, GitHub cannot render it,
# and leaving the vendor means re-authoring everything.

scenario:
  id: sc_8f2e91c4ab
  name: user_signup_flow
  runtime_config:
    browser: chromium
    headless: true
    viewport: { width: 1280, height: 720 }
    retry: 2
    wait_ms: 5000
  steps:
    - action: navigate
      target: "/signup"
    - action: fill
      locator_type: "data-testid"
      locator_value: "email-input-v3"
      value: "${VAR.email}"
    - action: click
      locator_type: "role"
      locator_value: "button"
      locator_name: "Sign up"
    - action: assert
      type: "element_visible"
      locator_type: "data-testid"
      locator_value: "dashboard-heading-v2"
      timeout_ms: 5000
  on_failure: heal_with_model
Assrt (scenario.md)21 lines
# Assrt generator's output: plain Markdown.
# Saved to /tmp/assrt/scenario.md. Grep works. Diff works.
# Check it into your repo, it is a flat text file.

#Case 1: A new user signs up and lands on the dashboard
1. Navigate to /signup
2. Type a disposable email into the email field
3. Click the Sign up button
4. Wait for the page to stabilize
5. Assert: the heading on the page says "Dashboard"

#Case 2: A visitor explores the pricing page from the nav
1. Navigate to /
2. Click the Pricing link in the header
3. Assert: the URL path is /pricing
4. Assert: the heading "Plans" is visible

#Case 3: A logged-out user hits a protected route
1. Navigate to /dashboard
2. Assert: the URL path is /signin
3. Assert: the heading "Sign in" is visible

# There is no locator string in this file. The runner re-discovers
# every element from the live accessibility tree at run time.

Assrt vs. closed AI Playwright test generators

Treat this as a lens, not a leaderboard. Any closed generator that hides its prompt and stores output in a dashboard reads like the left column.

FeatureClosed AI Playwright generatorsAssrt (open prompt + Markdown output)
Is the generator prompt published and auditableNo. The prompt is in the vendor's backend.Yes. 18 lines at server.ts:219-236. Fork it and ship.
What you hand the generatorA URL plus sometimes credentials, behind a formA URL. Either from Claude Code (MCP) or npx assrt-mcp
What the model actually seesBlack box. You cannot log the payload.3 JPEG screenshots + 8000 chars accessibility text + system prompt
Output formatProprietary YAML / JSON / SaaS-internal recordsPlain Markdown #Case blocks at /tmp/assrt/scenario.md
Where the tests liveVendor dashboard, behind an accountYour disk. Optionally your Git repo.
Locator strings in the generated testYes. Then sold as a feature, healed later.Zero. Intent only, re-discovered at run time.
Runtime engineProprietary runner or wrapped PlaywrightReal @playwright/mcp driving real Chromium
Cost at comparable scale$7.5K / month per seat for closed AI QA platforms$0 + Anthropic tokens (open source, self-hosted)
Data boundaryYour DOM and screenshots leave your networkSet ANTHROPIC_BASE_URL, everything stays in-house

Generate tests for your URL on the call

15 minutes. Bring a URL. We run assrt_plan live, read the generated #Case blocks together, and show you what it looks like to never maintain a locator string again.

Book a call

AI Playwright test generator, answered

What actually goes into the model when Assrt generates a Playwright test from a URL?

Three JPEG screenshots and up to 8000 characters of accessibility-snapshot text, plus one prompt. The implementation is at /Users/matthewdi/assrt-mcp/src/mcp/server.ts lines 793-834. The assrt_plan tool launches a local Chromium via Playwright MCP, navigates to your URL, calls screenshot() and snapshot(), scrolls by 800 pixels, captures again, scrolls by 800 pixels, captures a third time. It then slices the concatenated accessibility text to 8000 chars, wraps the three screenshots as base64 image parts, and hands everything to claude-haiku-4-5-20251001 with max_tokens 4096. The model returns a plan string containing 5 to 8 #Case blocks, which is the test file the runner will later execute. Nothing about the payload is hidden; you can run the flow in your terminal and inspect every byte.

Can I actually read the prompt that generates my Playwright tests?

Yes, and you do not have to scrape it from a request log. It lives in the repo at /Users/matthewdi/assrt-mcp/src/mcp/server.ts lines 219-236 as a constant named PLAN_SYSTEM_PROMPT. The prompt is 18 lines long and covers four things: (1) what the browser agent can and cannot do, (2) the exact output format (#Case N: name followed by steps), (3) six CRITICAL Rules (self-contained cases, specific selectors, observable verifications, short 3-5 action cases, no features behind auth unless visible, 5-8 cases max), and (4) the constraint that it is generating for an AI agent that cannot inspect CSS or run arbitrary JavaScript. Closed-source generators like BrowserStack Playwright AI, ZeroStep, and Playwright's own Generator agent do not publish this. If you want to change behavior, you fork the file and ship. No waiting on a vendor.

How is this different from Playwright's own built-in Generator agent in v1.56?

Playwright's Generator agent writes TypeScript spec files that contain locator strings and call expect(). That is lovely if you already have a Playwright project and want AI-written .spec.ts code committed to it. Assrt writes something different: plain Markdown with intent ('Click the Sign up button', 'Assert the heading says Dashboard'), stored at /tmp/assrt/scenario.md. When the Assrt runner later executes those cases, it re-discovers every element from the live accessibility tree per step, so the tests never contain locator strings that can rot. Both approaches use @playwright/mcp under the hood; the difference is the artifact they hand you. If you want an AI to write you TypeScript that you will later maintain, use Playwright Generator. If you want an AI to write you an intent description that it will execute forever without ever binding to a selector, use Assrt.

Does Assrt emit proprietary YAML or a locked SaaS format?

No. The generator output is literally a string containing #Case blocks in Markdown. You can paste it into a gist, check it into Git, render it on GitHub, grep it, diff it, run it through a linter, or copy half of it into another project. The scenario layout is defined at /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts: scenario.md is the plan, scenario.json is metadata (ID and name), results/latest.json is the last run. Closed AI Playwright generators like Testim, Mabl, and Momentic store the generated test cases in their own backend and render them in their own dashboard; moving off their platform means re-authoring everything from scratch. Assrt's format is a flat file on your disk, so migration cost in either direction is the text of the file.

Why three screenshots at scroll offset 0, 800, 1600, instead of one full-page capture or the whole DOM?

Three screenshots with scroll gives the model a fair shot at seeing the hero, the mid-page content, and below-the-fold features without blowing past context limits on a long landing page. A single full-page screenshot gets compressed and text becomes unreadable for the model. Feeding the raw DOM drowns the model in divs and inline styles that do not help it reason about user-visible flows. The 8000-char accessibility snapshot fits under Haiku's token budget, leaves room for the screenshots, and keeps the roles, labels, and text content that matter for generating test cases. The three-screenshot choice is a deliberate trade between coverage and token cost, and the constant 800 (pixels per scroll) is right there at server.ts:797 and 802 if you want to tune it.

Which Claude model generates the tests and can I swap it for something else?

claude-haiku-4-5-20251001, called with max_tokens 4096 (server.ts:830-831). Haiku is the right default: generating a test plan from a URL is a bounded task with a clear output shape, and Haiku is fast and cheap for it. If you want a bigger model for richer test ideas, the assrt_plan MCP tool accepts a model parameter, and the whole pipeline honors the ANTHROPIC_BASE_URL environment variable, so you can point at a local proxy, a hosted clone, or an air-gapped endpoint. For teams with strict data policies, this matters because the generator payload (your URL, your screenshots, your accessibility tree) only ever leaves your machine if you let it.

I came here from a Reddit thread about flaky AI-generated Playwright tests. Does Assrt fix that?

Most flakiness in AI-generated Playwright tests comes from two sources: brittle locator strings that the model guessed and hardcoded sleeps that do not match real page timing. Assrt sidesteps both. The generator writes intent in Markdown, not locator strings, so there is nothing for a locator change to break. The runner uses a wait_for_stable primitive backed by a MutationObserver (defined at agent.ts:872-925 in the assrt repo), which waits until 2 consecutive seconds pass with zero DOM mutations or 30 seconds elapse, so fast pages return fast and streaming pages wait as long as they actually need. Between those two decisions, the kind of flakiness you see on Reddit threads for AI codegen tools mostly does not happen. What does happen: a test can fail because the app genuinely changed in a way your intent no longer covers, in which case you run assrt_diagnose and get a corrected #Case to paste back in.

Can I generate a test and then edit it by hand?

Yes, and this is the workflow most people settle into. The generator's job is to do the first 80 percent: recognize the visible flows on a URL and draft self-contained cases. Then you open /tmp/assrt/scenario.md in your editor, tweak the wording, delete the ones you do not care about, add a case the generator missed (a pattern it could not see because it is behind auth), save. The runner auto-syncs edits to cloud storage via a file watcher, but the authoritative copy is still the file on your disk. You can also check scenario.md into your repo next to your Playwright project as your living test spec; the whole point of the Markdown format is that it survives tooling changes.

How do I actually run the generator on my own app?

Two paths. Through an MCP-capable client like Claude Code: connect the assrt-mcp server (one line in your MCP config), then ask 'Use assrt_plan on http://localhost:3000'. The tool launches a local browser, takes the three screenshots, sends them to Haiku, and writes the result. Or from the CLI: npx assrt-mcp is the entry point (see /Users/matthewdi/assrt-mcp/src/cli.ts), and you can call it with --plan to generate, --run to execute, or combined. Both paths require an Anthropic credential; the getCredential helper (server.ts:777) supports both API keys and OAuth tokens, so you can use a Claude Pro subscription or a raw key depending on your setup.

What keeps the generator from hallucinating fake form fields or buttons that do not exist?

The accessibility snapshot. Everything the model writes about selectors, buttons, and labels has to trace back to nodes that actually appeared in the snapshot text it was given, because the runner will later try to find those nodes for real. A hallucinated 'Click the Checkout button' on a page with no Checkout button fails the moment the test runs, and assrt_diagnose will call it out. In practice, the stricter constraint is the PLAN_SYSTEM_PROMPT rule that each case must be SELF-CONTAINED and VERIFY OBSERVABLE THINGS (visible text, page titles, URLs, element presence). That rule alone kills the two biggest hallucination patterns for AI test generation: tests that assume invisible state and tests that assert on CSS. Everything left is grounded in what the model literally saw in the three snapshots.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.