From a Reddit thread on flaky suites

The cheapest AI Playwright test maintenance is no maintenance, because there is no locator to break

Every mainstream AI Playwright self-healing product (BrowserStack, LambdaTest, TestMu, the new Healer agent) assumes the same thing you do: your tests contain locator strings, those locators go stale, and a model should try to patch them at failure time. It works often enough to sell. The better idea is to not store locators at all. Assrt's agent calls a snapshot of the live accessibility tree before every action, uses ephemeral ref IDs that only live for that one step, and throws them away. There is nothing to go stale, because nothing got saved in the first place.

Matthew Diakonov, Written with AI

Published April 20, 202612 min read

Don't heal locators. Don't store them.

How snapshot-first AI Playwright testing works

Tests are Markdown intent, not locator strings

Agent re-snapshots the accessibility tree every step

Ref IDs like e5 only live one action, then get discarded

UI refactor = different tree, same intent still matches

Wait_for_stable uses MutationObserver, not hardcoded timeouts

0:00 / 0:05

4.9from Assrt MCP users

Selector strategy enforced by SYSTEM_PROMPT (agent.ts:206-218)

Tests stored as plain Markdown at /tmp/assrt/scenario.md

wait_for_stable uses real MutationObserver (agent.ts:872-925)

Open source, self-hosted, $0 beyond LLM tokens

What every top search result for this keyword gets wrong

Search "ai playwright test maintenance" and the results converge on a single story: locators go stale, AI patches them, your suite gets shorter outages. BrowserStack pitches Self-Heal. LambdaTest calls it Auto Heal. TestMu frames it as auto-healing. The Ministry of Testing essay walks you through an LLM that rewrites your locator. Playwright's own new Healer agent runs a failing test in debug mode and uses an accessibility-tree snapshot to propose a fix. All useful. All locked to the same premise: the test file contains a locator, the locator breaks, a model fixes it after the fact.

The premise is where the cost is. If your test never stores a locator, there is nothing for a healer to heal. That is the angle this guide is about, and it is the one thing the SERP uniformly skips.

Locator-first (everyone)

Heal the selector after it breaks

Your spec has getByTestId("submit"). Someone deletes the test-id. Test fails. AI proposes a new selector. You review and commit. Again next sprint.

Snapshot-first (Assrt)

Do not store the selector in the first place

Your test says "Click the Sign up button" in plain Markdown. The agent reads the live accessibility tree per step and finds it. Refactors do not touch the test.

The rule is a system prompt you can read and audit

Self-healing in closed products is an ML black box. Assrt's snapshot-first discipline is 20 lines of text in the agent's system prompt. It is open source. It is boring. Boring is what you want for a thing that runs on every CI build.

assrt/src/core/agent.ts

Two things to notice. The CRITICAL Rules block hard-wires the snapshot-per-step loop at the prompt layer, which means a fine tune or a model swap cannot drift away from it without editing the repo. The Error Recovery block handles the case where a ref you just read becomes stale because the page moved under you (a modal opened, a nav happened): re-snapshot, pick a different ref, scroll and retry, then surface evidence. No retry loop in the test author's file. No fragile Page Object to keep in sync.

0 locators

“Assrt stores zero selectors across your test suite. Every ref is re-discovered per step and discarded after. Maintenance is whatever your Markdown intent files need, which is usually nothing.”

SYSTEM_PROMPT, agent.ts:206-218

Same signup test, two mental models

The traditional Playwright spec stores three locators. Each is a future PR waiting to be written. The Assrt version stores none. Click the toggle.

Traditional locator-first spec vs. Assrt #Case Markdown

Every click and assertion binds to a selector string at authoring time. When the UI shifts, the test breaks. AI self-heal can patch, but only after a failed run, and only with human review.

input[name="email"] breaks if the field is renamed
getByRole("button", { name: "Sign up" }) breaks on copy change
[data-testid="dashboard-heading"] breaks if someone drops the test-id
Hardcoded 5000ms timeouts on CI = flakes you fight forever

The two specs side by side

// The traditional Playwright approach you maintain forever.
// Every selector here is a liability waiting for a UI change.

test("signup then see dashboard", async ({ page }) => {
  await page.goto("/signup");

  // Locator 1. Breaks if email field gets renamed or re-wrapped.
  await page.locator('input[name="email"]').fill("user@example.com");

  // Locator 2. Breaks if the button copy changes or role nesting shifts.
  await page.getByRole("button", { name: "Sign up" }).click();

  // Locator 3. Breaks if the test-id gets dropped during a refactor.
  await expect(page.locator('[data-testid="dashboard-heading"]'))
    .toBeVisible({ timeout: 5000 });
  // Hope 5000ms is enough on CI. Add retries when it isn't. Go numb.
});

0% locators

How a #Case becomes a real Playwright run

You provide three inputs. The agent, running Claude Sonnet, translates intent to action through the Playwright MCP bridge. The outputs are what you would see from any Playwright test run, because they are a Playwright test run.

Inputs to Playwright, via snapshot-first agent

Anchor facts, grep-able in the repo

snapshot-firstephemeral refs (ref="e5")#Case Markdownwait_for_stableMutationObserver@playwright/mcp runnerclaude-sonnet for execclaude-haiku-4-5 for diagnoseANTHROPIC_BASE_URL swapscenario.md in your reporeal Chromium, real .webm videosagent.ts:206-226

What the numbers look like

Not a vendor benchmark. Numbers taken from Assrt's source (prompts, defaults, code paths). Each one is verifiable by opening the files cited at the bottom of the page.

Locators stored per test

Zero by design; refs are ephemeral.

Default DOM-silence window

wait_for_stable, agent.ts:874.

Stability ceiling

Max wait before marking failed, agent.ts:873.

Diagnose output tokens

Haiku call in assrt_diagnose, server.ts:897.

The wait strategy that replaces your hardcoded timeouts

A big chunk of Playwright test maintenance comes from tuning waits. The 5000ms that works locally is too short on CI. The 10000ms that works on CI makes the suite crawl. Retries bandaid over it but never fix the root cause. Assrt ships a primitive that adapts to the page instead of guessing at it. The whole thing is a MutationObserver plus a polling loop.

assrt/src/core/agent.ts

A fast SPA returns from wait_for_stable in under 500ms. A page streaming tokens from an LLM returns whenever the streaming naturally stops. A broken page that churns forever times out at 30s with a count of how many mutations fired, which is a diagnostic hint you would not get from a setTimeout. You get a correct answer without encoding any assumptions about the page's speed in your test file.

What maintenance looks like, step by step

Author the test as intent, not mechanics

Write a #Case block in /tmp/assrt/scenario.md. Each step is a plain English instruction: navigate, click, type, assert. No locator strings. No timeouts. No page objects. The file is Markdown; grep and diff work. Check it into your repo alongside your Playwright project if you want.

Agent snapshots the live accessibility tree per step

Before every click or type, the agent calls snapshot() on the Playwright MCP bridge. It gets back the accessibility tree with ephemeral ref IDs like ref="e5" that only exist for that one snapshot. The agent picks the ref that matches the intent from your Markdown, then fires click/type with that ref.

UI refactor lands, next snapshot just returns a new tree

Designer renames Sign up to Create account. Frontend engineer wraps it in a new parent div. Someone deletes the data-testid. Next time the test runs, snapshot returns a fresh tree. The agent reads Create account from your #Case intent, finds the button by role and label, and clicks it. The test never knew there was a refactor.

Wait for DOM silence, not a hardcoded timeout

When the page has streaming content, loading spinners, or async submits, the agent calls wait_for_stable. That injects a MutationObserver and exits once 2 seconds pass with zero new childList or characterData mutations (or 30s ceiling). Fast pages return fast; streaming pages wait exactly as long as they need to.

If a test genuinely should fail, diagnose rewrites it

When the test fails because the app actually changed in a way your intent no longer covers, you call assrt_diagnose. It forks between app bug, flawed test, and environment issue, and if the test needs to be rewritten it emits a corrected #Case block you paste back into scenario.md. That is the only time you touch the test file.

The same UI refactor, two very different outputs

A designer renames data-testid="cta-signup" to data-testid="cta-register" in one component. Here is what each mental model does with that change on the next CI run.

One UI change, two outcomes

Locator-first self-heal vs. snapshot-first Assrt

This is not a feature-by-feature comparison of competitors. It is a comparison of mental models. Every closed AI QA platform is a variation on the left column.

Feature	Locator-first AI self-heal (BrowserStack, LambdaTest, TestMu, Healer)	Assrt (snapshot-first)
Where locators live at authoring time	In your spec files as page.locator / getByRole strings	Nowhere. Authoring is intent only.
Where locators live at runtime	Evaluated against the live DOM; cached for healing	Ephemeral ref IDs from a per-step accessibility snapshot
What a UI refactor costs	Failed run + AI-proposed patch + review + commit	Nothing. Next snapshot is a different tree; same intent still matches.
Where tests live	Proprietary YAML, vendor dashboard, or .spec.ts code	Plain Markdown at /tmp/assrt/scenario.md or in your repo
Wait strategy	Hardcoded timeouts plus retry plus expect(...).toBeVisible	MutationObserver-based DOM-silence detection (wait_for_stable)
Self-heal when a test really does need updating	Proprietary ML service patches a locator; you approve	assrt_diagnose emits a corrected #Case block in Markdown you paste in
Hosting	SaaS cloud; your DOM and videos live in their backend	Self-hosted; everything stays on your machine (ANTHROPIC_BASE_URL to fully localize)
Pricing at comparable scale	$7.5K / month per seat bundle (closed AI QA platforms)	$0 + Anthropic tokens (open source, self-hosted)

Walk through your flaky suite with the maintainer

15 minutes. Bring one test that keeps breaking. We'll show you what it looks like as a #Case with zero locators, on your actual URL.

AI Playwright test maintenance, answered

Why is AI Playwright test maintenance such a constant problem in the first place?

Every mainstream approach stores locators. You write page.getByRole('button', { name: 'Sign in' }) or page.locator('[data-testid="submit"]') at authoring time, and the moment a designer renames the button, swaps the test-id, or wraps it in a new parent, the locator stops matching. AI self-healing products detect that failure and try to guess a replacement locator at runtime using historical context, visual hashes, or LLM inference. It works often enough to ship, but you are still maintaining a stale cache of selectors you hope the model can repair. The cheaper move is to not store locators at all. Assrt's agent calls a snapshot of the accessibility tree before every single action, uses ephemeral ref IDs that only live inside that one step, and discards them afterward. There is nothing to go stale. The rule is enforced by the CRITICAL Rules section of SYSTEM_PROMPT at /Users/matthewdi/assrt/src/core/agent.ts lines 206-218: 'ALWAYS call snapshot FIRST' and 'If a ref is stale (action fails), call snapshot again to get fresh refs.'

How is this different from BrowserStack's AI Self-Heal, LambdaTest Auto Heal, or TestMu AI?

Those products sit on top of your existing Playwright test code. Your spec file still contains page.locator calls. When a run fails, their service intercepts the failure, consults a model plus historical runs, and patches the locator in-flight. You stay on the locator-first mental model; they add an ML repair step. Assrt replaces the mental model. Your test is a #Case block in plain Markdown that says 'Click the Sign In button', and the agent discovers the Sign In button from the live accessibility tree every time the test runs. There is no step where a locator breaks, because there is no locator. There is also no cloud dashboard, no seat-based pricing, no proprietary YAML to lock you in. The cost side matters: AI-driven closed platforms price around $7.5K/month per seat bundle. Assrt is open-source, self-hosted, $0 beyond the Anthropic tokens your runs consume.

Does Assrt actually use Playwright under the hood, or is it a new engine?

It uses Playwright. The runner spawns @playwright/mcp (the official Playwright MCP server) and drives a real Chromium process. You get the full Playwright surface: real network traffic, real rendering, real video capture written as .webm files to /tmp/assrt/<runId>/. What Assrt changes is the authoring layer above Playwright, not Playwright itself. You write intent in Markdown, the agent translates intent to snapshot+ref+click sequences in real time, and Playwright executes. When someone on your team already knows Playwright, the videos, traces, and failures look exactly like Playwright output because they are Playwright output. You are not buying into a parallel universe; you are changing how tests get authored and maintained.

What does a test file actually look like and where is it stored?

It is a markdown file at /tmp/assrt/scenario.md with #Case blocks. Example: '#Case 1: Email signup\n1. Navigate to /signup\n2. Type a temp email into the email field\n3. Click the Sign up button\n4. Assert: dashboard heading is visible'. The layout is defined at /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts (scenario.md for plan text, scenario.json for metadata, results/latest.json for the last run). You can check these into your repo next to a Playwright project, version-control them, diff them, review them. Because the format is plain Markdown, grep works, GitHub renders them, and there is no migration pain if you decide to move to a different runner later. The Corrected Test Scenario that assrt_diagnose emits when a test genuinely needs rewriting is the same format, so you paste it straight into scenario.md.

How does Assrt handle flaky tests and dynamic content without hardcoded waits?

It ships a wait_for_stable tool backed by a real MutationObserver. The implementation is at /Users/matthewdi/assrt/src/core/agent.ts lines 872-925. When the agent calls wait_for_stable, it injects a MutationObserver that counts childList, subtree, and characterData mutations on document.body, then polls every 500ms until either (a) 2 seconds pass with zero new mutations, or (b) a 30-second ceiling is hit. This adapts to the page instead of guessing. A fast SPA settles in 400ms; a chat streaming LLM tokens might churn for 4 seconds. Both are handled by the same primitive. Flaky tests in locator-first Playwright are usually either stale-locator flakes (handled by snapshot-first re-discovery) or timing flakes (handled by wait_for_stable). Between the two, most of the flakiness that drives maintenance overhead in traditional suites never manifests.

What model runs the agent, and can I swap it out?

Execution runs on Claude Sonnet through the Anthropic SDK. Diagnosis (the assrt_diagnose tool that rewrites a failing #Case after a real failure) runs on Claude Haiku 4.5 with a 4096-token output cap, because diagnosis is bounded and cheap. The full pipeline respects the ANTHROPIC_BASE_URL environment variable, so you can point the runner at a local proxy, a hosted model compatible with the Anthropic API shape, or an air-gapped endpoint. For teams with compliance constraints, this matters: your app under test, your scenarios, and your test videos never leave your network unless you want them to. That is the 'self-hosted, no cloud dependency' property. Contrast with locator-healing SaaS products where your DOM snapshots, screenshots, and selector history all live in their backend.

Can I migrate an existing Playwright spec into an Assrt #Case?

Yes, and the translation is almost mechanical. await page.goto('/signup') becomes 'Navigate to /signup'. await page.getByRole('button', { name: 'Sign in' }).click() becomes 'Click the Sign in button'. await expect(page.getByRole('heading')).toHaveText('Dashboard') becomes 'Assert heading text is Dashboard'. The locator is gone; the intent remains. If your existing spec is doing something esoteric (say, a hand-written XPath expression or a race-condition workaround), translate the underlying behavior rather than the mechanics. The full conversion usually takes five minutes per test and the resulting #Case is shorter than the TypeScript it replaced. Keep both during a migration if you want a safety net; nothing about Assrt prevents a Playwright project from sitting next to it.

What happens during a test run when a UI element changes mid-flow?

Nothing surprising. The agent already calls snapshot between actions, so it sees the new state on the next iteration and uses a fresh ref. The Error Recovery block of SYSTEM_PROMPT (agent.ts lines 220-226) spells out the sequence: action fails, call snapshot, page may have shifted (modal, navigation), try a different ref or approach, scroll and retry after three failures, mark as failed with evidence if truly stuck. In practice this means an A/B test that swaps a button variant mid-run does not crash your test; the agent finds the new button by role and label and proceeds. A traditional Playwright locator would have hardcoded '[data-testid="cta-v2"]' and required a rerun after someone renamed it to 'cta-v3'.

Does the accessibility-tree-first approach work on apps that have bad accessibility?

It degrades gracefully. The accessibility tree that @playwright/mcp snapshots is the computed tree, which includes non-ARIA elements with their text content and roles inferred from HTML semantics. Buttons without aria-labels still show up with their visible text. Divs with click handlers show up as generic interactive nodes. The agent can fall back to evaluate for arbitrary JS if a page is truly opaque, and the SYSTEM_PROMPT explicitly documents an evaluate-based workaround for OTP inputs split into single-character fields (agent.ts line 235). For apps that have zero accessibility story at all, you get the same coverage you would have had with role-based Playwright locators plus an escape hatch. The fix is usually to improve the app, which is also good for users.

I saw this link from Reddit. Is it safe to paste my app's URL into a diagnose tool?

Safer than the alternatives. Assrt MCP runs as a local Node process on your laptop (npx assrt-mcp), so the target URL can be localhost, a private preview URL, or anything reachable from your machine. Outbound network goes to the LLM provider and the target URL you chose; no third-party SaaS middleman sees your DOM, your test plan, or your results. Test artifacts land under /tmp/assrt on your own disk. If your security team requires strict LLM routing, set ANTHROPIC_BASE_URL to a proxy you control and every inference call respects it. The scenario format is Markdown you own, the runner is open source, the videos are .webm on your filesystem. There is no vendor who can lose, leak, or gate your test data.