Architecture guideAccessibility treePlaywright MCPNo vendor YAML

Self-healing tests guide: there is nothing to heal when the test is prose

Every guide on self-healing tests explains the same thing: how an AI engine detects a broken CSS or XPath locator, picks a fallback, and patches the stored string. That is useful if your tests store locators. This guide is about the architecture above that, where the test artifact is plain English and the locator does not exist until the moment the run starts. I wrote it because the top ten search results cover selector patching and almost nothing else, and in the Assrt source the whole concept collapses into one try/catch that just re-reads the page.

Matthew Diakonov, Written with AI

Published April 20, 202611 min read

Nothing to heal

Tests as prose, resolved per run from the accessibility tree

/tmp/assrt/scenario.md holds English, not selectors

snapshot() returns role, name, and a per-run ref=eN

The LLM binds intent to ref every run, fresh each time

On failure, agent.ts:1012-1019 re-snapshots and retries

No stored locator, no heal step, no vendor YAML

0:00 / 0:05

4.9from Assrt engineering

Plan file is /tmp/assrt/scenario.md (scenario-files.ts:17), prose only

Every step resolved from a live ARIA tree via ref=eN (agent.ts:27-28)

On any failure, the agent re-snapshots and retries (agent.ts:1012-1019)

Open-source, self-hosted, no vendor YAML, no $7,500/mo license

#nav > div.user-block > button.cta-primarydiv[data-qa="signup-email"] inputxpath=//section[2]//button[contains(@class,'submit')]css=form >> nth=1 >> input.email//*[@id='react-select-3-option-2']tr:nth-child(4) td:nth-child(2) a.ant-modal-body .ant-btn-primary[data-testid='login-submit-v3-new']

Every string above is a future bug. Self-healing tools patch them after they break. The architecture in this guide never stores them in the first place.

What every top result misses

Search this keyword and the first page is uniform. testRigor, Momentic, AccelQ, Testsigma, Katalon, BrowserStack, TestComplete, ideyaLabs, TestGrid, Cypress. Every one defines self-healing the same way: the framework detects that a stored locator stopped matching, an AI step tries fallbacks (sibling, parent, text, visual, ML-ranked), and the healed string is written back to the test. That is a useful thing inside a world where tests store locators. None of the top results address the world where they do not.

Industry default

Patch the broken locator

Record a selector at authoring time. Detect failure at runtime. Pick a fallback. Rewrite the file. Log "healed." Repeat next refactor.

The gap this guide fills

Do not store the locator at all

Plan is prose. Resolution happens each run against the live accessibility tree. On failure, re-snapshot, do not re-write. Nothing ages between releases.

The plan file is a markdown document

This is the entire persistent artifact for an Assrt test run. No page object, no locator map, no YAML. It lives at /tmp/assrt/scenario.md (defined in scenario-files.ts:17) and it is the only file that survives between runs. When someone renames the primary CTA, this file does not change. There is nothing in it that could drift.

/tmp/assrt/scenario.md

0 locators

“Nothing persistent is matched against the DOM. The plan is prose; the locator is regenerated per run from the accessibility tree.”

scenario-files.ts:17, agent.ts:27-28

How a step resolves at runtime

Every click or type starts with a snapshot call. That call returns the accessibility tree, which lists interactive elements by role, accessible name, and a per-run ref=eN token. The LLM reads the sentence in the #Case block, finds the matching element in the tree, and passes the ref to the click or type tool. The ref is valid only for that snapshot. A new snapshot means new refs. There is nothing here that can outlive the run.

assrt-mcp/src/core/agent.ts

What one run looks like, end to end

Six steps from plan file to scenario complete. Notice that no step writes a locator anywhere; the nearest thing to a locator is a sentence in a markdown file and a per-run token that disappears when the process exits.

Plan as prose

You (or assrt_plan) write #Case blocks in /tmp/assrt/scenario.md. Each step is an English sentence naming an intent: click the Sign in link, type into the email field, verify the dashboard loads.

Snapshot at runtime

On scenario start, the agent opens the URL and calls the Playwright MCP snapshot tool. That returns the full accessibility tree with per-run ref=eN tokens on every interactive element.

LLM binds intent to ref

For each #Case step, the LLM reads the live tree and picks the ref that matches the sentence. 'The Sign in link' maps to the element with role=link and accessible name Sign in. The ref is valid only for this run.

Act, then re-snapshot

Every click or type is followed by a fresh snapshot. Refs from the previous snapshot are discarded. There is nothing to cache, nothing to patch, nothing to version.

On failure, re-read the page

If a ref is stale or the element is not found, agent.ts:1012-1019 catches the throw, calls snapshot again, and hands the new tree back to the LLM with 'try a different approach.' The scenario.md is not touched.

Diagnose rewrites the plan

If a whole scenario fails for a real reason (a flow changed, a field was removed), assrt_diagnose returns a corrected #Case in the same prose format. You review one sentence in a markdown file, not a healed selector.

Watch a happy-path run in the log

Every [mcp] browser_snapshot line is the locator regeneration. Every ref=eN is a per-snapshot token. Restart the run tomorrow against a drifted UI and the ref numbers change but the plan file does not.

npx assrt run — happy path

The line that makes healing unnecessary

The core claim of this guide reduces to one try/catch in one file. When any browser tool throws, the agent does not call a healer, does not fall back to a secondary locator, and does not rewrite anything on disk. It re-reads the page and asks the model to try again. If the model can still match intent to an element in the new tree, the scenario continues. If it cannot, the failure surfaces to the human with the accessibility tree attached so you can see what the model saw.

assrt-mcp/src/core/agent.ts:1012-1019

What drift looks like in the logs

A classic self-healing scenario: the button label moved from "Get started" to "Try it free." The plan file is not touched. The agent notices the mismatch, re-snapshots, and binds the sentence to the new label in the same turn. The run passes, the log carries enough breadcrumbs to audit the interpretation, and you never commit a healed locator.

UI drifted overnight; plan did not

What the industry heals vs what lives on disk here

Side by side. On the left: a classic Playwright script with stored locators, the kind of code every self-healing engine exists to patch. On the right: an Assrt plan for the same flow, which contains zero locators and therefore never needs patching.

Locators on disk, then no locators on disk

// What every self-healing vendor actually heals.
// A stored locator written months ago, against a DOM that has since drifted.
// The vendor's job: detect the failure, try a fallback, rewrite this string.

await page.locator('#nav > div.user-block > button.cta-primary').click();
await page.locator('div[data-qa="signup-email"] input').fill('a@b.com');
await page.locator('xpath=//section[2]//button[contains(@class,"submit")]')
  .click();

// When the dev renames "cta-primary" to "cta-accent", this breaks.
// Self-healing tools fuzzy-match a sibling, patch the locator, and log
// "healed". That patch is now a new brittle string that will rot on the
// next refactor. You are not escaping the treadmill; you are outsourcing
// who maintains it.

7% fewer strings that can drift

Failure flow, classic self-healing vs snapshot-and-retry

Two sequences against the same drifted UI. First: how a locator-patching healer handles it. Second: how the snapshot-and-retry architecture handles it.

Classic self-healing: patch the stored selector

Now the same drift in Assrt. The plan file is prose, so there is nothing to rewrite.

Assrt: re-snapshot and retry

Six properties of a no-locator runner

Each card names a specific design choice and the file that implements it. Nothing here is aspirational. It is all in assrt-mcp today.

The test is the prose

Each #Case in scenario.md is 3-5 English sentences, not code. 'Click the Get started button' is the locator, the intent, and the documentation. There is no CSS string anywhere in the file.

snapshot() is the selector

Playwright MCP returns an accessibility tree where every interactive element has a stable role, an accessible name, and a per-run ref=eN. The LLM picks the ref each time from what the tree says, not what was saved.

Failure recovery is built in

agent.ts:1012-1019: any thrown tool call triggers a fresh snapshot and a 'call snapshot and try a different approach' nudge back to the LLM. No separate heal step, no bolted-on fallback chain.

assrt_diagnose rewrites the prose, not the locator

When a whole scenario fails, assrt_diagnose returns a corrected #Case in the same plain-English format. You review a sentence, not a fuzzy locator patch.

No vendor YAML

The artifact is a .md file you can diff, grep, and commit. When you swap runners, the plan still works anywhere a Playwright MCP stub does.

Self-hosted, open source

Both the web app and the MCP server are code you can read. The runtime resolution loop lives in one TypeScript file. You can port it.

CSS or XPath strings stored per test

Markdown file is the only persistent artifact

Chars of live ARIA tree fed back after any failure (agent.ts:1017)

License fee; self-hosted, open source (competitors up to $7,500/mo)

No-locator runner vs classic self-healing

Same test, same drift, two very different on-disk artifacts. If you pick a self-healing vendor, you are picking who owns the treadmill. If you pick a prose-plus- snapshot architecture, you step off it.

Feature	Classic self-healing platform	Assrt (prose plan + snapshot)
What lives on disk between runs	CSS/XPath/attribute locators stored inside a test script or vendor DB	Natural-language #Case blocks at /tmp/assrt/scenario.md
What happens when the UI drifts	Stored locator fails, vendor tries fallbacks, patches the file, logs 'healed'	Next run reads a fresh ARIA tree; the LLM re-matches intent to element
Whose locator you run in production	The AI-rewritten locator, which then ages until the next heal	No persistent locator; resolved per run from accessibility tree
How a failure propagates	Healer runs only when a known brittle selector throws	agent.ts:1012-1019 catches, re-snapshots, hands the tree to the LLM, retries
Risk of silently masking bugs	Heal-on-fail can paper over intentional regressions that should have caught attention	Agent can flag suggest_improvement if a flow changed unexpectedly
Artifact you own	Proprietary locator format; evaporates on vendor switch	Prose plan + real Playwright MCP calls; no vendor YAML
Price to start	Closed platforms up to $7,500/month	$0, open-source, self-hosted (you pay LLM tokens only)

Seven rules for a no-heal test suite

You do not have to run Assrt to adopt these. Each rule follows from one property of the no-locator model and applies to any runner that reads the accessibility tree per step.

No-heal suite, seven rules

Stop storing CSS and XPath in your tests. If the locator lives in the file, it will drift, and you will either self-heal it or hand-patch it every quarter.
Prefer tools that read the accessibility tree every run (snapshot-and-resolve) over tools that record locators and patch them later.
Write test cases as intent, not implementation. 'Click the Continue button' survives visual refactors; 'click div.btn-primary' does not.
Treat failure as information, not as a trigger to auto-rewrite. A good runner re-reads the page and asks a model what to do, but does not silently heal intentional regressions.
Diff your plan file in code review. Prose #Case blocks are human-readable; vendor-healed locators are not.
Measure maintenance cost per release, not per locator. The real win of the no-locator model is that nothing ages between releases.
Own the artifact. If the plan file only runs inside one vendor, you have not escaped the treadmill.

Bring a brittle selector suite, walk away with a plan file

Thirty minutes. You share one flaky Playwright test with a stored CSS or XPath locator. We rewrite it as three sentences of prose, run it live against Assrt, and hand you the diff so you can see what 'nothing to heal' actually looks like on disk.

FAQ on self-healing tests and the no-locator model

How is this different from self-healing selectors in Testsigma, Momentic, or Katalon?

Those tools store a locator (CSS, XPath, attribute chain, or a proprietary equivalent) in the test. When the locator stops matching, a healer tries fallbacks (sibling, parent, text, visual) and patches the stored string. The healed string then ages until the next refactor. Assrt never stores a locator in the first place. The plan at /tmp/assrt/scenario.md is prose, and every element reference is regenerated per run from Playwright MCP's accessibility tree (see agent.ts lines 27-28 and 207-218). There is no locator to patch, which means there is nothing to heal.

Where is 'the selector' actually stored in Assrt?

It is not stored anywhere. The plan file is /tmp/assrt/scenario.md (written by scenario-files.ts line 17). It contains only natural-language #Case N: name blocks with English steps. At runtime, the agent calls snapshot(), gets back an accessibility tree where each element has a ref like e5 or e42, and the LLM picks the ref for each step by matching intent to the element's role and accessible name. The ref is a per-snapshot token, not a persistent identifier. A new run means a new snapshot, new refs, and a new binding. The only 'selector' is the English sentence.

What does agent.ts:1012-1019 actually do when an action fails?

It catches the thrown error in the try/catch around the tool dispatch, calls this.browser.snapshot() to pull a fresh accessibility tree, and assembles a tool_result that starts with the error message, concatenates the first 2000 characters of the live tree, and ends with 'Please call snapshot and try a different approach.' That result is fed back to the LLM on the next turn. There is no fallback-locator library, no selector-rewrite step, no heal log. The model re-reads the page and decides what to do next. This is why the architecture has no 'healed vs. original' concept; every step is always against what is actually on screen right now.

Is this just what Playwright's built-in role locators already do?

Partially, but the execution model is different. getByRole('button', { name: 'Submit' }) is still a stored locator: you write it once, it lives in your .spec.ts, and if someone renames the button to 'Send' the test breaks and you update the string. Assrt's plan is not a locator; it is an instruction. 'Click the Submit button' in a #Case block can be re-interpreted by the LLM as 'Click the Send button' without any file edit, because the sentence encodes intent and the ARIA tree is re-read every run. Role locators get you closer to semantic stability inside your test code; the no-locator model moves that stability out of the code entirely.

Does the per-run LLM interpretation make tests flaky in other ways?

It trades one failure mode for another. You lose 'my locator went stale' and you gain 'the LLM picked the wrong element because two buttons had similar accessible names.' In practice the second category is smaller because the accessibility tree exposes role plus accessible name plus containing landmark, which is usually unambiguous. When it is ambiguous, the fix is to make the #Case sentence more specific ('Click the Continue button in the signup form', not 'Click Continue'). That is a documentation improvement you would want anyway. The failure-recovery loop in agent.ts:1012-1019 also retries with a fresh snapshot before giving up, which absorbs most transient mismatches. Flakes that remain are almost always real application bugs.

How is assrt_diagnose different from a self-healing engine?

Both run after a failure, but they operate on different artifacts. A self-healing engine takes a failed selector and emits a patched selector (and often writes it back to the file). assrt_diagnose takes the failed #Case report plus the live page and emits a corrected #Case in prose, following the DIAGNOSE_SYSTEM_PROMPT in /Users/matthewdi/assrt/src/mcp/server.ts lines 240-268. The output is a short human-readable block with Root Cause, Analysis, Recommended Fix, and a rewritten scenario. You review a sentence change, not a selector diff. It also explicitly distinguishes 'the test was wrong' from 'the app has a bug,' which selector healers cannot because they conflate the two.

What about visual regressions, timing, data state, or anything else that a locator-patcher cannot fix?

Those are not 'self-healing' problems; they are different failure classes. Assrt handles them with separate tools inside the same agent loop: wait_for_stable for async DOM settling, http_request for verifying external API effects, suggest_improvement for flagging UX regressions, and create_temp_email plus wait_for_verification_code for end-to-end signup flows. The locator-patching vendors still leave those failures unaddressed, which is why benchmarks of self-healing in the wild consistently show that only a minority of test failures are selector-class in the first place. Moving selectors out of the picture entirely lets the runner spend its budget on the other categories.

Can I keep using Playwright and just add accessibility-tree resolution on top?

Yes, and that is roughly the architecture Assrt runs on. The MCP server spawns a Playwright process and exposes browser_snapshot, browser_click, browser_type, browser_wait_for, and friends over stdio. The agent calls those MCP tools instead of writing .spec.ts files. If you already have a Playwright suite, you can move the intent layer (your test cases) into prose #Case blocks, keep Playwright as the browser driver, and stop writing and maintaining selectors in test code. The code change on your side is small; the mental shift is that your committed artifact is a plan, not a program.

How do I verify this myself without running Assrt?

Read two files. First, /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 14-196 defines the TOOLS array; there is no 'locator' parameter on click or type_text, only a human element description plus an optional ref ID that comes from snapshot. Second, /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts line 17 defines the on-disk plan format; it is a plain .md file. No other file in the repo persists a CSS or XPath locator for a test step. If you find one, open an issue; it would be a regression against this architecture.

What does a failure recovery actually look like in logs?

You see three entries. First, the failed tool call: [agent] click element="Get started button" ref=e17 followed by an Error line. Second, an automatic [mcp] browser_snapshot line indicating the agent pulled a fresh tree. Third, a re-planned action: [agent] click element="Try it free" ref=e17. The re-plan is the LLM reading the prose sentence 'Click the Get started button' and deciding that in the current tree the element with role=button and accessible name 'Try it free' is the closest match for the intent. scenario.md is untouched. Compare this to self-healing logs, which typically say 'healed selector from X to Y' and write that new string back into the test file.

Adjacent guides on how the locator problem actually behaves

Keep reading

How-to

How to write self-healing tests

Practical patterns for writing tests that survive UI change without per-locator babysitting.

Read

Playwright

Self-healing playwright tests guide

What Playwright's role locators get you, and what still breaks when the DOM drifts.