Automated self-healing tests that have nothing to heal.
Every self-healing test product on the market does the same thing: it stores a selector, waits for the selector to break, then repairs the selector with a mix of retry logic and an LLM. Assrt does something different. It never stores the selector. Every action in a scenario starts from a fresh accessibility tree, and when an action fails, the live tree gets inlined straight into the agent's next turn. There is no locator to go stale and no AI rewrite to audit.
One sentence, the whole mechanism
You can't heal what you never stored. Assrt re-fetches the page before every click, so stale selector is not a category of failure.
Your test file is Markdown. The runner generates real Playwright calls on the fly. When the UI changes, nothing breaks because nothing was ever wired to the old shape of the page.
What every other self-healing guide gets half-right
Read the top ten results for this keyword. Ministry of Testing, BrowserStack's Self-Heal doc, LambdaTest and TestMu Auto Heal, DZone's AI-driven Playwright article, Nitor Infotech, and the Medium Gherkin-to-Playwright piece. They all describe the same loop: a stored selector, a failure signal, a replacement proposal, a confirmation step. Some use historical metadata, some use a large language model to regenerate the locator string. All of them accept the premise that a test should own a selector and that self-healing is the repair of that owned selector.
That premise is the bug. As soon as the test owns a locator that is expected to keep resolving, you accept a whole class of failure modes: locator drift, locator ambiguity, false positive heals (the replacement targeted the wrong element), and the operational cost of a selector-review queue. None of those exist if the test never owns a locator in the first place.
The anchor fact: what the agent does when a step fails
This is the single block of code that makes Assrt's self-healing work. It sits around every tool invocation in the agent loop. When a click, type, or scroll throws, this is what the agent sees as the result of its tool call.
Notice what is NOT here. There is no findReplacementLocator() call. No historical-run metadata lookup. No retry on the same broken selector. The catch block takes the failure, fetches a new accessibility tree, and hands both to the model as the tool result. The model's next move is made against a clean view of the page as it is right now, not against a locator the system is trying to rescue.
The failure recovery lifecycle, line by line
Line 962: the catch block fires
Every tool dispatch in the agent loop is wrapped in try/catch. A click/type/scroll that throws lands here. There is no separate heal pipeline; failure is handled inline.
Line 963: the error is stringified
err.message or the raw string becomes the first line of the tool_result. The model sees the literal reason Playwright rejected the action.
Line 966: snapshot() pulls a fresh accessibility tree
A brand new browser_snapshot call, not a cached tree. If the page moved on, the new tree reflects that. If the snapshot itself fails, the call is wrapped in its own try block and returns an empty string safely.
Line 967: the first 2,000 chars inline into tool_result
The tool_result string concatenates the error, a short instruction, and snapshotText.slice(0, 2000). The model receives all of this as the response to its failed tool call.
Next model turn: pick a new ref and retry
The model typically calls snapshot again for the full tree, then re-issues the action with a ref taken from the new tree. The scenario keeps running. No state is reset, nothing replays from the top.
What "self-healing" usually looks like, and what it looks like here
Two files. Left is a standard Playwright test that you would then run through a self-heal tool. Right is the equivalent Assrt plan. The plan is all you commit.
Selector-based self-healing vs. selector-free self-healing
// test.spec.ts
import { test, expect } from "@playwright/test";
test("checkout flow", async ({ page }) => {
await page.goto("/cart");
await page.locator('button[data-test="checkout-btn"]').click();
await page.locator('input[name="email"]').fill(tempEmail);
await page.locator('.shipping-form__submit').click();
await page.locator('#pay-now-button').click();
await expect(page.locator('.order-confirmation')).toBeVisible();
});
// When the UI team renames .shipping-form__submit to
// .checkout__continue-button next sprint, the test breaks.
// Your "self-healing" tool notices, asks an LLM for a
// replacement selector, picks .checkout__continue-button,
// and you log into a dashboard to approve the heal.How a single action actually resolves
A `Click "Proceed to checkout"` line in your scenario.md goes through three hops before a click fires in Chromium. The middle hop is the reason self-healing is free instead of engineered.
Every click is resolved against a fresh tree
The numbers that make the approach cheap enough to run every action
Pulling a fresh accessibility tree per action sounds expensive. In practice the snapshot has a hard character cap, the on-failure slice is tiny, and the whole pipeline reuses the same @playwright/mcp session. Open assrt-mcp/src/core/browser.ts and assrt-mcp/src/core/agent.ts to verify all of these.
What Assrt deliberately does not store
Most of the engineering effort in a commercial self-heal product goes into the storage layer: the historical-run metadata, the fingerprint graph, the heal-approval queue, the vendor-hosted grid that all of it depends on. Here is what Assrt chooses to not build, and why that is the point.
No selector strings in your codebase
scenario.md is English. There is no .spec.ts alongside it with page.locator() calls, no page-object pattern, no selectors map. The first time a selector shows up is at runtime, in a single accessibility snapshot that is thrown away after the action.
No historical locator metadata
Cloud self-heal products store fingerprints of every element they ever interacted with so they can propose replacements. Assrt stores none of that. The fresh snapshot IS the source of truth, every time.
No LLM-generated selector to audit
When BrowserStack or LambdaTest heal, the model proposes a new locator string. A human has to confirm the replacement actually represents the intended element. Assrt never produces a replacement; it picks a ref from the current tree.
No vendor-hosted run grid
$7.5k/mo self-healing platforms require running on their grid so the historical metadata stays intact. Assrt is npx assrt-mcp, local Chromium, and your machine. No lock-in, no dashboard login, no seat count.
No selector-repair approval queue
Every mainstream self-heal product has a human-review flow for ambiguous heals. Assrt has zero, because there is nothing to approve. If the agent clicked the wrong element, your assertion catches it the same way it catches any other test bug.
No proprietary DSL, just Markdown
The test file is scenario.md, and the runner generates real Playwright calls via @playwright/mcp at run time. You own the Markdown. If you stopped using Assrt tomorrow, the Markdown would still describe what the tests do.
A concrete first run: UI refactor, zero test edits
Walk through a real scenario where a selector-based test would break and a self-heal product would log a heal event. The scenario.md is unchanged across both runs.
The run log says 0 heal events not because nothing adapted, but because there is no category called "heal event." Adaptation happens inside the normal agent loop, inside the snapshot calls, invisible to the report. That is a feature. Heal counts are a lagging indicator of selector-based design.
Side by side, with the approaches people search for
Matching the top SERP results against the snapshot-first approach. Every row is verifiable: click into the tool's docs and it will tell you how it stores and repairs locators.
| Feature | Selector-repair tools | Assrt (snapshot-first) |
|---|---|---|
| Stores selectors between runs | Yes, that is the source of truth | No, snapshot is the source |
| Rewrites selectors on failure | LLM proposes a replacement | Never, refs are ephemeral |
| Human heal-approval queue | Required to avoid false positives | Does not exist |
| Test file format | .spec.ts with locator() calls | scenario.md in plain English |
| Generates real Playwright calls at runtime | Yes | Yes, via @playwright/mcp |
| Vendor-hosted run grid | Required for metadata storage | None, local Chromium |
| Open source and free to self-host | No, SaaS only | Yes, npx assrt-mcp |
| Works on a never-seen page with zero history | Degrades without past runs | Identical to run #100 |
| Heals by repairing stored selectors | Yes | No |
| Heals by re-reading the page per action | No | Yes, by design |
Why the on-failure snapshot is bounded to 2,000 chars
On a normal agent turn, the model calls browser_snapshot explicitly and receives up to 120,000 characters of accessibility tree. That is the full reference. On a FAILURE turn, the error tool_result inlines only the first 2,000 characters of a fresh tree. The cap is deliberate: the failure context exists so the model can see that the page has moved on, not so it can re-plan the whole scenario from scratch. The agent's next call is almost always another snapshot, which delivers the full tree. The 2k slice is a nudge, not the source of truth.
chars of live DOM
Inlined into every failed tool_result, straight from agent.ts:967.
SNAPSHOT_MAX_CHARS
Hard cap on a full explicit snapshot, see browser.ts:500.
selectors stored
No .spec.ts, no locator map, no page objects. scenario.md is the whole file.
Five steps to replace a selector-repair pipeline with Assrt
If you are currently running a selector-based suite with a self-heal layer on top, this is the minimal path to the snapshot-first approach. You do not have to rewrite every test at once: Assrt runs alongside your existing Playwright tests, so you can migrate the flakiest ones first.
Selector-repair to selector-free, in five steps
Pick the flakiest selector-based test
Sort your self-heal dashboard by heal events per run. The test with the most heals is the test with the most brittle selectors. That is your first candidate.
Translate it into a scenario.md
Rewrite the test as English #Case blocks. If the original test has ten steps, you get ten lines. Keep the assertions explicit: Verify the URL contains /app.
Run it locally with npx assrt-mcp
npx assrt-mcp run --url <your-url> --plan-file scenario.md. A local Chromium launches, the agent interprets your English, you get a WebM video back.
Delete the old spec
Once the Assrt scenario passes consistently across three UI refactors, remove the selector-based spec and its self-heal configuration. There is no more locator to monitor.
Repeat for the next flakiest test
Migration is opt-in and per-test. Your stable selector-based tests can stay as they are; the unstable ones graduate to selector-free scenarios.
The argument for selector-free as the default
Selectors survived this long in automated testing because they were the most machine-friendly handle a page could expose. Accessibility trees are now machine-friendly in a stronger sense: they describe the page the way a user (or a screen reader, or a language model) understands it, not the way a framework happens to lay out the DOM. Once you have a reliable accessibility tree and a model that can read it, the job of "finding the button labeled Pay now" is not selector matching. It is semantic lookup.
The consequence is that self-healing stops being a feature you opt into and a dashboard you monitor. It becomes the default behavior of the runner, with no operational surface. You did not opt into garbage collection in JavaScript; you expect it. Selector-free, snapshot-first test runners are on the same trajectory.
Try selector-free self-healing on one test
Install assrt-mcp and replace your flakiest selector-based test with a scenario.md. The Markdown plan survives UI refactors because nothing in it points at the old DOM.
Install on npm →Adjacent pieces on the snapshot-first runner, the agent toolkit, and the Playwright code it actually drives.
Keep reading
Playwright automated testing, without the locator file
Why the runner calls real @playwright/mcp tools at run time instead of compiling a .spec.ts you maintain by hand.
AI-powered agentic test execution with tool calls
The 15-tool primitive set the Haiku agent uses: snapshot, click, type_text, assert, complete_scenario.
Auto-discovers test scenarios by crawling
How the same snapshot-first primitive also drives scenario discovery across up to 20 pages per run.
Questions about automated self-healing tests
How are Assrt's automated self-healing tests different from Playwright auto-heal or LambdaTest Auto Heal?
Playwright auto-heal, LambdaTest Auto Heal, BrowserStack Self-Heal, and the AI-driven Gherkin-to-Playwright approaches all follow the same loop: you write a test with a stored selector, the selector eventually breaks, the tool detects the break, and it either retries with alternative locators or asks an LLM to propose a replacement. The stored selector is the source of truth, and the healing is a repair on top of it. Assrt never stores the selector. The test is written in English inside a scenario.md file. Before every single action, the agent calls browser_snapshot to fetch the current accessibility tree and acts on ephemeral ref IDs like e5 that regenerate on each snapshot. There is no locator to go stale, and no rewrite to audit.
What happens inside the agent loop when a click fails?
Open assrt-mcp/src/core/agent.ts and read lines 962 to 969. The catch block around every tool invocation does three things: it stringifies the error, it calls this.browser.snapshot() to pull a fresh accessibility tree, and it returns the literal string `Error: ${msg}\n\nThe action "${toolCall.name}" failed. Current page accessibility tree:\n${snapshotText.slice(0, 2000)}\n\nPlease call snapshot and try a different approach.` as the tool_result. That string goes straight back into the LLM's next turn as the response to its tool call. The model sees the failure reason AND the current DOM state in the same message, and decides what to do next. It never has to ask for a replacement selector because it is already looking at the page.
Does that mean the test re-runs the whole scenario on a failure?
No. The scenario keeps running; only the failed step gets retried against the fresh tree. The message history preserves everything the agent already did (navigate, snapshot, assert, etc.), and the failed tool call is followed by a tool_result containing the error plus the 2,000-character slice of the current accessibility tree. The next model turn typically calls snapshot again to see a full tree, picks a new ref, and re-issues the click. There is no backtracking, no replay-from-start, no stored locator lookup.
Why 2,000 characters specifically, and what if the page is huge?
2,000 chars is the on-failure quick-context slice that inlines directly into the error tool_result. It is intentionally small so the model gets a fast nudge without bloating the turn. On a normal turn the agent can still call snapshot explicitly, which returns the full tree up to SNAPSHOT_MAX_CHARS = 120,000 characters (see assrt-mcp/src/core/browser.ts line 500). If the raw tree exceeds that, Assrt truncates to the last clean line break and appends a note telling the model to keep using the refs it already saw. This hard cap keeps token usage under control on giant pages like Wikipedia without the agent losing orientation.
Ref IDs like e5 sound like selectors. How are they different?
They are not selectors, they are one-time pointers into a snapshot. A CSS selector like `button.primary[data-test=submit]` is expected to keep resolving to the same DOM node across runs, across page updates, and across framework re-renders. An Assrt ref like e5 only has meaning inside the accessibility tree that the snapshot just returned. If the agent snapshots again, the same element may be e7. If the page re-renders mid-scenario, the old ref errors out, the catch block injects the fresh tree, and the next turn uses the new ref. No ref ever survives to be stored, saved to a file, or checked into git.
So what IS checked into git as the test?
Your scenario.md file. Plain Markdown. Example: `#Case 1: Checkout flow` followed by four lines of English describing what to click and what to verify. That file is all the user ever commits. No page objects, no fixture fixtures, no selector maps, no .spec.ts. Under the hood, when you run npx assrt-mcp, the agent reads that Markdown and dynamically drives Playwright via the official @playwright/mcp server. The generated tool_calls are not persisted. Next week, when your CSS class names change, you re-run the same scenario.md and the agent adapts, because every action starts from a fresh snapshot.
Does this approach cost more in LLM tokens than selector-based self-healing?
Per-run, yes, because every action involves a snapshot call that returns a few thousand characters of a11y tree. Per-maintenance, much less, because there is no selector-repair prompt, no historical-run metadata store, and no false-positive review workflow. Teams using LambdaTest or BrowserStack self-heal typically pay a per-run subscription AND a per-seat maintenance tax because every heal event needs a human review to confirm the heal targeted the right element. Assrt's approach skips both: the model never proposes a rewritten selector, it just picks a ref from the fresh tree, so there is nothing to human-audit after the run.
Is this really more reliable than a well-written locator with Playwright's retry timeout?
For minor DOM churn, a good Playwright role-based locator with the default timeout handles a lot. Where selector-based tests crack is when the structural shape of the page changes: a form splits into two steps, a modal gets replaced by a drawer, a list becomes a tabbed interface. No amount of locator auto-retry fixes those because the element in question does not have a consistent replacement. Assrt's snapshot-first approach handles those because the English plan says `Click Continue`, the snapshot reports a new accessibility node with that accessible name, and the agent clicks it. The test does not know or care that the modal became a drawer.
How do I know the agent is not just passing a test by clicking the wrong thing?
Every run produces a WebM video with an injected red cursor, a click ripple, a keystroke toast, and a heartbeat pulse that forces continuous compositor frames. You watch the cursor glide to the element the agent clicked. If it targeted the wrong button, you see it. If you want a tighter guardrail, the `assert` tool in the agent's toolkit lets your plan demand specific state: `assert that the URL contains /app` or `assert that the text Order confirmed appears`. Assertions are explicit, named, and stored in the TestReport at the end of the run.
Does Assrt work locally and in CI, and does it need a cloud dashboard?
Local and CI, no cloud required. `npx assrt-mcp run --url <your-url> --plan-file scenario.md` launches a local Chromium via @playwright/mcp stdio. There is no Assrt-hosted dashboard you log into, no account, no run quota. The runner prints a TestReport JSON to stdout, writes a WebM video under /tmp/assrt/videos, and exits 0 or 1. In CI, pipe the report wherever you want. This is the difference between Assrt (open source, self-hosted) and the $7.5k/mo enterprise self-healing platforms, which require running your tests on their grid so they can store the historical metadata they need for selector healing.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.