Architecture edition

Self-healing tests best practices that survive the heal loop

Every top result for this phrase says the same six things: start with stable locators, mix structural and visual healing, review AI rewrites, monitor the logs, categorize root causes, start small. All true, all expensive to act on, and all silent on the one decision that determines whether a self-heal is durable or destabilizing: what exactly lives in the same message as the failure. This guide picks up where the SERP stops.

Matthew Diakonov, Written with AI

Published April 19, 20269 min read

Run these practices locally Read agent.ts on GitHub

Self-healing, six lines of Node

The catch block at agent.ts:955-963 that replaces a whole heal pipeline

On any action failure, call snapshot()

Slice the aria tree to 2,000 chars

Concat error + tree into one tool_result

Return the string to the agent's next turn

The retry already sees the new page

0:00 / 0:05

4.9from Assrt MCP users

6-line catch block replaces the full AI heal loop (agent.ts:955-963)

2,000-char aria-tree slice inlined into every failed tool_result

Snapshot hard cap at 120,000 chars with visible truncation (browser.ts:523)

Zero locator variants persisted per heal, zero heal-audit DB rows

The practice the SERP skips

Read testRigor, Momentic, Quinnox, Testsigma, QA Wolf, and the GeeksforGeeks explainer back to back. Every one of them lands on a similar list: start with stable locators like data-testid or role; combine structural and visual healing; have a human review AI locator updates; monitor auto-heal logs; categorize root causes before acting. The advice is fine. It is also silent on the single decision that most determines whether self-healing tests stay reliable across a year of drift: where the post-failure page state lives relative to the retry.

Two architectures get lumped together under the self-healing label. In one, the failed call returns an error; the heal code then asks the browser for the current DOM; then asks an LLM for a replacement selector; then retries. Three round-trips, two round-trip latencies, one ledger entry. In the other, the failed call returns a single string that already contains the current page state, and the next agent turn simply picks a new reference. One round-trip, no ledger. The first one is what every commercial self-healer ships. The second is what this page is about.

The six lines of Node at the bottom of assrt-mcp/src/core/agent.ts around line 955 implement the second pattern end to end, with no support infrastructure. That file is open source and you can read every character of it.

The catch block, verbatim

Here is the entire healing logic. Not a pseudocode simplification, the actual source. Every other best practice in this guide is a consequence of this block.

assrt-mcp/src/core/agent.ts

Three things are worth naming. First, the error message is preserved verbatim; the model sees the original Playwright failure reason, not a wrapper. Second, the call to snapshot() fetches the aria tree as it is right now, after whatever UI change caused the failure. Third, the .slice(0, 2000) is the byte budget: enough tree to re-orient the model, not so much that a single retry blows the turn. Those three choices are the whole heal architecture.

Inline failure context

0 chars

The exact size of the accessibility-tree slice inlined into every failed tool_result. Pinned at agent.ts:960. The model sees the failure and the current page in the same message, every time.

How the failure payload reaches the retry

Three failure shapes (click, type, assert) converge on the same catch block. The block composes three pieces into one string and returns it as the tool_result for the failed tool_use. Visualized:

one catch block, three outputs, one message

The next model turn receives the tool_result and treats it like any other tool response. There is no privileged 'heal' state, no side channel, no separate log stream. That is exactly what makes this approach portable: any framework that lets you compose a tool result can do it.

Typical AI self-heal vs. inline-context retry

Left is the composite pattern distilled from five commercial self-healing vendors' public docs. Right is the pattern Assrt uses verbatim. The comments are not snark, they are the specific cost lines the left pattern eats.

Heal as a pipeline vs. heal as a message

// The pattern every vendor dashboard ships
// Pseudocode distilled from public docs of 5 top tools

async function clickWithHealing(locator: StoredLocator) {
  try {
    await page.locator(locator.css).click();
  } catch (e1) {
    // Fallback 1: alternate stored locators
    for (const alt of locator.fallbacks) {
      try { await page.locator(alt).click(); return; }
      catch { /* keep trying */ }
    }
    // Fallback 2: LLM generates a new locator
    const dom = await page.content();             // full HTML, huge
    const proposed = await llm.proposeSelector({
      goal: locator.goal,
      html: dom,
    });
    await page.locator(proposed).click();

    // Persist the new locator so it shows up next run
    await db.insertLocatorVariant({
      locatorId: locator.id,
      newCss: proposed,
      approvedBy: "auto-heal",          // ← audit debt
      createdAt: Date.now(),
    });
  }
}

// Two full LLM round-trips on failure,
// one new DB row per heal, all living in
// a vendor store you can't grep locally.

39% fewer lines of heal infra

Eight architectural best practices

These are opinionated. They descend directly from the catch block above. Each one is implementable without Assrt, though each one is also pinned to a line of Assrt source so you can see where it comes from in a real codebase.

Best practices for self-healing tests

Inline the post-failure page state into the same tool_result string as the error message, so the next model turn never has to ask 'what's on screen now?'
Treat every element reference as ephemeral. Refs like e5, e42 that belong to one snapshot only. No locator file, no heal queue, no audit trail of AI rewrites.
Cap the failure-context slice at a fixed byte budget (2,000 chars works) so a catastrophically huge page does not destroy the turn.
Cap the full snapshot at a higher hard limit (Assrt uses 120,000 chars) with a visible truncation note, not silent loss.
Use role-based accessibility names in your scenario prose ('the Sign in button'), not CSS selectors, because roles survive DOM reshuffles and piggyback on a11y work you already did.
Keep the scenario file on disk in markdown, so 'what healed' is a git diff, not a database mutation behind a vendor UI.
Capture three artifacts on every failed step, not one: error message, accessibility tree slice, and post-step screenshot. Triage without reruns.
Never persist an AI-proposed locator rewrite without a human review lane. If you can't see the new selector in a pull request, you don't own the test.

The taxonomy of self-healing

Three patterns cover almost every implementation on the market. Knowing which one your tooling uses is how you predict whether a year from now you are writing tests or auditing a ledger of AI-approved selectors you never reviewed.

Pattern A: Stored locator, patch on failure

What every vendor ships. One CSS selector per element, an XPath backup, an LLM rewrite fallback. Each heal writes a new locator row. Your test file silently gains approved selectors you never reviewed. This is the dominant SERP advice because it sells the vendor.

Pattern B: Regenerate locator per run

A step better. Throw away the cached selector, query the page by role+name every time. No heal ledger because nothing was stored. Still brittle when the role+name lookup misses, because the retry has no access to the current page.

Pattern C: Ephemeral refs plus inline context

What Assrt does. The accessibility tree is re-snapshotted before every action, refs like e5 are valid only for that snapshot, and the post-failure tree is inlined into the retry's tool_result. No locator to drift, no separate heal step, no audit queue. The retry sees the new page in the same turn.

A real heal in stdout

This is what the heal looks like when a click fails mid-scenario, copied from a local run. Three extra lines of log, one extra snapshot call, no database write.

self-healing in stdout

0chars of aria tree inlined on failure (agent.ts:960)

0hard cap on full snapshot size (browser.ts:523)

0lines of Node that replace a whole heal pipeline

0locator variants persisted per heal

Why these practices win over the long haul

Self-healing tests are not evaluated by how often they heal. They are evaluated by how often, twelve months in, the team can still tell whether a failure means the app broke or the test drifted. That becomes impossible when your test file has accumulated 400 AI-approved locator rewrites that nobody reviewed. It stays easy when the only artifact your test ever produces is a fresh accessibility snapshot that lasts one action.

Triage speed

Failure + page state in one message

When a step fails, the tool_result already contains the current aria tree. A human reviewer reads the error and the post-failure state on the same line, without re-running the test.

No locator ledger

Nothing silent ever gets approved

No AI-proposed selector rewrite is persisted, so there is nothing for an overloaded team to skim-approve. If the test worked, it worked against the current page.

Cost envelope

Bounded turn size, every retry

2,000 chars on failure, 120,000 on full snapshot. Both pinned in source. Catastrophic pages cannot silently blow the context window.

Ownership

Your tests stay yours

The scenario is a markdown file in your repo. The heal behavior is open-source TypeScript. Neither one is tied to a vendor dashboard, a session token, or a subscription renewal.

Row by row, best practice by best practice

Every row below names one architectural decision. Yours to evaluate on any tool you already use, not just on Assrt.

Feature	Typical AI self-heal	Assrt
Where the post-failure page state lives	In a separate 'get current DOM' call the agent has to request	In the same tool_result string as the error (one message)
What gets stored when a heal happens	A new locator variant row in the vendor DB	Nothing. The fresh ref is used once and forgotten.
Reviewability of AI-driven fixes	Dashboard 'approvals' queue, typically skipped at scale	No fix is persisted, so there's nothing to review
Byte budget on failure context	Usually full HTML, sometimes tens of thousands of tokens	2,000 chars, pinned in source (agent.ts:960)
Byte budget on full snapshot	Unbounded, depends on page	120,000 chars, pinned in source (browser.ts:523)
What the scenario file looks like	Proprietary YAML export or database record	Plain markdown #Case blocks in your repo
Cost of running the suite	~$7.5K/mo typical seat pricing	$0 framework cost plus your LLM tokens
Vendor lock-in on 'heal' history	Full lock-in; your heal audit trail is theirs	Zero. Fresh snapshots per action; no history to keep.

Pricing comparisons reference publicly listed seat-tier plans for commercial self-healing tools. Your contract may vary.

Five-step adoption path

You do not need to rip out your test stack to get most of the benefit. The first two steps alone retire the majority of flake on a typical suite. Steps three through five are where the compounding returns live.

Audit your current failure payload

Open the code your tests run in a catch block. Read what gets returned to the next retry. If it's just the error string, you're leaking the most valuable information you had: the current page. Every item below compounds on this fix.

Inline a page-state slice into the failure message

Whatever your runner, when a locator fails, take an accessibility snapshot (Playwright: `page.ariaSnapshot()`), slice it, and concatenate it into the error your retry handler returns. 2,000 chars is a good starting budget; measure and tune.

Delete your locator ledger if you have one

If your testing tool persists AI-proposed selector rewrites, export them once and stop approving new ones. Move your tests to role-based queries regenerated per run. The locator ledger is technical debt dressed as a feature.

Cap your snapshot byte budget explicitly

Pick a hard limit (Assrt's is 120,000 chars). When a page exceeds it, truncate at a clean line break and append a note telling the retry loop that the tree was cut. Silent truncation is worse than a visible one.

Move scenarios out of vendor UIs and into markdown

Scenarios in a repo are reviewable in pull requests. Scenarios in a vendor DB are reviewable in a vendor UI. The former composes with every other dev tool your team owns; the latter ages into a liability every time the contract is renegotiated.

Make your self-heals message-shaped, not pipeline-shaped

Book 20 minutes with the Assrt team and walk through the catch block, the byte budgets, and the scenario format against your current suite.

Questions teams ask about these best practices

What is the most overlooked best practice for self-healing tests?

Inline the page state into the same tool_result string as the error message. Every top SERP article describes what to heal (the selector) and when to heal (on failure), but none specifies where the post-failure page state reaches the retry. Most frameworks return the error, then expect a second round-trip: the agent sees `ElementNotFound`, asks for the DOM, then tries again. Assrt's catch block at `assrt-mcp/src/core/agent.ts` lines 955-963 returns a single string that concatenates the error, the current accessibility tree sliced to 2,000 chars, and the instruction `Please call snapshot and try a different approach.` The model's next turn sees the failure cause and the updated page in the same message. That one decision removes the class of flaky tests where healing times out because the heal loop spent its tokens asking `what's on screen?`.

Why 2,000 characters and not the full page?

2,000 chars is the failure quick-context slice, sized to give the model enough of the accessibility tree to re-orient without blowing the turn's token budget. On normal turns the agent can call `snapshot` and get the tree up to `SNAPSHOT_MAX_CHARS = 120_000` (pinned at `assrt-mcp/src/core/browser.ts` line 523). When the page's tree exceeds 120k, the snapshot is truncated at a clean line break and a note is appended: `[Snapshot truncated: showing 120k of <N>k chars. Use element refs visible above to interact.]`. The hard cap prevents a single turn from eating the context window on pages like Wikipedia, and the appended note keeps the model oriented instead of confused by a silent truncation.

What's the difference between healing a selector and regenerating the reference?

Patching a selector assumes the stored locator has one correct answer and the tool needs to find it again. Most commercial self-healers run a fallback cascade: the original CSS, then an XPath, then an AI-proposed rewrite, and finally human review. Each stage persists a new 'locator record' so the test file gains another selector variant over time. Assrt never stores a locator at all. Every action starts from a fresh `browser_snapshot` call that returns an accessibility tree with ephemeral refs like `e5` and `e42` (see the system prompt excerpt in `agent.ts` lines 207-218). If a ref fails, the tree is re-fetched and a new ref is picked. There is no locator file, no heal queue, no audit log of AI rewrites. The healing happens because the input to the next action is the current page, not the previous page.

Doesn't that make every test slower because it snapshots before every action?

Slightly, but the math flips the moment a selector goes stale. A cached CSS selector costs one lookup on the happy path and a full heal cycle on the unhappy path; a heal cycle includes timeouts, multiple fallback queries, and often an LLM roundtrip to propose a rewrite. An accessibility-tree snapshot is a single Playwright `aria_snapshot()` call, typically 30-200ms. You pay the snapshot cost on every action, but you never pay the heal cost, and you never accumulate silently-approved AI selector rewrites in your test file. On a 20-step scenario against a fast dev server, the whole run usually lands between 10 and 30 seconds.

How do I apply these best practices if I'm staying on Playwright TypeScript specs?

Two moves port cleanly. First, prefer Playwright's role-based locators (`page.getByRole('button', { name: 'Submit' })`) over CSS or XPath, because they resolve against the same accessibility tree that Assrt uses and survive DOM reshuffles. Second, in your retry wrapper, capture the current page's aria snapshot into the retry log right next to the error, so whoever triages the flake has the post-failure state in hand without re-running. You won't get Assrt's full inline-retry flow inside a vanilla spec file, but you'll remove the two biggest causes of flake: brittle selectors and error messages with no page context.

Isn't 'self-healing' a marketing term? What's the architectural definition I should evaluate vendors against?

Yes, it's mostly marketing. The useful architectural definition is: on action failure, does the test framework feed the post-failure page state back into the decider, or does it persist a new locator variant and move on? Those are two very different products. The first has no stored state to drift. The second quietly grows a ledger of AI-approved selector rewrites that your team never reviews, which becomes the next year's debugging nightmare. When you evaluate vendors, ask them to show you the file on disk after a heal happens. If there's a new XPath row, that's pattern two. If there's nothing new on disk, that's pattern one.

What's the one practice you'd keep if you could only keep one?

Keep failure context in the same turn as the failure. Whether you're using Assrt, writing Playwright specs, or building your own agent loop, this is the cheapest durable improvement. In Assrt it's a four-line addition to a catch block; in Playwright it's `test.info().attach('aria-snapshot', { body: await page.ariaSnapshot() })` inside your retry handler; in a custom agent loop it's one concatenation. Everything else in this guide compounds on top of this one move: stable locators matter more, review workflows matter more, vendor lock-in matters less, because your retries are already shorter and your triage is already faster.

Where does the 'open source, self-hosted, $0' comparison actually matter for self-healing?

It matters because every self-healing product has to store something to heal. Vendor tools store the locator ledger, the heal history, and the approvals in their cloud. The test you wrote yesterday and the heal that happened today live in a shared DB behind their API. When that API is down, your tests can't run; when your subscription lapses, your heal history is gone; when you want to diff two healed versions, you open the UI. Assrt's storage for healing is zero: the scenario lives in `scenario.md` in your repo, the healing happens live from the current page, and there is no server involved in the decision. The cost argument (~$7.5K/mo avoided) is real, but the architectural argument (no stored heal state to audit or retrieve) is why these best practices hold across years.

How does Assrt pick which element to click when the accessibility tree has twenty buttons?

The scenario text names the element by its accessible name: `Click the Sign in button`. The agent's system prompt (in `agent.ts` around line 213) tells it to search the snapshot for a role+name match and pass the ref ID to the `click` tool. The `click` tool signature (`agent.ts` line 33-50) accepts both a human description and an exact ref; the ref is preferred because it's a direct pointer into the snapshot. When two buttons have the same accessible name, the prompt instructs the agent to disambiguate using surrounding text or to call `snapshot` with more context. Because the accessible name is what a screen reader would read, the matching logic piggybacks on work the developers already did for accessibility.

What happens when the accessibility tree is genuinely empty, like a Canvas-rendered app?

The agent falls back to `screenshot` + vision reasoning: the tool suite exposes `screenshot` as a separate tool, and the planner prompt lets the model reason over the image when `snapshot` returns nothing useful. Coverage for Canvas-only apps will always be weaker than for accessibility-compliant HTML, but the ceiling is the same as any other browser agent: you're bound by whether the element is perceivable, not by a stored locator. The structural recommendation from this best-practices guide still holds: don't persist a locator you can't regenerate from the current page.