Tool Comparison, Built From Source

Best self-healing tests tools in 2026: the third approach nobody writes about is 25 lines of MIT code.

Every roundup I read split the market into two buckets: locator fallback (Testim-style, stored alternatives) and intent-based resolution (Mabl-style, ML over past failures). The fastest-healing tool in my tests uses neither. It stores no locators at all. On every failure it re-reads the live accessibility tree, hands it to an LLM with a JPEG of the page, and lets the model pick a new ref. You can read the entire implementation in one file.

M
Matthew Diakonov
11 min read
4.9from teams running self-healing tests on their own laptops
Healing mechanism is 25 lines of MIT TypeScript you can read in one sitting.
No stored locators, no ML model, no cloud service. Fresh accessibility tree on every failure.
Emits real Playwright code. Not a vendor YAML schema you have to migrate off.

The two healing approaches every roundup describes

Read any of the ranked lists you get when you search “best self-healing tests tools” and you will see the same two buckets. Locator fallback tools (Testim, Reflect, some Mabl modes) store a primary selector plus a handful of alternatives and walk the list when the primary fails. Intent-based tools (Functionize, KaneAI, Shiplight) store a semantic description of the element and run an ML pass to map intent to the current DOM at run time.

Both approaches carry state from the last successful run into the next one. Both also quote the same ceiling industry price tag, roughly $7.5K per month for enterprise seats. The reason is that the stored state has to live somewhere the vendor can update, which means a server, a dashboard, a login.

The three approaches, side by side

What each approach stores, where it stores it, and what it costs you when the app changes.

FeatureFallback / Intent (Testim, Mabl, KaneAI)Live-snapshot (Assrt)
What is stored per elementA primary selector plus N alternatives, or a trained embedding.Nothing. The ref is picked fresh each call.
Where healed state livesA vendor database behind a login.Nowhere. The LLM sees the tree, decides, moves on.
What happens when the DOM is rewrittenFallbacks may exhaust; ML model may retrain off-cycle.LLM reads the new tree and picks by accessible name.
Healing code, in linesOpaque. Lives server-side, not published.~25 lines of MIT TypeScript, readable in one file.
Scenario formatProprietary JSON/YAML on vendor servers.#Case N: plain text in your repo.
Monthly cost (enterprise)~$7,500/mo is the commonly quoted enterprise tier.$0 agent + LLM tokens (~$0.01-0.05/scenario).
What happens if the vendor disappearsScenarios and healed state disappear with the account.Nothing. The scenarios and healing code are in your repo.

The live-snapshot approach, in one file

This is the full healing implementation in the open source assrt repo. One try/catch, one snapshot call, one slice, one screenshot attached to the tool_result. That is the whole thing. There is no separate healing service, no retry table, no stored fallback selectors on disk. When a click fails, the page's accessibility tree is re-read, the first 2000 characters are handed to Claude along with a base64 JPEG, and the system prompt tells the model to pick a new ref.

assrt/src/core/agent.ts (lines 928-955, MIT)

The two details that make it work are the 2000-character slice, which keeps the tree inside a reasonable context budget without dropping ref IDs, and the JPEG. Sending a picture of the page as a second content block in the tool_result lets the LLM resolve cases the accessibility tree alone cannot, like modals that appeared mid-action or a button that visually exists but has no accessible name yet.

How the failure round-trip looks to the model

Every failed action becomes a structured conversation turn. The agent emits a tool call, the browser attempts it, the browser fails, the agent re-reads the tree, and the model gets the tree plus a screenshot back as context. The next tool call is the model's new attempt. Nothing in the test file changes. Nothing on disk gets written. The healing is a single conversation step, not a persistent database row.

One failed click, start to finish

Scenario fileAssrt agentPlaywright MCPAnthropic LLMClick 'Sign up' (#Case 1, step 2)click(ref='e17')Error: ref e17 is stalesnapshot() (fresh tree)accessibility tree (2000 chars)screenshot() (jpeg base64)page screenshottool_result: error + tree + screenshotclick(ref='e24') # new sign-up buttonclick(ref='e24')ok

Healing selectors is only half the problem

Public data from QA Wolf's 2026 failure taxonomy shows that DOM changes and brittle selectors account for only about 28% of test failures. The other 72% split between async timing, runtime errors, test data, visual assertions, and interaction changes. A self-healing suite that only heals selectors will still flake on everything else.

Assrt's second half is a DOM mutation observer exposed as a wait_for_stable tool. It injects an observer into the page, counts mutations, and blocks until the page has been quiet for N seconds. That one tool removes most of the async-timing flakiness that a selector heal cannot touch.

assrt/src/core/agent.ts (lines 872-924, MIT)

What moves through the agent on a run

The plumbing is simpler than it sounds: plain scenario text in, live accessibility tree and screenshots streaming through the agent, Playwright code and a disk report out. If any of those inputs or outputs lives in a vendor service instead of on your machine, healing is the wrong thing to evaluate. Lock-in is.

Healing on a laptop, nothing in the cloud

tests/smoke.txt
Your app under test
Anthropic API key
Assrt agent (MIT)
Playwright code
report.json
recording.webm

The scenario file that feeds the heal

A scenario file for a live-snapshot tool has no locators in it. The scenarios describe what a user does in their own words. The agent decides which ref to click by reading the tree on the spot. This is why the scenarios survive major UI rewrites: there is nothing about the old DOM stored in them in the first place.

tests/smoke.txt

What one run looks like when a heal triggers

Nothing special, which is the point. A failed click shows up in the log as a re-snapshot and a retry with a different ref. The scenario passes. The scenario file on disk is unchanged. No “healed selectors” dashboard row is written. The next run does the same fresh read all over again.

npx @assrt-ai/assrt run --plan-file tests/smoke.txt

How to evaluate any self-healing tests tool

Treat the marketing page as noise and go straight to four questions. If a tool is unwilling to answer them, that is the answer. These are the ones I asked when building my own shortlist.

Where does healed state live?

If it lives on the vendor's servers, you are renting your test maintenance. If it lives in your repo (or nowhere, as with live-snapshot), you own it.

Can I read the healing code?

If the answer is a whitepaper instead of a file path, the tool is a black box. Assrt's healing is 25 lines of MIT TypeScript at agent.ts:928-955.

What's the scenario format?

Proprietary JSON/YAML is lock-in dressed as a feature. Plain text with #Case N: headers is portable to any runner in any language.

Does it emit real Playwright code?

If the only artifact is a vendor dashboard, the tool is a product, not a runtime. Real .spec.ts files on disk let you fall back to plain Playwright the day the vendor breaks.

What runs on my laptop vs the vendor cloud?

Local browser + local scenario file + direct LLM call is zero lock-in. A hosted runner is fine as an option, a forced dependency is not.

What's covered beyond selectors?

72% of real test failures are not selector staleness. A DOM mutation observer (like Assrt's wait_for_stable) addresses async timing, which selector-only healing ignores.

The four checks, in the order I ran them

Evaluating a self-healing tests tool in 2026 is a one-hour job if you know where to look. This is the path that cut through the SERP noise quickest for me.

1

Open the healing source file

If it's closed source, the tool is opaque by design. With Assrt, the file is /Users/.../assrt/src/core/agent.ts and the catch block is 25 lines. You can paste it into ChatGPT and audit it before you even install the CLI.

2

Look at the scenario format

Open one scenario in a text editor. If it is human-readable English with #Case headers, the agent re-reads the page fresh and the scenario is portable. If it is a JSON schema of stored locators, the tool is running locator fallback, and you are renting that state.

3

Trigger a breaking DOM change and watch the log

Rename a CSS class or restructure a button group in your staging build. The good tools surface healing as a line in the run log (ref stale, re-snapshot, new ref picked). The opaque tools surface a green checkmark and never tell you how.

4

Export and run the Playwright code without the tool

Ask the tool to hand you the Playwright spec for a passing scenario. Run it outside the vendor. If it runs, the output is real code. If the tool refuses or hands you a runner-locked file, the artifact is not yours.

The mental model

The best self-healing tests tools in 2026 are the ones that store the least state outside your repo, not the ones with the most complicated ML behind a login.

A healing mechanism you can read in one file is not a missing feature, it is the feature. No healed selectors database, no vendor dashboard, no proprietary scenario schema. Just a catch block, a fresh tree, and a screenshot handed back to the LLM every time something breaks.

The numbers that make the tradeoff explicit

The contrast between live-snapshot and the paid alternatives is mostly visible in the smallness of the open source option. Smaller code, zero monthly cost, two artifacts on your disk per run.

0
lines in the MIT healing mechanism
0
character slice of the a11y tree per heal
0%
of real test failures are selector staleness (QA Wolf)
$0
per month for the agent itself

The stack a live-snapshot tool uses, licenses included

“Open source self-healing” only means something if every layer is actually OSI licensed. The live-snapshot approach rides on four OSI-licensed pieces and a commodity LLM API. No proprietary healing database anywhere in the stack.

assrt test agent, MIThealing catch block, MIT (25 lines)wait_for_stable observer, MITTestReport types, MITPlaywright, Apache-2.0@playwright/mcp, Apache-2.0Model Context Protocol SDK, MITjq for CI gating, MITyour LLM key, commodity

When locator fallback and intent-based resolution are still the right pick

Live-snapshot healing is not a free lunch. It requires an LLM round trip on every failure, which is fine for a nightly suite and expensive for a 10,000-scenario CI job that runs on every commit. If your suite is enormous and mostly stable, locator fallback (cached alternatives, zero LLM calls on the happy path) is faster and cheaper per run.

Intent-based resolution still wins when you need human-level judgment about visually similar buttons (two “Save” buttons in different contexts) and you have a big enough corpus of past failures to train the resolver on. The tradeoff is that you are renting that corpus from the vendor. Live-snapshot sidesteps the corpus by letting the LLM read the whole page every time, which costs tokens but avoids training-data decay.

Want to see the 25-line healing mechanism on your own app?

20 minutes, a screenshare, and we will watch your suite heal a real DOM change live. Bring a staging URL and a scenario that used to flake.

Book a call

Frequently asked questions

What is the actual difference between locator fallback, intent-based resolution, and the live-snapshot approach?

Locator fallback stores a primary selector plus N alternatives; on failure, it walks the list. Intent-based resolution stores a semantic intent (e.g. 'the primary sign-up button') and a model that maps intent to the current DOM. The live-snapshot approach Assrt uses stores neither. Every action reads a fresh accessibility tree from Playwright MCP, picks a ref by name/role, and on failure the tree is re-read and the LLM picks again. The difference is that locator fallback and intent-based both carry state from the last successful run; live-snapshot carries nothing except the scenario text.

Which self-healing tests tool is actually the best one in 2026?

If you care about lock-in, the best is whichever one leaves your tests on your disk in a format you can run without the vendor. Most paid tools (Testim, Mabl, KaneAI, Functionize) store scenarios and healed selectors server-side, so when the account ends the tests end. The MIT-licensed option is Assrt, which runs on your machine, writes a #Case N: text file to your repo, and emits real Playwright code. Assrt's healing is the live-snapshot approach documented in agent.ts:928-955 of the open source assrt repo.

How many lines of code does the healing mechanism actually take?

In Assrt, about 25 lines in a single try/catch block at /Users/.../assrt/src/core/agent.ts:928-955. The catch captures the error message, calls this.browser.snapshot() to fetch the live accessibility tree, slices it to 2000 chars to stay inside the LLM's context budget, and appends a base64 JPEG screenshot as a second block in the tool_result. No retry loop, no stored fallback list, no separate healing service. Testim, Mabl, and KaneAI do not publish the line count for their healing because the logic lives server-side behind a login.

Why is the accessibility tree better than CSS selectors for healing?

Because it is the browser's semantic view of the page, not the developer's markup choice. A button has role='button' and an accessible name no matter what className Tailwind generated for it this deploy. When a designer rearranges the DOM or a build pipeline changes hashed class names, the accessibility tree is still recognizable to a human or an LLM reading it. CSS selectors encode how the page is built; accessibility refs encode what the page is for.

Does live-snapshot healing slow the test down?

Yes, slightly. Each failed action spends roughly 200-600ms grabbing a new accessibility tree and screenshot, then the LLM takes another 500-1500ms to decide what to try next. For a suite with 0 failures that extra round trip never happens. For a suite with 3 healing events in a 20-scenario run, the total wall-clock cost is about 2-4 seconds, far less than the 5-30 minutes a human spends re-recording a broken Testim or Mabl test.

What about timing, not selector, failures? Is a fresh snapshot enough?

No. Assrt pairs the snapshot healing with a wait_for_stable tool (agent.ts:872-924) that injects a MutationObserver into the page, counts DOM mutations, and waits for N seconds of quiet before continuing. A suite that only heals selectors but not async timing will still flake. The DOM mutation observer approach is what turns most real-world flakiness (async content, streaming AI responses, lazy-loaded components) into stable assertions.

Can I run self-healing tests without a cloud vendor?

Yes. Self-healing is not a cloud property. Assrt runs entirely on your machine: chromium via Playwright MCP locally, the accessibility tree read locally, the LLM call goes straight to Anthropic from your laptop, and the test report writes to /tmp/assrt/results/latest.json. No scenario data, no screenshots, and no healed locators are uploaded anywhere. The closest thing to a vendor dependency is the LLM API key, which you own.

How much do the paid self-healing tools cost compared to the open source option?

The commonly quoted enterprise ceiling for platforms like Functionize and Testim is around $90K/year ($7.5K/mo) before add-ons. Mabl and AccelQ sit in the same order of magnitude. Assrt is MIT licensed and costs $0 for the test agent, plus LLM tokens (roughly $0.01-$0.05 per scenario at Sonnet 4.6 prices). The reason the paid tools cost that much is that healing state lives on their servers, not because healing itself is expensive to compute.

Which tool emits real Playwright code vs a proprietary YAML or JSON scenario format?

Assrt emits a plain .txt file of #Case N: scenarios and generates real Playwright code on demand through its MCP server. Testim, Mabl, and most enterprise tools keep scenarios in a proprietary schema that only their runner can read; exporting to Playwright is usually lossy or unsupported. If the tool you are evaluating cannot hand you a Playwright file that runs anywhere Node runs, it is a vendor format dressed up as a feature.

What happens to a test if the app's DOM changes completely between runs?

With locator fallback, the test fails because all N stored selectors are stale. With intent-based resolution, the test probably survives if the ML model was trained on that app's UI patterns. With live-snapshot, the test survives because the LLM sees both the new accessibility tree and a screenshot of what the page looks like right now, and picks a ref by the user-visible name ('Sign up'), not by the old markup. The scenario text describes intent; the tree describes reality; the LLM connects them.