E2E test tools, sorted by the one thing every roundup skips

Every comparison of e2e test tools ranks Playwright, Cypress, and Selenium on speed, browser support, and language bindings. Those are interchangeable. The choice that actually sets your maintenance bill is simpler and almost never discussed: how does the tool find the Login button on the page?

Matthew Diakonov, Written with AI

Published June 16, 20268 min read

See how Assrt generates Playwright

Direct answer · verified 2026-06-16

The e2e test tools in 2026 fall into three families. Coded frameworks (Playwright, Cypress, Selenium, WebdriverIO, Puppeteer) where you write and maintain the selectors. Managed services (QA Wolf, Mabl, Testim, testRigor, Momentic) where a vendor stores them for a price that scales. And AI agent layers (Assrt) that store no selector at all and write out standard Playwright you keep. Pick by element-location strategy, not by the feature matrix.

The axis nobody graphs: where the selector lives

A test fails for a short list of reasons. In day-to-day work the most common one is not a real bug; it is that the element the test was pinned to moved, got renamed, or picked up an extra wrapper. So the property that predicts your maintenance load is not how fast a tool runs. It is what the tool remembers between runs.

A coded framework remembers a selector you typed. A managed service remembers a selector it recorded. Both inherit the same failure mode: when the page shifts, the remembered thing stops matching and someone has to repair it. A tool that re-reads the page on every run remembers nothing about the DOM, so it has nothing to repair. That is the whole ballgame, and it is the line the three families fall along.

1. Coded frameworks

Playwright, Cypress, Selenium, WebdriverIO, Puppeteer. You write the test in code and pin every interaction to a CSS or XPath selector. Free, open source, total control. The catch: the selector is yours forever. When a designer renames a class or wraps a button in one more div, the locator stops matching and the test fails until a human rewrites it.

2. Managed and record-and-replay services

QA Wolf, Mabl, Testim, testRigor, Momentic. You record a flow or describe it, and the tool stores a locator (or a proprietary scenario format) in its own database. Maintenance moves off your desk, but so does ownership: the tests live in a vendor account, and the bill scales with test count. QA Wolf is reported to start around $8,000/month.

3. AI agents that generate framework code

Assrt. An LLM agent reads the live page, acts on it, and writes out standard Playwright. No selector is stored anywhere, so there is nothing to rename when the DOM moves. The output is a file you commit, run in your own CI, and keep if you ever walk away.

families, set by where the selector lives

$0/mo

reported start price for the managed family (QA Wolf)

tools the Assrt agent uses, none of them a stored selector

What the third family actually does, step by step

Here is the loop Assrt's agent runs, taken from the Selector Strategy and Error Recovery sections of assrt-mcp/src/core/agent.ts (lines 215 to 227). The thing to watch is what it does on failure: there is no fallback locator, only another look at the page.

The locate-act-recheck loop

01 / 05

snapshot

The agent calls snapshot first and gets the accessibility tree, with a fresh ref on every actionable element (for example ref="e5"). No CSS, no XPath.

What you write, and what comes back

You do not write selectors. You write the flow in plain steps, in the #Case format:

#Case 1: Sign in and reach the dashboard
- Click the Sign in button
- Type test@example.com into the email field
- Type password123 into the password field
- Click Continue
- Verify the text "Dashboard" is visible

The agent resolves each step against the live page, runs it, asserts the result, and records a pass or fail with a video of the run. The artifact you keep is a standard Playwright file plus the run output, all on your own disk. Run it with npx @m13v/assrt discover <your-app-url>.

The part most e2e test tools quietly skip: real login and OTP

Most suites never test the actual sign-up, because driving a real email and a one-time code is annoying, so they stub auth or inject a session. Assrt's agent has two tools for it: create_temp_email gets a disposable inbox and wait_for_verification_code reads the OTP back. The split-digit OTP input that breaks naive type-into-each-box scripts is handled with one fixed clipboard paste, hard-coded verbatim in the agent prompt at agent.ts:237:

// assrt-mcp/src/core/agent.ts:237 (verbatim, agent prompt)
() => {
  const inp = document.querySelector('input[maxlength="1"]');
  if (!inp) return 'no otp input found';
  const c = inp.parentElement;
  const dt = new DataTransfer();
  dt.setData('text/plain', 'CODE_HERE');
  c.dispatchEvent(new ClipboardEvent('paste', {
    clipboardData: dt, bubbles: true, cancelable: true
  }));
  return 'pasted ' + document.querySelectorAll('input[maxlength="1"]').length + ' fields';
}

This is the kind of detail that separates a tool you can point at a real app from one that only works on the parts behind no gate. It is not on anyone's feature matrix, and it is the reason a flow can be tested end to end instead of stubbed.

Selector-bound tools vs. an agent that stores none

The same shortlist, graded on the axis that predicts maintenance: what the tool remembers between runs.

Feature	Selector-bound tools	Assrt
How it finds an element	A CSS/XPath selector you write, or one a vendor records	A ref resolved from the live accessibility tree, every run
Who fixes the test when the DOM changes	You (coded) or the vendor's team (managed)	Nobody: there is no stored locator to repair
What you can read after a run	A dashboard, or a stack trace pointing at a dead selector	A standard Playwright file, a video, screenshots, events.json
Where the tests live	Your repo (coded) or a vendor account (managed)	Your repo: standard Playwright you commit and run in your CI
Cost	Free (coded) up to a reported ~$8,000/mo (managed)	Free and open source
What happens if you leave	Coded tests stay; managed tests stay in the vendor account	You keep the Playwright code: zero vendor lock-in

Coded frameworks like Playwright are the right call when you want full control and have the engineering time to maintain selectors. Managed services are right when you have budget and want maintenance off your team entirely. Assrt fits when you want the tests written for you but still want to own and run standard Playwright in your own CI.

When each family is the right answer

None of the three is wrong. A small team with one critical flow and plenty of engineering taste should probably just write Playwright by hand; the maintenance is real but bounded. Playwright best practices will take you a long way. A larger org with a six-figure QA budget and hundreds of flows may genuinely prefer to pay a managed service to own the maintenance, lock-in and all.

The agent family earns its place in the middle: you have more flows than you have time to hand-write, you cannot or will not spend thousands a month, and you are not willing to trade ownership of your tests for that convenience. You want the first draft written and the re-resolution handled, but you want the output to be code you keep. If that is you, read more on how the no-stored-selector approach heals.

Not sure which family fits your app?

Walk through your current e2e setup and where the maintenance is actually going. We will tell you honestly whether an agent layer helps or whether hand-written Playwright is fine.

E2E test tools: common questions

What are the main e2e test tools in 2026?

Three families. Open-source coded frameworks: Playwright, Cypress, Selenium, WebdriverIO, and Puppeteer. Managed and record-and-replay services: QA Wolf, Mabl, Testim, testRigor, and Momentic. And AI agent layers that generate framework code, where Assrt sits. The first family is free and gives you total control at the price of writing and maintaining every selector. The second outsources maintenance but stores your tests in a vendor account and charges by test count. The third reads the live page and writes standard Playwright you keep.

Which e2e test tool should I pick?

Pick by how the tool locates elements, because that decides who pays the maintenance bill. If you want full control and have the engineering time to fix selectors, a coded framework like Playwright is the default. If you have budget and want maintenance off your plate, a managed service does that for a price that scales with test count. If you want the tests written for you but still want to own and run standard Playwright code in your own CI, an AI agent layer like Assrt fits: it generates the framework code rather than a proprietary format.

Why does 'how a tool finds elements' matter more than the feature matrix?

Because the feature matrix is interchangeable. Every serious tool already does parallel browsers, headless mode, screenshots, retries, and CI integration. What is not interchangeable is what breaks your suite. A test fails for one of a few reasons, and the most common one in practice is that the element it was pinned to moved or got renamed. A tool that stores a selector inherits that failure mode. A tool that re-resolves the page on every run does not have a stored selector to break.

How does Assrt locate elements without a stored selector?

Its agent calls a snapshot tool first to get the accessibility tree, where each actionable element carries a ref like ref="e5". It acts on that ref, then snapshots again after every action to see the updated page. If a ref goes stale, the recovery is another snapshot, not a fallback locator from a database. This is spelled out in the Selector Strategy and Error Recovery sections of assrt-mcp/src/core/agent.ts (lines 215 to 227). The agent exposes only ten tools: navigate, snapshot, click, type_text, scroll, evaluate, create_temp_email, wait_for_verification_code, assert, and complete_scenario.

Do AI test tools lock me in like managed services do?

It depends on the output. Managed record-and-replay services and tools built on a proprietary YAML or scenario format keep the test inside their system, so leaving means rebuilding. Assrt's output is a standard Playwright file. You commit it, run it in any CI, and if you stop using Assrt the tests keep running because they are plain Playwright. That is the practical meaning of zero vendor lock-in: your exit cost is whatever it costs to keep running code you already own.

Can an e2e test tool get through login and OTP flows?

Most coded suites stub auth or inject a session because driving a real login with a one-time code is fiddly. Assrt's agent has two tools for exactly this: create_temp_email gets a disposable inbox, and wait_for_verification_code reads the OTP back. For the common split-digit OTP input it pastes all digits at once with a fixed clipboard-event snippet hard-coded at agent.ts:237, rather than typing into each single-character field, which often misfires. So flows behind a sign-up gate are testable end to end, not skipped.

Is Assrt a replacement for Playwright?

No, it is a layer on top of it. Assrt wraps @playwright/mcp and runs each test case through an LLM agent (Claude Haiku by default) that drives the browser with Playwright's own tools. The output is Playwright. Think of it as the part that writes and maintains the tests, while Playwright is still the engine that runs them. If you already love writing Playwright by hand, Assrt is the thing that handles the tedious first draft and the re-resolution when the UI shifts.

Keep reading

Guide

Self-healing test tools: the two families

Locator-repair tools store an XPath and swap a fallback. Others store nothing. A mechanism-level breakdown.

Read

Playwright

Playwright e2e test best practices

What to do once you have committed to Playwright as the engine under your suite.

Read

Guide

AI-generated Playwright e2e tests

What the generated code looks like, and whether you would actually ship it to main.

Read

E2E test tools, sorted by the one thing every roundup skips

The axis nobody graphs: where the selector lives

1. Coded frameworks

2. Managed and record-and-replay services

3. AI agents that generate framework code

What the third family actually does, step by step

The locate-act-recheck loop

snapshot

What you write, and what comes back

The part most e2e test tools quietly skip: real login and OTP

Selector-bound tools vs. an agent that stores none

When each family is the right answer

Not sure which family fits your app?

E2E test tools: common questions

Keep reading

Self-healing test tools: the two families

Playwright e2e test best practices

AI-generated Playwright e2e tests

Comments (••)

Comments ()