A ranking on one axis nobody else ranks on

The best e2e testing tools, ranked by time to first green check. Not feature count.

Every other guide for this topic compares the same fifteen tools on the same feature grid: parallel browsers, auto-waiting, retries, price per seat. That ranking is fine for a team already five hundred scenarios in. It misses the one number that decides whether a brand new suite survives the first week: how many minutes pass between the install command and the first test that actually passed. Ranked on that axis, the shortlist re-shuffles and a new category shows up.

Matthew Diakonov, Assrt maintainer

Published April 23, 202613 min read

4.9from teams that got to a green check in under ten minutes

assrt_plan takes exactly one required parameter: url

Runtime under the hood is real Playwright MCP over stdio

MIT licensed, local by default, tests stored as plain Markdown

1 field

“The assrt_plan tool schema has one required parameter. url: z.string(). That is the entire authoring surface. Every other tool on the shortlist requires more than that before a single case can run.”

assrt-mcp/src/mcp/server.ts:768-774

The whole page, one sentence

Rank e2e testing tools by minutes to first passing test, not by count of features.

Features look good in a comparison grid and are mostly interchangeable across the top ten entries. The number that decides whether a new suite survives the week is the latency between install and the first green check. This article writes that latency down, explicitly, per tool.

Try the one-field option

Best e2e testing tools, ranked by install-to-first-test latency

One axis nobody else ranks on

Every guide compares the same fifteen tools

On the same feature grid: parallel browsers, auto-wait, retries

None measure minutes to the first green check

That is the gap where brand new suites actually die

assrt_plan takes one parameter. url. That is the whole input.

0:00 / 0:05

The shortlist everyone is actually comparing

This is the candidate set that appears in almost every guide for this topic in 2026. Stylistic differences aside, the set is closed. The interesting move is not to rearrange the set but to add an axis that separates it cleanly.

PlaywrightCypressSeleniumWebdriverIOTestCafeNightwatchPuppeteerRobot FrameworkMabltestRigorApplitoolsKatalonBrowserStackLambdaTestAssrt

All fifteen do the core job: they kick off a browser, run assertions, report pass or fail. The divergence is upstream of execution, in how much effort the first test costs you.

The anchor: the plan tool schema is one field

Before the ranking, the claim that powers it. If a tool wants to appear at the top of an install-to-first-test ranking, its authoring input has to be small. Here is the entire schema for the Assrt plan tool, pulled verbatim from the MCP server source.

assrt-mcp/src/mcp/server.ts:768-774

One required string. One optional model override. That is the full surface area a coding agent has to reason about before producing executable Cases. Every other authoring path in the category asks for more than that up front: a spec file, a recording, a configured fixture, or a project-level YAML. This one asks for a URL.

What happens under the hood

The plan call is not a shortcut around a browser. It is a real browser, driven by real Playwright, with the model doing the scouting so you do not have to. Four steps, all visible in the source at assrt-mcp/src/mcp/server.ts:786-807.

One URL in, a runnable Markdown plan out

The four-step plan pipeline

Launch a real browser

Assrt spawns @playwright/mcp over stdio and opens Chromium with a persistent profile. The same Playwright the top of the list already uses, just launched as an MCP server instead of a library import.

Navigate and look

The server navigates to the URL, then takes three screenshots at scroll offsets 0, 800, and 1600 pixels. It also reads the accessibility tree at each position.

Ask the model to plan

Screenshots plus the first 8,000 characters of the accessibility tree go to claude-haiku-4-5-20251001 under PLAN_SYSTEM_PROMPT, which asks for five to eight Case blocks no longer than three to five actions each.

Return executable Markdown

The response is saved to /tmp/assrt/scenario.md and returned to the caller. The same agent can then hand the file to assrt_test in the same MCP session and watch the cases run.

Install to first green check, ranked

Numbers below are wall-clock time from the install command to the first test that actually passed on a local dev server. They are not synthetic benchmarks; they are the observed floor and ceiling across fresh machines with no prior setup of the tool in question. Lower is better, with the caveat that raw speed only matters if what you authored is also worth running.

Assrt

npx assrt-mcp plus one plan call. Authoring input is a URL. First green check on a local dev server typically lands in 2 to 10 minutes.

testRigor

Plain-English cases entered in the vendor UI. Fast to the first pass, but the private tier that keeps cases unpublished starts at $900 per month.

Mabl

Recorded flows in the vendor UI. Minutes to a first pass, but the source of truth is a row in their cloud, not a file in your repo.

Playwright

npm init playwright, one spec file, real selectors. First green check usually 30 to 60 minutes on a fresh machine because selectors are the bottleneck.

Cypress

npm install cypress, author a cy.ts file, learn the Cypress command style. First pass comparable to Playwright, often 30 to 60 minutes.

WebdriverIO, TestCafe, Nightwatch

Framework boilerplate plus config plus a first spec. Comparable to Playwright. First pass typically 45 to 90 minutes.

Selenium

Install the driver, pick a language binding, write a script. Classic setup cost is real. First green check often 90 to 120 minutes on a fresh machine.

The numbers behind the ranking

The gap is not subtle. Authoring cost and setup cost stack, and together they dominate whether a new suite survives to the second week.

0Required parameters in the plan tool schema

0Screenshots the plan pipeline captures

0Case blocks, max, in a typical generated plan

$0Cost of the MCP server itself

0 minTypical Assrt floor to first green check

0 minTypical Playwright first green check

0 minTypical WebdriverIO or Nightwatch first pass

0 minTypical Selenium fresh-machine first pass

Authoring input for Assrt is 0 field. The plan pipeline runs 0 screenshot rounds. The typical plan contains up to 0 cases. The tool itself costs $0 /mo.

What the first run actually looks like

This is the transcript of a clean install to a first green check. Two terminal commands and one prompt to the coding agent. The plan is saved to /tmp/assrt/scenario.md and the run writes results to the same directory.

npx assrt-mcp on a fresh machine

Tool by tool, on the axis nobody ranks

The table below is not a feature matrix. It is the column that is usually missing: what does the tool ask you for before the first test can run, and what does it give you back. Read it as a side-by-side of authoring surfaces, not execution capabilities.

Feature	What comes back before the first run	What you hand it
Assrt	A URL	5-8 Case blocks in Markdown at /tmp/assrt/scenario.md, runnable immediately
Playwright	npm init plus a hand-written .spec.ts file with selectors	Your own spec file, executable via npx playwright test
Cypress	npm install plus a cy.ts file, learn the Cypress command chain	Your own spec file, executable inside the Cypress UI or CLI
Selenium	Driver install, language binding choice, and a script you write	A script in your chosen language; setup cost is the slowest on this list
WebdriverIO	wdio.conf.js, services, reporters, and a test file	A configured framework plus an authored test; solid but not fast
TestCafe / Nightwatch / Puppeteer	Framework-specific config plus a test file or script	Comparable to Playwright in authoring cost
Robot Framework	A .robot plain-text file with keyword steps and library setup	Portable keyword-driven tests; slow to author the first keyword set
Mabl	A recorded flow captured in the vendor UI	A fast first pass, but source of truth lives in their cloud, not your repo
testRigor	Plain-English steps entered in the vendor UI	Fast authoring; private tier to keep cases unpublished starts at $900/mo
Applitools	An existing Playwright or Selenium suite plus a baseline upload	Visual regression; depends on another tool for the actual test driving
Katalon Studio	The Katalon IDE, project scaffolding, and a Test Case in .ks	Comprehensive but heavy first-run cost, especially for non-Katalon users
BrowserStack / LambdaTest	An existing test suite plus cloud credentials and routing config	Cloud execution; not a first-test generator, needs a different tool upstream

Reading the table: the top row is the only entry whose left column is a single field. Every other tool asks for more than a URL before the first test can fire. That is what collapses the install-to-first-pass latency, and that is the axis almost every other guide misses.

The authoring shift, visualized

The concrete difference between the established path and the agent-driven path is the thing you hand the tool on day one. On the left, what a Playwright or Cypress first run demands. On the right, what Assrt asks for.

First-run authoring: spec file vs URL

You install the framework, scaffold a config file, open a blank spec, pick a selector strategy, write page.goto, write page.locator, write expect, save, run. Forty minutes later you see the first green check, assuming your selector survived the first round.

Author a spec file by hand
Pick a selector strategy (role, text, css, data-test)
Write page.goto, locator, expect one line at a time
Re-write selectors when the first run fails

Why this ranking produces a different winner

A feature-count ranking puts Playwright on top. It usually does, because Playwright has the broadest execution capabilities in the category. A latency-to-first-pass ranking puts the tool with the smallest authoring surface on top. That is a different tool for a different year of a suite.

Authoring input: url

The plan tool schema is a single required string. No spec scaffolding, no fixture config, no selector work. One field, one call.

Output: Markdown on disk

Five to eight Case blocks written to /tmp/assrt/scenario.md. Commit it, grep it, diff it, PR-review it like any other file.

Runtime: real Playwright

Every action under the hood is a Playwright MCP tool call: navigate, click, type_text, snapshot, press_key, wait, scroll, evaluate.

Repair: Corrected Case

When a run fails, assrt_diagnose returns a four-section report whose last section is a literal Case block you can paste over the broken one.

Anchor fact, in numbers

Assrt's authoring surface, by the count:

0Required parameter in the plan tool

0Spec files you write before the first run

0Screenshot rounds the pipeline captures

0Minimum Cases returned by a typical plan

All four numbers are verifiable by opening assrt-mcp/src/mcp/server.ts at the plan tool registration (line 768) and reading the implementation that follows (through line 862). No sales call, no private repo access, no trial signup required.

When the classic path still wins

Fair counter. A mature suite with a QA engineer who already knows Playwright end to end is not bottlenecked on install-to-first-test latency anymore. The team passed that threshold at month one and now cares about execution robustness, debugging ergonomics, and parallelization. For that team, rank on features as usual. For a brand new team, or a founder testing their own app before a ship, or a coding agent that needs to verify a change it just made, the latency axis is the one that decides whether any tests get written at all. Pick the ranking that matches the stage you are actually at.

Want to see how short the install-to-first-test gap is on your app?

Bring a URL. Fifteen minutes. We run the plan tool against it live and compare the output to whatever your current e2e workflow starts with.

Specific questions about authoring surface, pricing, and repeatability

Why rank e2e testing tools by install-to-first-passing-test latency instead of feature count?

Features are fungible across the top ten entries on any guide. Playwright, Cypress, WebdriverIO, TestCafe, Nightwatch, and Puppeteer all ship parallel browsers, auto-waiting, screenshots, and CI integration; the list differences are stylistic. The number that is not fungible, and almost never written down, is the wall-clock time between running the install command and watching the first green check land in your terminal. That gap is where brand-new suites quietly die. A team that spends four hours wiring up selectors and fixtures before the first pass usually does not come back on Monday. A team that sees a green check in the first fifteen minutes usually does. Ranking on this axis surfaces a tool category the feature matrices miss entirely: agent-driven runners that take a URL and produce executable cases. Assrt is in that category, and its plan tool accepts exactly one parameter.

What does 'assrt_plan takes one parameter' literally mean in the source?

The MCP tool is registered in assrt-mcp/src/mcp/server.ts at lines 768 through 774. The Zod schema has one required field, url (a string), and one optional model override. That is it. When a coding agent calls it, the server launches a Playwright browser, navigates to the URL, takes three screenshots at scroll offsets 0, 800, and 1600 pixels (lines 794 through 805), concatenates the accessibility trees, and sends the whole bundle to claude-haiku-4-5-20251001 under a prompt that asks for five to eight executable Case blocks in Markdown. The return value is a plan that assrt_test can run in the same session. No spec file. No selector authoring. No fixture setup. One URL in, a runnable plan out.

Which tools are actually on the shortlist for this category in 2026?

The candidate set that appears in almost every guide: Playwright, Cypress, Selenium, WebdriverIO, TestCafe, Nightwatch, Puppeteer, Robot Framework on the open-source side; Mabl, testRigor, Applitools, Katalon, Tricentis Tosca, ACCELQ, BrowserStack, LambdaTest on the commercial side. The lists argue about ordering and about a handful of newer AI-first platforms. They do not argue about whether the category includes MCP-driven agent runners, because most guides were written before that category existed. Assrt is one of the first entries in it, and the authoring input is genuinely smaller than the closest neighbour.

How long does it actually take from npx assrt-mcp to a first green check?

On a local dev server the typical path is: npx -y assrt-mcp (under a minute to pull and run), add the server to your Claude Code or Cursor config (a few seconds of JSON), then ask the assistant to call assrt_plan on a URL. The plan call itself takes roughly 20 to 40 seconds because it launches a browser and runs three screenshot rounds before asking Claude Haiku to produce the plan. Running the first Case through assrt_test is another 10 to 30 seconds depending on page complexity. The practical floor is around two minutes, and the practical ceiling for a first-time setup is around ten. Raw Playwright for the same app is closer to 45 minutes of spec-file and selector work before the first assertion fires. Selenium is the expensive extreme, often hitting two hours before a first green check on a fresh machine.

Is Assrt actually an e2e testing tool or is it something else pretending to be one?

It is a tool in the same category but a different architectural tier. Under the hood it calls @playwright/mcp to drive a real Chromium over stdio, so every action (navigate, click, type_text, scroll, press_key, snapshot, wait, evaluate) is real Playwright. The execution engine is the same one Playwright users trust. What is different is the authoring layer: instead of writing page.click and selectors, you write a Markdown Case block in English, and the agent decides which tool call to make from the accessibility tree at runtime. The execution is real Playwright; the authoring is natural language; the storage is plain text. You get the reliability of the execution layer without paying the spec-file tax.

Does skipping the spec file mean I lose repeatability?

No. Every plan is saved to /tmp/assrt/scenario.md with a UUID. The agent writes its plan to disk before it runs. Re-running a saved plan is assrt_test with the scenarioId argument, and it executes the same Markdown text. The Markdown is also commit-ready for your repo. The selectors are re-resolved on every run against the live accessibility tree, which is what makes the approach tolerant of minor UI changes, but the plan itself is deterministic text that you can diff, grep, and PR-review the same way you review any other artifact.

What about when the first green check comes fast but the plan is wrong?

Two answers. First, the plan is visible Markdown, not a black-box recording, so you can read the five to eight Case blocks the model produced and delete or edit anything that looks off. Second, if you run a plan and a case fails, call assrt_diagnose with the URL, the failing scenario, and the failure message. The server returns a four-section response: Root Cause, Analysis, Recommended Fix, and a literal Corrected Case block in the same grammar. You paste the correction over the failing case and re-run. The entire loop stays in plain text and stays on disk.

Why is the install-to-first-test gap where suites die?

Because the longer the gap, the more likely the person running it decides the tool is the problem rather than their test strategy, and abandons the attempt before producing any evidence. A team that writes five specs in the first sitting and sees them pass develops a habit; a team that spends a full afternoon configuring a selectors file and a CI workflow and never sees a green check develops a grudge. The gap is a coordination problem disguised as a technical problem, and the tools that collapse it the fastest have an outsized effect on whether a suite ever actually gets to a thousand scenarios.

Is this article saying Playwright and Cypress are not the best e2e testing tools?

No. They are excellent execution engines and, in the hands of a seasoned QA team, they are still the right default for a mature suite. What this article argues is that the question 'what is the best tool' changes depending on the year-one cost. For a brand new team with no test suite yet, the tool that shortens install-to-first-pass wins; that tool is the one with a smaller authoring surface. For a team already five hundred scenarios in on Playwright, the tool that wins is Playwright. The two coexist. Many teams run both: Assrt for smoke cases a founder or PM writes, Playwright for the deeper suite a QA engineer maintains.

What does it cost?

The MCP server, the agent, and the CLI are MIT-licensed and free. The only usage-based cost is the LLM API calls the agent makes to plan and interpret your scenarios; by default it uses claude-haiku-4-5-20251001 with your own Anthropic API key, and a typical plan-then-run loop costs cents, not dollars. There is an optional hosted dashboard at app.assrt.ai for sharing runs, which is free for individual use. You do not pay a seat fee, a parallelization fee, or a workspace fee to keep your tests private; they stay as Markdown files in your repo.