Framework taxonomy, re-tiered for 2026

E2E testing frameworks split into four tiers, not two.

Every "best e2e testing frameworks in 2026" list I read lumps Playwright, Cypress, and Selenium into one bucket and calls Mabl or Autonoma the "AI alternative." That collapses a real axis: the one between an AI tool that compiles your scenario to its own cloud and one that interprets plain English at runtime through a standard Playwright MCP tool vocabulary. This page sorts the tier out.

Matthew Diakonov, Assrt maintainer

Published April 19, 202611 min read

4.8from Assrt MCP users

8-tool Playwright MCP vocabulary exposed to the agent

Scenarios stored as plain-text .md on your disk

No compile step, no proprietary DSL, no vendor lock-in

The tier every top-10 list misses

Your test is English in a .md file. An agent interprets it against 8 Playwright MCP tools at runtime.

No compile step. No YAML. No proprietary selector graph. When you cancel the subscription, the scenarios are still on your disk, still grep-able, still runnable by any Playwright MCP agent.

Install npx assrt-mcp

E2E testing frameworks, re-tiered

Four tiers by who drives the browser. The fourth is missing from every top-10 list.

Tier 1: you write Playwright or Cypress code

Tier 2: you record a flow, tool saves selectors

Tier 3: AI compiles your scenario to its cloud

Tier 4: agent interprets English at runtime

Only tier 4 keeps your test as a plain .md file

0:00 / 0:05

The axis nobody is drawing: compile vs interpret

Every list you find for this keyword splits the field into scripted and AI, and calls it a day. That misses what happens the second you cancel. A scripted framework leaves you with .spec.ts files you wrote by hand. An AI-compiled framework leaves you with nothing you can read. They are not the same outcome.

The real axis is whether your scenario is compiled to a proprietary format and executed against it, or interpreted fresh on every run against a public tool vocabulary. The first case hides your test from you. The second case keeps it in a file you wrote.

The four tiers, ranked by distance from your code

Tier 1 — Scripted

Playwright, Cypress, Selenium, Appium. You import a library, write page.click, maintain selectors, run in CI. Maximum control, maximum maintenance. Best when a human engineer will own the suite long-term.

Tier 2 — Low-code record

BrowserStack Low-Code, Ghost Inspector, older Testim. You click through a UI and the tool captures selectors. Fast to author, brittle to maintain, selectors bound to the tool vendor.

Tier 3 — AI-compiled

Mabl, new Testim, Functionize, Autonoma. You describe a flow in natural language and the tool compiles it down to its internal format: a YAML DSL, a locator graph, or a stored scenario ID. Execution is theirs. Data is theirs. Lock-in is the business model.

Tier 4 — Agent-interpreted

Assrt. Your test is a plain-text #Case in a .md file. A coding agent reads it on each run and picks from 8 Playwright MCP tools in real time. No compile step. No proprietary format. If you cancel tomorrow, you keep the .md files and the videos.

Which tier fits a vibe-coded app?

Tier 4. You are shipping features faster than any human can babysit a selector, and your AI coding agent already understands the product. Pointing it at an English plan it can execute via Playwright MCP is the smallest path.

Which tier fits a 10-year-old regression suite?

Tier 1, still. A thousand Playwright specs that someone already wrote and tuned is a fine asset. Tier 4 is complementary, not a replacement. Run both against the same app, fail the build if either does.

The anchor fact: Assrt exposes exactly 8 Playwright MCP tools

This is the uncopyable part. The agent Assrt drives has a fixed action surface of 8 tools, defined in a single file you can read in the repo in under a minute. Every click, every key press, every assertion in every run goes through one of these. Nothing else is reachable.

assrt-mcp/src/core/agent.ts:14-100

Compare that to a tier-3 AI tool, where your scenario compiles to a proprietary locator graph you cannot inspect, and the list of primitives is internal. Compare it to a tier-1 framework, where the primitives are public (Playwright has 40+) but the caller has to write them by hand. Tier 4 sits in between: the primitives are public and small and called for you.

The full tool vocabulary, no hidden ninth

navigatesnapshotclicktype_textselect_optionscrollpress_keywait

Your test, on disk, as a file you can open

Here is a two-case scenario from the path the runner actually reads. The file lives at /tmp/assrt/scenario.md (documented at assrt-mcp/src/core/scenario-files.ts:5-20). It is parsed by the regex #?\s*(?:Scenario|Test|Case). No YAML frontmatter. No JSON schema. No DSL keywords. Plain English, line by line.

/tmp/assrt/scenario.md

Two things about this file you cannot get from a tier-3 tool. First, you can edit it while a run is in progress; an fs.watch picks up the change and the next run uses it. Second, you can commit it to git and diff it. A new line in the plan is a new line in the PR. Reviewers read the English, not a selector graph.

How the plan becomes browser actions

On every run, the same pipeline. You kick it off with one npx command, the agent reads the .md file, and the 8 tools do the rest. No build cache, no compiled artifact, no vendor-side scenario ID.

From English to Playwright, in real time

Assrt vs a typical tier-3 AI testing platform

This is the comparison the top-10 lists conflate. Same marketing category ("AI-native E2E") but architecturally different. The table below is the honest breakdown, from the source.

Feature	Typical tier-3 AI platform	Assrt (tier 4)
Scenario storage	Proprietary YAML or graph in vendor cloud	Plain-text .md file on your disk
Action surface	Internal, unversioned, closed	8 Playwright MCP tools, public, in agent.ts
Execution engine	Vendor-hosted worker fleet	Real Playwright on your machine or CI runner
Cost model	Per-seat + per-run, $7.5k+/mo at scale	Open-source CLI, $0 to run self-hosted
Data residency	Scenarios + videos live in vendor cloud	Everything on /tmp/assrt and ~/.assrt
What you keep when you cancel	Nothing readable	All .md files, all JSON results, all WebM videos
Debug loop on a failed test	Click into vendor UI, parse their report	Watch the WebM, read the tool-call transcript
Lock-in	High: scenarios, selectors, history	None: the plan is English, the tools are Playwright

8 tools

“Tests live as plain-text #Case blocks in /tmp/assrt/scenario.md, parsed by the regex #?\s*(?:Scenario|Test|Case). No YAML, no DSL, no proprietary locator graph. You can grep them, diff them, commit them.”

assrt-mcp/src/core/scenario-files.ts

What a run actually prints

Each line below corresponds to one of the 8 tools firing. The transcript is deliberately readable so reviewers can understand a failed run without digging into a stack trace.

npx assrt-mcp run --url http://localhost:3000

The concrete numbers, from the source

Every value below is in the repo. You can verify them by opening assrt-mcp/src/core/agent.ts and reading around the line numbers called out in the FAQ.

0Playwright MCP tools exposed

0sEmail wait default

0sHard step timeout

0sWait-for-stable default

0Desktop viewport width

0Mobile viewport width

0Desktop viewport height

0Mobile viewport height

How to pick the right tier for the job

This is the question the top-10 lists do not help with. They give you ten tools and no decision rule. The decision rule is the tier, not the tool. Here is how I pick.

Decision tree, five steps

Is the suite owned by a human QA engineer for the next 5 years?

If yes, tier 1. Playwright or Cypress. Nothing below is worth the velocity trade-off once you have a human writing selectors daily.

Is the suite owned by someone who reads code but doesn't write tests?

Tier 4. An agent-interpreted framework lets a PM, designer, or support engineer describe a flow in English and have it run. You do not need to hire a QA engineer to start.

Is your product being vibe-coded by an AI coding agent?

Tier 4. The AI that wrote the feature can write the #Case that tests it. Tool-for-tool: if your dev agent uses Playwright MCP, your test runner should too.

Is your company's policy no-vendor-cloud-for-app-data?

Tier 1 or tier 4. Tier 3 stores scenarios in the vendor cloud by default. Tier 1 keeps them in your repo. Tier 4 keeps them in /tmp/assrt on your disk. Tier 2 depends on the vendor.

Do you already have 500+ Playwright specs that work?

Keep them. Run them. Layer tier 4 on top for the flows no one has scripted yet. Tier 4 is additive, not a migration.

What you keep the day you leave

The lock-in test. Run it mentally against any e2e testing framework you are evaluating: if you cancel the contract tomorrow, what is on your disk, in a format you can read, that you could run again without the vendor?

The full artifact list

Everything Assrt writes to your disk, per run

Every scenario as a plain-text .md file you can grep, diff, and commit
A structured JSON result at /tmp/assrt/results/<runId>.json (schema in types.ts)
A WebM video of every run with the injected cursor overlay
PNG screenshots per step saved to disk
The persisted browser profile at ~/.assrt/browser-profile with saved logins
Zero proprietary selectors, zero YAML DSL, zero vendor-specific locator graphs
The ability to port the same .md plan to any other Playwright MCP agent and run it unchanged

Try tier 4 against your own app

One npx command. Scenario is plain English in a .md file. Runs real Playwright under the hood. Video auto-opens in a playback tab when the run finishes. No account required, no cloud to connect.

Install npx assrt-mcp →

E2E testing frameworks: specific answers

Is Assrt an E2E testing framework in the same sense as Playwright, Cypress, or Selenium?

It sits one layer above them, not beside them. Playwright, Cypress, and Selenium are frameworks you program against: you import a library, call page.click, write selectors, and compile a test file. Assrt ships as an MCP server (npx assrt-mcp) that drives a real Playwright instance on your behalf. You do not call page.click. You write a #Case in English, and an agent picks navigate, snapshot, click, type_text, select_option, scroll, press_key, or wait at runtime based on what it reads on the page. So you get Playwright's execution reliability without the scripting tax. The source lives at assrt-mcp/src/core/agent.ts lines 14-100.

What exactly are the 8 Playwright MCP tools the Assrt agent uses?

navigate(url), snapshot() (accessibility tree with [ref=eN] markers), click(element, ref?), type_text(element, text, ref?), select_option(element, values[]), scroll(x?, y), press_key(key), and wait(text?, ms?). Every action the agent can take on the browser goes through one of these eight. There is no ninth hidden tool, and there are no selectors you write by hand. The list is not a wrapper around a proprietary DSL, it is the canonical Playwright MCP vocabulary, which means anyone reading assrt-mcp/src/core/agent.ts can audit the full action surface in under five minutes.

What's the difference between a tool like Mabl or Testim and an agent-interpreted framework like Assrt?

Mabl, Testim, Functionize, and similar AI-native platforms record your browser session and compile it into their proprietary format: a YAML DSL, a locator graph, a stored scenario ID in their cloud. The execution engine is theirs. The data is theirs. If you cancel, the tests are useless. Assrt is agent-interpreted: the scenario is a plain-text .md file on your disk with #Case headers, interpreted by a coding agent at runtime, driving real Playwright MCP tools. There is no compile step, no YAML, no proprietary selector graph. If you cancel Assrt tomorrow, you keep the .md files. You can grep them. You can git-blame them. You can port them to Claude or Cursor to run.

Where are test scenarios stored on disk, and what is the file format?

Scenarios live at /tmp/assrt/scenario.md, with metadata at /tmp/assrt/scenario.json, and results at /tmp/assrt/results/latest.json plus one file per historical run at /tmp/assrt/results/<runId>.json. The file layout is documented at assrt-mcp/src/core/scenario-files.ts lines 5-20. The .md format is prose: a header line matching the regex `#?\s*(?:Scenario|Test|Case)` followed by plain-English imperative sentences. No YAML frontmatter, no JSON schema, no DSL. You can edit scenario.md in any text editor while a test is running, save, and the next run picks up your changes automatically via an fs.watch on the file.

What does the 4-tier taxonomy look like, practically?

Tier 1 is scripted frameworks: Playwright, Cypress, Selenium, Appium. You write code. Tier 2 is low-code record-and-replay: BrowserStack Low-Code, Ghost Inspector, older Testim. You click through a UI and the tool captures selectors. Tier 3 is AI-compiled: Mabl, Testim (new), Functionize, Autonoma. You describe a flow and the tool compiles it to its internal format. Tier 4 is agent-interpreted: Assrt. You write English, and an agent picks Playwright MCP tools at runtime on each run. The difference between tiers 3 and 4 is whether your scenario is compiled and stored, or interpreted and ephemeral. Tier 4 is what every 'best E2E testing frameworks in 2026' list misses.

If the agent picks tools at runtime, isn't that flaky? What happens when it picks the wrong element?

It can pick wrong, and when it does you see it. Every run produces a WebM video with a 20px red cursor overlay, a transcript of every tool call the agent made, and a screenshot at the failure point. If the agent clicks the wrong thing, the video shows the cursor landing in the wrong place and you rephrase the plan sentence. This is usually faster than debugging a selector timeout in a compiled framework, because the feedback is visual. The snapshot() tool also returns the accessibility tree with stable [ref=eN] identifiers, so the agent has structured anchors to work against, not just fuzzy CSS guesses.

Can I run Assrt in CI, or is it only for local dev?

Both. The same CLI works locally and in CI. In CI you pass --headed=false, --no-auto-open, and --json so the runner does not try to launch a video player on the CI runner. The WebM video still writes to /tmp/assrt/videos and can be uploaded as a GitHub Actions artifact or a GitLab CI artifact. The JSON report at /tmp/assrt/results/latest.json has the full pass/fail structure documented at assrt-mcp/src/core/types.ts lines 28-35, so a CI script can parse it and fail the build. There is no cloud dashboard you have to log into, no concurrency quota, and no per-test billing.

How does Assrt compare to running Playwright directly with a coding agent (Claude, Cursor) plus the Playwright MCP extension?

Assrt is that, plus opinionated defaults for the parts everyone redoes by hand. The base layer is real Playwright MCP. On top of it, Assrt adds: persistent browser profile at ~/.assrt/browser-profile (so saved logins survive between runs), disposable email with create_temp_email and wait_for_verification_code for signup flows, the cursor and ripple overlay for video debugging, a /tmp/assrt/scenario.md file watcher that auto-syncs edits, a custom video playback tab that opens after each run with 1x/2x/5x/10x speed keys, and a structured pass/fail JSON report. You could build all of that yourself on top of raw Playwright MCP. You would end up with Assrt.

What is the default model, and can I swap it out?

The default is Claude Haiku 4.5 (claude-haiku-4-5-20251001), set at assrt-mcp/src/core/agent.ts line 9. You can override with the --model flag. Gemini 3.1 Pro Preview is also supported as an alternate provider via the Provider type ("anthropic" | "gemini") defined on the same file. Both providers map the same 8 Playwright MCP tools, so switching providers does not change the scenario language, only the interpreter.

What is the full list of test artifacts I get back from one run, and where do they live?

From a single run you get: (1) /tmp/assrt/results/<runId>.json, the structured pass/fail report with scenarios[], totalDuration, passedCount, failedCount (schema at assrt-mcp/src/core/types.ts lines 28-35); (2) /tmp/assrt/results/latest.json, a copy of the same report for convenience; (3) /tmp/assrt/videos/run-<timestamp>.webm, the full WebM Matroska video with the injected cursor overlay; (4) a sequence of PNG screenshots per step, written by the server at assrt-mcp/src/mcp/server.ts lines 468-471; (5) ~/.assrt/browser-profile, the persisted cookies and localStorage so the next run starts logged in. All of these are on your disk. None are in anyone else's cloud unless you explicitly opt into Firestore sync.

Other guides on agent-driven E2E testing and the tradeoffs against scripted and AI-compiled tools.

Keep reading

Guide

E2E testing for beginners: the first test you can actually watch

The part every beginner tutorial skips: headless runs look frozen. Assrt injects a red cursor so your first run plays like a screen recording.

Read

Feature

Generates real Playwright tests, not automation YAML

Why agent-driven execution beats AI-compiled scenarios. Your test stays as the .md file you wrote.

Read

Compare

Manual Playwright vs Assrt

What you give up, and what you gain, moving from raw Playwright scripts to an agent driving them through MCP.

Read

Tier 4, on your disk

The scenario is English. The tools are Playwright. The file is yours.

0 MCP tools, 0 proprietary DSL, 0 tiers for a field the lists treat as two.

Try Assrt free

The axis nobody is drawing: compile vs interpret

The four tiers, ranked by distance from your code

Tier 1 — Scripted

Tier 2 — Low-code record

Tier 3 — AI-compiled

Tier 4 — Agent-interpreted

Which tier fits a vibe-coded app?

Which tier fits a 10-year-old regression suite?

The anchor fact: Assrt exposes exactly 8 Playwright MCP tools

Your test, on disk, as a file you can open

How the plan becomes browser actions

From English to Playwright, in real time

Assrt vs a typical tier-3 AI testing platform

What a run actually prints

The concrete numbers, from the source

How to pick the right tier for the job

Decision tree, five steps

Is the suite owned by a human QA engineer for the next 5 years?

Is the suite owned by someone who reads code but doesn't write tests?

Is your product being vibe-coded by an AI coding agent?

Is your company's policy no-vendor-cloud-for-app-data?

Do you already have 500+ Playwright specs that work?

What you keep the day you leave

E2E testing frameworks: specific answers

Keep reading

E2E testing for beginners: the first test you can actually watch

Generates real Playwright tests, not automation YAML

Manual Playwright vs Assrt

The scenario is English. The tools are Playwright. The file is yours.

Comments (••)

Comments ()