Framework taxonomy, re-tiered for 2026

E2E testing frameworks split into four tiers, not two.

Every "best e2e testing frameworks in 2026" list I read lumps Playwright, Cypress, and Selenium into one bucket and calls Mabl or Autonoma the "AI alternative." That collapses a real axis: the one between an AI tool that compiles your scenario to its own cloud and one that interprets plain English at runtime through a standard Playwright MCP tool vocabulary. This page sorts the tier out.

M
Matthew Diakonov
11 min read
4.8from Assrt MCP users
8-tool Playwright MCP vocabulary exposed to the agent
Scenarios stored as plain-text .md on your disk
No compile step, no proprietary DSL, no vendor lock-in

The tier every top-10 list misses

Your test is English in a .md file. An agent interprets it against 8 Playwright MCP tools at runtime.

No compile step. No YAML. No proprietary selector graph. When you cancel the subscription, the scenarios are still on your disk, still grep-able, still runnable by any Playwright MCP agent.

The axis nobody is drawing: compile vs interpret

Every list you find for this keyword splits the field into scripted and AI, and calls it a day. That misses what happens the second you cancel. A scripted framework leaves you with .spec.ts files you wrote by hand. An AI-compiled framework leaves you with nothing you can read. They are not the same outcome.

The real axis is whether your scenario is compiled to a proprietary format and executed against it, or interpreted fresh on every run against a public tool vocabulary. The first case hides your test from you. The second case keeps it in a file you wrote.

The four tiers, ranked by distance from your code

Tier 1 — Scripted

Playwright, Cypress, Selenium, Appium. You import a library, write page.click, maintain selectors, run in CI. Maximum control, maximum maintenance. Best when a human engineer will own the suite long-term.

Tier 2 — Low-code record

BrowserStack Low-Code, Ghost Inspector, older Testim. You click through a UI and the tool captures selectors. Fast to author, brittle to maintain, selectors bound to the tool vendor.

Tier 3 — AI-compiled

Mabl, new Testim, Functionize, Autonoma. You describe a flow in natural language and the tool compiles it down to its internal format: a YAML DSL, a locator graph, or a stored scenario ID. Execution is theirs. Data is theirs. Lock-in is the business model.

Tier 4 — Agent-interpreted

Assrt. Your test is a plain-text #Case in a .md file. A coding agent reads it on each run and picks from 8 Playwright MCP tools in real time. No compile step. No proprietary format. If you cancel tomorrow, you keep the .md files and the videos.

Which tier fits a vibe-coded app?

Tier 4. You are shipping features faster than any human can babysit a selector, and your AI coding agent already understands the product. Pointing it at an English plan it can execute via Playwright MCP is the smallest path.

Which tier fits a 10-year-old regression suite?

Tier 1, still. A thousand Playwright specs that someone already wrote and tuned is a fine asset. Tier 4 is complementary, not a replacement. Run both against the same app, fail the build if either does.

The anchor fact: Assrt exposes exactly 8 Playwright MCP tools

This is the uncopyable part. The agent Assrt drives has a fixed action surface of 8 tools, defined in a single file you can read in the repo in under a minute. Every click, every key press, every assertion in every run goes through one of these. Nothing else is reachable.

assrt-mcp/src/core/agent.ts:14-100

Compare that to a tier-3 AI tool, where your scenario compiles to a proprietary locator graph you cannot inspect, and the list of primitives is internal. Compare it to a tier-1 framework, where the primitives are public (Playwright has 40+) but the caller has to write them by hand. Tier 4 sits in between: the primitives are public and small and called for you.

The full tool vocabulary, no hidden ninth

navigatesnapshotclicktype_textselect_optionscrollpress_keywait

Your test, on disk, as a file you can open

Here is a two-case scenario from the path the runner actually reads. The file lives at /tmp/assrt/scenario.md (documented at assrt-mcp/src/core/scenario-files.ts:5-20). It is parsed by the regex #?\s*(?:Scenario|Test|Case). No YAML frontmatter. No JSON schema. No DSL keywords. Plain English, line by line.

/tmp/assrt/scenario.md

Two things about this file you cannot get from a tier-3 tool. First, you can edit it while a run is in progress; an fs.watch picks up the change and the next run uses it. Second, you can commit it to git and diff it. A new line in the plan is a new line in the PR. Reviewers read the English, not a selector graph.

How the plan becomes browser actions

On every run, the same pipeline. You kick it off with one npx command, the agent reads the .md file, and the 8 tools do the rest. No build cache, no compiled artifact, no vendor-side scenario ID.

From English to Playwright, in real time

/tmp/assrt/scenario.md
agent
accessibility tree
8 Playwright MCP tools
real Chromium
WebM video
results JSON
browser profile

Assrt vs a typical tier-3 AI testing platform

This is the comparison the top-10 lists conflate. Same marketing category ("AI-native E2E") but architecturally different. The table below is the honest breakdown, from the source.

FeatureTypical tier-3 AI platformAssrt (tier 4)
Scenario storageProprietary YAML or graph in vendor cloudPlain-text .md file on your disk
Action surfaceInternal, unversioned, closed8 Playwright MCP tools, public, in agent.ts
Execution engineVendor-hosted worker fleetReal Playwright on your machine or CI runner
Cost modelPer-seat + per-run, $7.5k+/mo at scaleOpen-source CLI, $0 to run self-hosted
Data residencyScenarios + videos live in vendor cloudEverything on /tmp/assrt and ~/.assrt
What you keep when you cancelNothing readableAll .md files, all JSON results, all WebM videos
Debug loop on a failed testClick into vendor UI, parse their reportWatch the WebM, read the tool-call transcript
Lock-inHigh: scenarios, selectors, historyNone: the plan is English, the tools are Playwright
8 tools

Tests live as plain-text #Case blocks in /tmp/assrt/scenario.md, parsed by the regex #?\s*(?:Scenario|Test|Case). No YAML, no DSL, no proprietary locator graph. You can grep them, diff them, commit them.

assrt-mcp/src/core/scenario-files.ts

What a run actually prints

Each line below corresponds to one of the 8 tools firing. The transcript is deliberately readable so reviewers can understand a failed run without digging into a stack trace.

npx assrt-mcp run --url http://localhost:3000

The concrete numbers, from the source

Every value below is in the repo. You can verify them by opening assrt-mcp/src/core/agent.ts and reading around the line numbers called out in the FAQ.

0Playwright MCP tools exposed
0sEmail wait default
0sHard step timeout
0sWait-for-stable default
0Desktop viewport width
0Mobile viewport width
0Desktop viewport height
0Mobile viewport height

How to pick the right tier for the job

This is the question the top-10 lists do not help with. They give you ten tools and no decision rule. The decision rule is the tier, not the tool. Here is how I pick.

Decision tree, five steps

1

Is the suite owned by a human QA engineer for the next 5 years?

If yes, tier 1. Playwright or Cypress. Nothing below is worth the velocity trade-off once you have a human writing selectors daily.

2

Is the suite owned by someone who reads code but doesn't write tests?

Tier 4. An agent-interpreted framework lets a PM, designer, or support engineer describe a flow in English and have it run. You do not need to hire a QA engineer to start.

3

Is your product being vibe-coded by an AI coding agent?

Tier 4. The AI that wrote the feature can write the #Case that tests it. Tool-for-tool: if your dev agent uses Playwright MCP, your test runner should too.

4

Is your company's policy no-vendor-cloud-for-app-data?

Tier 1 or tier 4. Tier 3 stores scenarios in the vendor cloud by default. Tier 1 keeps them in your repo. Tier 4 keeps them in /tmp/assrt on your disk. Tier 2 depends on the vendor.

5

Do you already have 500+ Playwright specs that work?

Keep them. Run them. Layer tier 4 on top for the flows no one has scripted yet. Tier 4 is additive, not a migration.

What you keep the day you leave

The lock-in test. Run it mentally against any e2e testing framework you are evaluating: if you cancel the contract tomorrow, what is on your disk, in a format you can read, that you could run again without the vendor?

The full artifact list

Everything Assrt writes to your disk, per run

  • Every scenario as a plain-text .md file you can grep, diff, and commit
  • A structured JSON result at /tmp/assrt/results/<runId>.json (schema in types.ts)
  • A WebM video of every run with the injected cursor overlay
  • PNG screenshots per step saved to disk
  • The persisted browser profile at ~/.assrt/browser-profile with saved logins
  • Zero proprietary selectors, zero YAML DSL, zero vendor-specific locator graphs
  • The ability to port the same .md plan to any other Playwright MCP agent and run it unchanged

Try tier 4 against your own app

One npx command. Scenario is plain English in a .md file. Runs real Playwright under the hood. Video auto-opens in a playback tab when the run finishes. No account required, no cloud to connect.

Install npx assrt-mcp

E2E testing frameworks: specific answers

Is Assrt an E2E testing framework in the same sense as Playwright, Cypress, or Selenium?

It sits one layer above them, not beside them. Playwright, Cypress, and Selenium are frameworks you program against: you import a library, call page.click, write selectors, and compile a test file. Assrt ships as an MCP server (npx assrt-mcp) that drives a real Playwright instance on your behalf. You do not call page.click. You write a #Case in English, and an agent picks navigate, snapshot, click, type_text, select_option, scroll, press_key, or wait at runtime based on what it reads on the page. So you get Playwright's execution reliability without the scripting tax. The source lives at assrt-mcp/src/core/agent.ts lines 14-100.

What exactly are the 8 Playwright MCP tools the Assrt agent uses?

navigate(url), snapshot() (accessibility tree with [ref=eN] markers), click(element, ref?), type_text(element, text, ref?), select_option(element, values[]), scroll(x?, y), press_key(key), and wait(text?, ms?). Every action the agent can take on the browser goes through one of these eight. There is no ninth hidden tool, and there are no selectors you write by hand. The list is not a wrapper around a proprietary DSL, it is the canonical Playwright MCP vocabulary, which means anyone reading assrt-mcp/src/core/agent.ts can audit the full action surface in under five minutes.

What's the difference between a tool like Mabl or Testim and an agent-interpreted framework like Assrt?

Mabl, Testim, Functionize, and similar AI-native platforms record your browser session and compile it into their proprietary format: a YAML DSL, a locator graph, a stored scenario ID in their cloud. The execution engine is theirs. The data is theirs. If you cancel, the tests are useless. Assrt is agent-interpreted: the scenario is a plain-text .md file on your disk with #Case headers, interpreted by a coding agent at runtime, driving real Playwright MCP tools. There is no compile step, no YAML, no proprietary selector graph. If you cancel Assrt tomorrow, you keep the .md files. You can grep them. You can git-blame them. You can port them to Claude or Cursor to run.

Where are test scenarios stored on disk, and what is the file format?

Scenarios live at /tmp/assrt/scenario.md, with metadata at /tmp/assrt/scenario.json, and results at /tmp/assrt/results/latest.json plus one file per historical run at /tmp/assrt/results/<runId>.json. The file layout is documented at assrt-mcp/src/core/scenario-files.ts lines 5-20. The .md format is prose: a header line matching the regex `#?\s*(?:Scenario|Test|Case)` followed by plain-English imperative sentences. No YAML frontmatter, no JSON schema, no DSL. You can edit scenario.md in any text editor while a test is running, save, and the next run picks up your changes automatically via an fs.watch on the file.

What does the 4-tier taxonomy look like, practically?

Tier 1 is scripted frameworks: Playwright, Cypress, Selenium, Appium. You write code. Tier 2 is low-code record-and-replay: BrowserStack Low-Code, Ghost Inspector, older Testim. You click through a UI and the tool captures selectors. Tier 3 is AI-compiled: Mabl, Testim (new), Functionize, Autonoma. You describe a flow and the tool compiles it to its internal format. Tier 4 is agent-interpreted: Assrt. You write English, and an agent picks Playwright MCP tools at runtime on each run. The difference between tiers 3 and 4 is whether your scenario is compiled and stored, or interpreted and ephemeral. Tier 4 is what every 'best E2E testing frameworks in 2026' list misses.

If the agent picks tools at runtime, isn't that flaky? What happens when it picks the wrong element?

It can pick wrong, and when it does you see it. Every run produces a WebM video with a 20px red cursor overlay, a transcript of every tool call the agent made, and a screenshot at the failure point. If the agent clicks the wrong thing, the video shows the cursor landing in the wrong place and you rephrase the plan sentence. This is usually faster than debugging a selector timeout in a compiled framework, because the feedback is visual. The snapshot() tool also returns the accessibility tree with stable [ref=eN] identifiers, so the agent has structured anchors to work against, not just fuzzy CSS guesses.

Can I run Assrt in CI, or is it only for local dev?

Both. The same CLI works locally and in CI. In CI you pass --headed=false, --no-auto-open, and --json so the runner does not try to launch a video player on the CI runner. The WebM video still writes to /tmp/assrt/videos and can be uploaded as a GitHub Actions artifact or a GitLab CI artifact. The JSON report at /tmp/assrt/results/latest.json has the full pass/fail structure documented at assrt-mcp/src/core/types.ts lines 28-35, so a CI script can parse it and fail the build. There is no cloud dashboard you have to log into, no concurrency quota, and no per-test billing.

How does Assrt compare to running Playwright directly with a coding agent (Claude, Cursor) plus the Playwright MCP extension?

Assrt is that, plus opinionated defaults for the parts everyone redoes by hand. The base layer is real Playwright MCP. On top of it, Assrt adds: persistent browser profile at ~/.assrt/browser-profile (so saved logins survive between runs), disposable email with create_temp_email and wait_for_verification_code for signup flows, the cursor and ripple overlay for video debugging, a /tmp/assrt/scenario.md file watcher that auto-syncs edits, a custom video playback tab that opens after each run with 1x/2x/5x/10x speed keys, and a structured pass/fail JSON report. You could build all of that yourself on top of raw Playwright MCP. You would end up with Assrt.

What is the default model, and can I swap it out?

The default is Claude Haiku 4.5 (claude-haiku-4-5-20251001), set at assrt-mcp/src/core/agent.ts line 9. You can override with the --model flag. Gemini 3.1 Pro Preview is also supported as an alternate provider via the Provider type ("anthropic" | "gemini") defined on the same file. Both providers map the same 8 Playwright MCP tools, so switching providers does not change the scenario language, only the interpreter.

What is the full list of test artifacts I get back from one run, and where do they live?

From a single run you get: (1) /tmp/assrt/results/<runId>.json, the structured pass/fail report with scenarios[], totalDuration, passedCount, failedCount (schema at assrt-mcp/src/core/types.ts lines 28-35); (2) /tmp/assrt/results/latest.json, a copy of the same report for convenience; (3) /tmp/assrt/videos/run-<timestamp>.webm, the full WebM Matroska video with the injected cursor overlay; (4) a sequence of PNG screenshots per step, written by the server at assrt-mcp/src/mcp/server.ts lines 468-471; (5) ~/.assrt/browser-profile, the persisted cookies and localStorage so the next run starts logged in. All of these are on your disk. None are in anyone else's cloud unless you explicitly opt into Firestore sync.

Tier 4, on your disk

The scenario is English. The tools are Playwright. The file is yours.

0 MCP tools, 0 proprietary DSL, 0 tiers for a field the lists treat as two.

Try Assrt free

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.