Playwright vs AI browser automation is a false binary

The debate usually sounds like this. Playwright is deterministic and slow to author. AI browser tools are fast to author and brittle in CI. Pick your poison. That framing misses the stack that has existed for about a year: an LLM agent that drives Playwright itself through Playwright MCP, so every click, type, and snapshot is a real Playwright call, and the plan you write is plain English in a Markdown file you can commit to git.

Assrt is that stack. It wraps @playwright/mcp, a Claude or Gemini agent sits on top, your scenarios live in Markdown, and the whole runtime is 1,087 lines of MIT TypeScript you can read in an afternoon. This page walks the wiring, line by line, so you can decide whether to keep defending the binary or take the third option.

Matthew Diakonov, Assrt

Published April 21, 202611 min read

4.8from 212 engineers

Wraps @playwright/mcp, spawned over stdio (browser.ts line 155)

10 Playwright MCP primitives in the agent's tool palette (agent.ts lines 16-113)

Scenarios are plain Markdown at /tmp/assrt/scenario.md, not proprietary YAML

Playwright vs AI browser automation

the third option the binary ignores

1. LLM writes a plan in plain Markdown

2. Agent calls snapshot, click, type_text

3. Playwright MCP runs them on real Chromium

4. Report, screenshots, video land on your disk

0:00 / 0:05

10 primitives

“The agent is not trying to be smarter than Playwright. It is trying to spare you from writing selectors. The selector strategy is still Playwright's job; the plan is English because English is what you already have.”

Assrt design note, agent.ts SYSTEM_PROMPT

The three stacks, not two

The Reddit thread version of this debate lists two options. There are three. The third is the one most commercial copy glosses over because it cannibalises a paid SKU.

Stack 1

Hand-written Playwright

You author .spec.ts with getByRole, expect, locators. Fast CI. High authoring and maintenance cost. Deterministic failures. No AI in the loop.

Stack 2

Closed AI browser tool

A vendor cloud runs agents that click through your app. Fast authoring, hidden runtime. Vision-model failures, proprietary YAML plans, $8K to $90K per year.

Stack 3

LLM agent on Playwright MCP

Your plan is Markdown. The agent calls Playwright MCP primitives. Determinism of stack 1, authoring speed of stack 2, price of zero. This is what Assrt is.

Inputs, runtime, and the thing that clicks the button

Three inputs come in (Markdown plan, an app URL, a model key). The runtime launches Playwright MCP as a child process over stdio. The agent consults the plan, calls primitives, and artifacts land on disk. Nothing in this chain is proprietary.

Markdown + Playwright MCP + LLM = real Playwright, English plan

The anchor fact: Assrt requires @playwright/mcp at runtime

The proof that this is Playwright, not a Playwright imitation, is in /Users/matthewdi/assrt/src/core/browser.ts around line 155. The runtime resolves @playwright/mcp from your node_modules and spawns it as a local child process. If the package is not installed, Assrt does not run. Every click is a Playwright click.

src/core/browser.ts

The 10 primitives the agent uses

The agent is an LLM with a fixed tool palette. The 10 entries below are 1:1 with Playwright MCP's own tools, so the LLM's plan is a sequence of real Playwright calls. The shape and names are defined in agent.ts lines 16 through 113.

src/core/agent.ts

What the agent does on every step

A typical click follows the same three-call cycle you would use by hand with Playwright MCP: snapshot, pick a ref, act. The difference is the agent reads your English and picks the ref itself.

A click, step by step

The same flow in two stacks

A sign-up scenario that hits email OTP is where hand-written Playwright shows its cost. Mailosaur or Mailhog or Ethereal. API key. Polling. Retry. Per-input OTP fill. A helper in a utils file. Every time the OTP inputs change, you edit two places. In Assrt, the same flow is a seven-line Markdown block.

Assrt plan (scenario.md)

tests/checkout.md

Hand-written Playwright equivalent

tests/signup.spec.ts

What a run actually looks like on stdout

No magic. The log line for a click names the ref the agent hit. The log line for a wait names the text it waited for. The log line for an assertion names the evidence. If something breaks, you read the log the same way you read a failing Playwright test: by the thing that happened right before.

assrt run

How Assrt stacks up against the two common stacks

The comparison below folds stack 1 (hand-written Playwright) and stack 2 (closed AI QA platforms) into a single "competitor" column, because on each row, one of them is Assrt's real competition. The Assrt column is the third stack.

Feature	Raw Playwright or a closed AI QA vendor	Assrt
How does the tool talk to the browser?	A hand-written .spec.ts file calls Playwright directly. You author every selector and wait.	An LLM agent calls Playwright MCP. Each English step becomes a real Playwright primitive.
What is the plan file format?	TypeScript .spec.ts you maintain by hand. Or, on closed AI platforms, a proprietary YAML DSL.	Plain Markdown with #Case N: blocks at /tmp/assrt/scenario.md. Commit to git, grep from the CLI.
What happens when the UI shifts?	Selectors break. You edit .spec.ts, rerun, edit again. Or you pay a managed-service QA team.	The agent re-snapshots on each run and picks the new ref. Your Markdown plan does not change.
How does the agent pick elements?	Playwright locators you wrote. Or a vision model guessing at pixel coordinates.	Accessibility tree refs from Playwright MCP's snapshot tool. Deterministic, inspectable, stable.
Who owns the scenario source of truth?	Your repo (raw Playwright) or the vendor's cloud (closed AI QA platforms).	Your disk. /tmp/assrt/scenario.md and ~/.assrt/scenarios/<uuid>.json. Cloud is a cache, not a vault.
What does it cost?	Raw Playwright is free but maintenance is expensive. Closed AI QA platforms are $8K to $90K per year.	MIT licensed, free. You pay only for the model calls, typically a few cents per scenario.
Can I keep hand-written Playwright tests too?	Yes (if that is what you were already doing). Or no, if you picked a closed AI vendor.	Yes. Assrt and .spec.ts coexist in the same repo. Use each where its maintenance cost is lowest.
What survives if the vendor disappears?	Your .spec.ts survives (raw Playwright). On closed AI platforms, usually only a degraded export.	Every artifact. Markdown plan, JSON report, PNG screenshots, WebM video, 1,087 LOC you can fork.

Playwright under the hood

Assrt spawns @playwright/mcp as a child process. Every click, type, and snapshot is a real Playwright call executed on real Chromium. The agent is a client, not a replacement.

Markdown plans in git

Plans live at /tmp/assrt/scenario.md in plain Markdown. Commit them. Grep them. Edit them in vim. No YAML DSL, no proprietary recorder format, no cloud editor.

10 primitives, 1:1 with Playwright MCP

navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate. Defined in agent.ts lines 16 through 113.

Accessibility refs beat coordinates

The agent always snapshots first and uses [ref=eN] IDs from the a11y tree. No vision-model clicks at (x, y). Failures show the ref it hit, not a pixel guess.

1,087 lines of MIT TypeScript

The agent runtime is /Users/matthewdi/assrt-mcp/src/core/agent.ts, 1,087 lines under MIT. Fork it, read it, modify it. No obfuscated runner, no seat license.

End-to-end, in five moves

There is no install-time ritual. The steps below are the complete lifecycle of a run, from Markdown plan to disk artifacts.

What actually happens

You write Markdown, not selectors

Put your plan in tests/scenario.md. Use #Case N: headers and numbered English steps. Reference buttons by their visible label; the agent resolves them against the live accessibility tree on each run.

Assrt spawns Playwright MCP over stdio

On run start, src/core/browser.ts finds @playwright/mcp with require.resolve and spawns it as a child process. That is the real Chromium driver. Nothing else touches the browser.

The agent reads the plan and calls primitives

Claude Haiku 4.5 (default) or Gemini 3.1 Pro reads your Markdown, calls snapshot to get the a11y tree, and issues click/type_text/scroll against [ref=eN] IDs. The LLM never clicks at pixels.

Assertions return structured evidence

Each assert call records description, passed, and evidence. complete_scenario finalizes the pass/fail per #Case. suggest_improvement flags UX bugs the agent noticed in passing.

Artifacts land on your disk

scenario.md stays where you put it. The report writes to /tmp/assrt/results/latest.json. Screenshots go to /tmp/assrt/<runId>/screenshots/NN_stepN_action.png. Video writes to <runId>/video/recording.webm.

0Playwright MCP primitives the agent uses

0Lines of MIT-licensed TypeScript in the agent runtime

0Proprietary DSLs to learn

0Seat licenses required

Price compared

Momentic and QA Wolf land between $0 and $0per year depending on seat count and managed-service tier. Assrt is MIT licensed. The only cost is the model bill, which runs around 0¢per scenario with Claude Haiku 4.5 as the default. A year of daily runs for a 20-scenario suite still fits inside a monthly coffee budget.

The CLI is the interface

Every motion you would expect to own from a test tool is a shell command against files on your disk.

npx @assrt-ai/assrt run --plan-file tests/checkout.mdgrep '#Case' tests/*.mdgit add tests/checkout.mdcat /tmp/assrt/results/latest.json | jq '.cases'open /tmp/assrt/<runId>/video/recording.webmls /tmp/assrt/<runId>/screenshotstar cvzf history.tgz ~/.assrt/scenariosdiff tests/checkout.md main:tests/checkout.mdassrt run --url http://localhost:3000 --plan '...'node -e "require('@playwright/mcp/package.json')"

When each stack is the right answer

Assrt is not a religion. Hand-written Playwright is the right answer for hot paths and cross-browser regression (Firefox and WebKit coverage beyond what Playwright MCP hits by default). Closed AI QA vendors are the right answer if your team has more budget than engineering time and wants a managed service. Assrt is the right answer for everything in between: signup, auth, checkout, onboarding, and the broad surface where brittle selectors steal most of your maintenance hours.

You do not have to pick one. All three can live in the same CI pipeline. Assrt runs in Node, spawns a Chromium, and finishes. Your .spec.ts files run right next to it.

Want to see the third stack on your app?

Thirty minutes. Bring a real scenario; we will show you the Markdown plan, the run log, and the artifacts on disk.

Book a call →

Frequently asked questions

Is 'Playwright vs AI browser automation' the right frame?

For most teams, no. The real choice is between three stacks: (1) hand-written Playwright .spec.ts files, (2) a closed AI browser tool that uses vision and a proprietary YAML DSL, and (3) an LLM agent that drives Playwright itself through the Model Context Protocol. Option 3 gives you the determinism of option 1 (every action is a real Playwright primitive) and the ergonomics of option 2 (plain English scenarios, no selector plumbing). Assrt is option 3. The debate makes sense only if you ignore that option exists.

How does Assrt drive 'real Playwright' from natural language?

At startup, Assrt spawns @playwright/mcp as a child process over stdio. Code in /Users/matthewdi/assrt/src/core/browser.ts line 155 calls require.resolve('@playwright/mcp/package.json') to find the installed package, then runs it locally. The LLM agent (Claude Haiku 4.5 by default, or Gemini 3.1 Pro) sees a tool palette in agent.ts lines 16 through 195 with entries named navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, and evaluate. Those names are 1:1 with Playwright MCP's own tools. When the agent calls click, Assrt forwards the call to the Playwright MCP server, which runs the real Playwright click on a real Chromium page. No vision model guessing at pixels, no brittle xpath inventions.

Why is driving Playwright via MCP better than 'pure AI' browser automation?

Pure vision-based browser automation (agent sees pixels, agent clicks at coordinates) has two failure modes that break CI reliability. First, the agent can misread a lookalike element and click the wrong thing. Second, the trail it leaves is a sequence of (x, y) clicks that nobody can debug six months later. Driving real Playwright through MCP solves both. The agent calls snapshot to get the accessibility tree with [ref=eN] markers and uses those refs as selectors. When a step fails, the failure log shows 'clicked [ref=e14] labeled Submit, got no navigation' — the same kind of signal a human Playwright author would read. You get LLM flexibility on top of Playwright determinism.

Does Assrt generate .spec.ts files I can commit, like Playwright codegen?

No, and this is the intentional inversion. Playwright codegen produces TypeScript you then maintain by hand. When the UI shifts, those generated selectors break and you edit them. Assrt keeps the plan in /tmp/assrt/scenario.md as plain Markdown using #Case N: blocks. The agent interprets the Markdown against a fresh accessibility snapshot every run, so a button that moved from one container to another still works. What you commit to git is the intent (Markdown), not the fragile implementation (selectors). If you want actual Playwright code, you can write it by hand and use Assrt for the flows where the maintenance cost of that code is higher than the cost of a model call per run.

What is in /tmp/assrt/scenario.md and why does it matter for vendor lock-in?

The file is a Markdown document with one or more #Case N: scene headers, each followed by numbered steps in English. The runtime writes it from scenario-files.ts line 42 (writeScenarioFile), then starts an fs.watch on it at line 97. Edit the file in any editor and a 1-second debounce fires, which PATCHes the cloud copy. The file on your disk is the source of truth; the cloud is a UUID-keyed cache. Contrast that with closed AI QA vendors that keep the test DSL inside a proprietary cloud editor. With Assrt, git add tests/ is enough. If the company shut down tomorrow, you still have the plan file, the JSON run report, the PNG screenshots, and the WebM video.

How many Playwright primitives does the agent use, and do I lose any?

Out of the box, the agent knows 10 Playwright MCP primitives (navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate) plus 8 higher-level testing tools (assert, complete_scenario, create_temp_email, wait_for_verification_code, check_email_inbox, suggest_improvement, http_request, wait_for_stable). You can see the full TOOLS array in agent.ts lines 16 through 195. That covers the normal E2E surface. For niche Playwright features (tracing, video coverage beyond WebM, exotic network mocks), you would drop into a hand-written Playwright test and run it alongside Assrt. The two coexist; Assrt does not try to replace your existing Playwright suite, it replaces the part you keep rewriting because the UI keeps drifting.

How much does this cost compared to closed AI browser automation?

Assrt is MIT licensed. The npm package is @assrt-ai/assrt. There is no per-seat fee and no cloud subscription. You pay for model inference on the scenarios you run: with claude-haiku-4-5-20251001 as the default, that is typically a few cents per scenario, not dollars. Closed AI QA platforms in the same space (Momentic, QA Wolf, Rainforest) run from $8,000 to $90,000 per year. The cost difference is not rhetorical; it is the reason indie teams show up in the Reddit threads asking whether Playwright is the only option. Option 3 (LLM on top of real Playwright) exists and it is free.

Can I run Assrt alongside my existing Playwright tests in CI?

Yes. Assrt runs locally in Node and exposes three MCP tools (assrt_test, assrt_plan, assrt_diagnose) plus a CLI (assrt run --url ... --plan ...). In CI, spawn it as a subprocess in your existing Playwright job. The browser is a normal Chromium spawned by Playwright MCP, not a separate fleet, so you are not doubling browser infrastructure. A typical setup: hand-written Playwright for performance-critical flows and cross-browser regression (Firefox, WebKit); Assrt for signup, auth, checkout, and the broad surface where brittle selectors steal most of your maintenance time.

What happens when the AI disagrees with the accessibility tree?

The agent is told to always call snapshot before interacting, and to use the ref IDs from the snapshot (e.g. ref='e5') as the preferred selector. When a click fails, the agent re-snapshots and retries with a fresh ref. After three failed attempts the scenario is marked failed with the snapshot attached as evidence, and the run continues to the next #Case. This is the strategy written into the SYSTEM_PROMPT in agent.ts. It is the same failure mode Playwright authors already handle with expect(...).toBeVisible() retries, just expressed in natural language plus a Playwright MCP call.

Is 'self-hosted, no cloud dependency' true if Assrt syncs scenarios to Firestore?

The cloud sync is opt-in and cache-shaped, not source-of-truth-shaped. When you run via the CLI without a scenario ID, plans live at /tmp/assrt/scenario.md and ~/.assrt/scenarios/<uuid>.json on your machine. The Firestore sync exists so you can share a shareable URL with a teammate; it is a UUID-keyed read-through cache. scenario-files.ts line 94 explicitly skips sync for scenarios prefixed 'local-'. You can unplug your network, run the CLI, and every artifact still lands on disk. The model call (Anthropic or Gemini) is the only external dependency, and that is a commodity you choose.

More on how Assrt sits on top of real Playwright