Playwright Guide

Playwright Automation Testing Without Writing Selectors

Most Playwright guides teach you to write page.locator('.btn-primary') and then maintain that selector forever. There is a different path. The Playwright team now ships an MCP server (@playwright/mcp) that exposes the browser as an accessibility tree with stable element refs. An agent on top of that server can run a test described in one paragraph of English, pick refs out of the snapshot, and never see a CSS selector. This is how Assrt works, and this guide walks through the moving parts so you can either build the same loop yourself or use Assrt directly.

By the Assrt team|April 12, 2026|9 min read

ref=e5

“ALWAYS call snapshot FIRST to get the accessibility tree with element refs. Use the ref IDs (e.g. ref='e5') when clicking or typing. This is faster and more reliable than text matching.”

Assrt agent system prompt, src/core/agent.ts:207

1. The Two Shapes of Playwright Automation Testing

When people search "playwright automation testing" they usually want one of two things. The first is a tutorial for writing test('login works', async ({ page }) => {...}) in TypeScript. That path is well documented; the official Playwright docs cover it, BrowserStack and LambdaTest republish variations, and every framework comparison post ranks Playwright against Selenium and Cypress for it.

The second shape is the one most posts skip: driving Playwright as a tool that something else uses, not as a script you author. Since the Playwright team shipped @playwright/mcp, a Model Context Protocol server that exposes browser actions as MCP tools, you can point an LLM agent at a URL with a paragraph of intent and let the agent do the clicking. No page.locator, no await expect(...).toBeVisible(). This guide is about that second shape.

2. What @playwright/mcp Actually Gives You

@playwright/mcpis an npm package that runs a Playwright-controlled browser and speaks MCP over stdio or SSE. Its tools are not "run this script". They are primitives: navigate, snapshot, click, type, evaluate, screenshot, wait_for_stable.

The interesting tool is snapshot. It returns the page as an accessibility tree, where each interactive node has a role, a human-readable name, and a short ref like ref=e5. The ref is not a DOM attribute. It does not appear in your HTML. It is an internal handle the MCP server keeps for the duration of the snapshot, mapped to a Playwright locator under the hood. To click, you pass the ref back. The server resolves it to the right element using the accessibility-derived locator, not a CSS string.

That single design choice is what makes this style of testing viable. You never write a selector. You read the snapshot, find the element by what it is for the user ("button: Sign in"), and reference it by an opaque token. When the page redesigns, the ref values change, but the way you find the button does not.

3. The Snapshot-Act-Snapshot Agent Loop

Here is the rule the Assrt agent operates under, copied verbatim from its system prompt at src/core/agent.ts:207:

## CRITICAL Rules
- ALWAYS call snapshot FIRST to get the accessibility tree with element refs
- Use the ref IDs from snapshots (e.g. ref="e5") when clicking or typing.
  This is faster and more reliable than text matching.
- After each action, call snapshot again to see the updated page state
- Make assertions to verify expected behavior (use the assert tool)
- Call complete_scenario when done

Every step is two MCP calls: snapshot, then act. After the act, the agent re-snapshots because the act may have triggered navigation, opened a modal, or revealed an inline error. Refs from the previous snapshot are now stale; using them would fail. The re-snapshot is not optional.

Error recovery follows from the same loop. From the same file at line 220: "When an action fails: 1. Call snapshot to see what is currently on the page. 2. The page may have changed (modal appeared, navigation happened). 3. Try using a different ref or approach. 4. If stuck after 3 attempts, scroll and retry." There is no separate self-healing system. The retry is the self-healing.

Try this loop on your own app in 60 seconds

npx @m13v/assrt test https://your-app.com — Assrt spawns a local @playwright/mcp browser, generates cases, and runs them with the snapshot/act loop described above. No selectors to write.

4. Tests as Markdown, Not TypeScript

Because the agent resolves refs at runtime, the test itself does not need to encode them. The Assrt scenario format is plain markdown saved to /tmp/assrt/scenario.md:

#Case 1: Email signup happy path
Click Sign up. Fill the email field with a fresh disposable
email. Submit. Wait for the verification screen and confirm
"check your email" appears.

#Case 2: Wrong password shows inline error
Click Sign in. Enter test@example.com and the password "nope".
Submit. Verify the form shows an error containing "incorrect"
and that we are still on the sign-in page.

Edit this file in any editor, save, and the next assrt_test run uses your edits. The scenario carries a UUID stored in /tmp/assrt/scenario.json; re-run later with assrt_test({ url, scenarioId: '<uuid>' }) and the same cases run against any URL. Results, screenshots, and the assertion trace land in /tmp/assrt/results/latest.json.

What you do not get is a TypeScript file in your repo. Playwright purists will dislike this. The trade is real: you give up static checks on locators and IDE autocomplete on page. in exchange for never editing a locator after a redesign.

5. Local stdio vs. Remote VM, Same Agent

One agent, two transports. Looking at src/core/browser.ts:

launchLocal() resolves @playwright/mcp/cli.js from node_modules, spawns it under process.execPath with --headless --viewport-size 1600x900 and an isolated user-data-dir at /tmp/assrt-browser-<ts>. The MCP client connects over stdio. This is what runs when you use the CLI on your laptop or in CI.
launch() creates an ephemeral Freestyle VM with Chromium and @playwright/mcp pre-baked, then connects to https://<vmId>.vm.freestyle.sh/sse over SSE. Same MCP protocol, different transport, same scenario file.
launchExisting(sseUrl) attaches to a Playwright MCP server you already provisioned (useful when something else owns the VM lifecycle).

Video recording is wired through Playwright's normal recordVideo context option: launchLocal(videoDir) writes a temp Playwright config with contextOptions.recordVideo pointing at your dir, then passes --config to the MCP CLI. You get a full WebM per scenario, no extra plumbing.

6. Failure Modes This Approach Actually Has

Not every problem disappears when you delete the selectors. Three real ones:

OTP and split-character inputs

Six-digit verification codes that render as six input[maxlength="1"] fields break the snapshot/click loop because each cell has the same role and similar name. The Assrt agent special-cases this with a hardcoded evaluate call that fires a paste event into the parent container; see src/core/agent.ts:234. Pure ref-based interaction would type the full code into the first cell only.

Canvas, video, and WebGL surfaces

The accessibility tree does not contain the contents of a canvas. If your test needs to click a specific spot inside a chart or game canvas, ref-based targeting will not help. You fall back to coordinate clicks or scripted DOM interactions, the same way you would in hand-written Playwright.

LLM cost and latency

Every step calls a model. A ten-action scenario is roughly ten LLM round trips plus the snapshots they consume as context. This is more expensive per run than executing a compiled Playwright script. The break-even is maintenance: if you would otherwise spend a developer hour per week patching locators, the LLM bill is small.

7. When to Use This and When to Write Playwright by Hand

Use the agent loop when:

The app is changing weekly and locator maintenance is the bottleneck.
You want non-engineers to author and edit cases (markdown is a lower bar than TypeScript).
You are testing a flow against multiple deploys (preview URL, staging, prod) and want one scenario re-run by UUID against each.
You are an AI agent (Claude, Cursor) that needs to verify its own UI changes end-to-end without writing tests first.

Stick with hand-written Playwright when:

You need millisecond-precise timing assertions or deterministic millions-of-runs load testing.
The test logic involves heavy data setup, fixtures, or parallel sharding controlled per worker.
Your CI budget rules out per-step LLM calls.
Your app is behind a strict outbound network policy that cannot reach a model API.

The two approaches compose. Many teams keep their highest-stakes flows as hand-written Playwright in CI and let an agent loop cover the long tail of regression cases that would otherwise rot.

Frequently Asked Questions

Does Assrt generate Playwright code I can check into a repo?

No, and that is the point. Assrt drives the official @playwright/mcp server at runtime using an LLM agent loop. Your test artifact is a plain markdown file (`/tmp/assrt/scenario.md`) with `#Case N: name` blocks plus a saved scenario UUID. You re-run by UUID against any URL; the agent re-derives clicks from a fresh accessibility snapshot every time.

What is a Playwright MCP ref and why use it instead of a CSS selector?

Playwright MCP exposes the page as an accessibility tree where every interactive node has a stable ref like `ref=e5`. The Assrt agent calls `snapshot` first, finds the element by role and name in the tree, then passes the ref to click or type. Refs survive class-name churn, build-tool hashing, and most layout refactors because they are derived from semantics, not the DOM string.

What happens when a ref goes stale mid-scenario?

The agent's system prompt explicitly handles this: when an action fails, it calls `snapshot` again to get fresh refs and retries. After three failed attempts on a step, it scrolls and retries; if still stuck, it marks the case failed with screenshot evidence. There is no separate self-healing layer; the snapshot loop is the self-healing.

How does this run locally vs. in CI?

`launchLocal()` in `src/core/browser.ts` spawns `@playwright/mcp/cli.js` over stdio with a per-run isolated user-data-dir (`/tmp/assrt-browser-<ts>`) at 1600x900 headless. `launch()` provisions an ephemeral Freestyle VM with Chromium and @playwright/mcp baked in and connects over SSE. Same agent, same scenarios, no rewrite for CI.

Can I edit the generated tests by hand?

Yes. Open `/tmp/assrt/scenario.md`, add or rewrite a `#Case` block in plain English, save. The file auto-syncs to cloud storage keyed by the scenario UUID, so the next `assrt_test` run picks up your edits. There is no compile step and no DSL.

How does this compare to writing Playwright tests with `getByRole`?

`getByRole` is the right primitive but you still author each call in TypeScript and maintain it through redesigns. Assrt collapses authoring and maintenance into one step: describe the case in markdown, the agent resolves roles and refs from the live a11y tree on every run. You give up static type-checking of locators in exchange for never updating them by hand.

Where do I see what actually happened during a run?

Results land in `/tmp/assrt/results/latest.json` with per-case pass/fail, screenshots, and the assertion trace. Each run also records a video via Playwright's `recordVideo` context option (configured at `browser.ts:194-211`) and, by default, opens a player in the browser when the run finishes.

Run Playwright Without Writing Selectors

Assrt drives @playwright/mcp with an LLM agent that targets accessibility refs. Tests live as plain markdown you can re-run by UUID. Open-source, self-hosted, no vendor lock-in.

View on GitHub