The part of Playwright that most end-to-end guides skip

Playwright end-to-end testing without a single spec file. Through the official @playwright/mcp server, pinned at 0.0.70

Every other piece on this topic teaches you the same thing. Install @playwright/test. Learn getByRole. Wrap things in describe and test. Use expect. Run npx playwright codegen when you are stuck. There is a second, less-covered door into the same engine: @playwright/mcp, the official sibling package that exposes a running Playwright browser as a Model Context Protocol server. Assrt pins version 0.0.70 of that package, spawns it over stdio, and compiles plain-English scenarios into browser_click, browser_type, browser_snapshot, and browser_evaluate calls against it. Real Playwright underneath. No .spec.ts above.

M
Matthew Diakonov
11 min read
4.9from based on the actual assrt-mcp source
@playwright/mcp@0.0.70 pinned in freestyle.ts line 586 and installed globally in the disposable VM.
McpBrowserManager spawns the CLI over stdio with --output-mode file --caps devtools (browser.ts line 296).
Eleven agent tools map one-to-one onto browser_* MCP tools at browser.ts lines 560-670.

The piece most guides leave out: @playwright/mcp

If you open any current piece on Playwright end-to-end testing, the shape is the same. Install @playwright/test. Learn the locator ladder (getByRole, getByLabel, getByTestId). Write a spec file. Use expect(locator).toBeVisible(). Configure parallelism in playwright.config.ts. If you get stuck, npx playwright codegen will record a brittle first draft.

That stack works. It is also the whole public story, and the public story quietly omits a newer official Playwright package called @playwright/mcp. That package does not produce spec files. It boots a browser and exposes it as a Model Context Protocol server: tools like browser_click, browser_type, browser_snapshot, browser_evaluate, and a small set of others. An MCP client can drive the browser without ever touching a Playwright test harness.

The Assrt runner is an MCP client of that exact server. It pins @playwright/mcp@0.0.70, spawns it as a stdio child process, reads a plain-English scenario file, and translates the intent into browser_* tool calls. The actual browser doing the work is the same chromium binary any @playwright/test suite would launch. The difference is everything sitting on top of it.

Scenario in, browser_* out

scenario.md
agent tool call
accessibility ref
@playwright/mcp@0.0.70
browser_click
browser_type
browser_snapshot
browser_evaluate

The spawn, in the real file

Open assrt-mcp/src/core/browser.ts and jump to line 275. What follows is the call site that converts an Assrt run into a running Playwright browser controlled over MCP. It resolves @playwright/mcp/package.json through the Node require resolution, appends cli.js, and passes a short set of flags. The flags are what you would notice if you ever read someone else's Playwright config: viewport size, output mode, the devtools capability (which is what unlocks video recording on the MCP side).

assrt-mcp/src/core/browser.ts lines 275-320

Nothing in that block is unusual if you have used the MCP SDK before. It is unusual only because most Playwright end-to-end writing pretends @playwright/mcp does not exist. The stdio transport is the default MCP wiring. The only line that does anything interesting is the flag set on the arg array, and even there, every flag has a straightforward job: the viewport matches what the scenario expects, the output mode keeps the accessibility tree off the wire, the output directory is under the user's home so it survives the MCP restart.

What lives inside the runner

Six pieces make the whole mechanism work. None of them are decorative.

The stdio child process

browser.ts line 284 resolves @playwright/mcp/package.json, appends cli.js, and spawns it with StdioClientTransport from the MCP SDK. No sockets, no tokens, no cloud hop. The agent talks to the same process tree it launches.

Pinned to 0.0.70

freestyle.ts line 586 runs npm install -g @playwright/mcp@0.0.70 inside the disposable VM image. The same version is the runtime dependency for the local path, so cloud and local runs call identical tool schemas.

File-mode snapshots

CLI flags at browser.ts line 296: --output-mode file --output-dir ~/.assrt/playwright-output. Every browser_snapshot returns a .yml path instead of stuffing a giant tree into the MCP response body.

120k-char truncation

SNAPSHOT_MAX_CHARS at browser.ts line 523. Even after the file hop, oversized trees are trimmed at a line break so the agent's next turn stays inside the model's input budget. The ref you clicked is almost always in the first chunk.

Devtools capability

--caps devtools is appended to the spawn args. That turns on browser_start_video and browser_stop_video on the MCP server, which the runner uses to grab a webm of each scenario run without touching page.video().

Same engine as spec files

The MCP server's browser_click calls locator.click() on a real Playwright Page. Every auto-wait, every actionability check, every default timeout is the stock Playwright behavior you would get from @playwright/test.

The one-to-one map from agent tool to browser_* call

The agent knows eleven high-level tools (defined as an Anthropic tool schema at agent.ts line 16). Each one has a single counterpart on the Playwright MCP server. Reading this map is the fastest way to see what the runner actually does when it decides to click something.

assrt-mcp/src/core/browser.ts lines 560-670

The reason the bridge is this thin is that both sides are already doing the hard part. @playwright/mcp already translates MCP tool calls into Playwright API calls; the agent already knows how to map scenario text to tool names. McpBrowserManager just has to carry arguments across the boundary, resolve file references when --output-mode file is active, and truncate oversized snapshots. There is no clever glue inside it.

Following one end-to-end interaction down the stack

Take a single #Case step ("Click the Pay button") and trace it across the four actors: the scenario file, the agent, the Playwright MCP server, and the Chromium process itself. What you see is that the end-to-end test is still a plain Playwright flow. The layers on top only decide which locator.click() to call.

From scenario.md line to locator.click()

scenario.mdAssrt agent@playwright/mcpChromiumplain-English #Casebrowser_snapshot (accessibility tree)page.accessibility.snapshot()a11y tree.yml file ref + refsbrowser_click({ element, ref })locator.click() + auto-waitclick resolvedbrowser_type({ element, text, ref })locator.fill(text)browser_snapshot (verify post-action state)assert + complete_scenario

The full run at a terminal

The log below is what a single scenario looks like when the runner works end to end. Note the one-line mapping next to each agent tool (for example click(Pay button) -> browser_click) and the two browser_snapshot hops: one before interacting, one after, each producing a .yml file in the output directory.

npx assrt-mcp run
00.70@playwright/mcp version pinned
0Snapshot char budget per turn
0Agent tools mapped to browser_*
0Stdio child process per run

Same engine, two radically different top layers

Tab through both. The left tab is the kind of @playwright/test spec that the rest of the web shows you for this sort of checkout flow. The right tab is the scenario.md Assrt would use. The interesting thing is that the underlying browser calls are almost identical, because the MCP server calls the same locator methods @playwright/test would, but the top layer went from TypeScript you maintain to text you describe.

Plain @playwright/test spec vs Assrt scenario.md

// ordinary Playwright end-to-end spec
import { test, expect } from "@playwright/test";

test("user can complete checkout", async ({ page }) => {
  await page.goto("https://staging.myapp.com/checkout");

  await page
    .getByLabel("Card number")
    .fill("4242424242424242");
  await page.getByLabel("Expiry").fill("12/34");
  await page.getByLabel("CVC").fill("123");
  await page
    .getByRole("button", { name: /pay/i })
    .click();

  await expect(
    page.getByRole("heading", { name: /order complete/i })
  ).toBeVisible({ timeout: 15_000 });
});
64% fewer lines

The six things that actually happen on a run

Each of these is something you would do manually in a hand-written spec. Here they are done once, inside the runner, for every scenario.

1

You write one #Case

In /tmp/assrt/scenario.md, a few plain-English lines. "Navigate to /checkout. Fill the card fields with a Stripe test token. Assert the confirmation heading reads Order complete." No locator strings. No imports. No page object classes.

2

Runner spawns @playwright/mcp on stdio

browser.ts line 284 resolves the package, line 296 builds the arg list (--viewport-size 1600x900, --output-mode file, --caps devtools). StdioClientTransport brings the child process up. The agent now has a full MCP client talking to a real Playwright browser.

3

Agent asks for a snapshot first

Every interaction starts with browser_snapshot. The MCP server writes the accessibility tree to a .yml file in the output dir and returns the reference. McpBrowserManager resolves it, trims over-length trees at 120,000 characters (line 523), and hands the agent a stable set of [ref=eN] ids.

4

Each intent becomes a browser_* call

click → browser_click({ element, ref }). type_text → browser_type({ element, text, ref }). scroll → browser_scroll. evaluate → browser_evaluate. select_option → browser_select_option. The mapping lives at browser.ts lines 560-670 and is one function per tool.

5

Assertions run through assert, not expect()

The agent emits an assert tool call (agent.ts line 133) with a description, a boolean, and evidence. The evidence is usually a snippet of the accessibility tree or an evaluate return. Passing and failing assertions get logged into the scenario result the same way, which is what the final pass/fail is computed from.

6

Scenario completes, artifacts land

complete_scenario writes /tmp/assrt/results/latest.json with the full tool trace, screenshots, and the webm recording (browser.ts lines 625-645). The video player auto-opens in your browser unless you passed autoOpenPlayer: false. No CI integration required to view the run.

Side-by-side: @playwright/test vs the MCP path

Both rows run Chromium. Both have auto-waiting. Both can record video. The differences are in where the test lives, how selectors get discovered, and how a scenario survives a UI refactor. The right column is what happens when you delegate those three things to the MCP server and the agent instead of maintaining them in TypeScript yourself.

FeatureHand-written @playwright/testAssrt through @playwright/mcp
Where the test livesfoo.spec.ts with describe / test / expect blocksscenario.md with plain-English #Case blocks
Selectorspage.getByRole / getByTestId / getByLabel strings you maintainAccessibility refs from the live snapshot; agent rediscovers on each run
Engineplaywright chromium/firefox/webkitplaywright chromium/firefox/webkit (same engine, driven through MCP)
Scenario portabilityTypeScript, tied to your repo and its @playwright/test versionPlain text; any @playwright/mcp consumer can run it
Codegen storynpx playwright codegen, record once, maintain foreverWrite intent once; agent re-snapshots every run
Trace and videotrace.zip + video via playwright.config.tswebm via browser_start_video (browser.ts:625-645)
Parallelismworkers in playwright.config.tsMultiple scenario IDs, one MCP server per run
Vendor lock-inNone; plain PlaywrightNone; plain text + official Playwright MCP

There is no rivalry between the two. @playwright/test is still the right tool when you want a pinned spec file you commit, review in PRs, and run as a gate in CI. The MCP path is the right tool when the end-to-end flow changes faster than you can maintain selectors, or when the scenario description is the artifact you actually care about.

The uncopyable anchor

Open /Users/matthewdi/assrt-mcp/src/core/browser.ts. Jump to line 284. That line is const pkgDir = dirname(require_.resolve("@playwright/mcp/package.json")), the one-liner that resolves the official Playwright MCP package from the local node_modules. Line 296 is the arg array (--viewport-size 1600x900, --output-mode file, --caps devtools). Line 523 holds the constant SNAPSHOT_MAX_CHARS = 120_000, the ceiling on a single a11y tree before truncation. Lines 560 through 670 are the one-to-one bridge from agent tools to browser_* calls. And if you want the pinned version, open /Users/matthewdi/assrt/src/core/freestyle.ts line 586: npm install -g @playwright/mcp@0.0.70. That is the full mechanism, and no other piece on this topic names those lines because they do not ship the runner.

What you stop maintaining

Each chip below is a line item from a traditional @playwright/test end-to-end suite. When the scenario is plain text and the browser is driven through the Playwright MCP server, most of this infrastructure stops being load-bearing.

page.getByTestId strings scattered across specsplaywright.config.ts per-env matricesLocator drift after refactorsnpx playwright codegen rerecordingBrittle data-testid bookkeepingTrace.zip download, unzip, show-tracePage object class hierarchiesJest-style describe / test / expect importsVendor YAML dialectsProprietary record-and-playback dashboards7,500 dollar monthly SaaS seats

When a spec file still wins

Two cases. First, a hard CI gate where a specific sequence of steps absolutely must pass byte-for-byte on every commit. A pinned foo.spec.ts reviewed in a PR is easier to reason about than an agent rerunning the same intent through a live snapshot; the agent is the right tool for flows that evolve with the UI, not for contract-style gates that should not evolve at all. Second, cross-browser matrix runs where you want one execution on chromium, firefox, and webkit in lockstep. The MCP server runs one browser at a time by default, so a traditional projects config in playwright.config.ts is still the cleaner way to stamp out that matrix.

For the flows that actually change week to week (checkout, signup, OTP, magic-link, onboarding, payment redirect), the MCP path is the one that holds up. The scenario is a plain description, the selectors regenerate every run, and the engine underneath is still the exact Playwright binary you would have been running anyway.

Want this run against your own checkout or signup flow?

Twenty minutes on a screenshare. A #Case written against your real app. A webm recording and a latest.json you can carry away.

Book a call

Questions this topic usually raises

What does Assrt actually do differently from writing @playwright/test spec files?

Assrt does not generate spec files at all. Under the hood it spawns @playwright/mcp, which is the official Playwright sibling package that turns a running browser into an MCP server. The Assrt agent then calls MCP tools like browser_click, browser_type, browser_snapshot, and browser_evaluate against that server to drive the same Playwright engine that @playwright/test would. The input is a plain-English scenario in /tmp/assrt/scenario.md. The output is screenshots, a webm recording, and a results.json. There is no intermediate .spec.ts artifact to maintain. The full spawn is in assrt-mcp/src/core/browser.ts at line 284, where require_.resolve('@playwright/mcp/package.json') pulls the CLI path and a child process is launched with --viewport-size 1600x900 --output-mode file --caps devtools.

Which version of @playwright/mcp is pinned?

0.0.70. It is hard-coded in the disposable-VM setup at assrt/src/core/freestyle.ts line 586: npm install -g @playwright/mcp@0.0.70 ws @agentclientprotocol/claude-agent-acp@0.25.0. The same version resolves when the runner is used locally because the package.json dependency matches. Pinning matters because @playwright/mcp is a young package and the MCP tool schemas (browser_click, browser_type, and so on) are still evolving between patch releases. If you want to reproduce a run months from now, the pinned version is what the agent was originally trained to call.

How does a plain-English line turn into a Playwright call?

The agent holds a fixed set of Anthropic tool definitions at agent.ts line 16 onward: navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, assert. When it decides to click something, it emits a tool_use with the high-level name. McpBrowserManager in browser.ts lines 560-670 catches that call and forwards it to the Playwright MCP server. click becomes browser_click({ element, ref }), type_text becomes browser_type({ element, text, ref }), evaluate becomes browser_evaluate({ function: expression }). The ref comes from the accessibility snapshot the agent reads before interacting, so the call is tree-grounded, not coordinate-based.

Where are the page snapshots actually stored?

The CLI launches the MCP server with --output-mode file --output-dir ~/.assrt/playwright-output (assrt-mcp/src/core/browser.ts line 296). Every browser_snapshot returns a reference to a .yml file in that directory. The manager reads the file on the next tool hop and truncates anything over 120,000 characters (browser.ts line 523) before passing it to the model. That matters for end-to-end runs against large apps: a raw accessibility tree for a real site can blow past a context window, so the file-mode output plus one truncation pass is what keeps the run stable across dozens of interactions.

Is this actually the same Playwright engine, or a re-implementation?

Same engine. @playwright/mcp is an official Playwright project; it imports chromium, firefox, and webkit from playwright and drives them through the regular Playwright API. The MCP layer only translates tool calls into Playwright API calls. When Assrt's agent calls browser_click, inside the MCP server that becomes a locator.click() on a real Playwright Page, same as an ordinary test.spec.ts. The auto-waiting, retrying, tracing, and screenshotting behaviors are all stock Playwright behavior; nothing gets re-implemented at the MCP level.

What about tracing and video recording, the things most end-to-end guides spend pages on?

Video is enabled via the devtools capability at browser.ts line 296 (--caps devtools) and controlled per-run through browser_start_video and browser_stop_video calls (browser.ts lines 625-645). The file is saved as a webm alongside the results.json. Tracing is not exposed as a separate tool yet; for failing runs, the agent's own tool-call transcript plus the screenshots serves the same diagnostic role. If you need a Playwright trace.zip for a specific scenario, you can point the MCP server at --output-dir of your choice and use the standard playwright show-trace viewer on whatever it writes.

How does this compare to Playwright codegen?

Playwright codegen is a one-shot UI recorder: you click through your app and it emits a test.spec.ts that you then maintain. The Assrt flow is the opposite direction. You describe the flow in a scenario.md, and the agent drives a real browser through @playwright/mcp using the same underlying Playwright engine codegen would. There is no generated .spec.ts file to edit; the scenario is the canonical artifact. If an element moves, the agent re-snapshots, sees the new tree, and calls browser_click against the new ref. Codegen, by contrast, would emit a brittle selector that now fails until you re-record.

Does it run locally or does it need a cloud?

Local by default. npx assrt-mcp spins up @playwright/mcp on stdio on your own machine. There is also a cloud mode (assrt/src/core/freestyle.ts sets up a disposable VM with chromium, Xvfb, VNC, websockify, and the MCP server), but the cloud path is optional: every scenario that works in the cloud also works with the local runner because it is the same @playwright/mcp package behind both. No vendor-specific DSL, no lock-in; the scenarios are portable text.

Is this open source?

Yes. assrt-mcp is MIT-licensed at github.com/m13v/assrt-mcp and @playwright/mcp itself is the official Microsoft-maintained Playwright package. Competing agent-testing platforms that ship proprietary YAML and cloud-only execution charge up to 7,500 dollars per month; the Assrt stack costs your own LLM token usage plus zero infrastructure when you run it locally. The scenario.md files are plain text you can grep, diff, and carry to any other Playwright MCP consumer.

What files should I open first if I want to read the actual mechanism?

Three, in this order. assrt-mcp/src/core/browser.ts lines 275-375 is where the Playwright MCP process is spawned over stdio and wired to the agent. browser.ts lines 560-670 show the one-to-one mapping from agent tools (click, type_text, snapshot, evaluate) to MCP tools (browser_click, browser_type, browser_snapshot, browser_evaluate). assrt/src/core/freestyle.ts line 586 pins the exact @playwright/mcp version for reproducible runs. Reading those three spots takes about fifteen minutes and gives you the full path from scenario text to Playwright engine.