The in-between path

Playwright MCP to a committed regression suite, the path most teams miss

Playwright MCP is great for one-off exploration inside an agent. Every article tells you to hand-port that exploration into a .spec.ts file for CI. There is a third option, and it is the one that actually scales: commit the natural-language plan, re-run it via the same MCP loop, and treat the assertions, not the keystrokes, as the contract.

Matthew Diakonov, Written with AI

Published May 22, 20269 min read

Direct answer (verified 2026-05-22)

You commit Playwright MCP scenarios to CI by saving the plan, not the keystrokes. Assrt writes every run to /tmp/assrt/scenario.md as a plain-text plan in #Case N: format, plus a sibling scenario.json with a stable UUID. Copy the .md file into your repo, commit it, and re-run it in CI with one line:

npx assrt run \
  --plan-file tests/assrt/checkout.md \
  --url $STAGING_URL \
  --json > results.json

The underlying tool is the official @playwright/mcp package. The plan is yours. There is no proprietary format and no cloud dependency for the run.

The standard advice has a hole in it

If you read the published thinking on this topic you will get the same three-step recipe everywhere. Use Playwright MCP for exploration. Watch what the agent does. Sit down and write a deterministic tests/checkout.spec.ts that does the same flow but with hard-coded selectors. Commit the .spec.ts, throw away the exploration.

The recipe is correct for one layer of testing. It is wrong for the layer most teams actually struggle with. The .spec.ts that you finish writing on a Tuesday afternoon will start failing on Wednesday morning when someone renames a button. The agent that watched the original flow would have shrugged and re-found the button by its visible text. The hand-written test breaks because you locked the path in concrete.

So you end up in a bad equilibrium. The flows that change often (signup, onboarding, checkout, the parts with three providers and two A/B tests on top) are exactly the ones where a brittle .spec.ts file is highest maintenance, and exactly the ones where the maintenance-cost gap between hand-written code and AI-driven execution is widest. Most teams resolve that by not having tests there at all. Then they ship a bug, get burned, write a test, the test starts failing for the wrong reasons, somebody marks it .skip(), the cycle repeats.

The hole in the advice is that it treats "exploration" and "regression" as two different things that need two different artifacts. They do not. They can share an artifact, if that artifact is the plan, not the path.

The artifact that crosses the gap

What Assrt commits, instead of generated Playwright code, is the plan that drove the agent. The plan is plain-text Markdown. The format is documented inside the MCP server itself, so you can read it in the open source repo. Here is the actual shape:

#Case 1: Checkout with valid card
- Navigate to /pricing
- Click the "Get Pro" button
- Type "test+pw@acme.com" into the email field
- Type "4242 4242 4242 4242" into the card field
- Click "Subscribe"
- Verify the page shows "Welcome to Pro"
- Verify the URL contains /app/dashboard

#Case 2: Checkout with declined card
- Navigate to /pricing
- Click the "Get Pro" button
- Type "test+decline@acme.com" into the email field
- Type "4000 0000 0000 0002" into the card field
- Click "Subscribe"
- Verify an error message containing "declined" is visible
- Confirm the URL still ends in /checkout

That file is small, diff-friendly, reviewable in a pull request, and survives a redesign. The bullets that start with Verify and Confirm are not narration. They are the contract. The system prompt for the agent (in src/core/agent.ts) says it bluntly: every line that starts with Verify, Check, Assert, Confirm, or Ensure is a MANDATORY assertion, exactly one assert tool call per such line, no merging, no skipping, no extras.

The result is a file where the assertions are the deterministic contract and the steps above them are intent the agent is allowed to satisfy in its own way. The signup page can be redesigned. The email field can move from the top of the page to a modal. The contract is still "Verify the page shows Welcome to Pro", and the agent will still go and verify it.

What Assrt writes to disk, and where

This is the part that lets the plan cross from exploration to a committed suite. Every time the MCP tool assrt_test loads or creates a scenario, the server writes it to a fixed path tree. The source for this is in src/core/scenario-files.ts:

On-disk layout

/tmp/assrt/
  scenario.md            <- the #Case plan, editable
  scenario.json          <- { id, name, url, updatedAt }
  results/
    latest.json          <- most recent run results
    <runId>.json         <- per-run results, kept by runId

The interesting part is what happens after the write. The same file is then put under fs.watch with a 1000 ms debounce. If you open scenario.md in an editor and change a line, Assrt sees it, waits one second for the debounce to settle, and calls updateScenario(scenarioId, { plan, name, url }) in scenario-store.ts. The Firestore row updates. Next time any agent (yours, your CI, your teammate's) calls assrt_test({ url, scenarioId }) with that UUID, it gets the edited plan, not the original. The scenario is, in effect, a small versioned document with the AI agent as a stable interpreter.

To commit it, copy /tmp/assrt/scenario.md into your repo. That file is the regression test. In CI it is read back by readFileSync(args.planFile, "utf-8") on line 478 of src/cli.ts, passed straight into the same TestAgent.run(url, plan) method the MCP tool uses, and run against your staging URL. There is no second code path. The thing your agent did in exploration is the same thing CI does on every commit.

From an exploratory MCP run to a committed CI run

The honest comparison

A committed plan is not a free win over hand-written .spec.ts files. It has a different cost shape, and which one you pick should depend on the layer you are testing. Here is the actual trade spread out:

Feature	Hand-written Playwright spec (.spec.ts)	Committed Assrt scenario (.md)
Cost per CI run	Sub-cent, browser time only	Cents to dimes of model inference per case
Latency per case	Hundreds of milliseconds to a few seconds	Tens of seconds (LLM step + browser action)
Survives a redesign	Breaks until selectors are rewritten	Usually, agent re-finds elements by visible text
Survives a copy change	Whole locator may need updating	Plan needs a one-word edit
Determinism guarantees	Both deterministic if selectors are stable	Assertions are deterministic, action path is not
Best layer of the pyramid	Short, frequent smokes (login, healthcheck, ping)	Long flows with UI churn (signup, checkout, settings)
Reviewable in a PR	Code, also diff-friendly, more verbose	Plain text, English bullets, diff-friendly
Vendor lock-in	None, Playwright is open source	None, plan format is a plain .md file

The two approaches are complementary, not competing. Most teams that adopt Assrt keep their existing .spec.ts smokes and only move the brittle, high-churn flows over.

Before: an exploration that evaporates. After: an exploration that compounds.

The thing you actually want when you give an agent a Playwright MCP loop is for the work to compound. A debugging session at 11pm should become a regression check by next morning, not vanish with the conversation buffer.

What survives a Playwright MCP session?

Agent runs through the flow once, makes assertions out loud in chat, returns. When the conversation ends the browser context dies. Nothing is on disk. If you want the same check next week, you ask again and hope the agent makes the same choices.

No scenario file, no UUID, no replay
Cookies/session vanish on session close
Cannot be added to CI as-is
Other engineers cannot review what was tested

One counter-argument worth taking seriously

There is a real objection to all of this, and it shows up in the one Medium post that takes the position properly: AI-driven runs are risky for regression because two runs of the same English instruction can produce different low-level actions, and if your tests are nominally green but for slightly different reasons each time, you have lost the regression signal you wanted in the first place.

The honest reply is that this argument fails the moment you read the assertions as the contract, not the actions. A standard Playwright spec, run twice, also produces different low-level activity: different millisecond timings, different network jitter, different element appearance order. We do not call that non-deterministic because we have agreed that expect(page.getByText("Welcome to Pro")).toBeVisible() is the thing that has to be true. The committed-plan approach extends that contract one level up: the action path is allowed to vary, the assertion outcomes are not.

Where the objection bites is when you write assertions that are really implementation checks in disguise ("Verify the URL contains a query string starting with ?utm="), or when you rely on the agent to discover the assertion list rather than authoring it. The fix for both is editorial: keep the Verify lines tight, write them yourself before you commit, treat them like the test contract they are. The agent is good at finding paths through a UI; it is not the right author of what the UI is supposed to do.

The shape of the suite you end up with

A real production setup that takes this approach is not 100% .md scenarios. The honest layout looks like this:

tests/
  unit/                  # vitest, jest, fast
  e2e/
    smoke.spec.ts        # hand-written, milliseconds, on every commit
    health.spec.ts       # hand-written, deterministic
  assrt/
    signup.md            # AI-driven, runs on every PR
    checkout.md          # AI-driven, runs on every PR
    settings.md          # AI-driven, runs nightly
    onboarding.md        # AI-driven, runs nightly

The .spec.ts files are layer two. They cover the things you cannot afford to be slow on (login is broken, the homepage 500s, healthcheck returns a non-200). They have brittle selectors that are cheap to maintain because the underlying UI does not change often.

The .md scenarios are layer three. They cover the things that change a lot and are expensive to keep in deterministic code. Signup with three providers and a phone fallback. Checkout with two payment gateways. The settings panel that gets reskinned every quarter. These are the cases where the agent re-finding the right button by its visible text saves you the half-day of selector archaeology after every redesign. The cost per run is higher, the maintenance per redesign is dramatically lower, and on the layer where you actually had no coverage before, this is the trade you want.

Want help wiring the MCP-to-CI loop on a real codebase?

A 20 minute call to look at your current Playwright setup, your CI pipeline, and the flows that keep eating QA time. No pitch.

Frequently asked questions

Why are raw Playwright MCP runs not a regression suite?

Three reasons people repeat, and one nobody mentions. The repeated three: non-determinism (the LLM can pick a different click sequence on the same English sentence), latency (each step is seconds, not milliseconds), and token cost (every CI run charges per scenario). The fourth one is that Playwright MCP sessions are ephemeral. The server is started fresh, the browser context is wiped at exit, and nothing about that conversation lives in your repo. There is no artifact to commit, no file to diff in a pull request, no scenarioId to re-run a week later. That is the part most articles skip, and it is the part that actually keeps teams from going from one good exploration to a suite.

What does Assrt actually persist when an agent does a Playwright MCP run?

Two files on disk and one row in Firestore. The two local files: /tmp/assrt/scenario.md (the plan text in #Case format) and /tmp/assrt/scenario.json (metadata with id, name, url, updatedAt). The Firestore row: the same plan keyed by a stable UUID, so `assrt_test({ url, scenarioId: "<uuid>" })` resolves to the same exact scenario next week. The MCP server starts a fs.watch on scenario.md with a 1000 ms debounce. If you edit the file, Assrt sees the change, calls updateScenario(scenarioId, { plan, name, url }) in src/core/scenario-store.ts, and the cloud row updates. Local-only scenarios (id prefix "local-") skip the watch because they cannot sync.

How do I commit one of these scenarios to my repo?

Copy /tmp/assrt/scenario.md into your repo, somewhere like tests/assrt/checkout.md. Commit it. From then on the file is the source of truth. In CI you re-run it with `npx assrt run --plan-file tests/assrt/checkout.md --url $STAGING_URL --json > results.json`. The --plan-file flag is read at src/cli.ts line 478 (`plan = readFileSync(args.planFile, "utf-8").trim()`); the file content is passed straight through to the same TestAgent.run() that powers the MCP tool. There is no separate "CI mode" that picks a different code path. You are just running the agent loop with a plan that happens to live in version control.

Won't the agent still be non-deterministic in CI?

The action sequence is non-deterministic. The assertions are not. The plan format makes a strict distinction: every bullet line that starts with "Verify", "Check", "Assert", "Confirm", or "Ensure" is a MANDATORY assertion, and the agent prompt in src/core/agent.ts forces exactly one assert tool call per such bullet. So the agent is free to find a different way to reach the page (typing in a search box, clicking a different link), but it cannot finish the case without producing the same set of pass/fail assertions you wrote down. That is the inversion the standard advice misses: you do not need deterministic actions, you need deterministic checks.

What about cost and latency in CI?

Honest answer: this is worse than a pure Playwright .spec.ts file. Each scenario in Assrt costs a few cents to a few dimes of model inference and runs in tens of seconds, not milliseconds. The trade is real and worth being explicit about. Where this approach wins is the part of the suite that breaks every other week because the UI changed: signup, onboarding, the checkout that has six providers, the dashboard whose CSS class names rotate. There, the maintenance cost of hand-written .spec.ts dominates the cost-per-run gap. Where it loses is high-volume smoke (you want milliseconds, not seconds), so most teams will end up running both.

Can I migrate to plain Playwright code later if I change my mind?

Yes. The plan file is plain text. It is your file. There is no proprietary YAML, no opaque DSL, no SaaS-only format that becomes a hostage. The underlying tool is the official @playwright/mcp package (the dep is pinned in assrt-mcp/package.json), and the agent uses the standard Playwright MCP verbs (navigate, snapshot, click, type_text, press_key, etc., defined in src/core/agent.ts). If you decide to move the entire suite into a hand-written tests/*.spec.ts shape, your scenario.md files are an honest scaffold for that work, not a dead end. Zero vendor lock-in is a literal property of the file format.

Where does this fit relative to a real Playwright .spec.ts file?

Think in layers. Layer one is unit and contract tests in code, run on every commit. Layer two is the deterministic UI smokes you hand-write in tests/*.spec.ts, the ones that have to run in a few seconds. Layer three is the long, flow-shaped things that change a lot: signup, checkout, settings, the things that today most teams either skip or pay a QA contractor to update. Assrt's committed scenarios are layer three. They live in the repo, they re-run in CI, and the assertions are the contract; the path the agent takes to get there is allowed to wiggle. Most articles compress these layers into one, which is why the standard advice ("port everything to .spec.ts") sounds reasonable but doesn't survive contact with the projects that actually need it.

Does this work with the local Chrome the developer is already logged into?

Yes. Pass extension: true to assrt_test (or --extension on the CLI) and Assrt connects to your existing Chrome via CDP using Playwright's --extension mode. The cookies, the saved session, the OTP that arrived in your dev inbox, all reachable in the run. On first use you paste a one-time extension token; it is then saved to your keychain. The same scenario.md works in both modes; the only thing that changes is whether the run launches its own Chromium or attaches to the one you are already using.