Honest answer to a Twitter question

Reading the generated Playwright code, would you actually ship it to main?

The honest answer is shorter than you expect. With Assrt there is no .spec.ts to ship. The diff that lands on main is one dependency line in package.json and a Markdown scenario file. Below is the literal diff, with file paths from the open-source repo so you can verify it yourself.

Matthew Diakonov, Written with AI

Published May 8, 20266 min read

Direct answer (verified 2026-05-08)

Yes, you would ship it, because the part people fear shipping (a 200-line .spec.ts with brittle locators) is not in the diff. What lands on main is one new line in package.json (Microsoft's official @playwright/mcp pinned at 0.0.70) plus a scenario.md file with your test plan in English. Verifiable in the public source at github.com/assrt-ai/assrt-mcp.

The diff, two ways

Tap the tabs to flip between what most AI test generators commit and what Assrt commits. The line counts are the headline; the absence of selectors is the substance.

What lands in your repo

// what most AI test generators put in your repo
// 200+ lines of .spec.ts that lock in selectors at write time

import { test, expect } from "@playwright/test";
import { LoginPage } from "./page-objects/login";
import { DashboardPage } from "./page-objects/dashboard";

test.describe("Signup flow", () => {
  test.beforeEach(async ({ page }) => {
    await page.goto("/");
  });

  test("user can sign up with email", async ({ page }) => {
    await page.locator("button.signup-cta").click();
    await page.locator('input[name="email"]').fill("test@example.com");
    await page.locator('button[type="submit"]').click();
    await page.waitForSelector(".otp-input-1");
    // ... 40 more lines of selector pinning ...
    await expect(page.locator("h1.dashboard-title")).toBeVisible();
  });

  // ... 6 more tests, ~30 lines each ...
});

38% fewer lines, zero selectors

What you commit, file by file

Four artifacts a reviewer touches. Three of them you already have in every Node project. The fourth is the only thing Assrt asks you to write.

1
package.json
One new dependency line. Microsoft's official @playwright/mcp pinned at 0.0.70.
2
tests/scenario.md
Plain Markdown. #Case blocks. Steps in English. No imports, no fixtures, no locators.
3
.github/workflows/test.yml
One step that runs `npx @m13v/assrt run --url $PREVIEW_URL`. CI uploads videos on failure.
4
(nothing else)
No .spec.ts files. No page-objects/. No playwright.config.ts you did not already have.

File 1: the one new dependency line

This is the entire Playwright surface in your repo. One line in package.json. The package is published by Microsoft's Playwright organization on npm. The patch-level pin lives in the runtime image at src/core/freestyle.ts line 586 (npm install -g @playwright/mcp@0.0.70).

{
  "dependencies": {
    "@playwright/mcp": "^0.0.70"
  }
}

The package is @playwright/mcp, published by the Microsoft Playwright org. The Assrt repo at github.com/assrt-ai/assrt-mcp consumes it directly.

File 2: the scenario.md plan

This is the part you write. The whole test suite for a small SaaS fits on a phone screen. The path on disk is /tmp/assrt/scenario.md during runs (defined at src/core/scenario-files.ts line 17). Most teams copy the file into tests/scenario.md and commit that path so it lives next to the rest of the codebase.

#Case 1: Sign up with email
Use a disposable email
Enter the OTP from the inbox
Verify the dashboard heading appears

#Case 2: Subscribe to the paid plan
Click upgrade
Pay with the Stripe test card
Verify the plan badge says Pro

#Case 3: Cancel and confirm refund posted
Open billing
Cancel the subscription
Verify the refund email arrived in the disposable inbox

Three cases. Twelve content lines. Reviewable in under a minute.

What a reviewer used to see, and what they see now

The mental model shift is bigger than the line-count shift. Tap to flip:

A pull request that touches the test suite

The PR adds tests/signup.spec.ts (160 lines) and tests/page-objects/login.ts (80 lines). The reviewer skims, sees plausible-looking calls to page.locator(...).click() and waitForSelector(...), and approves on the assumption the test passes locally. Six weeks later half the selectors are stale, the test silently passes by clicking a different button, and the bug ships.

240 lines of TypeScript to read
Selectors pinned at write time
Approval is on faith; nobody re-reads the file
Decay starts on day one

The five questions a reviewer actually asks

The Twitter version of this question has a five-part subtext. Here is each part with a one-paragraph answer.

1. Will this test invent a Playwright API that does not exist?

No. The agent does not emit Playwright code. It calls 18 fixed tools defined as a TypeScript array at src/core/agent.ts lines 14 to 196 (navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable). If the model tries to invent a method, the MCP server rejects the call. Hallucinated APIs are structurally impossible; the surface is a tool schema.

2. Does the test pin selectors that will rot?

No. There are no selectors in the plan. The click tool's input schema (agent.ts lines 32 to 42) accepts only element (a human description) and ref (an opaque id from the live accessibility tree). The ref is generated fresh by a snapshot call before every interaction. Nothing about the page's DOM structure is committed.

3. Is the underlying browser automation a real Playwright build?

Yes. The runtime is @playwright/mcp@0.0.70, published by Microsoft's Playwright org on npm. The pin is at package.json line 19 in the open-source repo and at freestyle.ts line 586. Every click ends up as Playwright's own click inside a real Chromium process.

4. What gets uploaded to CI on failure?

A WebM video, a zero-padded PNG per step, and an execution log with the agent's reasoning at each turn. All of it goes to /tmp/assrt/<runId>/ on the runner; CI uploads the directory as a workflow artifact. Nothing about the run is a source of truth that needs to be committed.

5. If I leave Assrt later, do I keep the work?

Yes. The Markdown plan is yours; the runtime is Microsoft's npm package. There is no proprietary YAML, no closed-source binary, no SaaS data lock. Exporting a static .spec.ts is a one-shot operation when you want to freeze the test, and the agent itself is MIT-licensed at github.com/assrt-ai/assrt-mcp. Worst case you fork it.

The honest caveat

The runtime cost is not zero. Every test run drives an LLM, which is slower and more expensive per execution than a hand-written .spec.ts on a stable selector. At MVP scale (a few flows, a few runs per day) the math is not close: the maintenance you avoid is worth far more than the tokens. At thousand-runs-per-day scale on a frozen UI, the calculus shifts; that is when most teams export the plan to standard Playwright and pin the selectors. The framework supports both shapes because every #Case in the plan is already a list of imperative steps a Playwright export can codify mechanically.

The other caveat: you still review the plan. The cost is bounded because the plan is short, but it is not zero. A reviewer should read the four lines of each case the same way they would read an acceptance criterion in a ticket: does this describe the feature correctly? If yes, ship.

Want to see the diff against your repo?

Bring a small flow on a real branch. We will run Assrt against it on a call, walk through the literal git diff that would land on main, and hand back the scenario.md so you keep it whether you adopt the framework or not.

Common questions

What does Assrt actually put in my repo if I run it on a feature branch?

Two things and only two things. The first is a single dependency line in package.json: `"@playwright/mcp": "^0.0.70"`. The second is a plain Markdown file at `tests/scenario.md` (or wherever you choose to commit it; the runtime file lives at `/tmp/assrt/scenario.md`, defined at /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts line 17). The Markdown file is your test plan in English: a series of `#Case` blocks with numbered steps. There is no .spec.ts file, no helpers folder, no Page Object class, no fixtures, no playwright.config.ts that you did not already have. The total surface a reviewer reads is the package.json line plus the scenario file.

So if I read the generated code, what am I actually reading?

Markdown. The scenario file looks like `#Case 1: Sign up with email`, then `Use a disposable email`, then `Enter the OTP from the inbox`, then `Verify the dashboard heading appears`. Four lines per case. A reviewer reads it the way they would read a bug report or an acceptance criterion. The reason this is reviewable is that there is no executable code mixed in, so there is nothing for the reviewer to mentally compile while also checking intent. They just check intent. The executable code lives upstream, in `@playwright/mcp@0.0.70`, which is Microsoft's own package on npm.

Where are the selectors? Every Playwright test I have ever read has selectors.

Nowhere on disk. Selectors get resolved at run time from the live accessibility tree. The agent calls a `snapshot` tool (defined at /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 27 to 30), gets back the page's a11y tree with elements tagged `[ref=e5]`, `[ref=e12]`, and so on, then passes the matching ref to the `click` tool (agent.ts lines 32 to 42). The click tool's input schema accepts `element` (a human description) and `ref` (the live id). It does not accept a CSS string, an XPath, or a data-testid. There is no place in the schema where you could put one. So no selector ever lands in your repo, because none ever leaves run time.

Is the @playwright/mcp dependency from Microsoft, or some fork?

Microsoft's official package, pinned at the patch level. Look at /Users/matthewdi/assrt-mcp/package.json line 19: `"@playwright/mcp": "^0.0.70"`. The base image setup in /Users/matthewdi/assrt/src/core/freestyle.ts line 586 runs `npm install -g @playwright/mcp@0.0.70 ws @agentclientprotocol/claude-agent-acp@0.25.0`, frozen at the exact patch version. That package is published by the Microsoft Playwright organization on npm. There is no fork, no in-house wrapper, no proprietary automation library. If Microsoft ships a Playwright fix, you bump the pin in your package.json and you get it.

OK, but if there is no .spec.ts in the repo, what does CI run?

CI runs `npx @m13v/assrt run --url <preview-url>` (or the equivalent local dev URL). The CLI loads `scenario.md`, spins up `@playwright/mcp@0.0.70` against a real Chromium, and the agent works through each `#Case`. The artifacts CI uploads on failure are a WebM video, a zero-padded PNG per step, and an execution log with the agent's reasoning. The framework is MIT-licensed; the source is at https://github.com/assrt-ai/assrt-mcp. There is no SaaS gate, no API key beyond your LLM token, no closed-source binary in the path. CI runs it, CI uploads the artifacts, CI is done.

Would I actually ship a scenario.md to main, given that LLMs hallucinate?

Yes, with one caveat. The hallucination class that bites code-generating tools (invented method names, made-up imports, wrong API shapes) cannot occur here, because the agent does not emit code. It calls 18 fixed tools defined as a TypeScript array at agent.ts lines 14 to 196. If the model tries to invent a method, the MCP server rejects the call. So the residual hallucination risk is limited to the plan itself: the model proposing a step the product cannot actually do. That risk is bounded by the fact that the plan is plain English you read before merging. A 4-line `#Case 1: Sign up` is roughly as easy to review as a Linear ticket. The mistake-spotting load is low because the surface is small.

What about the run output? Do I commit videos or screenshots?

No. Run output goes to `/tmp/assrt/<runId>/` (videos under `video/recording.webm`, screenshots under `screenshots/NN_stepN_action.png`, logs under `execution.log`) and is regenerated on every run. The convention is to gitignore `tmp/` and rely on CI to upload the artifacts on failure. Nothing in the run output is a source of truth that needs to be checked in. The source of truth is the Markdown plan and the dependency pin. Everything else is recreated.

What is in the standard Playwright export, if I want one?

If you eventually want a portable .spec.ts (for example, your team graduates past MVP stage and wants to lock down the test in a stable file), Assrt's plan format converts cleanly because every `#Case` is already a list of imperative steps. The export is a one-shot operation, not a continuous sync. The reason most teams do not do this on day one is that the Markdown plan is cheaper to maintain through UI churn (the agent re-reads the a11y tree on every run; a static .spec.ts pins selectors at write time and breaks at run time). If the UI is stable and the test is locked in, exporting and committing the .spec.ts is the right move. If the UI is still moving, leave it as Markdown.

Why does this matter on a Twitter thread? People ask the question because they have been burned.

Right. The pattern that burns people is: an AI tool drops 200 lines of .spec.ts into the repo, the lines compile, the test passes locally, and three sprints later half the selectors are stale, the test was getting around the bug all along by clicking a different button, and nobody can review the file because it is too long and too brittle to read line by line. The reframe is to stop generating the part that decays. Assrt's plan is short enough to read on a phone (5 to 12 lines per case is typical), the executable part is upstream Microsoft code, and the selector layer is regenerated every run. The thing you ship to main is the part you would have written by hand anyway, but in English instead of TypeScript.

Where do I look in the source if I want to verify all of this myself?

Four files. /Users/matthewdi/assrt-mcp/package.json line 19 for the @playwright/mcp pin. /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 14 to 196 for the 18-tool TOOLS array (none of which accept a CSS selector). /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts lines 16 to 20 for where scenario.md and the run artifacts get written. /Users/matthewdi/assrt/src/core/freestyle.ts line 586 for the base-image install line that pins @playwright/mcp@0.0.70 on the runtime. The repo is at https://github.com/assrt-ai/assrt-mcp. Cloning it and grepping for `"selector":` in the source returns zero hits in the agent surface, which is the structural answer to the headline question.

Keep reading

Review workflow

AI-generated Playwright tests review: watch the run, not the .spec.ts

If reading the .spec.ts is not the right review surface, what is? The plan, the per-step PNG, the WebM, the agent's a11y refs.

Read

Authoring

Readable Playwright test code: delete the selector line

Why selectors are the unreadable part of every Playwright test, and how Assrt's plan format avoids them by resolving against the a11y tree at run time.

Read

MVP playbook

E2E tests for an MVP: the three-test minimum that survives daily ship

Three flows cover an MVP. Signup, the paid action, billing. The OTP one is the hard one. Source-line walkthrough.

Read

The diff, two ways

What you commit, file by file

File 1: the one new dependency line

File 2: the scenario.md plan

What a reviewer used to see, and what they see now

A pull request that touches the test suite

The five questions a reviewer actually asks

1. Will this test invent a Playwright API that does not exist?

2. Does the test pin selectors that will rot?

3. Is the underlying browser automation a real Playwright build?

4. What gets uploaded to CI on failure?

5. If I leave Assrt later, do I keep the work?

The honest caveat

Want to see the diff against your repo?

Common questions

Keep reading

AI-generated Playwright tests review: watch the run, not the .spec.ts

Readable Playwright test code: delete the selector line

E2E tests for an MVP: the three-test minimum that survives daily ship

Comments (••)

Comments ()