Test Automation, from First Principles
How to test automation when the test is a paragraph of markdown
Every guide on this keyword tells you the same thing: plan the suite, pick a framework, write the code, run it, report the results. That flow assumes you author selectors and wait conditions. This is the other way. You describe the scenario in plain English in a file called tests/*.md, an AI agent drives a real Playwright browser against your app, and you get back a six-key JSON report plus a recording you can scrub through at 10x.
The shape every other guide teaches
Search the keyword and the top five results read like one article. They tell you to list the manual cases you already run, pick a framework (usually Selenium, occasionally Playwright or Cypress), install a runner, wire it into CI, and start writing code. Then they show a snippet in Java or Python with driver.findElement(By.id(...)) and an explicit wait. The sample test is a login form.
This pipeline works, but every step below the "pick a framework" line is a tax. You maintain selectors. You author waits. You translate what the user does into method calls. When the UI shifts slightly, the test breaks even though the behavior did not. The author loop gets long enough that tests become a separate engineering workstream from the feature work, which is the exact point where most teams decide automation is not worth it.
The thing missing from those guides is a shorter path: keep the real browser, drop the code. The rest of this page is what that path looks like in practice.
Framework-code flow vs. #Case flow
Pick Selenium or Cypress, install the SDK, bootstrap a project, import a page-object framework, write Java or TypeScript, author selectors, add explicit waits, maintain the suite across refactors.
- Test is a .java or .ts file of selectors
- Human translates intent into code
- UI refactor breaks tests even when behavior is unchanged
- Parallel skill to the feature author; separate workstream
The scenario format the agent executes
The entire test language is one rule: a line that starts with #Case N: begins a new scenario. Everything under it, until the next #Case line, is the instruction set for that scenario in plain English. The parser is a regex in src/core/agent.ts of the open-source assrt-mcp repo (lines 259 and 568). There is no other grammar.
Here is a real three-case file a reader could drop into any Next.js or React app with a signup route.
What the runner actually does
When the agent calls assrt_test, it spawns a Playwright browser under the hood, loads your URL, and for each #Case it takes an accessibility-tree snapshot, then calls into a small set of internal tools (navigate, click, type, snapshot, wait, evaluate, assert, create_temp_email, wait_for_verification_code, complete_scenario). Elements are targeted by [ref=eN] ids pulled off the accessibility tree, not CSS selectors, so the tests are resilient to class-name and DOM-structure changes.
Inputs, runner, outputs
The three MCP tools your coding agent calls
Assrt is not a CLI you remember commands for; it is a Model Context Protocol server that exposes exactly three tools to whichever coding agent you already use (Claude Code, Cursor, any MCP-aware client). The three names are defined in src/mcp/server.ts of the assrt-mcp repo and they cover the whole loop.
assrt_plan
Point it at a URL. The agent crawls a few pages, inspects the accessibility tree, and drafts #Case scenarios for you. Good for the first pass against a new route.
assrt_test
The core verb. Takes a URL and a plan file, executes every #Case in a real Playwright browser, and returns the six-key TestReport. This is what the hook fires on git commit.
assrt_diagnose
When a case fails, hand the runId to this tool. It reads the video, the trace, and the failure context and proposes a patch to either the scenario or the application. The agent applies it in the same turn.
Checked-in scenarios
Your #Case files live in the repo next to the feature. They survive machine switches, reviews, and vendor migrations. There is no proprietary format to export.
The contract you program against
Whatever you chain onto the runner, the output is the same shape. Six top-level keys. Per-scenario children with a boolean pass, a steps array, and a summary. Every integration (CI gate, Slack bot, dashboard, PR comment) reads the same JSON.
The whole file is under 110 lines and lives in the open-source assrt-mcp repo. The full CI-gate expression is jq '.failedCount' run.json. There is no second schema you have to learn to consume this.
A run, end to end, from the terminal the agent sees
Here is what the loop looks like from the agent's point of view. The agent commits a feature. A PostToolUse hook reminds it to verify. It calls assrt_test. If a case fails, the agent reads the JSON, watches the recording at 5x, calls assrt_diagnose, patches, and re-runs. The whole thing happens inside the same conversation that produced the change.
Where every file lands
The runner is deliberate about disk layout. Nothing is scattered across temp directories; everything lives under two roots so your editor, your hook scripts, and your eyes always know where to look.
/tmp/assrt/scenario.md
The #Case file the agent is currently running. Human-editable. Auto-synced to Firestore on save with a one-second debounce (unless the scenario id is prefixed "local-", in which case it stays local only).
/tmp/assrt/scenario.json
Metadata for the current run: scenario id, display name, target url. The agent reads this to know where to navigate before the first case fires.
/tmp/assrt/results/latest.json
The most recent TestReport. Overwritten every run. This is the file your CI gate, your PR comment bot, or your Slack notifier should read.
/tmp/assrt/results/<runId>.json
Timestamped history, one per run. Used by assrt_diagnose to reconstruct what happened when a case regressed.
/tmp/assrt/recording.webm + player.html
The video of the run plus a generated HTML player that streams it with Range requests. Keyboard shortcuts: Space to toggle play, arrow keys to seek. Speed buttons: 1x, 2x, 3x, 5x, 10x.
~/.assrt/browser-profile
Persistent Chromium profile for cookies, localStorage, and session persistence between runs. Pass --isolated on the command line to skip the profile and launch a fresh context.
~/.assrt/playwright-output
Playwright trace files (.yml) for every run. Open one in Playwright's trace viewer for a frame-by-frame debugger if the video is not enough.
Stop describing tests as code
Install Assrt once, keep the loop forever. The three MCP tools plug into Claude Code, Cursor, or any MCP-aware agent. Your tests become markdown files in your repo, and a real browser verifies them in the same turn the agent wrote the code.
Read the install steps →How this compares to the traditional automation playbook
The row that matters most is "who writes the test." Every other difference follows from that choice.
Framework code vs. #Case markdown
| Feature | Selenium or Cypress | Assrt |
|---|---|---|
| Test source format | Java, Python, or TypeScript file | Plain-text markdown, #Case N: [name] |
| Who authors it | Human, after the feature ships | Same agent that wrote the feature, in the same turn |
| Element targeting | CSS, XPath, or data-testid selectors | Accessibility tree [ref=eN] ids |
| Runner | Your CI container | Real Playwright, local, 1600x900 viewport |
| Output format | JUnit XML and a framework-specific HTML report | Six-key TestReport JSON, one schema to learn |
| Failure diagnosis | Read the stack trace, open the trace viewer | assrt_diagnose proposes a patch; video auto-opens |
| Vendor lock-in | None for Selenium, proprietary DSL for SaaS tools | None; open source, tests are markdown you own |
| Cost | Free to thousands/mo for SaaS alternatives | Free and self-hosted; agent runs on your API key |
Frequently asked questions
What does a #Case scenario actually look like?
Plain text. A typical file is three to five cases, each starting with "#Case N: [short name]" on its own line, followed by one or two sentences of what the user does and what to assert. The agent in assrt-mcp parses this with a regex at src/core/agent.ts:259 and again at line 568. There is no DSL to learn; if you can describe the behavior in English, you can author the test.
Why use markdown instead of writing Playwright code directly?
Two reasons. First, the agent that wrote the feature can also write the scenario in the same turn; there is no second translation layer from intent to selectors. Second, when the UI changes but the behavior does not, the scenario rarely changes because it describes what the user does, not how the DOM looks. Assrt still wraps Playwright (via @playwright/mcp) under the hood, so you keep deterministic browser semantics.
Where do test results actually land on disk?
Every run writes /tmp/assrt/results/latest.json and a timestamped /tmp/assrt/results/<runId>.json. The scenario itself lives at /tmp/assrt/scenario.md and is mirrored to Firestore if it has a non-local ID. The video recording lands at recording.webm with an auto-generated player.html next to it that streams the video with HTTP Range support.
What is in the TestReport JSON?
Six keys at the top level: url, scenarios, totalDuration, passedCount, failedCount, generatedAt. Each scenario has a name, a passed boolean, a steps array, an assertions array, a summary, and a duration. The full type is in src/core/types.ts of the assrt-mcp repo. A CI gate is one line: jq '.failedCount' — if it returns anything other than 0, fail the build.
Is this a cloud service? Can I run it without sending my app to a vendor?
Yes, everything is local. The MCP server and the CLI are open source and run on the developer's machine. The browser profile lives in ~/.assrt/browser-profile for cookies and localStorage persistence, unless you pass --isolated. Artifact upload is opt-in; by default, nothing leaves your machine.
What are the three MCP tools I can call from a coding agent?
assrt_plan navigates to a URL and auto-generates test cases from what it sees. assrt_test executes the #Case scenarios and returns the TestReport. assrt_diagnose takes a failed run and proposes a fix for the scenario or the application code. These three tool names are defined in src/mcp/server.ts:5-8 of the assrt-mcp repo.
How does the video player work if the test was headless?
Playwright still records video even in headless mode when you enable it, and Assrt always does. After a run, a player.html file is generated next to recording.webm that streams the webm with Range requests. It supports 1x, 2x, 3x, 5x, and 10x playback speeds, Space to play or pause, and arrow keys to seek. The reminder hook will auto-open this page in the default browser so you can watch a failing run without leaving your terminal.
What viewport size does the browser launch at?
1600x900, fixed. The value is passed as --viewport-size 1600x900 to the Playwright MCP subprocess in src/core/browser.ts:273. This is deliberate. A single consistent viewport keeps snapshots deterministic; if you need mobile or tablet emulation, author a separate scenario and call the resize tool inside it.
Open source. Self-hosted. No account required. Tests stay in your repo.
0 MCP tools, 0 JSON keys, zero vendor lock-in.