Test Automation, from First Principles

How to test automation when the test is a paragraph of markdown

Every guide on this keyword tells you the same thing: plan the suite, pick a framework, write the code, run it, report the results. That flow assumes you author selectors and wait conditions. This is the other way. You describe the scenario in plain English in a file called tests/*.md, an AI agent drives a real Playwright browser against your app, and you get back a six-key JSON report plus a recording you can scrub through at 10x.

M
Matthew Diakonov
10 min read
4.9from developers running Assrt locally
Tests are plain text; no framework to learn and no YAML DSL
Real Playwright under the hood, deterministic browser semantics
Open source, self-hosted, zero vendor lock-in

The shape every other guide teaches

Search the keyword and the top five results read like one article. They tell you to list the manual cases you already run, pick a framework (usually Selenium, occasionally Playwright or Cypress), install a runner, wire it into CI, and start writing code. Then they show a snippet in Java or Python with driver.findElement(By.id(...)) and an explicit wait. The sample test is a login form.

This pipeline works, but every step below the "pick a framework" line is a tax. You maintain selectors. You author waits. You translate what the user does into method calls. When the UI shifts slightly, the test breaks even though the behavior did not. The author loop gets long enough that tests become a separate engineering workstream from the feature work, which is the exact point where most teams decide automation is not worth it.

The thing missing from those guides is a shorter path: keep the real browser, drop the code. The rest of this page is what that path looks like in practice.

Framework-code flow vs. #Case flow

Pick Selenium or Cypress, install the SDK, bootstrap a project, import a page-object framework, write Java or TypeScript, author selectors, add explicit waits, maintain the suite across refactors.

  • Test is a .java or .ts file of selectors
  • Human translates intent into code
  • UI refactor breaks tests even when behavior is unchanged
  • Parallel skill to the feature author; separate workstream

The scenario format the agent executes

The entire test language is one rule: a line that starts with #Case N: begins a new scenario. Everything under it, until the next #Case line, is the instruction set for that scenario in plain English. The parser is a regex in src/core/agent.ts of the open-source assrt-mcp repo (lines 259 and 568). There is no other grammar.

Here is a real three-case file a reader could drop into any Next.js or React app with a signup route.

tests/signup.md

What the runner actually does

When the agent calls assrt_test, it spawns a Playwright browser under the hood, loads your URL, and for each #Case it takes an accessibility-tree snapshot, then calls into a small set of internal tools (navigate, click, type, snapshot, wait, evaluate, assert, create_temp_email, wait_for_verification_code, complete_scenario). Elements are targeted by [ref=eN] ids pulled off the accessibility tree, not CSS selectors, so the tests are resilient to class-name and DOM-structure changes.

Inputs, runner, outputs

tests/signup.md
http://localhost:3000
Claude Haiku 4.5
~/.assrt/browser-profile
assrt_test
TestReport JSON
recording.webm
Playwright trace

The three MCP tools your coding agent calls

Assrt is not a CLI you remember commands for; it is a Model Context Protocol server that exposes exactly three tools to whichever coding agent you already use (Claude Code, Cursor, any MCP-aware client). The three names are defined in src/mcp/server.ts of the assrt-mcp repo and they cover the whole loop.

assrt_plan

Point it at a URL. The agent crawls a few pages, inspects the accessibility tree, and drafts #Case scenarios for you. Good for the first pass against a new route.

assrt_test

The core verb. Takes a URL and a plan file, executes every #Case in a real Playwright browser, and returns the six-key TestReport. This is what the hook fires on git commit.

assrt_diagnose

When a case fails, hand the runId to this tool. It reads the video, the trace, and the failure context and proposes a patch to either the scenario or the application. The agent applies it in the same turn.

Checked-in scenarios

Your #Case files live in the repo next to the feature. They survive machine switches, reviews, and vendor migrations. There is no proprietary format to export.

The contract you program against

Whatever you chain onto the runner, the output is the same shape. Six top-level keys. Per-scenario children with a boolean pass, a steps array, and a summary. Every integration (CI gate, Slack bot, dashboard, PR comment) reads the same JSON.

src/core/types.ts

The whole file is under 110 lines and lives in the open-source assrt-mcp repo. The full CI-gate expression is jq '.failedCount' run.json. There is no second schema you have to learn to consume this.

0MCP tools (plan, test, diagnose)
0keys in the TestReport JSON
0pixel viewport width (fixed)
0xmax recording playback speed

A run, end to end, from the terminal the agent sees

Here is what the loop looks like from the agent's point of view. The agent commits a feature. A PostToolUse hook reminds it to verify. It calls assrt_test. If a case fails, the agent reads the JSON, watches the recording at 5x, calls assrt_diagnose, patches, and re-runs. The whole thing happens inside the same conversation that produced the change.

agent session — post-commit verification

Where every file lands

The runner is deliberate about disk layout. Nothing is scattered across temp directories; everything lives under two roots so your editor, your hook scripts, and your eyes always know where to look.

1

/tmp/assrt/scenario.md

The #Case file the agent is currently running. Human-editable. Auto-synced to Firestore on save with a one-second debounce (unless the scenario id is prefixed "local-", in which case it stays local only).

2

/tmp/assrt/scenario.json

Metadata for the current run: scenario id, display name, target url. The agent reads this to know where to navigate before the first case fires.

3

/tmp/assrt/results/latest.json

The most recent TestReport. Overwritten every run. This is the file your CI gate, your PR comment bot, or your Slack notifier should read.

4

/tmp/assrt/results/<runId>.json

Timestamped history, one per run. Used by assrt_diagnose to reconstruct what happened when a case regressed.

5

/tmp/assrt/recording.webm + player.html

The video of the run plus a generated HTML player that streams it with Range requests. Keyboard shortcuts: Space to toggle play, arrow keys to seek. Speed buttons: 1x, 2x, 3x, 5x, 10x.

6

~/.assrt/browser-profile

Persistent Chromium profile for cookies, localStorage, and session persistence between runs. Pass --isolated on the command line to skip the profile and launch a fresh context.

7

~/.assrt/playwright-output

Playwright trace files (.yml) for every run. Open one in Playwright's trace viewer for a frame-by-frame debugger if the video is not enough.

Stop describing tests as code

Install Assrt once, keep the loop forever. The three MCP tools plug into Claude Code, Cursor, or any MCP-aware agent. Your tests become markdown files in your repo, and a real browser verifies them in the same turn the agent wrote the code.

Read the install steps

How this compares to the traditional automation playbook

The row that matters most is "who writes the test." Every other difference follows from that choice.

Framework code vs. #Case markdown

FeatureSelenium or CypressAssrt
Test source formatJava, Python, or TypeScript filePlain-text markdown, #Case N: [name]
Who authors itHuman, after the feature shipsSame agent that wrote the feature, in the same turn
Element targetingCSS, XPath, or data-testid selectorsAccessibility tree [ref=eN] ids
RunnerYour CI containerReal Playwright, local, 1600x900 viewport
Output formatJUnit XML and a framework-specific HTML reportSix-key TestReport JSON, one schema to learn
Failure diagnosisRead the stack trace, open the trace viewerassrt_diagnose proposes a patch; video auto-opens
Vendor lock-inNone for Selenium, proprietary DSL for SaaS toolsNone; open source, tests are markdown you own
CostFree to thousands/mo for SaaS alternativesFree and self-hosted; agent runs on your API key

Frequently asked questions

What does a #Case scenario actually look like?

Plain text. A typical file is three to five cases, each starting with "#Case N: [short name]" on its own line, followed by one or two sentences of what the user does and what to assert. The agent in assrt-mcp parses this with a regex at src/core/agent.ts:259 and again at line 568. There is no DSL to learn; if you can describe the behavior in English, you can author the test.

Why use markdown instead of writing Playwright code directly?

Two reasons. First, the agent that wrote the feature can also write the scenario in the same turn; there is no second translation layer from intent to selectors. Second, when the UI changes but the behavior does not, the scenario rarely changes because it describes what the user does, not how the DOM looks. Assrt still wraps Playwright (via @playwright/mcp) under the hood, so you keep deterministic browser semantics.

Where do test results actually land on disk?

Every run writes /tmp/assrt/results/latest.json and a timestamped /tmp/assrt/results/<runId>.json. The scenario itself lives at /tmp/assrt/scenario.md and is mirrored to Firestore if it has a non-local ID. The video recording lands at recording.webm with an auto-generated player.html next to it that streams the video with HTTP Range support.

What is in the TestReport JSON?

Six keys at the top level: url, scenarios, totalDuration, passedCount, failedCount, generatedAt. Each scenario has a name, a passed boolean, a steps array, an assertions array, a summary, and a duration. The full type is in src/core/types.ts of the assrt-mcp repo. A CI gate is one line: jq '.failedCount' — if it returns anything other than 0, fail the build.

Is this a cloud service? Can I run it without sending my app to a vendor?

Yes, everything is local. The MCP server and the CLI are open source and run on the developer's machine. The browser profile lives in ~/.assrt/browser-profile for cookies and localStorage persistence, unless you pass --isolated. Artifact upload is opt-in; by default, nothing leaves your machine.

What are the three MCP tools I can call from a coding agent?

assrt_plan navigates to a URL and auto-generates test cases from what it sees. assrt_test executes the #Case scenarios and returns the TestReport. assrt_diagnose takes a failed run and proposes a fix for the scenario or the application code. These three tool names are defined in src/mcp/server.ts:5-8 of the assrt-mcp repo.

How does the video player work if the test was headless?

Playwright still records video even in headless mode when you enable it, and Assrt always does. After a run, a player.html file is generated next to recording.webm that streams the webm with Range requests. It supports 1x, 2x, 3x, 5x, and 10x playback speeds, Space to play or pause, and arrow keys to seek. The reminder hook will auto-open this page in the default browser so you can watch a failing run without leaving your terminal.

What viewport size does the browser launch at?

1600x900, fixed. The value is passed as --viewport-size 1600x900 to the Playwright MCP subprocess in src/core/browser.ts:273. This is deliberate. A single consistent viewport keeps snapshots deterministic; if you need mobile or tablet emulation, author a separate scenario and call the resize tool inside it.

Install Assrt locally

Open source. Self-hosted. No account required. Tests stay in your repo.

0 MCP tools, 0 JSON keys, zero vendor lock-in.