Guide · Test automation tools

A test automation tool where the test file is a Markdown document, not code

Every other guide about test automation tools compares programming languages, parallel shards, and pricing tiers. None of them answer the only question an AI-era team should be asking: what exactly is the file your tests live in, and who is allowed to edit it. Assrt's answer is one path, one format, and one watcher. This is the whole design, and the article a buyer guide will not print.

See the source on GitHub Jump to the test file

Matthew Diakonov, Written with AI

Published April 23, 20269 min read

scenario.md

The test file is a document, not a script

Plain English #Case blocks on disk

fs.watch() syncs every save in 1000ms

The LLM rereads the file every run

Claude Code can Read and Edit it mid-session

0:00 / 0:05

4.9from Open source on GitHub

Self hosted

No YAML DSL

Plain Markdown plans

Playwright MCP under the hood

The gap

Comparison articles sort by language. The actual question is about the file.

Type "test automation tool" into any search bar and you will get a wall of listicles: Selenium 4, Cypress, Playwright, Mabl, Testim, QA Wolf, BrowserStack. They sort by supported languages, parallel run count, and starter plan price. They almost never describe the single most consequential thing about the tool you are about to adopt, which is the format your tests live in and who can edit that format once the product is live.

Two historical answers have dominated. A script based tool (Selenium, Cypress, Playwright) stores tests as compiled code in a language you picked. A recording based tool (Mabl, Testim, QA Wolf) stores tests as a proprietary YAML or binary tree inside their cloud. Both lock you in differently. Scripts demand rewrites every time the UI shifts. Recordings demand their vendor to stay alive.

Assrt does neither. Your plan is a plain English file. The path is not a secret. The format is Markdown. And the file is watched, so edits propagate without a build step. If you want to understand whether this matters for your team, keep reading. If you want to skip to the anchor, it starts three sections down at /tmp/assrt/scenario.md.

Playwright MCP tools wired in

0callable by the agent

agent.ts lines 16–196

Debounce before cloud sync

0ms, fs.watch callback

scenario-files.ts line 102

License cost per seat

$0pay your model bill only

vs $7,500+/mo enterprise AI testing SaaS

The anchor

One path, one format: `/tmp/assrt/scenario.md`

When you run assrt_test via the MCP server or the CLI, the first thing the tool does is write your plan to disk. Not a temp buffer, not an in-memory object: an actual Markdown file at a predictable path. Here is what ends up in that file:

/tmp/assrt/scenario.md

That is the entire specification. There is no hidden step list, no generated Playwright .spec.ts sitting behind it, no XPath cache. At the next run the file is read back in, handed to an LLM agent, and the agent interprets each #Case block against a fresh accessibility tree. That is the whole design.

And the file is watched

assrt-mcp/src/core/scenario-files.ts

Line 97 registers a Node fs.watch callback on the scenario file. Every edit, whether you made it in your editor or a coding agent made it mid-session, resets a 1000ms debounce timer. When the timer fires, syncToFirestore() pushes the new content to shared storage so the next run, on your machine or a teammate's, sees the latest plan. No build pipeline, no artifact upload.

How the file moves

Three editors, one source of truth, no compile step

scenario.md is the only artifact that matters

Anatomy of a run

What actually happens when you press run

The plan gets written to disk

writeScenarioFile() at scenario-files.ts line 42 dumps your #Case text verbatim to /tmp/assrt/scenario.md. An accompanying scenario.json stores id, name, url, and updatedAt.

fs.watch starts watching the file

startWatching() on line 90 installs the Node watcher with a 1000ms debounce. Any future edit triggers syncToFirestore() so teammates see the change.

The preflight checks the target URL

Before burning time on Chrome, Assrt does a HEAD request to your URL with an 8000ms timeout. A wedged dev server fails fast with an actionable error instead of manifesting as an opaque MCP disconnect.

Playwright MCP launches the browser

Local by default. --headed for a visible window, --isolated for an in-memory profile, --extension to attach to your running Chrome via CDP. The browser boots once and is reused across scenarios.

The LLM rereads scenario.md and drives the browser

claude-haiku-4-5-20251001 by default. It receives the accessibility tree, picks one of eighteen tools (navigate, click, type_text, wait_for_stable, http_request, and so on), and iterates until it can call complete_scenario.

Assertions, a video, and the results file land on disk

Every assert() call is logged. A .webm recording is auto-opened in a player at 5x speed. Results sit at /tmp/assrt/results/latest.json so the coding agent can Read them without a network call.

What it looks like

A single run, from the shell

Put the plan inline or in a file. No config, no describe/it boilerplate.

assrt CLI

The coding agent loop

The first test automation tool your coding agent can actually fix

Because the plan is a Markdown file at a known path, the same agent that wrote your feature can fix the broken test. There is no specialised test DSL to learn. The agent reads /tmp/assrt/scenario.md with its ordinary Read tool, edits one line, and the file watcher picks up the change for the next run. That is the whole loop:

Claude Code session

How this differs from what you already have

Assrt vs a typical script-based or recorded automation tool

Feature	Script / recording tool	Assrt
Where your test plan lives	Compiled .spec.ts files, or a proprietary YAML DSL inside a SaaS UI	Plain Markdown at /tmp/assrt/scenario.md, synced to cloud via fs.watch
What your coding agent can do with it	Read only, or regenerate from scratch and overwrite your diff	Read and Edit the file with normal tools, changes picked up on next run
What happens when a selector changes	Script breaks, CI red until someone rewrites getByRole	LLM rereads the plan, looks at the accessibility tree, finds the new label
Source of the test driver	Closed binary, vendor-hosted cloud	Open source TypeScript. Self-host it or run npx @assrt-ai/assrt
Typical team license	$7,500/mo and up for enterprise AI testing SaaS	Free. You pay the Anthropic or Google bill for the model, nothing else
What you keep if you walk away	Nothing. The YAML does not run anywhere else	A Markdown file and a Playwright MCP config. Both open formats

Assrt is not a drop-in replacement for high-scale shard/parallel-heavy CI setups. It is optimised for the dev-loop case where a coding agent writes the feature, writes the test, and fixes the test.

What the agent can do

18 tools, defined in one file, readable in ten minutes

Page interaction

navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, wait_for_stable, screenshot, evaluate. The agent gets an accessibility tree, picks a ref like e5, and acts on it.

Disposable email and OTP

create_temp_email spins up a disposable address. wait_for_verification_code polls the inbox. The system prompt teaches the agent to paste OTPs into split single-character inputs via a single ClipboardEvent, rather than typing one field at a time.

External API verification

http_request lets the agent poll Telegram, Slack, GitHub, or any webhook endpoint to verify that an action in the web app produced the expected external effect.

Assertions and suggestions

assert logs a pass/fail with evidence. suggest_improvement flags UX bugs the agent spotted while running the plan, so every test run doubles as a light product review.

Continuous page discovery

Every URL the agent visits is queued for auto-discovery. A secondary model generates extra test ideas for that page in the background, up to MAX_DISCOVERED_PAGES = 20.

Variables and pass criteria

Pass variables as {{KEY}} placeholders and they get interpolated into the plan. Pass passCriteria as free text and the agent must verify every condition or mark the scenario failed.

Want to see scenario.md on your own app?

Bring one user flow that keeps breaking. We will turn it into plain English in a file you own, run it live, and show you the video.

Frequently asked questions

What is a test automation tool in 2026, and why are plain English plans relevant?

A test automation tool runs browser interactions against a real application and reports pass/fail without a human clicking through. Until recently that meant a compiled script (Selenium, Cypress, Playwright) or a point-and-click recorder that exports a proprietary YAML DSL (Mabl, Testim, QA Wolf). Both assume the test is static code. In an agent-native tool like Assrt, the plan is plain text that an LLM interprets at runtime. The practical payoff is that minor UI edits (a relabelled button, a reordered form) do not break the test, and a coding agent can open the same file and patch it the way a human would.

Where does Assrt actually store my test plan?

On disk at `/tmp/assrt/scenario.md`. The file is created by `writeScenarioFile()` in `src/core/scenario-files.ts` (line 42 in the assrt-mcp source). Results for the last run land at `/tmp/assrt/results/latest.json`, and per-run artifacts live at `/tmp/assrt/<runId>/`. Every assrt_test call rewrites `scenario.md` with the current plan, which means Claude Code, Cursor, or any other file-tool-equipped agent can Read and Edit it without a custom API.

How does Assrt detect when I or the agent edit the Markdown file during a session?

`startWatching()` calls Node's `fs.watch(SCENARIO_FILE, { persistent: false })` (line 97 in scenario-files.ts). The change event is debounced for 1000ms, then `syncToFirestore()` pushes the new content to the shared store. The next assrt_test run reads the updated plan on disk before handing it to the agent. One caveat: scenarios with IDs prefixed `local-` are skipped, so offline-only plans stay local.

If the plan is plain English, what actually drives the browser?

Playwright MCP. Assrt wraps `@playwright/mcp` and exposes a fixed tool list (eighteen tools in `agent.ts` lines 16 through 196) to the LLM. The model gets an accessibility snapshot of the current page, decides which tool to call (`click`, `type_text`, `press_key`, `wait_for_stable`, etc.), and Assrt proxies the call into Playwright. The agent also owns scenario orchestration, disposable email creation, and OTP pasting for split code inputs.

Which model runs my tests, and can I change it without rewriting anything?

Default driver is `claude-haiku-4-5-20251001`, set on line 9 of `assrt-mcp/src/core/agent.ts`. You can pass `--model` to the CLI or `model` to the MCP tool to swap in another Anthropic model, or `--provider gemini` for Google's `gemini-3.1-pro-preview`. The plan file does not change. You pay the model provider directly; Assrt itself is free and open source.

How do I run it without leaving data in someone else's cloud?

Run the CLI locally with `npx @assrt-ai/assrt run --url <your-url> --plan <...>`. Launch is local by default. Pass `--isolated` to keep the browser profile in memory only, no disk persistence. Skip scenario sync by setting `ASSRT_NO_SAVE=1` or by using a `local-` scenario ID. The test plan stays as a file you own, the video recording lands in `/tmp/assrt/<runId>/video/recording.webm`, and no cloud account is required to run a full end-to-end test.

Does 'the LLM interprets the plan every run' mean my tests are flaky?

Less flaky than you expect, and the failure mode is different. The agent grounds every step in the accessibility tree it just pulled from the page, so a one-off DOM change in isolation usually works. What fails is genuinely ambiguous steps ("click the button" on a page with ten buttons). The fix is to make the plan more specific the same way you would make a bug report more specific, not to rewrite selector code. Pass `passCriteria` to enforce explicit conditions the agent must verify.

How does this differ from recording-based tools like Mabl, Testim, or QA Wolf?

Recording tools capture selectors at record time and store them in a proprietary format. You get locked into their cloud and their DSL. If their service goes down or raises prices, your tests are stuck. Assrt stores the plan in a Markdown file you can commit to git, runs on open-source Playwright MCP, and is driven by a commodity LLM API. The words 'export to plain code' are not a feature you have to ask for, they are the default storage format.