Guide, April 2026

Automated test automation is a loop, not a button. Here is where the file lives.

Most pages for this keyword describe a capability in the abstract: the AI writes tests. That tells you nothing you can verify. This page describes the artifact. One Markdown file at /tmp/assrt/scenario.md, three MCP tools that read and write it, one fs.watch() that keeps them in sync, and a Playwright subprocess that actually drives the browser. If you can see the file, the loop is real.

$ npx assrt-mcp setup && ls /tmp/assrt

M
Matthew Diakonov
12 min read
4.9from engineers running Assrt locally
/tmp/assrt/scenario.md is the single source of truth
fs.watch() at scenario-files.ts:97 syncs edits in 1000 ms
Playwright MCP subprocess handles the browser, not a custom driver
11 lines

Assrt's entire scenario parser is eleven lines of TypeScript and one regex in agent.ts:568-579. There is no DSL compiler, no cloud transformation, no proprietary record format between the Markdown you read and the test that runs.

assrt-mcp source, April 2026

The whole page, one sentence

Automated test automation is write run repair, on one Markdown file, with one watcher, over an open-source browser subprocess.

Three MCP tools you call, one file you can edit, zero vendor cloud required between them. Everything else on the SERP wraps this shape in a dashboard and sells you the dashboard.

The keyword is a trick. There are two automations in it.

The first automation in automated test automation is the old one: a runner drives a browser through a test the human wrote. Selenium did this in 2004. Playwright and Cypress do it better. Everyone in the category clears this bar. The second automation is the new one and the one the keyword actually asks about: automating the work upstream of execution, drafting the test, maintaining it when the UI drifts, fixing it when it fails without a human rewriting it by hand. That second layer is where the category splits into honest and hand-wavy.

Honest answer: ship an agent loop that writes, runs, and repairs, with the artifact on the user's disk in a format they can read. Hand-wavy answer: put a generate button in a dashboard, call it "AI-powered automation," and keep the format proprietary so the exit path is a migration project. Assrt is the first flavor. The page below describes the shape of that flavor concretely enough that you could re-implement it yourself.

The architecture, drawn once

Three tools on the left. One file in the middle. One runtime on the right. Every arrow in this diagram is a function call you can grep for in the open-source repo.

write · run · repair, over one file

assrt_plan
assrt_test
assrt_diagnose
/tmp/assrt/scenario.md
Playwright MCP
Claude Haiku / Gemini
/tmp/assrt/results/latest.json

The anchor: the fifteen lines that close the loop

The file-watch logic is the smallest piece of the system and it is what makes the loop feel automatic. When the agent saves a corrected #Case into /tmp/assrt/scenario.md, the watcher fires, the 1000 ms debounce coalesces the write burst, and the next assrt_test run picks up the change with zero re-upload step. Everything else in the loop is plumbing around these fifteen lines.

assrt-mcp/src/core/scenario-files.ts:96-110

The parser is not much bigger. An entire scenario grammar, inline below. There is no schema, no bytecode, no compile step. If you can match this regex, you can run an Assrt plan.

assrt-mcp/src/core/agent.ts:568-579

Anchor facts, in numbers

The loop, measured against itself:

0MCP tools in the entire surface
0Markdown file on disk as source of truth
0 msDebounce between save and cloud sync
0Lines of parser that back the grammar

Verifiable from assrt-mcp/src/mcp/server.ts, assrt-mcp/src/core/scenario-files.ts, and assrt-mcp/src/core/agent.ts. Not a deck number.

One full loop, without interruption

A write, a run that fails, a diagnosis, and a second run that passes, in order. The sequence below is the same flow you get whether you drive the three tools from Claude Code, from Cursor, or from the assrt CLI in a terminal.

Agent · scenario.md · MCP tool · Playwright runtime

Agent (you or Claude Code)MCP toolscenario.mdPlaywright MCPurl → assrt_planwrite 5-8 #Case blocksassrt_test(url)read /tmp/assrt/scenario.mddrive browser via Playwright MCPassertion failedassrt_diagnose(case, error)Corrected #Case blockpaste over failing block, saveassrt_test(url) againpass, WebM video, JSON report

The only point where a human is strictly required is the paste step after assrt_diagnose returns its Corrected #Case block, and even that can be skipped if the agent driving the MCP is allowed to write files directly. In Claude Code, for example, Claude edits the file itself and you move on.

What the session actually looks like

The output stream from one end-to-end run, lightly edited. Nothing mocked, nothing simplified, this is what the terminal prints when you run the three tools against a local dev server.

assrt-mcp · automated test automation in one session

Three tools, three jobs, one file

The MCP surface is small on purpose. There is no fourth tool to learn, no dashboard-only features, no settings page that changes runtime behavior. Every moving part in the loop is one of the six cards below.

assrt_plan

Point it at any URL. The agent crawls the page, reads the accessibility tree, and writes a plan of 5-8 #Case blocks in plain English. Output lands at /tmp/assrt/scenario.md. You can edit it before the first run or just ship it.

assrt_test

Reads the plan, launches a Playwright MCP subprocess, drives a headless or extension-mode Chromium through each #Case, and writes the run report to /tmp/assrt/results/latest.json plus a scrubbable WebM video. Exit code and a structured pass/fail JSON are the same whether you invoke it from an IDE agent, CI, or a shell.

assrt_diagnose

Takes a failing case and its evidence. Returns four Markdown sections: Root Cause, Analysis, Recommended Fix, and a Corrected Test Scenario in the same #Case grammar. You paste the corrected block over the failing one in scenario.md, save, re-run assrt_test. Repair is a text edit, not a silent cloud rewrite.

The glue: /tmp/assrt/scenario.md

One Markdown file. Written by assrt_plan, read by assrt_test, patched by assrt_diagnose. An fs.watch() at scenario-files.ts:97 debounces saves by 1000 ms and mirrors the file to the optional hosted dashboard. Turn the dashboard off and the file still works. The file is the source of truth.

The runtime: Playwright MCP

Not a proprietary driver. Assrt spawns @playwright/mcp over stdio (McpBrowserManager in browser.ts) and forwards every navigate, click, type_text into an equivalent browser_* call. Portability is not a promise in the docs, it is a subprocess architecture you can verify.

The model: your key

Anthropic claude-haiku-4-5-20251001 by default, Gemini if you prefer, via ANTHROPIC_API_KEY or GOOGLE_API_KEY in your shell. No hidden meter. A typical #Case costs cents per run.

Two flavors of self-healing

"Automated test automation" pages often promise a self-healing suite. There are two concrete flavors of that promise and they diverge on review. The left column is the flavor most platforms ship. The right column is the one Assrt ships.

Closed cloud rewrite vs. Markdown diff

You click run. The platform's closed model detects the failure, rewrites the test in its own format, and marks the run green. You never see what changed unless you dig into an audit log the vendor optionally exposes. Review is a dashboard. Rollback is a support ticket.

  • Test is rewritten silently in the vendor's cloud
  • Diff lives in a proprietary audit trail, not your git
  • Review and rollback require the vendor's UI to be online

How to start the loop in four commands

If you stop reading here, the practical on-ramp is four steps. Each lands a concrete artifact on your disk.

Four steps to a closed write/run/repair loop

  1. 1

    Install the MCP

    npx assrt-mcp setup registers the server in Claude Code or Cursor and writes a PostToolUse reminder hook.

  2. 2

    Call assrt_plan

    Point it at http://localhost:3000 (or any URL). It writes /tmp/assrt/scenario.md with 5-8 cases.

  3. 3

    Call assrt_test

    Same URL. It reads the file, runs the cases, drops results to /tmp/assrt/results/latest.json.

  4. 4

    Call assrt_diagnose on failures

    Paste the Corrected Test Scenario into scenario.md. Re-run assrt_test. The loop closed in four commands.

Ten numbers from the running system

Everything below can be read from the source tree or a running MCP. No marketing tempo charts, no synthetic benchmark plots.

0MCP tools: plan, test, diagnose
0Scenario file on disk
0 msWatcher debounce
0Lines in the scenario parser
0Sections in an assrt_diagnose response
$0Runtime license cost (MIT)
0Typical #Case blocks in an auto-plan
0Playwright MCP subprocess per run

The 0-line parser and the 0 ms watcher debounce are the two numbers that make the loop feel like one thing instead of three disconnected tools.

When the vendor-cloud flavor is still the right call

Worth saying out loud. If your QA org is 40 non-engineers inside a regulated enterprise with a keyword library that already exists in a vendor's DSL, a closed-cloud platform buys you real onboarding value that the file-on-disk flavor cannot replicate. If your visual regression strategy is built around a vendor's baseline infrastructure, you pay for that infrastructure. Assrt is not a universal replacement, it is the flavor that wins when the loop is owned by engineers and needs to survive the vendor disappearing. Pick the flavor that matches the team that will maintain it for the next three years.

Run the loop once, end to end, in five minutes.

npx assrt-mcp setup, call assrt_plan at your dev server URL, run the resulting Markdown with assrt_test, paste the Corrected #Case from assrt_diagnose when something fails. The whole loop runs locally against your own LLM key.

Install npx assrt-mcp

Specific questions about the file, the watcher, and the loop

What does 'automated test automation' actually mean in 2026?

The phrase is a tell. The first 'automation' refers to automating test execution, which every tool in the category has done since Selenium in 2004. The second 'automation' refers to automating the work upstream of execution, writing the tests, maintaining them when the UI drifts, and repairing them when they fail. That upstream layer is the one most platforms have not shipped honestly. They generate a first draft with an LLM, lock it behind a cloud editor, and call the result 'automated.' A complete loop needs three tools (write, run, repair) and an artifact you can read. Assrt ships that loop as three MCP tools and a Markdown file at /tmp/assrt/scenario.md. That is specific. That is the shape of the answer.

Where do the tests live and who writes them?

The tests live at /tmp/assrt/scenario.md on your local filesystem, written by assrt-mcp/src/core/scenario-files.ts line 42. The agent that calls assrt_plan writes the initial draft; you or any other agent (Claude Code, Cursor, the MCP server running in the background) can open that file and edit it like any other Markdown file. When the file changes, the fs.watch() on line 97 fires, a 1000 ms debounce coalesces bursts of edits, and syncToFirestore() pushes the new plan to the optional cloud for sharing. If you disable the cloud sync, the file is still the source of truth and assrt_test still runs it. The Firestore step is optional. The file is not.

How is the repair loop closed when a test fails?

You call assrt_diagnose with the failed scenario. It is defined in assrt-mcp/src/mcp/server.ts around line 240 and returns a four-section Markdown response: Root Cause, Analysis, Recommended Fix, and Corrected Test Scenario. The last section is a literal #Case block written in the same grammar as the one in your scenario.md. You paste that block over the failing one in /tmp/assrt/scenario.md, save, and re-run assrt_test. The loop closes in three commands. No vendor rewrites your test silently, no closed AI decides in a dashboard what the fix should be. The diagnosis is a file diff you can review before committing.

What is the actual grammar Assrt uses to parse a plan?

One regex: /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi, defined in assrt-mcp/src/core/agent.ts at line 569 inside an 11-line parseScenarios function. The plan is Markdown. A case starts with #Case N:, #Scenario N:, or #Test N:, followed by 2-5 sentences in plain English describing what to click, type, and verify. There is no schema file, no compile step, no AST. If Assrt disappeared tomorrow you could write a replacement runner in a weekend because the format is the regex. Compare that to the reverse-engineering bill for a closed vendor's test record format.

How does Assrt differ from testRigor, Mabl, or an AI platform with a 'generate tests' button?

Platforms of that shape run the write/run/repair loop entirely inside their cloud. You log in, click a button, a proprietary LLM drafts a test in a proprietary format, the runner executes it, and if it fails, the platform self-heals the test silently. The artifact never touches your disk. The exit path is a manual export wizard and the audit trail is whatever the vendor chooses to surface. Assrt runs the same three stages as three MCP tools against a file you can cat, grep, and git diff. The runner under the hood is the open-source Playwright MCP subprocess. The LLM is your own Anthropic or Gemini key. The repair is a paste-over. If Assrt's hosted dashboard shuts off, the file on your disk keeps working.

Does 'automated test automation' require a cloud dashboard to be useful?

No. Cloud dashboards are useful for sharing runs with teammates, but they are not the loop. The loop is: agent writes a plan, runner executes it, agent reads the results and edits the plan when something failed. That loop runs entirely on a laptop with no network other than the LLM API call. Assrt's optional cloud at app.assrt.ai exists only to share runs; toggling it off does not change the runtime. assrt_test still runs, /tmp/assrt/scenario.md still parses, assrt_diagnose still returns a #Case block. The closed competitors start failing the moment their cloud is down.

What actually runs inside the browser when assrt_test executes a plan?

A subprocess of the official Playwright MCP server, spawned by McpBrowserManager in assrt-mcp/src/core/browser.ts. When the agent calls the navigate tool in Assrt's agent.ts, that call is forwarded over stdio as a browser_navigate call to the Playwright MCP, which drives a real Playwright-controlled Chromium (or your own Chrome in --extension mode). The tool names in agent.ts (navigate, snapshot, click, type_text, select_option, scroll, press_key, wait) map one-to-one onto Playwright MCP's official tool surface. There is no proprietary browser driver. This is why tests that pass in Assrt will behave the same when reproduced with raw Playwright, and why an Assrt scenario is portable in practice, not only on paper.

What does the 1000 ms debounce in scenario-files.ts actually do?

It prevents a burst of saves from triggering a pile of Firestore writes. When the agent edits /tmp/assrt/scenario.md the fs.watch() callback fires synchronously, a setTimeout schedules syncToFirestore() 1000 ms later, and any further changes in that window reset the timer. Only the last write hits the cloud. This matters because an agent pasting a multi-line corrected #Case block can produce several fsync events in sub-second intervals, and the naive implementation would spam the API. The skipping logic at line 136 compares the current content against lastWrittenContent so the watcher ignores echoes of its own writes. You can disable the sync entirely by passing a scenarioId that starts with local- (line 94).

Can I run this without an LLM API key at all?

Not the authoring and repair parts. assrt_plan and assrt_diagnose both call an LLM (Anthropic Claude Haiku by default, Gemini if you prefer) because those are the tools that write words. assrt_test calls an LLM too, because the agent that drives the browser during execution needs to reason about the accessibility tree. The cheapest model Assrt supports, claude-haiku-4-5-20251001, runs a typical #Case for cents. You supply your own key. There is no Assrt-side billing for the LLM. If you want a fully offline run you can record a scenario to traditional Playwright .spec.ts and run that without any model in the loop, trading the self-repair ability for zero per-run cost.

What is the smallest plan.md that exercises the whole loop?

Three lines. '#Case 1: Homepage loads' followed by 'Navigate to https://example.com and verify the heading is visible.' Save it to /tmp/assrt/scenario.md. Run assrt_test with url http://localhost:3000 and scenarioId derived from the file's metadata (or just pass the plan text inline on the first run). If it fails, run assrt_diagnose with the same plan and the failure, paste the Corrected Test Scenario section over line 1, save, run assrt_test again. That is the minimum reproducer for the full write-run-repair cycle and it fits on a Post-it note.

The loop, on disk

One Markdown file. Three MCP tools. Zero dashboards between them.

0 tools, 0 file, 0 lines of parser, and a loop that closes without anyone logging into a vendor cloud.

Try Assrt free

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.