Automation tools in QA: rank them by what lands on disk

Every "top 10 automation tools in QA" listicle ranks by feature checklists. I want to propose a harder test: after one real run, what files are in the folder? If the answer is a vendor dashboard URL and nothing else, you do not own your tests. This page walks through the exact artifact folder Assrt writes, with file paths, sizes, and the source lines that produce each file.

Read the writer code on GitHub Or try Assrt live →

Matthew Diakonov, Written with AI

Published April 20, 202610 min read

4.9from MIT-licensed and used in production

Every artifact replays with zero network and zero vendor login

scenario.md and results/latest.json are diffable in git

Standalone player.html ships with every run, no SDK required

The portability test

A new way to rank automation tools in QA

Run one test, then open the folder

Can you edit the plan in plain text?

Does the recording play in VLC?

Is the result a JSON with a schema?

Would it replay if the vendor vanished?

0:00 / 0:05

What Assrt writes, every single run

Run one assrt_test call. Here is the exact tree on disk afterwards. No dashboard, no API calls, no server-side rendering required to see what happened. All paths verified in assrt-mcp/src/mcp/server.ts and assrt-mcp/src/core/browser.ts.

The run folder

One MCP call, six artifacts

The agent takes three inputs (an MCP call, a markdown plan, a live page) and emits six files you keep forever. Nothing in the fan-out below depends on assrt.ai being online.

assrt_test fan-out

The six artifacts, one at a time

Each card below maps to a real file on disk after a run. Hover any card to see it emphasized; the descriptions list the exact format, the exact location, and the tool you can use to open it with zero Assrt knowledge.

scenario.md

The test plan, in #Case N: name markdown. One-character shell edit is a valid test rewrite. fs.watch reloads it with a 1 second debounce.

results/latest.json

Structured run report. Documented fields: runId, passed, passedCount, failedCount, duration, scenarios[], screenshots[], videoPlayerUrl, cloudUrls.

video/recording.webm

1600x900 VP9. Red cursor dot, click ripples, keystroke toasts, green heartbeat pulse in the corner. Plays in VLC; no vendor player required.

video/player.html

Single self-contained HTML5 page. Keyboard shortcuts 1, 2, 3, 5, 0 set playback to 1x, 2x, 3x, 5x, 10x. Space toggles play. Arrows seek +/-5s.

screenshots/

Numbered PNGs: 00_step0_init.png, 01_step1_screenshot.png, and so on. Step index zero-padded; any diff tool can compare two runs.

execution.log + events.json

Plaintext timestamped log plus full structured event trace. Feed events.json into any observability stack without a Assrt-specific parser.

The test plan, in full

This is a real example of what Assrt writes to /tmp/assrt/scenario.md. Two test cases, bullet steps, no selectors, no YAML structure tags. The parser is a single regex in assrt-mcp/src/core/agent.ts that splits on /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi. You can write this file by hand in any text editor and the agent will run it.

/tmp/assrt/scenario.md

The result, structured and readable

And the matching run report, written to /tmp/assrt/results/latest.json and a historical copy at /tmp/assrt/results/{runId}.json. Every field documented. No XML, no vendor-specific enums, no compile step between you and the data.

/tmp/assrt/results/latest.json

Vendor YAML vs Assrt markdown

Same test, two very different files. The left one only compiles through the vendor's recorder. The right one is the entire test.

The same 'signup validation' test, in two tools

# Proprietary vendor test, compiled from a visual recorder.
# You cannot edit this file by hand without breaking the recorder.
id: 7f2c-4a8e-b
testName: signup_validation
steps:
  - $action: click
    $target: { $xpath: "//header//button[contains(@class,'primary')]" }
    $retryPolicy: { $maxAttempts: 3, $wait: 2000 }
  - $action: type
    $target: { $css: "form#signup input[name='email']" }
    $value: { $secret: "ref:env.TEST_EMAIL" }
  - $action: assertVisible
    $target: { $css: "div.toast.toast-success" }
meta:
  createdBy: recorder@vendor.io
  recorderVersion: 18.2.1
  proprietaryFlags: [ "internal_only", "compiled" ]

6% fewer lines and zero vendor dependency

Real numbers from the implementation

Not marketing figures. These are constants and behaviors verified in the Assrt source tree. Every number below has a file-and-line reference available on request.

0×900Video resolution

0LLM retries per call

0Snapshot cap (chars)

0fs.watch debounce (ms)

Video overlays

0 injected before record

Red cursor dot, click ripples, keystroke toasts, and a green heartbeat pulse in the bottom-right corner that forces macOS compositor redraws. Implemented in assrt-mcp/src/core/browser.ts.

Player speeds

1x, 2x, 3x, 0x, 0x

Bound to keyboard keys 1, 2, 3, 5, 0 in the standalone player.html. Default on open is 5x. Arrow keys seek +/-5s. Space toggles play.

Retry backoff

0s / 0s / 0s

Linear, not exponential. The formula is literally (attempt + 1) * 5000. Fatal API errors skip retry entirely.

The tax of tools that fail the portability test

If you have shipped with one of these tools, you have paid at least one of the following costs. Most teams pay several, every quarter.

locked in a SaaS dashboard$5K/mo per parallel runnerrecorder version driftreplay URLs expire in 7 daysproprietary YAMLexport via paid API onlyselectors compile at deploysecret rotation breaks testsrecorder tax on every editvendor-only video player

A five-step portability test for any QA tool

Run this checklist on whatever automation tool you are evaluating next. If it fails step two or three, the artifacts belong to the vendor, not to you. If it passes all five, your tests survive acquisitions, price hikes, and migrations.

Open. Play. Parse. Zip. Survive.

Open the scenario in a plain text editor

Can you read and edit the test in vim, VS Code, or GitHub's web UI without launching the vendor tool? Assrt stores scenarios as /tmp/assrt/scenario.md in #Case N: format. If your tool fails this step, every future edit pays the recorder tax.

Play the recording outside the vendor's player

Assrt writes recording.webm as raw VP9. It opens in VLC, in ffmpeg, in any browser. The included player.html is a convenience, not a dependency. If your tool's replay only works inside a logged-in dashboard, you do not own that evidence.

Parse the run result with any JSON reader

results/latest.json has a documented schema (runId, passed, passedCount, failedCount, duration, scenarios[], screenshots[], videoPlayerUrl, cloudUrls). jq handles it, Python's json module handles it, Go's encoding/json handles it.

Zip the folder, mail it, replay on another machine

The entire /tmp/assrt/{runId}/ directory is self-contained. No absolute URLs, no cloud references, no vendor SDK. A teammate on a plane with no internet can still watch the exact failure at 5x speed.

Survive a 10x price increase from the vendor

Assrt is MIT licensed and self-hosted. The scenario, the runner, the MCP server, and the browser profile all live on your laptop. If the project gets acquired and someone raises the price, nothing breaks. Test portability IS pricing insulation.

How Assrt passes each check

The same five tests, scored against what Assrt actually does. Every check reflects behavior you can verify by running npx assrt-mcp once.

Assrt portability scorecard

Test plan opens in vim without the vendor app
Video replay plays in VLC
Result file has a documented JSON schema
Run folder replays offline from a zip
License lets you fork if pricing breaks
Runtime sends zero network calls beyond Playwright and the LLM

Assrt vs paid SaaS automation tools in QA

The features that matter when you open the folder.

Feature	Paid proprietary recorder	Assrt
Test plan format	Proprietary YAML / JSON emitted by a recorder	Plain markdown: #Case N: name followed by bullet steps
Run video format	Cloud-hosted replay behind auth	recording.webm (1600x900 VP9) + standalone player.html
Result access	Dashboard only; scrape an API to export	results/latest.json with documented schema
Retry policy	Selector retry table configured per step	Fresh accessibility snapshot on failure; 4-attempt LLM retry with 5s/10s/15s backoff
Offline replay	Requires vendor cloud	Zip the run folder, open on any machine
Pricing model	$5K to $7.5K per month at enterprise scale	MIT licensed; self-hosted; pay only your LLM bill
CI integration	Install the vendor's CLI + inject a token	Pass the markdown as the plan arg; read the JSON result

Prices cited are public enterprise quotes from platforms in the same category, April 2026.

The one scenario the others cannot run

Offline replay of a failed test. Copy the run folder to a USB stick, hand it to a teammate without internet, open the player, and see the failure at 5x speed. The folder contains its own recording, its own player, its own plan, and its own structured result. Nothing in it phones home.

Anchor fact

Per-run folder at /tmp/assrt/{runId}/ ships a 1600x900 VP9 recording.webm next to a standalone player.html whose keyboard shortcuts 1, 2, 3, 5, 0 bind to 1x, 2x, 3x, 5x, 10x playback. No asset paths are absolute. No script tag points at an external origin. Zip the folder, unzip on another machine, the replay still works.

Quickstart: ship one portable test in 3 minutes

One shell line installs the MCP server, one MCP call runs the test, and the run folder is written to /tmp/assrt/ on your machine. That is it.

From zero to a replayable run

Want to pressure-test your current QA tool against the portability checklist?

Bring your vendor's artifact folder. We will open every file live on the call and map it to the five-step test above.

Automation tools in QA: common questions

Why should I pick automation tools in QA by the artifacts they leave on disk, not by feature lists?

Feature lists rot. The artifact folder is what actually stays in your repo. If your automation tool stores scenarios inside a SaaS dashboard, videos behind an auth URL, and results inside a proprietary format, your QA suite dies the day the vendor raises the price or shuts down. The only way to know whether you own your tests is to open the folder the tool writes to after a run. For Assrt the folder is /tmp/assrt/{runId}/ and every file in it is either plain text, JSON, PNG, WebM, or standalone HTML. Nothing depends on the assrt.ai website to keep working.

What exact files does Assrt write on every test run?

The MCP server writes /tmp/assrt/scenario.md (the test plan in markdown, #Case N: name format), /tmp/assrt/scenario.json (UUID, name, URL), /tmp/assrt/results/latest.json (structured result), and a timestamped per-run folder /tmp/assrt/{runId}/ containing screenshots/00_step0_init.png, 01_step1_screenshot.png ..., video/recording.webm (1600x900 VP9), video/player.html (standalone HTML5 player with keyboard shortcuts 1, 2, 3, 5, 0 for speeds 1x, 2x, 3x, 5x, 10x and Space / Arrow keys for seek), execution.log, and events.json. See assrt-mcp/src/core/browser.ts for the video writer and assrt-mcp/src/mcp/server.ts for the scenario and result writers.

What does the recording actually look like?

Playwright MCP records the raw window at 1600x900 with VP9 codec. Before each frame, Assrt injects four visual overlays via injected JavaScript: a red cursor dot that follows the pointer, click ripples that expand out from clicks, keystroke toasts in the top-right corner that show each key press as it happens, and a green heartbeat pulse in the bottom-right corner used to force the macOS compositor to redraw reliably. All four are implemented in assrt-mcp/src/core/browser.ts around line 29 to 98. The file is a plain .webm you can open in VLC or ffmpeg without Assrt at all.

How do retries work when the model or the browser flakes?

The agent retries up to 4 times total (1 initial + 3 retries) on LLM 429, 503, or an error matching /overloaded/i. Backoff is (attempt + 1) * 5000ms, so the delays are 5s, 10s, 15s. Fatal API errors (anything matching /tool_use|tool_result|invalid_request/i) skip retry and fail immediately. Browser tool failures are handled differently: instead of a sleep-and-retry, the catch block calls browser.snapshot() again, attaches the fresh accessibility tree to the tool result, and lets the model re-reason. That replaces the classic retry-with-the-same-selector loop you get from most automation tools in QA.

Is the video player really standalone?

Yes. player.html is a single self-contained HTML5 file that loads recording.webm from the same directory. It has no external script tags, no vendor SDK, no analytics pixels. Keyboard shortcuts: 1, 2, 3, 5, 0 set playback speed to 1x, 2x, 3x, 5x, 10x; Space toggles play; Left/Right seek by 5 seconds. The default speed on first open is 5x (configurable with autoOpenPlayer: false on the MCP call). You can zip the run folder, mail it to a teammate on a plane, and the replay still works.

Where are scenarios stored, and can I version them in git?

Runtime source of truth is /tmp/assrt/scenario.md, which is plain markdown. Editing the file triggers fs.watch with a 1000ms debounce that syncs edits back to Firestore for cross-device sharing (assrt-mcp/src/core/scenario-files.ts, debounce timer). But the markdown is the canonical artifact, and you can commit the same file into your repo. In CI you pass the markdown as the 'plan' parameter on assrt_test, bypass the watcher, and log only the structured result. git diff shows real prose changes, so PR reviews work.

How does Assrt compare to paid SaaS automation tools in QA on pricing and lock-in?

Paid enterprise QA automation platforms land in the $5K to $7.5K per month range once you add parallel execution and test data features. Tests live in their web UI, video replays require their cloud, and exporting the suite typically means scraping a JSON API. Assrt is MIT licensed, the test agent and MCP server run entirely on your machine, and the per-run artifact folder is enough to replay, debug, and attach to a Jira ticket without an account anywhere. The only optional cloud call is the scenario sync to Firestore, which you can disable with ASSRT_NO_SAVE=1.

Does Assrt generate real Playwright code like competitors promise?

Not as .spec.ts files, and that is deliberate. Generated Playwright code with explicit selectors becomes maintenance debt the moment the DOM moves. Assrt instead stores the scenario as prose (#Case N: name followed by bullet steps) and resolves each step against the live accessibility snapshot through Playwright MCP at runtime. The tool calls themselves are Playwright calls (click, type_text, select_option, navigate, press_key, evaluate, assert), defined in assrt-mcp/src/core/agent.ts lines 16-196, so the browser driver is genuinely Playwright. What you do not get is a brittle .spec.ts sitting in your repo waiting to break.

What does a good 'portability test' look like for any QA automation tool?

Five checks. One, can you open the test file in a plain text editor without running the tool? Two, does a run produce a video you can play in VLC, or only a replay viewer that requires a login? Three, is the result a machine-readable file with a documented schema, or an HTML page you have to scrape? Four, can you zip the run folder and mail it to someone who has never heard of the tool, and can they still see what failed? Five, does the pricing model still let you keep running the tool if the vendor 10x's the price tomorrow? Assrt passes all five by design; most proprietary automation tools in QA fail at least three.

How do I try Assrt end to end without committing?

One shell line: npx assrt-mcp. That registers the MCP server with your coding agent (Claude Code, Cursor, or anything supporting MCP). Then from your editor call the assrt_test tool with a URL and a markdown plan. The first run creates ~/.assrt/browser-profile (persistent Playwright profile), writes /tmp/assrt/scenario.md, produces /tmp/assrt/results/latest.json, and auto-opens the standalone video player at playback speed 5x. Nothing is sent anywhere you did not explicitly configure.