QA automation, judged by the files it leaves on disk
Every guide ranking for this keyword defines QA automation in abstractions: the pyramid, the benefits, the frameworks list. None of them names a single file a real test run should produce. This one starts from the opposite end. If you and I agree on which artifacts a run has to leave behind, most of the theory answers itself.
$ ls /tmp/assrt/<runId>/
The frame: what QA automation actually is
A QA automation run is a function whose inputs are a plan, a URL, and some variables, and whose outputs are files on your disk. Everything in this guide flows from that definition. You cannot debug a red build from a dashboard alone; you need the video, the screenshots, and the events timeline, sitting in a directory you can tar up, attach to a bug, or commit to a draft PR. The first deliverable of any QA automation tool, before features and dashboards, is a well-shaped artifact tree.
The pipeline looks like this. On the left, three inputs. In the middle, one process (a Playwright-driven agent reading your plan). On the right, the four artifacts a reviewer will actually open when something breaks.
Plan → Agent → Artifacts
The anchor fact
Every Assrt run writes its artifacts to /tmp/assrt/<runId>/. Screenshots land in screenshots/ with filenames generated by the literal formatter on line 468 of assrt-mcp/src/mcp/server.ts: ${index.padStart(2,'0')}_step${stepN}_${action}.png. The video (recording.webm) sits next to a self-contained player.html that rebinds keyboard keys 1 2 3 5 0 to 1x, 2x, 3x, 5x, 10x playback. The plan lives at /tmp/assrt/scenario.md with `#Case N: name` headers parsed by the regex at src/core/scenario-files.ts. Verify it yourself: run npx assrt-mcp, trigger a test, then ls -R the directory.
The artifact tree, seen through ls -R
Before opening the code, it is worth looking at the filesystem. This is the output of listing one real run directory. Notice that the step numbers and action names are already embedded in the filenames; you can reconstruct the narrative of the test without opening a single file.
What each artifact is for
Six files, each doing a job that would take a paragraph of configuration in a traditional QA automation stack. The goal is that a new engineer joining your team can understand the QA posture from one ls.
scenario.md
The plan, in English, grouped into `#Case N: name` blocks. Versioned. Diffable. Reviewable by a PM. Re-runnable by any Playwright MCP agent after Assrt is gone.
events.json
Structured timeline. Every navigate, click, assertion, reasoning trace, and improvement suggestion with a timestamp. Grep it in CI.
recording.webm
A WebM video of the entire run. Sits next to a self-contained player.html that maps 1/2/3/5/0 to 1x/2x/3x/5x/10x playback.
screenshots/
One PNG per visual action. Filenames encode the step index and action so `ls` is already the whole outline.
results/<runId>.json
The scoreboard. A flat pass/fail per scenario with the full assertion list. One file your CI step can parse without pulling a client library.
~/.assrt/browser-profile/
Persistent cookies and localStorage. Sign in once; every future run starts in a logged-in state. Delete the folder to reset.
The plan file, in plaintext
This is the entire contract between the person authoring the test and the agent running it. No decorators, no fixtures, no page-object model. Plain prose steps, grouped by #Case N: headers. A PM can read it and open a PR that modifies a step. An agent can read the same file and drive the browser.
The code that writes the artifacts
This is the single block of TypeScript responsible for giving every run a portable, auditable artifact tree. If you have wondered where a claim like “screenshots are named by step” actually lives in the source, this is it.
“If your QA automation run does not leave five files I can grep, it is not automation, it is a green dashboard.”
Field note, Assrt engineering
What it looks like to run one
Here is a real session. The command spawns the local MCP server, starts the video server on a random port, and walks through the two-scenario plan from earlier in this guide.
Proprietary DSL vs plaintext plan
Every commercial QA automation tool at the top end of the market (Testim, mabl, Katalon Studio) stores your tests in a vendor format that only their runtime understands. Migrating off is a rewrite. The plain-md plan is a different trade.
What your tests look like under the hood
tests live in a vendor YAML or graphical DSL, stored in their cloud. no local copy you can commit. leaving the vendor costs a rewrite. artifacts live behind an api you rent.
- Proprietary YAML / DSL
- Tests stored in vendor cloud
- No portable export on cancel
- Artifacts behind a rented API
Side by side: artifact portability
Cost matters, but the axis this guide cares about is whether you own the artifacts. The table below is deliberately narrow: it is only about what lives on your disk at the end of a run.
| Feature | SaaS QA automation ($7.5K/mo tier) | Assrt (artifact-first) |
|---|---|---|
| Test plan format | Proprietary YAML / graphical DSL | Plaintext .md with `#Case N:` blocks |
| Plan portability after cancel | Unexportable vendor format | Plan file stays in your repo |
| Video recording per run | Cloud-only, requires subscription | recording.webm on disk |
| Screenshots named by step | Opaque IDs in a vendor UI | NN_stepN_<action>.png |
| Structured events log | Behind an API you rent | events.json on disk |
| Runs without a cloud dependency | Requires vendor cloud | 100% local or BYO-LLM |
| Software cost | $7.5K/mo at team scale | $0 (open source) |
| Browser engine | Proprietary runner | Real Playwright via @playwright/mcp 0.0.70 |
Five steps to an artifact-first setup
If you are starting from nothing, or migrating off a vendor YAML, this is the shortest honest path. Each step produces a real file or a real run; there is no "planning phase."
Install the runner
`npm i -g assrt-mcp` (or run ad-hoc via `npx assrt-mcp`). No account. No API key of ours. You bring an Anthropic or Gemini key for the planning model.
Write the plan
Create scenario.md with a `#Case 1: ...` block. A dozen English lines beat a hundred lines of Playwright. The regex that parses the headers is in src/core/scenario-files.ts.
Run it
`npx assrt-mcp --url https://yourapp.local --plan scenario.md`. The runner spawns @playwright/mcp 0.0.70 under the hood with `--output-mode file --output-dir ~/.assrt/playwright-output`.
Read the artifacts
Open /tmp/assrt/<runId>/video/player.html. Press 5 to go 5x. ArrowRight to jump 5 seconds. The exact moment a step fails is visually obvious within the first minute of watching.
Wire it into CI
In GitHub Actions (or any runner), call the CLI with `--headed=false`, upload /tmp/assrt/<runId>/ as a workflow artifact, and parse results/latest.json for the exit status. The artifact bundle is portable across CI vendors.
The numbers that matter
Not the ones typical guides cite (pyramid percentages, coverage targets). The ones you can verify against the source.
A QA automation health checklist
Run it against whichever tool you have today. If fewer than seven of these are true, your test runs are not auditable, which means your green dashboard is a lie you have agreed with.
Every check is a file on disk, not a feature in a brochure
- Plan file lives as plaintext in a reviewable location (not a cloud-only DSL)
- Every run gets its own directory keyed by run ID, not overwritten in place
- Screenshots encode step number and action in the filename, not opaque IDs
- A WebM video lives next to the screenshots for that run
- A self-contained player.html opens the video without network access
- An events.json timeline records every action, assertion, and reasoning trace
- A results file captures pass/fail at scenario granularity, not just overall
- Browser profile persists between runs so auth state survives
- CI uploads the whole run directory as a single artifact bundle
Run one test the artifact-first way
Pull the open-source MCP server, write ten lines of scenario.md, and watch your first /tmp/assrt/<runId>/ directory fill up in under three minutes. No account, no credit card, no cloud.
Get started →Questions people actually ask about QA automation
What is QA automation, really, stripped of the buzzwords?
QA automation is a program that drives your application the way a user would and writes down, to disk, what happened. That definition is deliberately boring: it excludes dashboards, pyramids, and vendor pitches, and it puts the emphasis on the only thing that matters at 2am when a release breaks — can you read what the test did. A QA automation setup that does not produce a scenario file, a video, a sequence of screenshots, a structured events log, and a pass/fail result JSON is not automation. It is a screensaver that occasionally returns green.
Why organize a guide around artifacts instead of the test pyramid?
Because the pyramid is a prioritization model, not a guide. Every top-ranked 'QA automation guide' on Google reuses the same three levels (unit, integration, end-to-end), lists benefits, lists frameworks, and ends. None of them answers 'what should be on my disk after a run.' Artifacts are concrete, portable, reviewable in a PR, replayable by a teammate. The pyramid is advice. The artifact tree is evidence.
What artifacts should a real QA automation run leave behind?
Minimum five. One, the plan: the English or pseudo-English description of what was tested (in Assrt this is /tmp/assrt/scenario.md, a plaintext file with `#Case N: name` sections). Two, the video: a WebM recording of the whole run (assrt writes it to /tmp/assrt/<runId>/video/recording.webm). Three, the screenshots: one PNG per visual step, deterministically named (assrt uses `${index.padStart(2,'0')}_step${stepN}_${action}.png`). Four, the events log: a JSON timeline of every action, assertion, and reasoning trace. Five, the result file: the structured pass/fail summary that your CI job can grep. If any of those five is missing, debugging a red build will require re-running the test.
Where exactly does Assrt write these artifacts, and can I verify it myself?
Yes. After running `npx assrt-mcp` and executing one test, look at /tmp/assrt/<runId>/ on your disk. You will find: screenshots/ populated with files like 00_step1_navigate.png, 01_step2_click.png, 02_step3_type_text.png; video/recording.webm plus video/player.html; plus a results payload. The filename format is hardcoded at assrt-mcp/src/mcp/server.ts line 468: `${String(screenshotIndex).padStart(2, '0')}_step${currentStep}_${currentAction || 'init'}.png`. The run directory creation is at lines 427-432 of the same file. The plan sync is at src/core/scenario-files.ts line 16 where ASSRT_DIR is pinned to `/tmp/assrt`.
How does the plan file differ from a Cypress or Playwright test spec?
A Playwright spec is JavaScript or TypeScript: `await page.goto(url); await page.getByRole('button').click();`. A Cypress spec is similar. Both are code: executable, versioned, but not human-first. An Assrt plan is a plain .md file shaped like this: `#Case 1: Sign in with the demo account\n1. Open /login\n2. Type demo@example.com into the email field\n3. Click Continue\n4. Verify the dashboard greeting appears`. The agent reads that, calls real browser tools, and writes artifacts. The PM reviews the plan. The engineer reviews the results JSON. The plan is the contract; neither side is forced into the other's domain.
The player.html that ships with every run has keyboard shortcuts. What are they?
Space toggles play/pause. ArrowLeft seeks back 5 seconds, ArrowRight seeks forward 5. The digit keys rebind speed: 1 is 1x, 2 is 2x, 3 is 3x, 5 is 5x, 0 is 10x. Default playback speed is 5x because nobody wants to watch a real-time browser video. The logic is inline in the generated HTML, defined in assrt-mcp/src/mcp/server.ts around lines 96-107. You can open /tmp/assrt/<runId>/video/player.html directly in any browser even after Assrt has quit; the file is self-contained.
How does this approach compare to SaaS QA automation tools like Testim or mabl at $7.5K/mo?
The expensive SaaS tools author tests in a proprietary YAML or graphical DSL, execute them in a vendor cloud, and keep the artifacts in that cloud. You rent access to your own test results. If you cancel, the specs are unexportable and the history is inaccessible. Assrt is the opposite: open-source (`npx assrt-mcp`), self-hosted, and every artifact is a file on your disk. The scenario.md will still run next week under any Playwright MCP agent, because there is no proprietary format between you and the browser. The trade is that you run and pay for your own LLM calls; at Claude Haiku 4.5 pricing (the pinned default at agent.ts line 9) that is tiny.
Is this actually Playwright or some wrapper pretending to be Playwright?
The browser is driven by `@playwright/mcp` version 0.0.70 (pinned in assrt-mcp/package.json), spawned over stdio. The tool names the agent calls (navigate, snapshot, click, type_text, scroll, press_key, select_option, wait, evaluate) are the literal Playwright MCP tool names. The difference from writing a Playwright spec by hand is not what runs the browser, it is what decides which calls to make: in Assrt the decider is an LLM reading your plan file, not a hand-coded scenario script. That means the same browser automation engine, a different planning layer.
What happens when a test fails — how do I audit it?
Open the run directory. First glance is the player.html video at 5x — in thirty seconds you see which click produced the wrong state. Then look at the numbered screenshots folder for the exact step (the name encodes the step number and action, e.g. 04_step5_click.png). Then open events.json for the structured timeline: each assertion has {description, passed, evidence} fields, where evidence is the English sentence the model wrote about what it saw (type defined in src/core/types.ts lines 13-17). This is meaningfully different from reading a Playwright CI log, which gives you a stack trace but no narrative.
Can the same plan file run against different environments (local, staging, prod)?
Yes. The plan references URLs and text, not implementation. `#Case 3: Checkout with a test card\n1. Navigate to {{baseUrl}}/cart` works against any base URL passed at run time. Assrt's assrt_test tool accepts both `url` (the target) and `variables` (a key/value map for parameterized tests). One plan file runs unchanged against localhost:3000, staging.yourapp.com, and yourapp.com, and writes per-run artifact directories you can diff between environments.
Does 'QA automation' still include manual QA, or does this replace it?
Manual exploratory testing does not die. A human finds a class of bug that a plan-based agent never thinks to check: visual subtlety, emotional friction, unexpected user paths. The right frame is that artifact-first automation handles the regressive cases (the flows you have already decided matter) and frees the humans to chase the unknown. A QA team running artifact-first automation on its top 40 flows can redirect 60% of the hours they used to spend re-running the same checkout test to actually breaking the product in creative ways.
How long does it take to add a new test to an artifact-first QA automation setup?
Writing the plan: one to three minutes, because it is English. Running it the first time: a few minutes while the agent finds the right selectors and stabilizes. From there, every future run is headless, in CI, in the background, and produces the same artifact tree. Contrast with writing a Playwright spec: you open the inspector, find stable selectors, write the code, debug the flake, add waits, add masks, parameterize env vars. The plan skips every step that is not about deciding what to test.
Guides adjacent to this one
Keep reading
E2E Testing Frameworks, Compared
A side-by-side of the actual runners powering QA automation in 2026.
Visual Regression Tutorial
How page-level visual regression works without a pixel-diff baseline.
Automated Test Automation
Plan-to-artifact automation pipelines, without the YAML cage.
How did this page land for you?
React to reveal totals
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.