A guide organized by artifacts, not theory

QA automation, judged by the files it leaves on disk

Every guide ranking for this keyword defines QA automation in abstractions: the pyramid, the benefits, the frameworks list. None of them names a single file a real test run should produce. This one starts from the opposite end. If you and I agree on which artifacts a run has to leave behind, most of the theory answers itself.

$ ls /tmp/assrt/<runId>/

A
Assrt Engineering
11 min read
4.9from Assrt MCP users
5 artifacts per run: plan, video, screenshots, events, result
Real @playwright/mcp 0.0.70 under the hood, not a proprietary runner
Zero proprietary DSL — every test is plaintext on your disk
Claude Haiku 4.5 is the default planning model (agent.ts line 9)

The frame: what QA automation actually is

A QA automation run is a function whose inputs are a plan, a URL, and some variables, and whose outputs are files on your disk. Everything in this guide flows from that definition. You cannot debug a red build from a dashboard alone; you need the video, the screenshots, and the events timeline, sitting in a directory you can tar up, attach to a bug, or commit to a draft PR. The first deliverable of any QA automation tool, before features and dashboards, is a well-shaped artifact tree.

The pipeline looks like this. On the left, three inputs. In the middle, one process (a Playwright-driven agent reading your plan). On the right, the four artifacts a reviewer will actually open when something breaks.

Plan → Agent → Artifacts

scenario.md
variables
target URL
Assrt agent
recording.webm
screenshots/*.png
events.json
results/<runId>.json

The anchor fact

Every Assrt run writes its artifacts to /tmp/assrt/<runId>/. Screenshots land in screenshots/ with filenames generated by the literal formatter on line 468 of assrt-mcp/src/mcp/server.ts: ${index.padStart(2,'0')}_step${stepN}_${action}.png. The video (recording.webm) sits next to a self-contained player.html that rebinds keyboard keys 1 2 3 5 0 to 1x, 2x, 3x, 5x, 10x playback. The plan lives at /tmp/assrt/scenario.md with `#Case N: name` headers parsed by the regex at src/core/scenario-files.ts. Verify it yourself: run npx assrt-mcp, trigger a test, then ls -R the directory.

The artifact tree, seen through ls -R

Before opening the code, it is worth looking at the filesystem. This is the output of listing one real run directory. Notice that the step numbers and action names are already embedded in the filenames; you can reconstruct the narrative of the test without opening a single file.

/tmp/assrt/

What each artifact is for

Six files, each doing a job that would take a paragraph of configuration in a traditional QA automation stack. The goal is that a new engineer joining your team can understand the QA posture from one ls.

scenario.md

The plan, in English, grouped into `#Case N: name` blocks. Versioned. Diffable. Reviewable by a PM. Re-runnable by any Playwright MCP agent after Assrt is gone.

#Case 1: Sign in with the demo account
#Case 2: Update a profile field
#Case 3: Cancel a subscription

events.json

Structured timeline. Every navigate, click, assertion, reasoning trace, and improvement suggestion with a timestamp. Grep it in CI.

recording.webm

A WebM video of the entire run. Sits next to a self-contained player.html that maps 1/2/3/5/0 to 1x/2x/3x/5x/10x playback.

screenshots/

One PNG per visual action. Filenames encode the step index and action so `ls` is already the whole outline.

00_step1_navigate.png
01_step2_click.png
02_step3_type_text.png

results/<runId>.json

The scoreboard. A flat pass/fail per scenario with the full assertion list. One file your CI step can parse without pulling a client library.

~/.assrt/browser-profile/

Persistent cookies and localStorage. Sign in once; every future run starts in a logged-in state. Delete the folder to reset.

The plan file, in plaintext

This is the entire contract between the person authoring the test and the agent running it. No decorators, no fixtures, no page-object model. Plain prose steps, grouped by #Case N: headers. A PM can read it and open a PR that modifies a step. An agent can read the same file and drive the browser.

/tmp/assrt/scenario.md

The code that writes the artifacts

This is the single block of TypeScript responsible for giving every run a portable, auditable artifact tree. If you have wondered where a claim like “screenshots are named by step” actually lives in the source, this is it.

assrt-mcp/src/mcp/server.ts
5 artifacts

If your QA automation run does not leave five files I can grep, it is not automation, it is a green dashboard.

Field note, Assrt engineering

What it looks like to run one

Here is a real session. The command spawns the local MCP server, starts the video server on a random port, and walks through the two-scenario plan from earlier in this guide.

$ run

Proprietary DSL vs plaintext plan

Every commercial QA automation tool at the top end of the market (Testim, mabl, Katalon Studio) stores your tests in a vendor format that only their runtime understands. Migrating off is a rewrite. The plain-md plan is a different trade.

What your tests look like under the hood

tests live in a vendor YAML or graphical DSL, stored in their cloud. no local copy you can commit. leaving the vendor costs a rewrite. artifacts live behind an api you rent.

  • Proprietary YAML / DSL
  • Tests stored in vendor cloud
  • No portable export on cancel
  • Artifacts behind a rented API

Side by side: artifact portability

Cost matters, but the axis this guide cares about is whether you own the artifacts. The table below is deliberately narrow: it is only about what lives on your disk at the end of a run.

FeatureSaaS QA automation ($7.5K/mo tier)Assrt (artifact-first)
Test plan formatProprietary YAML / graphical DSLPlaintext .md with `#Case N:` blocks
Plan portability after cancelUnexportable vendor formatPlan file stays in your repo
Video recording per runCloud-only, requires subscriptionrecording.webm on disk
Screenshots named by stepOpaque IDs in a vendor UINN_stepN_<action>.png
Structured events logBehind an API you rentevents.json on disk
Runs without a cloud dependencyRequires vendor cloud100% local or BYO-LLM
Software cost$7.5K/mo at team scale$0 (open source)
Browser engineProprietary runnerReal Playwright via @playwright/mcp 0.0.70

Five steps to an artifact-first setup

If you are starting from nothing, or migrating off a vendor YAML, this is the shortest honest path. Each step produces a real file or a real run; there is no "planning phase."

1

Install the runner

`npm i -g assrt-mcp` (or run ad-hoc via `npx assrt-mcp`). No account. No API key of ours. You bring an Anthropic or Gemini key for the planning model.

2

Write the plan

Create scenario.md with a `#Case 1: ...` block. A dozen English lines beat a hundred lines of Playwright. The regex that parses the headers is in src/core/scenario-files.ts.

3

Run it

`npx assrt-mcp --url https://yourapp.local --plan scenario.md`. The runner spawns @playwright/mcp 0.0.70 under the hood with `--output-mode file --output-dir ~/.assrt/playwright-output`.

4

Read the artifacts

Open /tmp/assrt/<runId>/video/player.html. Press 5 to go 5x. ArrowRight to jump 5 seconds. The exact moment a step fails is visually obvious within the first minute of watching.

5

Wire it into CI

In GitHub Actions (or any runner), call the CLI with `--headed=false`, upload /tmp/assrt/<runId>/ as a workflow artifact, and parse results/latest.json for the exit status. The artifact bundle is portable across CI vendors.

The numbers that matter

Not the ones typical guides cite (pyramid percentages, coverage targets). The ones you can verify against the source.

0Artifacts per run (plan, video, shots, events, result)
0Playback speeds bound to keyboard digits
0Playwright MCP tool primitives (navigate, click, type_text, scroll...)
0Lines of proprietary DSL the plan file depends on
0x
default video playback speed
0
digits in the step index prefix
0
MCP tools on the server (test, plan, diagnose)
0
lines of proprietary DSL

A QA automation health checklist

Run it against whichever tool you have today. If fewer than seven of these are true, your test runs are not auditable, which means your green dashboard is a lie you have agreed with.

Every check is a file on disk, not a feature in a brochure

  • Plan file lives as plaintext in a reviewable location (not a cloud-only DSL)
  • Every run gets its own directory keyed by run ID, not overwritten in place
  • Screenshots encode step number and action in the filename, not opaque IDs
  • A WebM video lives next to the screenshots for that run
  • A self-contained player.html opens the video without network access
  • An events.json timeline records every action, assertion, and reasoning trace
  • A results file captures pass/fail at scenario granularity, not just overall
  • Browser profile persists between runs so auth state survives
  • CI uploads the whole run directory as a single artifact bundle
/tmp/assrt/scenario.mdscenario.jsonresults/latest.jsonresults/<runId>.json<runId>/video/recording.webm<runId>/video/player.html<runId>/screenshots/00_step1_navigate.png<runId>/screenshots/01_step2_click.png<runId>/screenshots/02_step3_type_text.png<runId>/events.json~/.assrt/browser-profile/~/.assrt/scenarios/<uuid>.json

Run one test the artifact-first way

Pull the open-source MCP server, write ten lines of scenario.md, and watch your first /tmp/assrt/<runId>/ directory fill up in under three minutes. No account, no credit card, no cloud.

Get started

Questions people actually ask about QA automation

What is QA automation, really, stripped of the buzzwords?

QA automation is a program that drives your application the way a user would and writes down, to disk, what happened. That definition is deliberately boring: it excludes dashboards, pyramids, and vendor pitches, and it puts the emphasis on the only thing that matters at 2am when a release breaks — can you read what the test did. A QA automation setup that does not produce a scenario file, a video, a sequence of screenshots, a structured events log, and a pass/fail result JSON is not automation. It is a screensaver that occasionally returns green.

Why organize a guide around artifacts instead of the test pyramid?

Because the pyramid is a prioritization model, not a guide. Every top-ranked 'QA automation guide' on Google reuses the same three levels (unit, integration, end-to-end), lists benefits, lists frameworks, and ends. None of them answers 'what should be on my disk after a run.' Artifacts are concrete, portable, reviewable in a PR, replayable by a teammate. The pyramid is advice. The artifact tree is evidence.

What artifacts should a real QA automation run leave behind?

Minimum five. One, the plan: the English or pseudo-English description of what was tested (in Assrt this is /tmp/assrt/scenario.md, a plaintext file with `#Case N: name` sections). Two, the video: a WebM recording of the whole run (assrt writes it to /tmp/assrt/<runId>/video/recording.webm). Three, the screenshots: one PNG per visual step, deterministically named (assrt uses `${index.padStart(2,'0')}_step${stepN}_${action}.png`). Four, the events log: a JSON timeline of every action, assertion, and reasoning trace. Five, the result file: the structured pass/fail summary that your CI job can grep. If any of those five is missing, debugging a red build will require re-running the test.

Where exactly does Assrt write these artifacts, and can I verify it myself?

Yes. After running `npx assrt-mcp` and executing one test, look at /tmp/assrt/<runId>/ on your disk. You will find: screenshots/ populated with files like 00_step1_navigate.png, 01_step2_click.png, 02_step3_type_text.png; video/recording.webm plus video/player.html; plus a results payload. The filename format is hardcoded at assrt-mcp/src/mcp/server.ts line 468: `${String(screenshotIndex).padStart(2, '0')}_step${currentStep}_${currentAction || 'init'}.png`. The run directory creation is at lines 427-432 of the same file. The plan sync is at src/core/scenario-files.ts line 16 where ASSRT_DIR is pinned to `/tmp/assrt`.

How does the plan file differ from a Cypress or Playwright test spec?

A Playwright spec is JavaScript or TypeScript: `await page.goto(url); await page.getByRole('button').click();`. A Cypress spec is similar. Both are code: executable, versioned, but not human-first. An Assrt plan is a plain .md file shaped like this: `#Case 1: Sign in with the demo account\n1. Open /login\n2. Type demo@example.com into the email field\n3. Click Continue\n4. Verify the dashboard greeting appears`. The agent reads that, calls real browser tools, and writes artifacts. The PM reviews the plan. The engineer reviews the results JSON. The plan is the contract; neither side is forced into the other's domain.

The player.html that ships with every run has keyboard shortcuts. What are they?

Space toggles play/pause. ArrowLeft seeks back 5 seconds, ArrowRight seeks forward 5. The digit keys rebind speed: 1 is 1x, 2 is 2x, 3 is 3x, 5 is 5x, 0 is 10x. Default playback speed is 5x because nobody wants to watch a real-time browser video. The logic is inline in the generated HTML, defined in assrt-mcp/src/mcp/server.ts around lines 96-107. You can open /tmp/assrt/<runId>/video/player.html directly in any browser even after Assrt has quit; the file is self-contained.

How does this approach compare to SaaS QA automation tools like Testim or mabl at $7.5K/mo?

The expensive SaaS tools author tests in a proprietary YAML or graphical DSL, execute them in a vendor cloud, and keep the artifacts in that cloud. You rent access to your own test results. If you cancel, the specs are unexportable and the history is inaccessible. Assrt is the opposite: open-source (`npx assrt-mcp`), self-hosted, and every artifact is a file on your disk. The scenario.md will still run next week under any Playwright MCP agent, because there is no proprietary format between you and the browser. The trade is that you run and pay for your own LLM calls; at Claude Haiku 4.5 pricing (the pinned default at agent.ts line 9) that is tiny.

Is this actually Playwright or some wrapper pretending to be Playwright?

The browser is driven by `@playwright/mcp` version 0.0.70 (pinned in assrt-mcp/package.json), spawned over stdio. The tool names the agent calls (navigate, snapshot, click, type_text, scroll, press_key, select_option, wait, evaluate) are the literal Playwright MCP tool names. The difference from writing a Playwright spec by hand is not what runs the browser, it is what decides which calls to make: in Assrt the decider is an LLM reading your plan file, not a hand-coded scenario script. That means the same browser automation engine, a different planning layer.

What happens when a test fails — how do I audit it?

Open the run directory. First glance is the player.html video at 5x — in thirty seconds you see which click produced the wrong state. Then look at the numbered screenshots folder for the exact step (the name encodes the step number and action, e.g. 04_step5_click.png). Then open events.json for the structured timeline: each assertion has {description, passed, evidence} fields, where evidence is the English sentence the model wrote about what it saw (type defined in src/core/types.ts lines 13-17). This is meaningfully different from reading a Playwright CI log, which gives you a stack trace but no narrative.

Can the same plan file run against different environments (local, staging, prod)?

Yes. The plan references URLs and text, not implementation. `#Case 3: Checkout with a test card\n1. Navigate to {{baseUrl}}/cart` works against any base URL passed at run time. Assrt's assrt_test tool accepts both `url` (the target) and `variables` (a key/value map for parameterized tests). One plan file runs unchanged against localhost:3000, staging.yourapp.com, and yourapp.com, and writes per-run artifact directories you can diff between environments.

Does 'QA automation' still include manual QA, or does this replace it?

Manual exploratory testing does not die. A human finds a class of bug that a plan-based agent never thinks to check: visual subtlety, emotional friction, unexpected user paths. The right frame is that artifact-first automation handles the regressive cases (the flows you have already decided matter) and frees the humans to chase the unknown. A QA team running artifact-first automation on its top 40 flows can redirect 60% of the hours they used to spend re-running the same checkout test to actually breaking the product in creative ways.

How long does it take to add a new test to an artifact-first QA automation setup?

Writing the plan: one to three minutes, because it is English. Running it the first time: a few minutes while the agent finds the right selectors and stabilizes. From there, every future run is headless, in CI, in the background, and produces the same artifact tree. Contrast with writing a Playwright spec: you open the inspector, find stable selectors, write the code, debug the flake, add waits, add masks, parameterize env vars. The plan skips every step that is not about deciding what to test.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.