A file, not a dashboard

AI generated regression tests, but show me the file

The top five results for this phrase describe what AI regression generators do. None of them show the artifact: the actual file the AI produces, where it lands on your disk, whether you can grep it, diff it, or check it into git. That is the only question that matters when you are going to maintain the thing for a year. This page answers it with a file path, a line number, and a watcher loop.

Matthew Diakonov, Written with AI

Published April 19, 20268 min read

Generate a plan locally Read scenario-files.ts on GitHub

AI regression tests as markdown

Generated by Haiku 4.5, synced by a 1-second fs.watch

Run `assrt_plan` on your URL

5-8 `#Case` blocks land at /tmp/assrt/scenario.md

Hand-edit the prose, save the file

fs.watch debounces 1s and syncs to cloud

Git diffs, PR review, zero vendor lock-in

0:00 / 0:05

4.9from Assrt MCP users

Generated tests land at /tmp/assrt/scenario.md (scenario-files.ts line 16)

fs.watch + 1000ms setTimeout debounce (scenario-files.ts line 102)

Generator prompt caps output at 5-8 `#Case` blocks (server.ts line 236)

planSnapshot per run for stable regression history (scenario-store.ts line 169)

The question the SERP dodges

Open the first five Google results for “ai generated regression tests” and count how many show you the file the AI produces. The number is zero. You get marketing about self-healing selectors, anecdotes about a 70% reduction in manual work, lists of vendors. Every page stops before the interesting line: what is the on-disk format, and where does it land?

That omission is deliberate. If the answer is “a record in our SaaS”, the rest of your regression tooling is coupled to a subscription. If the answer is “a proprietary YAML export blob”, you can technically download it, but nothing outside the vendor UI can read it. Either way, your regression tests are not text, which means they cannot live in the same pull request as the feature they cover, and your team loses the only review workflow that actually catches regressions early: a readable diff.

Assrt is built on the opposite bet. The AI generator writes a markdown file. The file sits at a path you can ‘ls’. The watcher loop under the hood makes hand-edits first-class. The rest of this page is evidence for that claim, starting with the two files side by side.

What the AI generation looks like, two ways

# exported from acme-test-cloud # proprietary format, only runs inside the vendor SaaS test_id: 0f8b3a2d-92e1-4c0a name: Sign in with demo account version: 7 author: ai-synthesizer@system region: us-east-2 last_run: 2026-04-18T09:12:44Z session_key: a9f0d1...REDACTED actions: - kind: NAV target_selector_id: 148301 payload_ref: scfg://bucket/14/22/8ff.json - kind: TYPE target_selector_id: 148312 payload_ref: svar://demo_email_v3 - kind: CLICK target_selector_id: 148317 policy: auto_heal - kind: ASSERT_TEXT target_selector_id: 148322 payload_ref: svar://welcome_text_v2 # note: this file is an export artifact, # not a primary source. changes must be made # in the web editor at app.acme-test.com

payload_ref:// points at a private bucket, unreadable outside the SaaS
target_selector_id is an opaque integer mapped inside the vendor DB
session_key ties the file to a single authenticated account
The export is an artifact, not a source; primary edits happen in their web UI

A real AI generation, in full

This is what lands at /tmp/assrt/scenario.md after one assrt_plan call against a typical SaaS marketing plus auth app. Three cases, each under ten lines, each one independent so case two does not depend on case one's state. The prompt caps the output at eight cases (“Generate 5-8 cases max” at server.ts line 236) so the first run is narrow and reliable, and you grow the file by hand as you cover more flows.

/tmp/assrt/scenario.md

The 14 lines that keep the file editable

Most AI test generators treat the output as one-shot: generate, display, save to their DB, never look at the file again. Assrt keeps a live fs.watch on the scenario file so human edits and AI generations both flow through the same path. Under the hood it is fourteen lines of Node, reproduced below verbatim from the open source package.

assrt-mcp/src/core/scenario-files.ts

Two details make this work. First, the debounce: a tight 1000ms window, short enough that the PR-review feel is instant, long enough that a multi-line save in vim only fires one sync. Second, the echo guard at line 136. When Assrt writes the plan itself, it stamps lastWrittenContent; on the next tick, the watcher reads the file, sees the content matches, and does nothing. Only edits the runtime cannot explain, in other words, your edits, propagate.

How the round-trip flows

The AI generation is the first write into the file. Everything after is a round-trip between your disk and the cloud plan record, mediated by that single fs.watch.

scenario.md round-trip

A full edit-and-sync in the terminal

This is the loop from the outside: one generation, one hand-edit, and the sync event, all visible in stdout. No browser tab, no vendor login. The diff at the bottom is what a reviewer sees in a pull request.

regression flow

0ms debounce before fs.watch calls syncToFirestore

0max `#Case` blocks per AI generation (server.ts:236)

0regex that parses every plan (agent.ts:569)

0lines of vendor DSL the plan depends on

Why this format wins for regression, specifically

A regression test is a test you expect to re-run dozens of times against a moving codebase. The hard part is not writing it the first time, it is deciding whether today's failure is your code breaking or your test drifting. That decision is much easier when the test is a markdown file in git: you look at the last commit that touched it, you read the prose, you know.

Diffability

`git diff` already works

A two-line change in prose is a two-line diff. No screenshot of a dashboard, no vendor timeline, no “compare versions” modal. Your code reviewer reads the change the same way they review a pull request.

Per-run plan history

planSnapshot in every run record

saveScenarioRun stores the exact plan text as planSnapshot at scenario-store.ts line 169. If today's run fails and last week's passed, you can diff the two plan snapshots and see whether anyone edited the test in between.

UI drift is not your problem

The agent picks refs at run time

Because the plan says “click the Sign in button” instead of a CSS selector, the agent calls snapshot() on every run and picks a fresh [ref=eN] from the accessibility tree. A CSS rename does not break the test; a removed button does.

Portable evidence

Tar the run directory

Every run writes /tmp/assrt/<runId>/ with a WebM video, numbered screenshots, events.json, and the pass/fail JSON. Attach the tarball to the bug; anyone opens it without an account.

What the on-disk format buys you

Every row below is about the file itself, not a feature the dashboard wraps around it.

Feature	Typical SaaS AI tester	Assrt
On-disk format	Proprietary YAML export or DB row	Plaintext markdown with `#Case` headers
Where the test file lives	Vendor cloud, opaque session	/tmp/assrt/scenario.md (move into repo)
Diff in a pull request	Dashboard toggle between versions	`git diff` reads as prose
Round-trip human edits	Edit in web UI only	fs.watch + 1s debounce syncs back
Plan history per run	Hidden in the platform timeline	planSnapshot field per run record
Hand-edit without regenerating	Usually requires a new AI generation	Change the markdown, save
Export and take elsewhere	Locked-in format, unreadable outside	Copy the file, nothing else needed
Cost at team scale	~$7.5K/mo typical seat pricing	$0 plus LLM tokens

Prices are list-tier seats and exports as documented by each vendor's public pricing; your contract may vary.

Adopting AI generated regression tests without lock-in

Five steps, each one is a file move or a command. None of them require signing into a platform.

Generate the first pass with `assrt_plan`

Point the plan tool at your URL. The generator (constrained by the prompt at server.ts line 236) writes 5-8 `#Case` blocks to /tmp/assrt/scenario.md. This is the only step that burns LLM tokens for the regression file itself; everything after it is text editing.

Move the file into your repo

`mv /tmp/assrt/scenario.md tests/regression/signup.md` and commit. You have now captured the regression plan as source. Next run just points `--plan` at the repo path. If you change nothing else, every rerun is a true regression: same plan text, same agent, same pass criteria.

Edit the file when the app genuinely changes

UI drift does not usually require an edit — the agent calls `snapshot()` at run time and picks a new ref. But when intent changes (a step is added, a field is renamed in the spec, a new assertion is needed), open the file and edit the prose. One line diff, one PR, reviewed like any source change.

Re-run by scenarioId for stable history

After the first save, Assrt assigns a scenario UUID. `assrt_test({ url, scenarioId })` fetches the latest stored plan and runs it. Each run record persists `planSnapshot` (scenario-store.ts line 169) so you can always tell whether today's failure is because the test changed or the app did.

Tar the run directory as a CI artifact

Every run writes `/tmp/assrt/<runId>/` with the video, numbered screenshots, events.json timeline, and results JSON. In CI, upload that directory as a workflow artifact. No dashboard to sign into, no paid seat to review a failure; your teammates open a zip.

Generate the first scenario.md

Run `npx assrt-mcp` against your local dev server, let the planner write 5-8 `#Case` blocks to /tmp/assrt/scenario.md, then move the file into your repo. The watcher takes care of the rest.

Get started with Assrt →

Questions about AI generated regression tests

What format do most tools store AI generated regression tests in?

A vendor database row behind a dashboard. Momentic, Testim, mabl, Virtuoso, Functionize, Tricentis Tosca all persist the AI generated test as a record in their cloud, usually exposed to you as a visual editor and a proprietary YAML export you can't parse anywhere else. That means you can't grep across 400 tests, you can't diff a regression test in a pull request, you can't reconcile two copies after a merge, and you can't fork one to parameterize it without opening the vendor UI. The on-disk format matters more than any feature bullet on those pages, which is why none of the top-ranking guides mention it.

Where does Assrt put the generated test file?

At `/tmp/assrt/scenario.md`. The path is pinned in `assrt-mcp/src/core/scenario-files.ts` at line 16 as the `ASSRT_DIR` constant, then joined with `scenario.md`. Every call to `assrt_plan` or `assrt_test` writes the current plan text to that exact file. The metadata (scenario UUID, name, origin URL) lives next to it as `scenario.json`. Run artifacts land under `/tmp/assrt/<runId>/` with video, screenshots, events, and the pass/fail result. You can `tar czf regression.tgz /tmp/assrt/` and hand the whole thing to a teammate; everything is a boring filesystem path, nothing is locked to a session.

What does the AI actually generate inside scenario.md?

Five to eight `#Case N: name` blocks of plain English, max. That ceiling is explicit in `assrt-mcp/src/mcp/server.ts` inside the `PLAN_SYSTEM_PROMPT` at line 236: `Generate 5-8 cases max — focused on the MOST IMPORTANT user flows visible on the page.` Each block has a header like `#Case 1: Demo account can sign in`, followed by numbered steps in regular prose. At run time, the regex at `agent.ts` line 569 (`/(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi`) splits the file into scenarios and hands each block to the planner model. Because the file is markdown, GitHub renders it natively in PRs.

If I edit the generated file by hand, does my edit get lost on the next generation?

No. The file is watched by an `fs.watch` listener in `scenario-files.ts` at line 97. Every change kicks a `setTimeout` debounce set to 1000 milliseconds at line 102, and when that timer fires, `syncToFirestore` reads the current contents and calls `updateScenario` against the central API. So your hand-edit is the new source of truth one second after you save. If you regenerate instead, the newer plan overwrites the file, but your previous version is preserved inside the run record as `planSnapshot` (saved via `saveScenarioRun` in `scenario-store.ts`, line 169). You have a full plan history per run, keyed by run UUID.

How does this compare to a Playwright test file in my repo?

A Playwright spec is code: TypeScript that imports `test`, chains Playwright method calls, and gets compiled against your `playwright.config.ts`. An Assrt scenario file is intent: a markdown document a person wrote. Both drive the same engine (Assrt spawns `@playwright/mcp` under the hood), but the scenario file says what to verify and the spec says how to verify. The advantage for regression suites is that the scenario file does not break when a CSS selector changes. The regex parses headers, the agent calls `snapshot()` at run time to get a fresh accessibility tree, and the planner picks a new ref. The spec, by contrast, has to be republished whenever the DOM drifts.

Can I check the AI generated file into git alongside the app it tests?

Yes, and this is the point. Copy `/tmp/assrt/scenario.md` into your repo at something like `tests/regression/signup.md`, commit, push, and you have a regression test that sits in the same pull request as the code change that created the feature. Every diff is human-readable: a colleague reviews `+ Click the Submit button - Click the Confirm button` and understands the test change in one second. No vendor plugin, no visual editor, no separate account. Running the scenario is `npx assrt-mcp --url <url> --plan tests/regression/signup.md`.

What stops the fs.watch loop from syncing a generation back over itself?

A single-variable echo guard. When Assrt writes the plan to disk, `writeScenarioFile` sets `lastWrittenContent = plan` first (scenario-files.ts line 44). When the watcher fires, `syncToFirestore` reads the current contents and compares against `lastWrittenContent` (line 136); if they match, it skips the sync. The only writes that propagate to cloud storage are the ones the watcher cannot explain, i.e., human edits. This is the reason you can open the file in your editor, make a change, save, and see the central API receive a PATCH one second later without getting into a loop with the agent's own writes.

Does this only work for locally generated regression tests, or can the AI re-run the same plan later?

Each generation gets a UUID. Pre-run, the MCP server calls `saveScenario` and gets back a scenario ID (server.ts around line 410). Re-running later is `assrt_test({ url, scenarioId })`; the server calls `fetchScenario(scenarioId)`, pulls the saved plan text, and runs it against the current URL. This is how regression works in Assrt: the scenario ID is stable, the plan text is versioned, and each run produces a `planSnapshot` record so you can tell whether today's failure is because the test changed or because the app changed. The test IS the regression test, there is no separate compiled artifact.

Which model writes the AI generated regression tests?

By default, Claude Haiku 4.5 (model id `claude-haiku-4-5-20251001`, pinned at `assrt-mcp/src/core/agent.ts` line 9 as `DEFAULT_ANTHROPIC_MODEL`). Haiku is the smallest current Claude model and is used because plan generation is a cheap vision + text task: screenshot + accessibility tree in, 5-8 markdown case blocks out. You can override with `--model` to Sonnet 4.6 or to Gemini 3.1 Pro (default Gemini model is set at line 10). None of this is hidden; every inference call is made from your machine with your own API key, stored in macOS Keychain via `assrt-mcp/src/core/keychain.ts`.

Does the plan file work in CI, or is it tied to my laptop?

The plan file is portable text. In CI, check it into the repo, install the npm package, and run `npx assrt-mcp --url http://localhost:3000 --plan tests/regression/signup.md --headed=false`. The agent launches Chromium headless through `@playwright/mcp`, runs the scenario, and writes results under `/tmp/assrt/<runId>/`. Upload that directory as a workflow artifact and the plan, video, screenshots, and events.json show up in the Actions UI. Because the generator output is a text file, it does not depend on any dashboard or session that expires.

How many regression cases should one scenario.md hold?

Five to eight when auto-generated; as many as you want by hand. The 5-8 ceiling only constrains the AI generator (server.ts line 236) because narrow focused plans are more reliable for the first pass. When you edit the file yourself, add as many `#Case` blocks as you need; the regex at agent.ts line 569 does not care. A sensible pattern is one scenario.md per user flow (signup.md, checkout.md, settings.md), each holding 5-10 cases. The per-file cap keeps any single run under a few minutes and keeps failures easy to diff.

What if I want to diff two AI generations to see what changed?

That is just `git diff scenario.md` once the file is checked in. Because the format is unindented markdown with one header per case, a diff reads cleanly even for a non-tester. This is the specific capability vendor dashboards cannot match. When Momentic or Virtuoso regenerates a test, the change is a database mutation; to see it you open the UI, toggle two versions, and read a rendered timeline. With a markdown file under git, the regeneration is a diff in the PR, and your code reviewer can approve or reject it the same way they approve a source change.

Debounce window

0 ms

The only number you need to remember about the watcher loop. Pinned at line 102 of scenario-files.ts. Short enough to feel instant, long enough that a multi-line save only fires one PATCH.