An automation test framework is the files on your disk, not the vendor's brand

Every "top 10 automation test framework" list ranks vendors by logo. That is a brand rodeo, not a definition. A framework is three concrete things: a file format for the plan, a runtime contract that executes it, and a result artifact format. If you cannot point at those three files on your own disk, you do not have a framework. You have a SaaS subscription.

This guide walks Assrt as a framework under that strict definition. Every claim points at a file on disk and a line number in the source. You can fork every line.

Matthew Diakonov, Written with AI

Published April 21, 202611 min read

4.8from 189 engineers

Plan file at /tmp/assrt/scenario.md, editable in any text editor

18-tool JSON schema runs on Anthropic Claude Haiku 4.5 or Gemini 3.1 Pro

1,087 lines of open TypeScript; MIT; self-hostable

Automation test framework

the files on your disk ARE the framework

1. /tmp/assrt/scenario.md is your plan

2. fs.watch + 1s debounce syncs to cloud

3. One 18-tool JSON schema, two providers

4. UUID v4 is the only access token

5. Cursor overlay paints into your app at run time

0:00 / 0:05

3 files

“If you cannot point at the plan file on your disk, the tool contract in a versioned schema, and the result artifact in plain text, you do not have a framework. You have a SaaS subscription.”

Assrt, framework design principle

The framework as an input, hub, output

Before the file-by-file tour, here is the whole system in one diagram. Your inputs are an English plan, an app URL, and a model key. The hub is the runtime: an 18-tool JSON contract plus an fs-watched Markdown plan file. The outputs are a run report on disk, a cloud URL, a video, and optional improvement suggestions filed against your product.

Inputs → Framework → Outputs

Artifact 1: the plan file at /tmp/assrt/scenario.md

Every framework has a file format. Cypress has .cy.ts. Playwright has .spec.ts. Commercial AI QA tools have a proprietary YAML or a cloud-only editor. Assrt's file format is plain Markdown with #Case N: blocks. You read it in GitHub, grep for a string, paste it into Slack, and a human understands what the test does without loading an IDE plugin.

Here is exactly what the framework writes to /tmp/assrt/scenario.md when you load a scenario. Variables use {{VAR_NAME}} syntax and live next to the plan in scenario.json.

/tmp/assrt/scenario.md

Artifact 2: the fs.watch sync loop (the anchor fact)

The moment a scenario is loaded, the framework starts watching its own file. Save an edit in VS Code; the watcher fires. A 1-second debounce timer collects the last change, and the new content is PATCHed to app.assrt.ai via PATCH /api/public/scenarios/{id}. A running agent does not pause. The watcher is a framework-level feature, not a plugin.

The entire implementation is 22 lines in scenario-files.ts. If the scenario ID starts with local-, the watcher short-circuits and never syncs, which is how offline runs stay offline without a config flag.

scenario-files.ts: the watch loop

What the framework actually gives you

One Markdown plan file

/tmp/assrt/scenario.md holds every '#Case N:' block you have ever written. It is your editable source of truth. Commit it, diff it, grep it, paste it into Slack. No '.spec.ts' build step.

fs.watch with 1s debounce

The runtime watches the file the instant a scenario is loaded. Save in your editor, the framework PATCHes the cloud copy in the next second, without pausing a running agent.

UUID as access token

No API key. The scenario UUID is the capability. Anyone holding the URL can read, update, or post runs. Lose connectivity and the framework falls back to a 'local-' prefixed ID on disk.

18-tool JSON contract

The framework's public surface is an Anthropic.Tool[] array at agent.ts lines 16-196. Add a tool, both providers pick it up on the next boot. This is the framework API, not a wrapper.

Injected cursor overlay

CURSOR_INJECT_SCRIPT paints a red dot, click ripple, keystroke toast, and a compositor heartbeat into your app at run time. Test videos are legible. No separate trace viewer.

Artifact 3: the 18-tool JSON contract

The framework's public surface is an Anthropic.Tool[] array. One file, one source of truth, 18 entries. Adding a 19th tool (say, a read-only dom_query) is one edit here; both providers pick it up on the next boot. That shape is the API. There is no SDK, there is no YAML layer, and there is no hidden service translating between them.

agent.ts: the TOOLS array (18 entries)

Artifact 4: one schema, two providers

A framework that locks you to a single model vendor is a framework that will age badly. Assrt solves that inside the framework itself. A 24-line comprehension reshapes the Anthropic schema into Google Gemini's FunctionDeclaration format. The tool-use loop calls whichever provider has an API key present at start-up. No fork, no wrapper, no parallel type definitions. You can prove the bridge works by switching a single env var.

agent.ts: Anthropic → Gemini bridge

Four numbers that describe the framework

0Tools in the JSON schema (agent.ts:16-196)

0Lines of the agent runtime (agent.ts)

0 msDebounce before fs.watch syncs to cloud

0Model providers on one tool schema (Anthropic + Gemini)

Each value is in the source, not a benchmark. Count the entries in the TOOLS array yourself: 0 definitions between lines 16 and 196 ofagent.ts.

Artifact 5: the cursor overlay injected into your app

Every other framework records a test video, then hands it to a separate trace viewer so you can figure out where the invisible agent clicked. Assrt does the opposite: the framework paints a visible red cursor, click ripples, a keystroke toast, and a 6-pixel heartbeat pulse directly into your application's DOM under test. The compositor keeps producing frames, the CDP video recorder captures motion even when the page is still, and the video plays back as a real demo.

The overlay is injected by CURSOR_INJECT_SCRIPT in browser.ts lines 33-98. It is idempotent (the __pias_cursor_injected flag prevents double-mounts) and survives navigations because the runtime re-injects on every snapshot and restores the cursor to its last known pixel position.

browser.ts: CURSOR_INJECT_SCRIPT (excerpt)

Framework vs. vendor dashboard, line by line

Feature	Closed AI QA platforms	Assrt
Where is my test plan?	Inside the vendor's SaaS dashboard or a binary '.spec' file with framework imports	/tmp/assrt/scenario.md, plain Markdown, editable in any editor
Can I grep my tests?	Only if you have IDE integration; YAML DSLs hide fields behind keys	Yes, they are Markdown: grep '#Case' -R anywhere on disk
What format is the runner's tool contract?	Closed Python/TypeScript SDK with no versioned JSON schema	Anthropic.Tool[] array at agent.ts lines 16-196, typed, readable, one file
Switching model providers	Fork the runner, or wait for the vendor to ship a bridge	24-line comprehension at agent.ts 277-301 maps the same schema to Gemini
Cloud sync of the plan	Commit-and-push workflow or opaque SaaS storage	fs.watch on the plan file, 1-second debounce, PATCH /api/public/scenarios/{id}
Auth to share a test plan	OAuth, API keys, seat licenses	UUID v4 is the access token (scenario-store.ts line 8)
Test video readability	Separate trace viewer, scrub through frames, no cursor visible	CURSOR_INJECT_SCRIPT paints a red cursor, click ripple, and keystroke toast into the app
Cost at seat parity	$7,500/month for commercial AI QA platforms	Open TypeScript, MIT licensed, runs against your Anthropic or Gemini key

A full run, every file the framework writes

Here is what you see in your terminal when you kick off a single scenario. Every bracketed prefix maps to a file or line of code you can open.

assrt run --url ... --plan-file signup.md --video

File lifecycle inside a single run

The framework moves through these stages in order. Every stage produces or mutates a specific file. None of them are in a vendor database; all of them are on your disk.

plan → watch → edit → run → result → sync

Write the plan

writeScenarioFile(plan, meta) writes /tmp/assrt/scenario.md and /tmp/assrt/scenario.json. The plan is plain Markdown, the meta is {id, name, url, updatedAt}.

Start watching

startWatching(scenarioId) registers an fs.watch listener on the plan file, persistent=false, debounced at 1000ms. Local-only scenarios short-circuit and skip the watcher.

Edit in place

Any tool (editor, agent, shell script) that saves scenario.md fires the watcher. If the new content equals lastWrittenContent, sync is skipped to avoid echoing the runtime's own writes.

Run the agent

The 18-tool loop runs against whichever provider has an API key. Every tool call is logged; every assert accumulates evidence; complete_scenario is the only exit.

Write the result

writeResultsFile(runId, results) writes both /tmp/assrt/results/latest.json and /tmp/assrt/results/<runId>.json. The plan file is not overwritten; your source stays authoritative.

Sync to cloud

If the scenario has a real UUID (not 'local-'), the watcher PATCHes scenario.md content to /api/public/scenarios/{id}. Results can also be POSTed to /runs along with artifact uploads.

Every file path the framework touches

A scrolling list of the exact paths and line citations this page refers to. Copy any one into a shell.

/tmp/assrt/scenario.md/tmp/assrt/scenario.json/tmp/assrt/results/latest.json/tmp/assrt/results/<runId>.json~/.assrt/scenarios/<uuid>.json~/.assrt/browser-profile/~/.assrt/extension-token~/.assrt/playwright-output/agent.ts:16-196 (18 tools)agent.ts:277-301 (Gemini bridge)scenario-files.ts:97 (fs.watch)scenario-store.ts:8 (UUID=token)browser.ts:33-98 (cursor overlay)

Why the SERP answer is wrong

Top search results for "automation test framework" rank vendors or taxonomies: Linear, Modular, Data-Driven, Keyword-Driven, BDD, Hybrid. Those categories describe the shape of the test code inside each framework, not what makes a framework a framework. The interesting question is whether the files it produces are yours and whether the JSON tool schema it exposes is versioned. On both counts, most commercial AI-QA tools fail, while Assrt answers with open files and a single Anthropic.Tool[] array.

Use the brand-ranking pages to narrow a shortlist, then check each contender against the file test: where is my plan, who owns my results, and can I read the tool contract. Any framework that fails all three is not a framework. It is a service.

Walk through the file layout on your own app

30 minutes, live. We open /tmp/assrt/scenario.md on your stack, run one scenario, and show you every file the framework writes.

Frequently asked questions

What is the actual definition of an automation test framework, stripped of marketing?

A framework is three concrete things: a file format for the test plan, a runtime contract that executes the plan, and a result artifact format. Everything else is branding. Selenium's framework is a WebDriver JSON protocol plus your language's test runner. Cypress's framework is a .cy.ts file plus a command queue. Playwright's framework is .spec.ts files plus the test runner and a fixtures API. Assrt's framework is a Markdown plan at /tmp/assrt/scenario.md, an 18-tool JSON schema at agent.ts lines 16-196 executed by a Claude or Gemini loop, and a JSON run report at /tmp/assrt/results/latest.json. Judging frameworks by which 'type' they fall into (Linear, Modular, BDD, Hybrid) is a taxonomy argument that hides the only question worth asking: who owns the file.

Where does Assrt put the test file, and can I edit it in my own editor while a run is going?

The plan file is at /tmp/assrt/scenario.md. The metadata file is at /tmp/assrt/scenario.json. Results are at /tmp/assrt/results/latest.json and /tmp/assrt/results/<runId>.json. The plan file is plain Markdown with '#Case N:' blocks, so you open it in VS Code, vim, or any editor. The runtime starts an fs.watch on that file the moment a scenario is loaded (scenario-files.ts line 97). When you save, a 1-second debounce timer fires, the new content is PATCHed to app.assrt.ai via /api/public/scenarios/<id>, and the cloud copy updates without the agent pausing. The debounce prevents the watcher from echoing the tool's own writes: if the content equals lastWrittenContent, the sync is skipped (line 136).

Why is the UUID the auth token instead of a separate API key?

Because the sharing model is capability-URL, not role-based. scenario-store.ts line 8 states it explicitly: 'No auth required: the UUID v4 IS the access token.' The framework POSTs a new plan to /api/public/scenarios, the server returns a UUID, and anyone holding that UUID can GET, PATCH, and POST runs against it. This design buys two things. First, sharing a scenario is pasting a URL, which matches how developers actually collaborate on tests. Second, if central API is unreachable, the framework generates a local-only ID prefixed with 'local-' (scenario-store.ts line 126), caches the scenario at ~/.assrt/scenarios/<id>.json, and the rest of the runtime works offline. Capability URLs are a framework-level decision, not a cloud feature bolted on.

How does one JSON tool schema run on both Anthropic and Gemini without branching the framework?

There is exactly one source of truth: the TOOLS array at agent.ts lines 16-196. That array is typed as Anthropic.Tool, so Anthropic's SDK takes it directly. For Gemini, a 24-line comprehension at agent.ts lines 277-301 walks each tool, maps the JSON Schema types to Google's Type enum (string, number, boolean, array), and produces GEMINI_FUNCTION_DECLARATIONS. The runtime picks a provider at start-up based on which API key is present, and the tool-use loop (run()) dispatches identically in both branches. Adding a 19th tool is one edit in TOOLS; both providers pick it up on the next boot. That is a framework-level abstraction, not a wrapper library.

What does the 'cursor overlay' the framework injects into my app actually do?

browser.ts lines 33-98 defines CURSOR_INJECT_SCRIPT, which appends four DOM elements to document.body of your application under test: a red cursor dot (#__pias_cursor), a click ripple (#__pias_ripple), a keystroke toast (#__pias_toast), and a 6px heartbeat pulse (#__pias_heartbeat) that keeps the compositor producing frames so the CDP video recorder captures motion even during a quiet moment. Every click or type the agent issues emits a call to window.__pias_moveCursor and window.__pias_showClick so the overlay tracks agent activity. Result: your test video is legible, not just a silent recording of a page changing. Most frameworks punt this to a separate 'trace viewer.' Assrt bakes it into the runtime.

What happens in the framework when the model fails a step?

MAX_STEPS_PER_SCENARIO is set to Infinity at agent.ts line 7. That looks reckless; it is the most conservative choice a framework can make. The system prompt (agent.ts lines 220-226) states the recovery contract: on failure, call snapshot to see current state, pick a different accessibility ref, try again, scroll after 3 stuck attempts, and eventually call complete_scenario with passed=false and evidence. The scenario can only end through complete_scenario, which writes a pass/fail record. A flaky middle step does not kill a long integration run, because the loop has room to re-observe and re-plan. Classical frameworks cap at a step count that is usually too low for AI-rendered apps where one click triggers seconds of DOM churn.

If the framework is files on disk, what about secrets and per-run variables?

Secrets live in the macOS keychain, read by /Users/matthewdi/assrt-mcp/src/core/keychain.ts. ANTHROPIC_API_KEY or a Claude Code OAuth token can come from the env or the keychain; you never paste a secret into a scenario file. Per-run variables live inside scenario metadata (scenario-store.ts line 22: variables: Record<string, string>), shipped next to the plan. Reference them in Markdown with {{VAR_NAME}} syntax. The framework's sync loop preserves variables on every PATCH. Your test plan stays committed to git; your secrets do not.

Is this actually open source, or is there a server-side closed piece?

The runtime is open TypeScript, MIT licensed, installable via npm: the agent at /Users/matthewdi/assrt-mcp/src/core/agent.ts is 1,087 lines, browser.ts is 735, email.ts is 130, cli.ts is 608. All of those ship in the assrt-mcp package. The optional cloud component at app.assrt.ai holds shared scenarios and run artifacts; you can skip it entirely by running locally with --isolated or by using the scenario-files.ts cache at ~/.assrt/scenarios/. 'Local-first, cloud-optional' is a framework architectural choice, not a licensing caveat.

Which tools are in the 18-tool schema, and what does each one do?

In the order they appear in agent.ts: navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable. navigate/click/type_text/etc. are DOM primitives via @playwright/mcp. snapshot returns an accessibility tree with [ref=eN] IDs that the agent uses for targeting instead of CSS selectors. create_temp_email hits temp-mail.io for a disposable inbox. wait_for_stable injects a MutationObserver and waits for DOM quiet. http_request is for verifying webhooks to Telegram, Slack, Stripe. assert and complete_scenario are the test-runner primitives that accumulate evidence and emit pass/fail to results/latest.json. That schema is the framework's public surface.

Related guides on the framework layer

Deep dive

AI in automation testing: the 5 primitives that do the work

The OTP paste one-liner, the MutationObserver stabilization loop, disposable inboxes, http_request, and infinite recovery. Every one is in 200 lines of open TypeScript.

Read

Framework

Natural-language test case descriptions, automated

How '#Case N:' Markdown blocks turn into a working agent plan, without the YAML middle layer every commercial tool makes you learn.