Automation Tools for QA, Scored on an Axis Most Lists Skip

The question the comparison posts never ask: can you still read your own tests after you stop paying the vendor.

Most guides on this topic score tools on cloud vs on-prem, free vs paid, language support, and mobile coverage. None of them ask where the scenario file lives on your disk, or what opens it when the vendor dashboard is gone. That is the axis this page is built around. The scenario file is at /tmp/assrt/scenario.md, the results are at /tmp/assrt/results/latest.json, and the loop that keeps them in sync is a 1-second fs.watch debounce in 170 lines of MIT code.

M
Matthew Diakonov
13 min read
4.9from the number of axes most tool comparisons miss
Every scenario lives at /tmp/assrt/scenario.md as plain #Case text, editable from vim, VS Code, or cat.
fs.watch with a 1000ms debounce (scenario-files.ts:99-102) syncs edits on :wq, no vendor UI round trip.
Results are /tmp/assrt/results/latest.json. One jq -e '.passed' expression gates a full CI pipeline.

What portability actually means for a QA suite

Portability is a boring word that hides an expensive question. If you built 400 automated tests over three years on a vendor platform, and the renewal quote lands at six figures, can you leave? Can you take the tests with you, open them in a text editor, feed them to a different runner, and keep shipping? For most automation platforms the answer is a conditional yes that involves an export tool, a conversion script, and a quarter of engineering time. The exported artifact usually loses something: the recorded selectors, the step ordering, the wait semantics. A ported suite is rarely a working suite.

The reason so few comparison posts test for this is that portability is invisible until you need it. It reads like a procurement concern, not a feature. But it is the one axis that decides whether your test suite is an asset you own or a liability you rent. The rest of the feature matrix (reporters, integrations, IDE plugins, mobile agent pools) all matter less the moment you try to leave.

scenario.mdlatest.jsonrecording.webmexecution.logevents.json#Case N: headersplain text diffsgrep / jq / ripgrep friendlyyour disk, your repo.ks (Katalon Groovy).pjs (TestComplete)TestNG suite XMLproprietary test objectscloud-only recorder statevendor dashboard assetsbinary artifact blobsSaaS workspace row ids

teal tokens are yours on disk, gray tokens require the vendor to open

The five questions I run every automation tool for QA through

Every time someone asks me to evaluate a new automation tool, I run it through the same five checks. Not features. Not price. Not the marketing page. Just these five. If a tool fails any one of them, I lose interest, because the failure compounds every year you use the thing.

1

Can you open the scenario without the vendor installed?

The scenario file is /tmp/assrt/scenario.md, 2 KB of plain text, readable by any editor. Not a .ks script that needs Katalon Studio, not a .pjs that needs TestComplete, not a database row in a SaaS workspace. cat works, head works, diff works, git log -p works.

2

Can you diff two revisions of the same test?

Because the file is a flat list of #Case N: headers plus imperative lines, a PR diff reads like prose. No serialized object IDs, no auto-generated XML that reshuffles, no line numbers coupled to a recorder tool. Code review on a test behaves like code review on a README.

3

Can a human write a scenario faster than a recorder?

Three to eight lines of plain English beats a 40-step recorder trace every time, and it survives a CSS class rename without any healing pass. The agent reads a fresh accessibility tree on every step (agent.ts:28), so intent stays stable even when the DOM moves.

4

Can the pipeline grep the result file?

Results write to /tmp/assrt/results/latest.json with fields passed, passedCount, failedCount, duration, scenarios[], and screenshots[]. One jq expression gates a CI job. No polling the vendor API, no Selenium Grid reporter HTML, no proprietary artifact binary.

5

Can the whole thing live on a laptop without a cloud account?

The video player runs on 127.0.0.1 on an ephemeral port (server.ts:118-215), scenarios live in /tmp/assrt, the test agent talks to Playwright MCP over stdio. Cloud sync is optional, off by default when you pass --no-save. The network dependency is the Anthropic API call for the test agent, and even that is swappable via --model.

What the scenario file actually looks like on disk

The scenario file is not a rendering of something stored elsewhere. It is the source. assrt_test writes it at the start of every run (server.ts:397-425), fs.watch observes it for the whole lifetime of the MCP process, your editor rewrites it, the watcher picks up the change and syncs it. Here is the 2 KB of plain text that drove three signup and pricing cases on my localhost this morning.

/tmp/assrt/scenario.md

0

lines of watcher code

scenario-files.ts, MIT

0 ms

fs.watch debounce

the only wait between :wq and sync

0

vendor formats

no YAML, XML, or page objects

The one-second loop that makes the file the source of truth

Treating a file as the source of truth is not automatic. If the product was writing scenarios to a database and rendering a local mirror for convenience, the file would lie the moment you edited it. The opposite is true here: the file is authoritative, and the cloud is the cache that syncs to it. Here is how that loop is built in scenario-files.ts. 35 lines of code, no dependencies, no magic.

assrt-mcp/src/core/scenario-files.ts

What happens between :wq and the next test run

Your editorscenario.mdfs.watch1s debounceCloud syncwrite (:wq)FSEvent 'change'clearTimeout + setTimeout(1000ms)readFileSync('/tmp/assrt/scenario.md')current contentskip if currentContent === lastWrittenContentupdateScenario(id, { plan, name, url })success: true

The echo guard matters. Without the if (currentContent === lastWrittenContent) return at scenario-files.ts:136, every call to writeScenarioFile would trigger a watch event, which would sync the same content to cloud, which would be observed as a change, which would trigger another sync. The one-line guard turns a potential infinite loop into a stable one-way stream from editor to cloud, with server writes treated as no-ops.

The whole flow from a terminal, start to finish

Here is the pattern I use when I'm iterating on a scenario locally. Edit the plan in vim, save, re-run. The sync happens silently in the background, the results land in the same directory, and the next CI run reads the same file. No vendor IDE, no browser click path, no export-import dance.

edit -> save -> sync -> run

The pipes the file plugs into

Because the scenario is just text and the result is just JSON, the surface area for integration is trivially small. Everything on the left here is something you already have. Everything on the right is a primitive every Unix system has had for 40 years. The hub is a 2 KB Markdown file.

scenario.md is the hub, your Unix tools are the endpoints

Your editor
Your CI runner
Your coding agent
scenario.md
Playwright MCP
latest.json
recording.webm
git + jq + grep

The six files that make up the whole local surface

After any assrt_test run, /tmp/assrt holds everything the test produced. No hidden database, no cloud-only artifact, no vendor-specific format. Six files, six purposes. Point any of them at a ticket, a Slack thread, or a PR description.

/tmp/assrt/scenario.md

The plan itself. Plain text with #Case N: headers. Rewritten on every assrt_test call at server.ts:397 so it's always the exact plan you ran.

/tmp/assrt/scenario.json

Metadata: { id, name, url, updatedAt }. Read by the watcher to know which cloud record to sync back to.

/tmp/assrt/results/latest.json

The last run's structured report. passed, passedCount, failedCount, duration, scenarios[], screenshots[], videoFile, improvements[]. One jq expression is all CI needs.

/tmp/assrt/<runId>/video/recording.webm

The screencast of the whole run. Served by a local HTTP server on 127.0.0.1 with Range support for seeking. Opens in a browser tab, not a vendor player.

/tmp/assrt/<runId>/execution.log

Line-delimited log of every status, step, reasoning, and assertion event, with ISO-8601 timestamps. Pipe it into less, grep for [FAIL], save it as the failure attachment on a PR.

/tmp/assrt/<runId>/events.json

Full event trace (status, step, reasoning, assertion, screenshot, improvement_suggestion). Good for building your own replay UI or just answering 'what did the agent see at step 7'.

What the result file actually contains

The result file is the other half of the portability story. Scenarios portable in theory are meaningless if the runs produce artifacts that only the vendor dashboard can read. The latest.json is intentionally flat and intentionally small. It can sit in a GitHub Actions artifact, a Slack upload, or a Notion page, and still be useful a year from now.

/tmp/assrt/results/latest.json

Scored against a typical closed platform

Same axes, different answers. The closed-platform column is generic on purpose: it stands in for TestComplete, Tricentis, Katalon Studio, Ranorex, and the long tail of seat-licensed recorder tools that ship a proprietary IDE as the primary scenario editor.

FeatureTypical closed platformAssrt
Where the scenario livesSaaS workspace row or a binary recorder file/tmp/assrt/scenario.md, 2 KB of #Case text
How to edit itOpen the vendor IDE, navigate to the suitevim, VS Code, nano, or any editor that opens .md
How to diff two versionsVendor UI, side-by-side panel, or an export stepgit diff
How to run in CIReporter plugin, vendor runner, cloud runtime minutesnpx assrt run --plan-file scenario.md --json | jq -e '.passed'
Where results goCloud dashboard, vendor-hosted HTML/tmp/assrt/results/latest.json plus screenshots and .webm
What happens if you leaveExport tool (maybe), proprietary format, read-only tierThe file is already yours. Nothing to export.
LicenseCommercial per-seat, often $7.5K/mo or higher at scaleMIT, full source at assrt-mcp on GitHub
RuntimeVendor cloud, locked to their browser buildLocal Chromium via Playwright MCP, your laptop or your runner

The one-line argument if you only have 30 seconds

When you pick an automation tool for QA, you are not picking a feature matrix, you are picking whose disk your test suite lives on in three years. Pick the one that puts the scenario file on yours.

scenario-files.ts:16-170, server.ts:397-425

Want to port your existing suite onto a file-on-disk workflow?

Bring a sample scenario from any current automation tool and I'll walk through what it would look like as a #Case file, what the fs.watch loop saves you, and where the closed-platform features still win.

Book a call

Frequently asked questions

What actually makes a QA automation tool 'portable'?

Three things, and only three. (1) The source of truth for a test is a file on your disk, readable without the vendor's software. (2) The result is a structured artifact (JSON, plain text, or standard video formats), not a cloud-only dashboard. (3) The runner is invokable from a shell, so a CI machine that has nothing but curl, jq, and node can execute a scenario. Assrt puts the scenario at /tmp/assrt/scenario.md, the result at /tmp/assrt/results/latest.json, and the runner at npx assrt run. That's the whole portability surface.

Why do most comparison lists of QA automation tools skip portability?

Because the buyers writing those lists haven't had to leave a vendor yet. Portability only matters when you're mid-migration, mid-acquisition, or mid-renegotiation. Until then, it reads like an abstract concern. But when it matters, it matters catastrophically: an org can spend 18 months and six figures porting TestComplete Keyword Tests into something open, only to discover that the recorded object identifiers don't map cleanly. Listicles comparing on 'mobile support' or 'integrations' miss the one axis that decides whether your six-year-old regression suite survives the next CFO.

How do I actually edit the scenario file and have the change sync?

Run assrt_test once. It writes the plan to /tmp/assrt/scenario.md and starts fs.watch on that path. Open the file in any editor, make a change, save. Within one second, the debounce fires (scenario-files.ts lines 99 to 102), the watcher reads the file, checks it against the last content it wrote, and calls updateScenario with the new plan. Your next assrt_test run sees the edited version. If you skip the cloud sync (--no-save), the file is still the source of truth locally, there's just no cloud copy.

What does the #Case format look like and do I need to learn a DSL?

No DSL. The format is a Markdown-style list of #Case N: <name> headers, followed by 3 to 5 imperative steps per case. 'Navigate to /signup. Click Sign in. Type test@example.com into the Email field. Assert the dashboard heading is visible.' That is a full scenario. The test agent reads each case, pulls a fresh accessibility tree from Playwright MCP before interacting, and reports pass or fail with evidence. There are no selectors, no waits, no page-object classes, no language bindings to install.

Is the cloud sync optional or am I forced to trust a vendor endpoint?

Fully optional. Pass ASSRT_NO_SAVE=1 (server.ts:404) and the local files are the only copy. Or pass --no-save to the CLI. The scenario file, the results, the video, and the log all live under /tmp/assrt with no network call. Cloud sync is a convenience for sharing runs and scenarios across a team; it is not a dependency for running a test. Same for the MCP path: the MCP server exists on your machine, it talks to Playwright MCP over stdio, it doesn't need to phone home to function.

How does this stack up against $7.5K-per-month closed platforms?

On portability and on the source-of-truth question, Assrt wins trivially because the scenario is already on your disk. On features like large-scale test orchestration, distributed runners, and enterprise reporting dashboards, commercial platforms still have more polish. The real question is what you actually need. If the need is 'run 30 test cases against every PR in CI, fail the build on a red result, and keep my scenarios version-controlled,' Assrt covers it at $0 of tooling cost (only the LLM API calls for the agent, which are a few cents per scenario at Haiku prices). If the need is a procurement-friendly SaaS with an enterprise sales motion, that's a different market.

What happens if I commit scenario.md to git?

That is the recommended workflow. Copy /tmp/assrt/scenario.md into your repo (conventionally at tests/scenarios/<name>.md) and point assrt_test at it with plan-file. PR reviews become plain-text diffs. Code review comments work out of the box. git blame tells you who last touched a case. You can store scenarios per project, per feature, or per user flow; the runner doesn't care about the path, only the #Case format. There's no import step, no sync step, no conversion.

Can I use this alongside my existing Playwright or Selenium suite?

Yes. Assrt is additive. It runs an agent on top of Playwright MCP, which is itself a thin wrapper over Playwright core. Your existing Playwright tests still work, your existing CI config still runs them, your existing reporters still report. Teams typically use Assrt for the scenarios that keep breaking because selectors change (signups, checkouts, onboarding flows, long forms) and keep hand-written Playwright for the hot paths where deterministic behavior matters more than robustness to redesigns. Both suites can share a single local Chromium profile via Playwright MCP, so you don't double your CI wall time.

Is the test agent actually deterministic enough to trust in CI?

It's deterministic enough that the failure mode is 'the agent noticed a real bug you didn't plan for,' not 'the agent hallucinated a click.' The reason: every step reads a fresh accessibility tree, every assertion requires explicit evidence from the DOM, and wait_for_stable (agent.ts:956-1009) blocks on a MutationObserver quiet period before the agent is allowed to assert. For extra paranoia, pass passCriteria to assrt_test (server.ts:343) and the agent will verify every listed condition explicitly before marking a scenario passed. A scenario with good passCriteria fails the same way twice when the app is broken the same way twice.

What do I lose compared to a record-and-replay tool?

You lose the recorder UI. If your QA process relies on a non-technical teammate clicking through the app in a special mode to generate a test, the closest analog in Assrt is assrt_plan: navigate to a URL, it analyzes three scrolled screenshots, and emits 5 to 8 #Case scenarios you can edit. That replaces the recorder for most cases. What you don't get back is the recorder's illusion that the resulting test doesn't need human judgment. Record-and-replay suites break on the first redesign because the recorded selectors are tied to the old DOM. A #Case file written in intent-first English survives redesigns for the same reason a bug report does.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.