Manual Testing, Filesystem-First

Every manual testing tool tracks what the human clicked. This one runs the clicks for you and writes the result to /tmp/assrt/results/latest.json

The usual answer to this question is a list of dashboards: TestRail, Zephyr, Xray, TestLink, PractiTest, qTest, SpiraTest, Katalon, Jira plugins, Marker.io. They are all the same shape. A human tester clicks through a flow, marks pass or fail, and the tool stores the result. Assrt starts from a different assumption. If the test plan is just a sequence of English steps, an AI browser agent can read them and do the clicking, and the only tool the human needs is a text editor.

Book a 20-minute walkthrough Read the source on GitHub

Matthew Diakonov, Written with AI

Published April 22, 202611 min read

4.8from scenario files edited in a text editor, not a dashboard

Scenario file is /tmp/assrt/scenario.md (hard-coded in scenario-files.ts:16-17).

Result file is /tmp/assrt/results/latest.json, same shape every run, no API polling.

Video player defaults to 5x playback (server.ts:99); a 5-minute scenario is a 1-minute review.

Manual testing tools, inverted

the scenario is a markdown file; the agent does the clicks

open /tmp/assrt/scenario.md in any editor

write #Case 1 and three to five steps in plain english

call assrt_test, or npx assrt run, and walk away

read /tmp/assrt/results/latest.json when it finishes

scrub the recording.webm at 5x default playback

0:00 / 0:05

What “manual testing tools” usually means, and what it misses

Open any of the top guides on this question and the pattern is the same. Sixteen tools, grouped into four buckets. Case-management dashboards (TestRail, TestLink, SpiraTest, PractiTest, qTest, Zephyr, Xray, TestCollab, TestLodge). Bug trackers (Bugzilla, Mantis, Jira). Cross-browser device clouds (BrowserStack, Browserling, Browsershots). API testing add-ons (Postman, SoapUI, Citrus). Sometimes a security scanner (ZAP) or a hybrid authoring tool (Katalon, Selenium IDE) for variety.

Every one of them assumes the same thing: a human is going to click through the application, and the tool exists to track, organise, and report on what the human did. The deliverable is a row in a database. The workflow is human clicks, then tool records, then manager reviews. That shape has not changed meaningfully since the mid-2000s, and the per-seat price has gone up roughly every year.

What is missing from that shape is the option of letting something else do the clicking. A manual test scenario is, at its core, a short English script that describes user intent. That is exactly the kind of thing an LLM-driven browser agent reads well. You do not need a new dashboard; you need a runner that reads the script, drives the browser, and hands the result back in a format you can grep.

What goes in, what comes out

The scenario file, in plain English

The manual test scenario is a markdown file at a fixed path. The agent reads the file, resolves each step against a live accessibility snapshot of the page, and runs it. The format is intentionally readable by a non-technical QA engineer; if you can write a Google Doc test plan, you can write a scenario.

/tmp/assrt/scenario.md

A few things worth noticing. There are no selectors, no XPath, no waits, no page-object classes. The phrase “the email field” is resolved by the agent at runtime against whatever the page actually rendered, so a CSS class rename or a component refactor does not break the scenario. The file is a flat list of #Case N: headers with free-text steps underneath, which means a human with no automation background can open it, add a case, save, and re-run.

The run, end to end, at a terminal

From the command line, the whole thing looks like this. One command, no vendor login, no tenant setup, no MCP configuration outside the repo.

npx assrt run

Three output files, all local. The JSON result, the webm video, the screenshot directory. The MCP server exposes the same thing as structured output you can read from inside a coding agent conversation, so a Claude Code or Cursor session can call assrt_test, get the report back, and decide whether to open the PR. This is the step that breaks the manual-testing-tool assumption: the tester and the runner are not the same person anymore, and they do not have to share a dashboard.

0xDefault video playback speed

0Browser tools the agent can call

0MCP entrypoints (test, plan, diagnose)

0sfs.watch debounce before auto-sync

What “0xby default” actually buys you

The video player generated after each run autoplays the recording at five times real-time speed. The choice sits in one line of source at assrt-mcp/src/mcp/server.ts line 99: v.addEventListener('loadeddata', () => setSpeed(5));. The UI exposes 1x, 2x, 3x, 5x, 10x buttons plus keyboard scrubbing (space for play/pause, arrow keys for five-second jumps, number keys to switch speeds).

The reason that default matters: in traditional manual testing, you watch a tester perform the flow in real time, then you watch them explain what happened, then you mark pass or fail in a dashboard. A five-minute scenario consumes about ten minutes of manager attention by the time review and bookkeeping are done. When the agent runs the scenario and the player is 5x by default, the five-minute scenario is a one-minute skim with scrubbing, and the result row is already written to JSON by the time the video opens.

Over a suite of twenty scenarios, the difference between real-time review and 5x review is roughly an hour and a half of attention every time someone wants to understand what the suite did. That is the kind of thing most dashboards do not show as a line item; it is simply not their model.

The result file, line by line

Every run writes the same JSON shape to /tmp/assrt/results/latest.json, plus a stable copy at /tmp/assrt/results/<runId>.json for history. The shape is small, flat, and designed to be parsed by anything that reads JSON.

/tmp/assrt/results/latest.json

Three fields are worth calling out. assertions[].evidence is free-text the agent produces to justify its pass/fail call, which gives a human reviewer enough context to challenge the result without re-running. improvements[] captures UX bugs the agent stumbled across while running an unrelated scenario; those four fields (severity, title, description, suggestion) map cleanly into a Jira or Linear ticket. videoPlayerUrl is the local 127.0.0.1 URL; nothing leaves your machine unless you opt into cloud sync.

The one uncopyable fact about this workflow

Grep the repository for ASSRT_DIR. The result is one hit, in src/core/scenario-files.ts line 16, and the value is the string "/tmp/assrt". Every other path in the runner is derived from that constant: scenario.md, scenario.json, results/latest.json, <runId>/video/recording.webm. That is the part that no competing listicle can replicate: they do not have a file layout, because they do not write files to your disk at all.

Six differences from a test-case management dashboard

Below is a side-by-side of the same manual testing workflow in a traditional tool (pick your favourite case-management dashboard; the answers are roughly interchangeable) and in this filesystem-first model.

Feature	Traditional case-management tool	Assrt (filesystem-first)
Where the test plan lives	Vendor database, proprietary format	/tmp/assrt/scenario.md on your disk
How the tester records a case	Clicks in a dashboard, or imports CSV	Writes plain English in any editor
Who does the clicking	A human QA engineer, every run	An AI agent in a real Chromium
How results are stored	Cloud tenant, vendor-owned	/tmp/assrt/results/latest.json on disk
Video of the run	Usually absent, or screenshot-only	Scrubbable webm at 5x default playback
Scenario maintenance when CSS renames	Recorder script breaks, re-record	Accessibility-tree resolution, no selectors
Cost model	$40-$7,500/month per seat or team	MIT license, pay per LLM call (~$0.01-$0.05)
Where you host it	Vendor cloud only	Local (npx assrt-mcp) or your own infra

For rented-device labs (iOS Safari on a specific iPhone model), a physical-device cloud is still a separate tool. Assrt's viewport switch covers laptop-desktop-mobile-sized runs but does not replicate hardware.

What the workflow actually looks like, step by step

Six steps, most of which are “save the file and walk away”. The point is that the number of surfaces a human touches is small: a text editor, a terminal, a player window. No tenant admin panel.

Write the scenario in plain English

Open /tmp/assrt/scenario.md in any editor. Write #Case 1: <short name> and three to five steps. "Navigate", "Click", "Type", "Assert". No selectors.

Call assrt_test (or npx assrt run)

The MCP tool (or CLI) reads the file, spins up a headless Chromium at 1600x900, and hands the scenario to an AI agent that can call 17 browser primitives.

Watch the agent work, or do something else

Runs are non-blocking when launched via the CLI with run_in_background. Locally, a player window opens when the agent finishes, with the video scrubbable at 5x.

Read /tmp/assrt/results/latest.json

Structured report with per-scenario pass/fail, assertions with evidence text, screenshot file paths, and the video file path. No dashboard login required.

Edit the scenario file; it auto-syncs

fs.watch notices the save, waits one second for you to stop typing, and pushes the new plan to cloud storage if you opted into sync. Next run picks up the edit.

Pipe the result wherever you want

jq .passed gates CI. .improvements[] becomes Jira tickets. .screenshots[] goes into a Slack thread. The file is yours; the integration is a one-liner.

Six properties of a manual testing tool that chose the file route

The scenario is a file, not a record in a database

Every run writes /tmp/assrt/scenario.md. Edit it in VS Code, Vim, or a notes app. fs.watch auto-syncs your edit one second after you stop typing.

The result is a file, not a dashboard row

/tmp/assrt/results/latest.json is the same shape every run. cat it, jq it, pipe it into any tool. No vendor API to poll.

The video plays at 5x by default

assrt-mcp/src/mcp/server.ts line 99 hard-codes 5x as the initial playback rate on the HTML player. A 5-minute scenario is a 1-minute review.

The agent uses the accessibility tree, not selectors

A CSS class rename does not break a scenario. The agent resolves "the email field" against a live Playwright snapshot on every step.

UX bugs fall out as a side product

The suggest_improvement tool lets the agent file an issue with severity, title, description, and suggestion. Those four fields are your next Jira ticket.

You bring your own LLM bill

Roughly 1-5 cents per scenario at Haiku prices. No per-seat license, no enterprise tier, no usage-based vendor surprise on the 1st.

When the traditional dashboards still win

This guide is not a claim that TestRail or Zephyr will disappear. There are three cases where a full case-management dashboard still earns its seat price, and it is worth being honest about them.

First, regulated industries with signed-off test evidence (medical devices, avionics, financial settlement rails). When the auditor asks for a tamper-evident log of who marked a case passed on a specific build, a dashboard with user accounts and signed records is the shorter path. A results/<runId>.json file is reproducible, but the compliance story is about the tooling the auditor recognises.

Second, massive legacy suites. If the team already has 15,000 manual cases in TestRail and a five-year audit trail of executions, a migration is expensive; adding Assrt alongside for the fast-moving 200 cases is the usual shape.

Third, physical-device rental labs (iPhone 15 Pro Max on iOS 17.2.1 with a specific carrier profile). BrowserStack and Sauce Labs provide hardware Assrt does not. For those cases, the division of labour is simple: the plan is still a scenario.md on disk, but the “runner” is a human or a third-party device cloud. The result format can still be the same JSON shape.

The shortest “try it” path

One npm package, one ANTHROPIC_API_KEY, one file. The MCP server registers three tools (assrt_test, assrt_plan, assrt_diagnose) and writes its artifacts under /tmp/assrt. Point it at any URL, local or remote, and you have a manual testing rig that does not live in a vendor database.

first-run.sh

The other tools in this space, for comparison

TestRailZephyrXrayTestLinkPractiTestqTestTestCollabTestLodgeSpiraTestKatalonSelenium IDEBrowserStackBrowsershotsBrowserlingPostmanSoapUICitrusZAPBugzillaMantisJiraMarker.io

Want to see this run against your staging URL?

Twenty minutes, a live screenshare, a scenario file against your real app, and a recording.webm you can take away.

Questions this guide gets asked

What does "software testing tools for manual testing" usually mean, and why is Assrt answering it differently?

The phrase is usually a shorthand for test case management dashboards (TestRail, Zephyr, Xray, TestLink, PractiTest, qTest) and the bug trackers that sit next to them (Jira, Bugzilla, Mantis). The unspoken assumption is that a human tester clicks through the application, and the tool records what the human did. Assrt starts from a different place: if a manual test case is just a sequence of steps in plain English, then an AI browser agent can read the steps and run them in a real Chromium, and the only "tool" the human needs is a text editor. That changes the shape of every recommendation in this space.

Where do the manual test scenarios live?

In a single markdown file on your own laptop: /tmp/assrt/scenario.md. The path is hard-coded in the open-source assrt-mcp source at src/core/scenario-files.ts, lines 16-17. Every assrt_test call writes the plan there before the browser launches, watches it with fs.watch for edits you make in your editor, and auto-syncs changes back to cloud storage one second after you stop typing. The plan is not hidden inside a vendor database; it is a file you can cat, diff, git-commit, or pipe into another tool.

What does the result of a manual test run look like, specifically?

Three artifacts on disk, all in /tmp/assrt. The plan is /tmp/assrt/scenario.md (what you wrote). The structured pass/fail report is /tmp/assrt/results/latest.json with a small shape: passed boolean, passedCount, failedCount, scenarios[], assertions[], screenshots[] as file paths, videoFile, videoPlayerUrl. The video is /tmp/assrt/<runId>/video/recording.webm, generated by Playwright while the agent drives the browser. A self-contained player at /tmp/assrt/<runId>/video/player.html autoplays it at 5x by default (1x, 2x, 3x, 5x, 10x buttons, plus keyboard scrubbing with arrow keys). No dashboard required.

How does this compare to the usual "best manual testing tools" pick list (TestRail, Katalon, TestLink)?

Those tools live one layer away from the actual browser. TestRail is a case database; you import cases, run them yourself, and mark pass or fail by hand. Katalon is closer because it has a scripting layer, but it still expects a human or a proprietary recorder to produce the script. Assrt is not a case database; it is the runner that reads the case, drives a real Chromium with the same browser-extension mode a QA engineer would use, and writes the result to a file. There is no seat license, no per-user fee, no cloud tenancy to sign up for; the repository is MIT-licensed and npx assrt-mcp runs it locally.

What about bug reports? Do I still need Jira or Marker.io?

The test agent has a suggest_improvement tool defined in assrt-mcp/src/core/agent.ts. When it sees a UX issue during a scenario (a button that does not respond, a form that silently fails validation, a broken link), it files it as an improvement in the JSON report with severity, title, description, and suggestion fields. Those fields map one-to-one to the fields you would fill in a Jira ticket. You can read /tmp/assrt/results/latest.json and pipe the improvements[] array straight into whatever tracker your team uses, including a plain text file. Many teams do not need a separate bug tracker for this layer of testing at all; the JSON report is the bug report.

Is this really manual testing if an AI agent is driving the browser?

It is the same cognitive work the manual tester does, expressed differently. The hard part of manual testing is deciding what to click, what to type, what to verify, and whether a screen looks wrong. That part still belongs to the human; it is the text in /tmp/assrt/scenario.md, written as #Case N: <short name> followed by steps. The easy-but-tedious part is the actual clicking, typing, waiting, and screenshotting, and that part is what the agent absorbs. A good analogy: a builder still designs the kitchen; a power drill replaces the screwdriver. The design is the scenario file; the drill is the agent.

Can a non-technical QA engineer really edit the scenario file?

Yes, because the format is plain English. Each #Case N: block is a short name and three to five steps: "Navigate to /login", "Type test@example.com into the email field", "Click Sign in", "Assert the page shows Welcome back". No selectors, no XPath, no code. The agent resolves "the email field" against a live accessibility snapshot of the rendered page, so a CSS class rename on the login form does not break the scenario. The limit is imagination and specificity, not technical ability; if a QA engineer can write a Google Doc test plan, they can write a scenario.md.

What about cross-browser and mobile? Do I need BrowserStack?

For pure manual testing of a single laptop flow, no. assrt_test launches headless Chromium at 1600x900 by default, and you can pass viewport: "mobile" (375x812) or viewport: "desktop" (1440x900) or an explicit {width, height} to change it. For a real rented-device-lab case (iOS Safari 17.2 on a specific iPhone model, or a Samsung Galaxy S24), BrowserStack and Sauce Labs still apply; Assrt does not replicate physical-device rental. The overlap is the 80% of manual cases that are "does this flow work in a desktop browser at desktop size on the staging URL", and that part does not need a rented lab.

How fast is the video player, and why does that matter?

The player at /tmp/assrt/<runId>/video/player.html defaults to 5x playback speed (set by the initial 'loadeddata' handler at line 99 in assrt-mcp/src/mcp/server.ts), with buttons for 1x, 2x, 3x, 5x, 10x and keyboard shortcuts for scrubbing. This matters because manual testers historically lose time re-watching long screen recordings at real-time. A five-minute scenario collapses to a one-minute review at 5x, and a ten-minute regression suite review becomes a two-minute skim at 5x. Reviewing what the agent did is faster than doing the clicks yourself.

What is the actual licensing and cost?

The MCP server, agent, and CLI are MIT-licensed (see assrt-mcp/LICENSE). Running it locally with npx assrt-mcp is free; you bring your own Anthropic API key or Claude Code subscription for the agent's LLM calls, which come out to roughly one to five cents per scenario at Haiku prices. There is an optional sync-to-cloud for sharing runs at app.assrt.ai, but the source of truth is always the file on your disk. Compare to per-seat SaaS tiers that start at $40/user/month and climb into $7,500/month enterprise plans for comparable agent-driven testing.

Can my team run this as part of CI, or is it a local-only tool?

Both. Local runs write to /tmp/assrt and auto-open the video player when the agent finishes. In CI, you invoke npx assrt run --url <staging-url> --plan <scenario.md> --json and pipe the stdout to jq -e '.passed' to gate the build; the resultsFile path in the JSON points to the same structured report you would read locally. There is no daemon to install on the runner, no vendor API to authenticate against, no scheduling window to pay for; the cost model is the same Anthropic API usage you already have.

What file should I open first if I want to see the real implementation?

Three of them. (1) /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts is where the /tmp/assrt paths are defined and the fs.watch + auto-sync loop lives. (2) /Users/matthewdi/assrt-mcp/src/mcp/server.ts is the MCP entrypoint; the assrt_test tool definition starts around line 335, and the video player HTML generator (with the 5x default) is at lines 35-111. (3) /Users/matthewdi/assrt-mcp/src/core/agent.ts holds the TOOLS array (the 17 browser primitives the agent calls) plus the scenario executor. Reading those three files end to end takes about twenty minutes and gives you the whole picture.