Tools landscape, reviewed by artifact
Web application testing tools, graded by what they leave on your disk
Every shortlist in this category ranks by features: parallel browsers, codeless recorders, AI assists, real-device clouds. None of those features are differentiating in 2026. The dimension that actually decides whether you keep using a tool is what it writes to your local filesystem after a run, because that file set is what your AI coding agent has to work with.
Direct answer (verified 2026-05-09)
There is no single best web application testing tool. The grading dimension that matters in 2026 is what each tool writes to your local disk after a run, because that artifact set decides whether your tests survive the vendor and whether your AI coding agent can read your results. By that lens the shortlist is: Playwright (trace.zip, video, HTML report; you own the spec files), Cypress (videos, screenshots, spec files; the Cloud product changes ownership), Selenium (whatever you wire up; full freedom, full responsibility), closed-source SaaS like QA Wolf, Mabl, and testRigor (the canonical test lives in their database; export is a snapshot), and Assrt, which pins a known directory layout to /tmp/assrt/ plus a per-run artifactsDir and returns every path in the tool response so a coding agent can Read them directly. Source for Assrt's layout: assrt-mcp on GitHub.
The argument: features are fungible, artifacts are not
Pick any three guides covering the best web application testing tools for 2026 and you will see the same matrix: parallel execution, AI test generation, codeless recorder, real-device cloud, CI integration, free tier. Every serious tool has every one of those, or can ship the missing one in a sprint. The matrix ranks how the tool sells itself, not how it actually behaves on your machine after a run finishes.
The behavior that matters is what files exist on your filesystem when the run is done. Specifically: where they live, whether you can hand the path to a different process, and whether the format is something a coding agent can read without a vendor SDK. That is the substrate for everything downstream of a run, which is where the value lives. A failed test is only useful if you can turn it into a fix, and you can only turn it into a fix if your tools can read the failure.
Most reviews avoid this dimension because it makes the table uglier. Half the tools store the canonical test in a vendor database. Some write artifacts only to the dashboard. Some write to disk but in a binary format you need a viewer to open. The tools that win on this axis are the ones that treat the filesystem as a first-class output: every run writes a known set of files to known paths, and the tool tells you the paths.
Anchor: the directory Assrt writes after every run
Assrt is a useful reference because the layout is small, documented, and grep-able. There are two roots. The first is stable across runs and holds the editable plan plus the latest result. The second is per-run, named by a UUID, and holds the recording, the events, and the screenshots.
Where Assrt writes things
// assrt-mcp/src/core/scenario-files.ts:16-20
const ASSRT_DIR = "/tmp/assrt";
const SCENARIO_FILE = join(ASSRT_DIR, "scenario.md");
const SCENARIO_META = join(ASSRT_DIR, "scenario.json");
const RESULTS_DIR = join(ASSRT_DIR, "results");
const LATEST_RESULTS = join(RESULTS_DIR, "latest.json");A real run from a recent test session on this site produced run id 06778d6e-574b-430c-8449-2b2ffc1157df. The artifacts directory under $TMPDIR/assrt/ contained:
events.json, full structured event trace from the runexecution.log, human-readable log of every step, assertion, and reasoning linescreenshots/, 29 PNGs named00_step0_init.png,01_step2_click.png, on through28_step29_assert.pngvideo/recording.webm, the Playwright recording of the whole sessionvideo/player.html, a self-contained HTML5 player with 1x / 2x / 4x speed buttons, served on a local port so the player URL works as soon as the run finishes
The reason every file name is predictable is so a downstream process (your IDE agent, a CI script, a flake-tracking job) can find it without a callback URL or an SDK. The MCP tool response carries the absolute paths in named keys (artifactsDir, videoFile, videoPlayerFile, videoPlayerUrl, resultsFile, scenarioFile) so an agent does not have to guess.
The reviewer's table: artifact set per tool
The table below grades tools on the dimension that matters: what appears on your disk after a run, and whether a coding agent working from outside the test runner can read it.
| Tool | Plan format | Run artifacts on disk | Agent-readable result |
|---|---|---|---|
| Playwright | TypeScript / JS spec files in your repo | trace.zip, video.webm, screenshots, playwright-report/index.html | Partial, needs trace viewer to read |
| Cypress | JS spec files in your repo | cypress/screenshots, cypress/videos (when configured) | Partial, Cloud moves results off-disk |
| Selenium / WebdriverIO | Code in any language you want | Whatever you wire up; no canonical layout | Depends on your reporter |
| Puppeteer | JS code in your repo | screenshots, PDFs, traces (you write the call) | Depends on your reporter |
| QA Wolf / Mabl / testRigor | Stored in vendor database | In the dashboard; export is a snapshot | No, agent cannot reach the dashboard |
| BrowserStack / Sauce Labs | Your code (they rent the browsers) | Cloud video and logs; downloadable | Indirect, agent must call API |
| Assrt | Markdown plan at /tmp/assrt/scenario.md | artifactsDir/{events.json, execution.log, screenshots/, video/recording.webm, video/player.html} | Yes, paths returned in MCP response |
What the agent loop looks like when artifacts cooperate
With a known on-disk layout, the loop between your IDE agent and the test runner is just file reads. No webhooks, no SDK, no dashboard scraping. The sequence below is what happens when you tell Claude Code to test a local dev server and then explain a failure.
MCP test run, agent post-mortem
The loop closes because every link is a file read. Compare to a tool whose canonical artifact lives in a hosted dashboard: the agent cannot Read the dashboard, so the post-mortem step collapses into "open this URL in your browser and tell the AI what you saw", which puts the human back in the slow path.
Owning the files vs. renting the dashboard
The split between local-files-first and dashboard-first is the old open vs. closed argument with a new wrinkle. The new wrinkle is that AI coding agents make the dashboard friction much worse: every dashboard becomes a place your agent cannot reach.
The same failure, two artifact strategies
Test fails. Vendor sends a webhook to your CI; the result lives at vendor.com/runs/abc123. To read it, you log in, click into the run, watch the embedded video, copy the failed assertion text, paste it into your IDE chat, and ask the agent to fix it. The agent has no access to the underlying frames, the DOM at failure, or the network log; it has the text you pasted.
- Run artifact lives behind auth
- Agent cannot Read the result without an SDK
- You become the courier between dashboard and IDE
- Tests are not portable, leaving the vendor breaks the runs
Where each tool fits if you grade by what they leave on disk
A short take, given the lens of this page. None of these are wrong choices in absolute terms; they fit different priorities.
- Playwright is the best raw on-disk tool for a team that writes tests by hand. The trace.zip is unbeatable for human debugging. The friction for an AI coding agent is the opaqueness of trace.zip and the single-bundle HTML report.
- Cypress is fine until you adopt Cypress Cloud. Once results live in the dashboard the agent loop frays.
- Selenium and WebdriverIO give you total freedom over the artifact set and total responsibility for designing it. If your team has the time to specify a layout and ship a custom reporter, you will end up with something better than any tool's default. Most teams do not have that time.
- QA Wolf, Mabl, and testRigor are real choices for organizations that want to outsource the testing problem entirely. The trade is that the canonical test, the canonical run, and the canonical artifact all live in the vendor.
- BrowserStack and Sauce Labs are browser grids; you bring your own runner. Their on-disk footprint is whatever your runner produces, plus a hosted video and log per session you can download.
- Assrt is the local-files-first option built for the agent loop: the plan is a markdown file, the result is JSON at a stable path, the artifacts are in a known per-run directory, and every path is returned in the MCP tool response. The generated test code is standard Playwright you can keep and run anywhere.
Try the lens on your current shortlist
Pick whatever web application testing tool you are evaluating right now and answer four questions. Do not consult marketing pages; run a real test and look at your filesystem.
- After a run, what files exist on disk, and at what paths? Are the paths stable across runs?
- Could you point a coding agent at one of those paths and expect a useful answer about the run?
- If you uninstalled the tool tomorrow, would your tests still run on something else without rewriting them?
- Where does the canonical version of the test live: in your repo, on your filesystem, or in the vendor's database?
Tools that answer "yes, yes, yes, my repo" stay in the shortlist. Tools that answer "in the dashboard" go on a different shortlist, the one for teams that want to outsource the QA problem.
The honest shortlist
Tools mentioned on this page
Playwright
Microsoft's E2E framework; best raw on-disk artifacts
Cypress
JS-first; great DX; Cloud changes ownership
Selenium
Cross-language standard; bring your own reporter
WebdriverIO
Selenium / Playwright / Puppeteer wrapper with rich plugins
Puppeteer
Chromium-only; great for screenshots and PDFs
QA Wolf
Managed end-to-end testing as a service
Mabl
AI-driven low-code testing platform
testRigor
Plain-English test specs; hosted execution
BrowserStack
Real-device cloud; browser grid
Sauce Labs
Cross-browser cloud; CI integrations
Assrt
Open-source MCP test agent; writes a known on-disk layout
Want to walk through the artifact layout on a real run?
We'll point Assrt at your local dev server, capture a real run, and read the events.json and player.html together. 20 minutes.
Frequently asked questions
What counts as a web application testing tool?
Anything that drives a real browser engine (Chromium, Firefox, WebKit) against a web app and reports back whether it behaved correctly. The shortlist most teams compare in 2026: Playwright, Cypress, Selenium, WebdriverIO, Puppeteer, Mabl, QA Wolf, BrowserStack, Sauce Labs, and the agent-driven layer that wraps them. Assrt sits in that last group, generating real Playwright code rather than a proprietary scenario format. The feature matrix tells you very little; what tells you a lot is what file paths each tool returns when a run finishes.
Why grade by 'what's on disk after a run' instead of by features?
Because every tool on the shortlist already supports parallel browsers, headless mode, screenshots, retries, and a CI integration. Those features are interchangeable. What is not interchangeable is the artifact you and your IDE can pick up after the run: the plan you can edit, the video you can scrub, the events you can grep, the screenshots you can diff. That set of files is the surface area your AI coding agent has to work with. Tools that write a known layout to a known directory cooperate with agents. Tools that store everything in a vendor dashboard do not.
What does Assrt actually write to disk after assrt_test?
Two locations. First, a stable plan at /tmp/assrt/scenario.md, plan metadata at /tmp/assrt/scenario.json, and the latest result at /tmp/assrt/results/latest.json (plus one file per historical run under /tmp/assrt/results/<runId>.json). Second, a per-run artifacts directory at <tmpdir>/assrt/<runId>/ containing events.json (full structured event trace), execution.log (the human-readable log), screenshots/NN_stepM_action.png (one file per recorded step), video/recording.webm (the Playwright recording), and video/player.html (a self-contained HTML5 player with keyboard speed controls served on a local port). Source: assrt-mcp/src/mcp/server.ts lines 430-435 and 605-625.
Why should I care if my AI coding agent can read those files?
Because the value of a test run is mostly downstream: a failure becomes a fix, a screenshot becomes a regression report, a flaky case becomes a hardened case. If your agent can Read /tmp/assrt/results/latest.json, it can summarize the failure and propose a fix in the same conversation. If your agent has to log into a vendor dashboard to find the run, that loop snaps. Assrt's MCP server tells the agent the exact paths in the response (scenarioFile, scenarioMetaFile, resultsFile, runResultsFile, artifactsDir, videoFile, videoPlayerFile, videoPlayerUrl) so the agent does not have to guess.
How does Playwright compare on this dimension?
Playwright is the standard for raw on-disk artifacts: a trace.zip with a full UI timeline, a video.webm, screenshots, and an HTML report at playwright-report/index.html. The trace viewer is great for a human at a desk. The gap is that the trace.zip is opaque to a coding agent without a parser, and the HTML report is one big bundle that reads as a single page. Assrt borrows Playwright's video and screenshot output and adds a structured events.json plus a markdown plan that the agent can edit between runs.
Cypress, Selenium, WebdriverIO, Puppeteer, what about those?
Cypress writes screenshots to cypress/screenshots/ and videos to cypress/videos/ when configured; the spec files are yours. Cypress Cloud changes the equation: results live in the dashboard. Selenium, WebdriverIO, and Puppeteer leave whatever you wire them to leave; there is no canonical layout. That is the freedom and the cost. You own the files because you wrote the code that produced them.
And the closed-source tools, QA Wolf, Mabl, testRigor, BrowserStack?
These store the canonical test in their cloud. Some let you export Playwright code; the export is a snapshot, not a live working copy. Run results live in their dashboard with a hosted player. You can take a screenshot of the dashboard, but you cannot point your IDE agent at a directory of artifacts that updates after every run. The trade is real: managed services reduce your ops surface; they take ownership of the file layout in return.
Where in Assrt's source can I confirm this layout?
scenario-files.ts lines 16-20 declare the constants ASSRT_DIR='/tmp/assrt', SCENARIO_FILE='/tmp/assrt/scenario.md', SCENARIO_META='/tmp/assrt/scenario.json', RESULTS_DIR='/tmp/assrt/results', LATEST_RESULTS='/tmp/assrt/results/latest.json'. server.ts line 430 builds runDir as join(tmpdir(), 'assrt', runId). server.ts line 605 writes events.json. server.ts line 619 writes player.html via generateVideoPlayerHtml. server.ts line 629 advertises the local videoPlayerUrl. The whole thing is MIT-licensed and lives at https://github.com/assrt-ai/assrt-mcp.
Is Assrt actually free, or is there a paid tier you push toward?
The npm package @m13v/assrt and the MCP server are MIT-licensed and run end-to-end on your machine. The video, screenshots, events.json, player.html, and the generated Playwright code all land on your local filesystem. The optional cloud at app.assrt.ai gives every scenario a shareable URL and persists the artifacts off-disk; the local install works without it. The model API call (Anthropic by default) is the one paid dependency. You supply ANTHROPIC_API_KEY and pay Anthropic directly.
What's the smallest possible Assrt run that produces all those files?
One MCP call: assrt_test({ url: 'http://localhost:3000', plan: '#Case 1: home loads\n- Navigate to /\n- Verify the page heading is visible' }). The MCP server pre-creates a runId, runs the agent, writes events as they arrive, finalizes the .webm, generates player.html, and returns a JSON summary with every path embedded. Two seconds of typing in your IDE, one passing case in your repo, six artifact files on disk.
Adjacent angles on testing tools
Keep reading
Web browser testing software, graded by what it draws inside the tab
A different lens on the same tools: who paints inside the page, who acts on it from outside.
Best e2e testing tools, ranked by install-to-first-passing-test latency
Forget feature counts. Rank by minutes from install to your first green check.
Test automation tools comparison: 2026 guide
Playwright vs. Cypress vs. Selenium vs. QA Wolf vs. Mabl vs. Assrt across cost, format, and lock-in.