A test automation tools comparison matrix, scored on the rows that actually cost you later

Almost every comparison matrix online ranks tools by features you can read off a marketing page: supported languages, parallel runners, assertion syntax. Those columns tell you whether you can start. They say nothing about the bill you pay in month twelve. This matrix scores the four rows that do.

Matthew Diakonov, Written with AI

Published June 16, 20269 min read

The short answer (verified 2026-06-16)

No single tool wins every row, so do not look for the “best” cell. Pick by constraint, scoring your shortlist on four axes that predict 12-month cost: how you author tests, where they run, who fixes them when the UI moves, and what leaving costs. Free and open source keeps tests in your repo (Playwright, Selenium, Cypress, Assrt). A managed service trades money for the labor of writing and maintaining them (QA Wolf). And if you want AI to draft scenarios but still keep and edit the result, Assrt authors them as plain-English #Case blocks you own.

The matrix, filled in

Eight tools people actually shortlist, scored on the four cost-predicting rows plus cost and browser reach. The last column, “what you keep,” is the one most published matrices leave out.

Tool	Cost	Where tests run	Browsers	How you author tests	What you keep if you leave
Selenium	Free (Apache 2.0)	Your machine / your CI	Chrome, Firefox, Safari, Edge	Code (Java, Python, C#, Ruby, JS)	You keep the code
Cypress	Free runner (MIT) + paid Cloud	Your CI; Cloud for smart parallel	Chromium, Firefox, WebKit (experimental)	Code (JavaScript / TypeScript)	You keep the code
Playwright	Free (Apache 2.0)	Your machine / your CI	Chromium, Firefox, WebKit	Code (.spec files: TS, JS, Python, .NET, Java)	You keep the code
Testim	Commercial	Vendor cloud	Chrome-focused	Recorder + low-code editor	Locked to the platform
Mabl	Commercial	Vendor cloud	Multi-browser	Low-code recorder + AI	Locked to the platform
QA Wolf	Managed service (no public pricing)	Vendor-run on your behalf	Multi-browser (Playwright underneath)	Their team writes Playwright for you	You receive Playwright, but coverage is service-dependent
Momentic	Commercial	Vendor CLI + platform	Chromium / Chrome only (Safari, Firefox on roadmap)	Proprietary YAML (momentic.config.yaml)	YAML is not portable to other runners
Assrt	Free (open source)	Your machine / your CI	Chromium, Firefox, WebKit	Plain-English #Case Markdown, run by an AI agent	You keep the scenarios and run them yourself

Sources: each tool’s own documentation: Playwright, Selenium, Cypress, Momentic, QA Wolf, and the Assrt repository. QA Wolf pricing is not officially published; figures are from third-party reports. Verified 2026-06-16.

The four rows that actually predict cost

A matrix is only as good as its columns. Drop the rows that every marketing page already answers and keep the ones that decide whether you are still using the tool, and still sane, a year from now.

Score these, not feature counts

How you author tests. Code needs an engineer to maintain it; a recorder needs re-recording on every UI change; proprietary YAML needs the vendor's runner; natural language can be edited by anyone on the team.
Where tests run. Your own CI versus a vendor cloud decides whether a pricing change or an outage can take your suite down.
Who fixes a test when the UI moves. This is where most of the real cost lives, and it is almost never a column in published matrices.
What leaving costs. If the artifact only runs on the vendor's platform, your tests are a lease, not an asset.

The row most matrices put first, and the row that should replace it

Toggle between a typical matrix row and the one that predicts your actual bill. Same tools, very different buying decision.

Supported languages and assertion style. Every tool gets a green check, the grid looks complete, and you learn nothing about the next twelve months.

Counts features you can read off a landing page
All cells trend green, so it cannot break a tie
Silent on maintenance, ownership, and exit cost

The one authoring row no other tool has

Selenium, Cypress, and Playwright author tests as code. Testim and Mabl record clicks. Momentic writes YAML. Assrt is the only row in the matrix whose test is a plain-English Markdown block. Here is exactly what one looks like, taken from the scenario file the tool writes to /tmp/assrt/scenario.md:

#Case 1: Log in with valid credentials
Click the "Sign in" button in the header.
Type a valid email into the Email field.
Type the matching password into the Password field.
Click "Log in".
Verify the dashboard heading is visible.

#Case 2: Reject an empty password
Click "Sign in".
Type a valid email, leave Password empty.
Click "Log in".
Verify an inline "Password is required" error appears.

That file is editable in place and auto-syncs as you change it. Each #Case is self-contained, and verification steps start with words like Verify, Check, or Confirm. The assrt_test tool runs the cases in a real Playwright-driven browser and writes structured results to /tmp/assrt/results/latest.json. You can verify the format yourself in the open-source repository.

From a sentence to a verdict

Write a #Case

Plain-English steps in scenario.md, no selectors

Agent reads the page

Resolves elements from the accessibility tree

Runs in a real browser

Chromium, Firefox, or WebKit via Playwright

Writes a verdict

Pass / fail plus assertions to latest.json

Build your own matrix in four steps

A generic matrix ranks tools for an average team that does not exist. Yours should rank them for your constraints. Here is the fastest way to make one that actually decides.

Write your constraints as the first column

Budget ceiling, must-run-in-our-CI, who maintains tests, compliance needs. These become weighted rows. If a row does not change your decision, delete it.

Shortlist three tools, not ten

A ten-tool grid is a reading exercise. Pick the three that survive your hardest constraint (usually budget or ownership) and compare those seriously.

Score the cost-predicting rows, then weight them

Authoring format, run location, who fixes broken tests, exit cost. Weight the row that hurts most for your team, often maintenance, at double.

Run a one-hour spike on the top two

Point each at one real flow in your app. A free tool like Assrt or Playwright can be trialed in an afternoon; the spike beats any cell in the grid.

Where Assrt loses, honestly

If your real constraint is that nobody on the team will ever write or maintain tests, a managed service like QA Wolf is buying you labor, not software. That is a legitimate trade if the budget exists; an open-source tool cannot staff itself.

If you are a large organization whose gating requirement is compliance paperwork, SSO, role-based access, and a vendor security review, a commercial platform with that machinery will clear procurement faster than a self-hosted open-source tool will.

And if you have a mature, hand-tuned Playwright or Cypress suite that your team already maintains comfortably, there is no urgency to change authoring formats. The matrix is for teams deciding, not teams already settled. For the deeper artifact-by-artifact view, see our Playwright tools comparison and open-source testing tools comparison.

Not sure which row breaks your tie?

Bring your shortlist and constraints; we will help you score the matrix for your team and stack in 20 minutes.

Questions teams ask before they commit

Frequently asked questions

What is a test automation tools comparison matrix?

It is a grid that puts candidate tools in rows and decision criteria in columns so you can compare them at a glance. The useful version scores criteria that predict long-term cost (how you author tests, where they run, who maintains them when the UI changes, and what it costs to leave) rather than counting surface features like supported languages or assertion styles.

Which test automation tool should I pick from the matrix?

No single tool wins every row, so pick by constraint. If your bottleneck is budget and ownership, the free open-source options (Playwright, Selenium, Cypress, Assrt) keep tests in your repo and run in your CI. If your bottleneck is having nobody to write or maintain tests, a managed service like QA Wolf trades money for that labor. If you want AI to draft scenarios but still want to keep and edit the result, Assrt authors plain-English #Case blocks you own.

Why score authoring format instead of supported languages?

Supported languages tell you whether you can start. Authoring format tells you who can maintain the suite a year later. A tool that writes code needs an engineer to edit it; a recorder needs you to re-record on every change; proprietary YAML needs the vendor's CLI to run at all. Assrt's authoring format is a natural-language #Case Markdown block, which means a non-engineer can read and edit a scenario and an AI agent executes it.

What does Assrt generate, and where does it live?

Assrt scenarios are Markdown saved to /tmp/assrt/scenario.md, structured as #Case blocks (for example, '#Case 1: log in with valid credentials' followed by step-by-step instructions). The file is editable in place and auto-syncs. The assrt_test tool runs those cases in a real Playwright-driven browser and writes results to /tmp/assrt/results/latest.json. Because the scenarios are plain text in your project, leaving Assrt costs nothing.

How much does QA Wolf cost compared to open-source tools?

QA Wolf does not publish official pricing. Third-party reports put it at roughly $8,000 per month for around 200 tests, scaling with test volume, with median annual contracts well into five or six figures. Playwright, Selenium, Cypress's core runner, and Assrt are free; their cost is the engineering time to write and maintain tests, which is exactly what a managed service is selling you out of.

Is proprietary YAML a problem in a comparison matrix?

It is the exit-cost row most matrices omit. Momentic stores tests as momentic.config.yaml in your git repo, which looks portable, but the YAML only runs through Momentic's CLI and platform. If you leave, the files do not execute anywhere else. Code-based tools (Playwright, Selenium, Cypress) and plain-English scenarios (Assrt) do not have that lock-in.

Can one matrix cover both AI tools and classic frameworks?

Yes, if the columns are framed around outcomes rather than mechanisms. 'Where does it run' and 'what does leaving cost' apply equally to Selenium and to an AI agent. The mistake is adding AI-only rows (model used, prompt format) that classic tools can't fill, which fragments the grid into two half-empty tables.

Keep comparing

Comparison

Playwright tools comparison

Sort Playwright-adjacent tools by the test artifact each one leaves behind: code, YAML, cloud DSL, or Markdown.

Read

Open source

Open-source testing tools comparison

The portability column most matrices skip, and what it means when you want to leave.

Read

Guide

Test automation ROI calculator guide

Put a number on maintenance cost so the matrix rows stop being abstract.

Read

The matrix, filled in

The four rows that actually predict cost

The row most matrices put first, and the row that should replace it

The one authoring row no other tool has

Build your own matrix in four steps

Write your constraints as the first column

Shortlist three tools, not ten

Score the cost-predicting rows, then weight them

Run a one-hour spike on the top two

Where Assrt loses, honestly

Not sure which row breaks your tie?

Questions teams ask before they commit

Frequently asked questions

Keep comparing

Playwright tools comparison

Open-source testing tools comparison

Test automation ROI calculator guide

Comments (••)

Comments ()