Software regression testing tools, compared by what your test file actually is.
The list of names is closed. Selenium, Playwright, Cypress, WebdriverIO, Katalon, Tricentis Testim, testRigor, Mabl, Ghost Inspector, Momentic, Applitools, Percy, Assrt. Almost every guide picks the same shortlist and re-ranks it by features. This page picks a different axis: the on-disk shape of the file you end up storing after you author a test. Four shapes come out of that cut, and the shape you pick is the thing you have to live with for the next five years.
Direct answer, verified 2026-05-10
The regression testing tools that recur across every 2026 shortlist are Selenium, Playwright, Cypress, WebdriverIO, Katalon, Tricentis Testim, testRigor, Mabl, Ghost Inspector, Momentic, Applitools, Percy, and Assrt. They fall into four buckets defined by the shape of the artifact you store: spec code, recorder JSON or DSL, cloud row, or natural-language Markdown. Pick the shape first, then pick the name inside that shape.
Why feature checklists stopped being useful for this category
Every regression tool above ships parallel browsers, retries, video and screenshot output, CI integration, and some flavour of self-healing selectors. The checklist that used to separate them has collapsed into a row of green ticks. What it does not capture, and what the lists you find when you search this topic never address, is the format of the file you actually authored. The file is the artifact. It outlives the tool. It is the part you have to migrate, review, hand off, and read out loud in a standup. Compare on that and the shortlist regroups itself, cleanly.
A team that picks the right framework but the wrong artifact shape spends year two of the suite fighting the format instead of fighting bugs. A team that picks a less popular framework with the right artifact shape spends year two adding cases. The decision that compounds is the shape.
The same regression test, in four artifact shapes
One scenario: a new user clicks the signup button on the landing page, enters an email and password, submits, and lands on a dashboard with a welcome message. Below is what that scenario looks like in each of the four shapes the category produces, side by side. The shapes are not equivalent and they are not interchangeable, but the standard guides treat them as if they were.
Spec code: Selenium versus Playwright
Two flavours of the same shape. A real programming-language file that lives in your repo. Selenium is verbose with explicit waits, Playwright compresses the same intent into a denser API with built-in auto-waiting. Both are spec code; both portable; both leak selectors into the file.
Same scenario, two spec-code flavours
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def test_signup_flow():
driver = webdriver.Chrome()
try:
driver.get("http://localhost:3000")
wait = WebDriverWait(driver, 10)
signup_btn = wait.until(
EC.element_to_be_clickable((By.CSS_SELECTOR, "button.signup-cta"))
)
signup_btn.click()
email_field = wait.until(
EC.presence_of_element_located((By.NAME, "email"))
)
email_field.send_keys("test@example.com")
password_field = driver.find_element(By.NAME, "password")
password_field.send_keys("hunter2-test")
submit = driver.find_element(By.CSS_SELECTOR, "button[type=submit]")
submit.click()
wait.until(EC.url_contains("/dashboard"))
assert "Welcome" in driver.page_source
finally:
driver.quit()Recorder JSON versus Markdown case
The two shapes that are not programming-language code. A recorder JSON export, of the kind a Katalon, Testim, or recorder-based platform writes when you ask for the file behind a UI-authored test, on the left. The Assrt Markdown #Case format on the right. Both describe the same flow. One is a tree the vendor designed; the other is the same paragraph you would write in a Linear ticket.
The two non-code shapes, head to head
{
"id": "tc-9f3a-signup",
"name": "signup flow lands on dashboard",
"steps": [
{ "type": "navigate", "url": "http://localhost:3000" },
{ "type": "click", "locator": { "strategy": "smart", "candidates": [
{ "kind": "css", "value": "button.signup-cta", "score": 0.92 },
{ "kind": "text", "value": "Sign up", "score": 0.88 }
]}},
{ "type": "fill", "locator": { "name": "email" }, "value": "test@example.com" },
{ "type": "fill", "locator": { "name": "password" }, "value": "hunter2-test" },
{ "type": "click", "locator": { "text": "Submit" }},
{ "type": "assert", "kind": "url-contains", "value": "/dashboard" },
{ "type": "assert", "kind": "text-visible", "value": "Welcome" }
],
"metadata": { "createdBy": "recorder@v3.4", "selfHeal": true }
}Four buckets, with the tool names that live inside each
The same shortlist that the standard lists treat as a flat scoreboard regroups into four shapes, each with sharply different switching cost, review cost, and self-healing semantics.
Spec code
Selenium scripts, Playwright .spec.ts files, Cypress cy.ts, WebdriverIO suites. The test is a real program in your repo. Maximum flexibility, maximum authoring cost, full zero-vendor portability. Self-healing is whatever your locator strategy gives you.
Recorder JSON or DSL
Katalon test cases, Tricentis Testim trees, recorder exports from various vendors. The test is a structured file in a format the vendor defines. Author through a UI, store in git. Read it without the vendor at your own risk.
Cloud row
Mabl flows, Ghost Inspector tests, Momentic cases, Applitools sessions. The canonical test lives in the vendor's database. You author through their UI, you run through their cloud, and the export, if any, is a courtesy.
Markdown case
Assrt #Case blocks. The test is a paragraph of English in a .md file in your repo. An agent re-reads the page each run and resolves the selectors against a live accessibility tree. No stored locators, nothing to heal.
The anchor fact: where the Markdown shape actually comes from
The Markdown-case bucket is the youngest and least documented of the four, so it earns the closest look. The format is defined in one place. In the Assrt MCP server source at src/mcp/server.ts line 219, a constant called PLAN_SYSTEM_PROMPT instructs Claude Haiku to produce test cases in this exact grammar: #Case N: short name, followed by three to five plain-English steps describing what to click, what to type, and what to verify. The same prompt hard-caps a file at five to eight cases. The output is written to /tmp/assrt/scenario.md by the helper at src/core/scenario-files.ts, which also watches the file for edits and syncs them back to the scenario store.
At run time, an agent reads the Markdown, opens a real Chromium through @playwright/mcp over stdio, and for each line of each case calls the matching Playwright tool: navigate, click, type_text, snapshot, press_key, wait, scroll, evaluate, assert. The execution layer is Playwright; the authoring layer is English. That separation is the whole reason the artifact looks the way it does.
“The PLAN_SYSTEM_PROMPT caps each Markdown case at 3 to 5 actions and the whole file at 5 to 8 cases. The agent then executes those Markdown lines through real Playwright MCP tool calls.”
assrt-mcp/src/mcp/server.ts:219-236
What changes when you stop storing selectors
The selector is the most volatile line in a regression test. In the spec-code shape it is the line that gets edited most often. In the recorder shape it is the line the vendor silently swaps under you. In the Markdown shape there is no line at all, because the description of the element is the description, not a pointer.
A signup button when the CSS class renames
A spec file changes when a class name churns. await page.locator(".signup-cta__btn--primary").click(); The selector is brittle. The first CSS rename in marketing breaks the test, the CI run goes red, someone on QA spends fifteen minutes fixing one line. Multiply by every flow that touched the renamed class.
- CSS class .signup-cta__btn--primary lives inside the test file
- First marketing-driven rename breaks the suite
- QA spends 15 minutes per flow chasing the rename
The trade-off is honest. Stripping the selector means resolution happens at run time, and run time costs LLM tokens. For a suite that runs ten times a day on five flows, the cost is in cents. For a suite that runs a thousand parallel branches a day across hundreds of flows, the cost starts to matter, and a hybrid approach, Markdown for the casual cases a product manager files and spec code for the deep regression suite a QA engineer owns, becomes the honest answer.
What you keep when you leave
Switching cost is the part of the tool decision that is hardest to see when you are picking, and hardest to ignore three years in. Each artifact shape has a different end-state when the contract ends or the framework falls out of favour.
The honest switching story for each shape
- Spec code is portable. You can run it on any machine with the right runtime, and the migration from Selenium to Playwright is mechanical.
- Recorder JSON travels poorly. The export schema is the vendor's, and replaying it elsewhere is a rewrite, not a port.
- Cloud-row tests need an export endpoint that may not exist. Some vendors offer one, some do not, some offer one that lossy-roundtrips.
- Markdown cases survive any move. The file is English; pasting it into another agent or rewriting it as Playwright by hand is a copy-edit.
- The hidden cost in every shape is the test that nobody reads anymore. The agent on the Markdown side and a code reviewer on the spec-code side both make this less likely.
Assrt against the closest competitor shape
The closest competitor to Markdown #Case is the cloud-row shape: both let a non-developer author a test, both rely on a runtime layer to resolve intent. The difference is where the canonical file lives.
Same authoring promise, different ownership of the artifact
| Feature | Cloud-row platforms | Assrt |
|---|---|---|
| Where the test actually lives | Vendor cloud row, optional flat export | /tmp/assrt/scenario.md, commit it to your repo |
| What a single test looks like | Step tree in a UI, or a JSON export of one | 5 to 8 #Case blocks, each 3 to 5 plain-English lines |
| How selectors survive UI changes | Vendor scores candidate selectors and swaps silently | No stored selector. Agent resolves against the live accessibility tree on every run. |
| Engine doing the actual browser work | Proprietary runtime, sometimes Playwright under the hood | @playwright/mcp over stdio, every action is a real Playwright call |
| Cost model | Per-seat or per-parallel-runner subscription | MIT-licensed, you pay only the LLM tokens the agent burns at runtime |
| Leaving the tool | Export to a vendor schema, often partial, sometimes nothing | Your tests are already plain Markdown. You leave by deleting the npm package. |
| Who can author a new test | A QA engineer who knows the recorder, or a developer in spec-code | Anyone who can describe the flow in English, including a product manager |
Cloud-row platforms differ from each other on every row. The column reflects the most common posture across Mabl, Ghost Inspector, Momentic and recorder-driven entries; specific vendors may improve on individual rows.
A useful smell test before you pick
Open the vendor docs to the page that shows the export format of one test. If that page does not exist, the tool is cloud-row and your future migration is a rewrite. If it exists and the file is a real programming-language artifact, you are in spec-code and you have full portability. If it is a JSON tree authored by a recorder, your future migration is a structured rewrite. If the answer to the question is "your test is already plain Markdown in your repo", you are looking at the Markdown-case shape, and the question of leaving is moot, because there is nothing to take with you that you do not already have.
“We kept the deep Playwright suite and added an Assrt #Case file the PM owns. Two months in, the PM has shipped more regression coverage than the previous quarter and nothing has churned in the deep suite.”
How to actually decide
The mental model that survives contact with year two:
- Pick the artifact shape first. Spec code if a QA engineer or developer owns the suite. Recorder if you need a UI for less technical authors and the vendor relationship is long-term. Cloud-row if the vendor is doing genuine multi-tenant work you cannot replicate (rare). Markdown if you want the artifact to read like English and the runtime to be Playwright underneath.
- Then pick the name inside that shape. Inside spec code: Playwright is the modern default; Selenium is still correct for legacy stacks; Cypress and WebdriverIO are stylistic. Inside Markdown: Assrt is the open-source path that keeps you on real Playwright underneath.
- Run two shapes if you can. Most working setups in 2026 are a deep spec-code suite plus a casual Markdown layer the rest of the team owns. The cost of the second shape is small; the coverage gain is large.
Compare your current setup against the four shapes
Thirty minutes. Bring your existing suite, walk out with a clear picture of which shape you are already on and whether adding a second one would compound.
Common questions about regression testing tools
What are the most-cited software regression testing tools in 2026?
Across the lists that appear at the top of public search for this topic, the same names recur: Selenium and Playwright for open-source spec-code suites; Cypress and WebdriverIO as developer-oriented alternatives in the same lane; Katalon and Tricentis Testim as commercial spec-or-recorder hybrids; testRigor, Mabl, Ghost Inspector, and Momentic as recorder or natural-language platforms; Applitools and Percy as visual regression specialists; and a newer agent-driven category that includes Assrt. The shortlist is closed enough that the interesting decision is not which name you pick but which shape of test artifact you commit to keeping for the next five years.
Why compare regression testing tools by artifact shape instead of feature checklist?
Feature checklists are fungible at the top of the market. Every serious tool ships parallel browsers, retries, headed and headless modes, CI integration, video and screenshot output, and some flavour of self-healing. What is not fungible is the format your authored tests live in once you write them. That format decides three things: whether a non-QA can read a test, whether you can grep and refactor a thousand cases without the vendor, and how long the migration takes the day you decide to leave. The feature grid does not surface any of that. The artifact shape does.
What are the four artifact shapes regression testing tools actually use?
Spec code (Selenium scripts, Playwright .spec.ts files, Cypress cy files, WebdriverIO suites) where the test is a real programming-language artifact in your repo. Recorder JSON or proprietary DSL (Katalon Studio test cases, Tricentis Testim trees, parts of testRigor) where the test is a structured but vendor-defined file. Cloud row (Mabl flows, Ghost Inspector tests, Momentic cases, Applitools sessions) where the canonical test lives in the vendor's database and you author it through their UI. Natural-language Markdown (Assrt) where the test is plain English in #Case blocks that an agent interprets at runtime. Each shape has a clean answer to the question 'what do I have if the vendor disappears tomorrow'.
Is Assrt the same as Selenium or Playwright?
It is in the same outcome category, but a different tier of the stack. Under the hood Assrt drives @playwright/mcp over stdio, so the browser actions you end up running are real Playwright. What differs is the authoring layer and the storage layer. Instead of writing a TypeScript spec with page.click and locators, you write a paragraph that begins with #Case 1: and describes the user flow in English. The plan is saved to /tmp/assrt/scenario.md and then executed by an agent that resolves selectors against the live accessibility tree each run. The execution reliability is Playwright's; the authoring effort is closer to writing a bug report.
How does self-healing actually work across these tools and is it real?
It depends on the artifact shape. In a spec-code suite, self-healing is whatever your locator strategy buys you, getByRole and getByText are tolerant; CSS selectors usually are not. In recorder or DSL suites, self-healing is the vendor scoring candidate locators and silently swapping them when one fails; the quality varies and you mostly cannot inspect it. In cloud-row platforms, self-healing is a feature you toggle in a settings page, with a confidence threshold and a per-step audit log. In the Markdown-and-agent shape Assrt uses, there is no stored selector to heal: the agent re-reads the accessibility tree on every run and picks an element matching the natural-language description, so a button labelled Submit will keep working even if the class names underneath churn. The trade-off is that runs cost LLM tokens because the resolution happens at runtime.
If I already have a thousand Playwright specs, should I switch?
No. The right move is to keep what works and add the shape that solves a different problem. Most teams in this position run two suites: a maintained Playwright spec suite for the deep regression flows a QA engineer owns, and a thin Markdown #Case file an agent can spin up against a feature branch in two minutes when a product manager files a bug report. The cost of running both is small because the agent layer reuses the same Playwright primitives; the benefit is that the heavy suite stops being the bottleneck for casual coverage.
What does a single regression test actually look like in plain Markdown?
A single case is roughly seven lines. The first line is #Case 1: followed by a short action-oriented name. The next three to five lines describe what to click, what to type, and what to verify, in the order the user does it. There are no selectors, no imports, no setup blocks. The PLAN_SYSTEM_PROMPT in the Assrt MCP server constrains a generated file to 5 to 8 cases of 3 to 5 actions each, which is the same coverage a typical short Playwright spec produces, in roughly a third of the surface area.
What about visual regression specifically?
Visual regression is a separate sub-category, served well by Applitools, Percy, and the snapshot features in Playwright and Cypress. The artifact there is a baseline image plus a diff threshold, and the choice between vendors mostly comes down to how their AI handles anti-aliasing and dynamic content. For a team starting from scratch, pairing one functional regression tool from the four shapes above with a dedicated visual regression layer is a more honest setup than buying one tool that claims to do both at the same depth.
What does Assrt cost compared to the commercial entries?
The Assrt MCP server, the agent, and the CLI are MIT-licensed and free; the cost on top is the LLM API calls the agent makes during planning and execution, which on Claude Haiku land at cents per run for a typical 5 to 8 case file. The closest commercial competitors in 2026 publish list prices that range from a few hundred dollars a month for a small team plan to multi-thousand-dollar yearly contracts for the larger recorder-and-cloud platforms. The honest comparison is that you trade a flat seat fee for a per-run usage cost; for low-volume suites the usage cost is smaller, for high-volume CI matrices the cost crosses over at some scale.
Where does the literal #Case format live in the Assrt source code?
It is defined in /Users/matthewdi/assrt-mcp/src/mcp/server.ts at line 219, inside the constant PLAN_SYSTEM_PROMPT. The exact prompt instructs the model to produce blocks of the form '#Case N: short name' followed by step-by-step instructions, with hard caps of 3 to 5 actions per case and 5 to 8 cases per file. The on-disk storage path is /tmp/assrt/scenario.md, written by the helper in /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts, which also watches the file for edits and auto-syncs changes back to the cloud scenario store.
Adjacent angles on the same set of tools
Keep reading on the same shortlist
Best e2e testing tools, ranked by time to first green check
An adjacent shortlist, ranked on a different axis. Useful if you are choosing for a brand new suite and the year-one setup cost is the bottleneck.
AI generated regression tests
Where the model fits in: planning, executing, and repairing. How that maps to the four artifact shapes in this guide.
Visual regression testing guide
The companion sub-category to functional regression. Snapshot strategy, threshold tuning, and where visual regression overlaps with the four artifact shapes.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.