Open source testing tools comparison: the one column every matrix skips.
Every article on this keyword lines the same eight tools in the same eight columns: Selenium, Cypress, Playwright, Puppeteer, WebdriverIO, plus a few flavors of BDD. Language support, browser matrix, CI providers, price. The comparisons are fine. They are also interchangeable, because none of them include the one row that actually separates a 2025-era testing tool from a 2014-era one. That row is at the bottom of this page.
The anchor fact
The full Markdown-to-test-suite parser is one regex, 39 characters long, at agent.ts:621.
No DSL. No YAML schema. No describe or it blocks. Any Markdown heading that starts with #Case, #Scenario, or #Test opens a new scenario. Everything after it is English prose, passed verbatim to a Claude Haiku agent that drives real Playwright. That is the entire compile step for an open source testing tool you can install with one npx command.
The column you have never seen on this matrix
Open every open source testing tools comparison on the first page of Google. The columns you will see are some subset of: language support, browser support, headless mode, test runner style, CI integration, parallelization, licence, community size, price. Occasionally there is a column for auto-wait behavior or network interception.
There is never a column for the most user-visible property of a testing tool: what does the test source file, the literal thing you check into git, look like on disk? The answer to that question determines who on the team can read the test, who can edit it, and how a pull request that changes test coverage looks to a reviewer. It is the column that would matter most to a product manager or a designer, and it is missing from every chart because the answer was boring for twenty years. Every open source testing tool produced a .js, .ts, .py, .java, or .feature file. The column would have said the same thing in every row.
That changed in 2025. Add the column and the matrix splits in three.
| Tool | Year | Authored in | Test file on disk | Readable by non-coder |
|---|---|---|---|---|
| Selenium | 2004 | Java, Python, JS, C#, Ruby | .java / .py / .js with WebDriver imports | No |
| WebdriverIO | 2011 | JS, TS | .js / .ts mocha or jasmine suite | No |
| Nightwatch | 2014 | JS, TS | .js page object + assert tree | No |
| TestCafe | 2016 | JS, TS | .testcafe.ts fixture and test blocks | No |
| Cypress | 2017 | JS, TS | .cy.ts with describe / it / cy.get | No |
| Puppeteer | 2017 | JS, TS | .js script, your own runner | No |
| Playwright | 2020 | JS, TS, Python, .NET, Java | .spec.ts with test / expect | No |
| Cypress + Cucumber | 2017 plus plugin | Gherkin plus JS/TS step defs | .feature file backed by .ts step-defs | Partial |
| Playwright-BDD | 2022 plus plugin | Gherkin plus JS/TS step defs | .feature file backed by .ts step-defs | Partial |
| Assrt | 2025 | English prose in Markdown | .md file with #Case headings | Yes |
Eight traditional runners cluster at one end, where the test file is a program. Gherkin plus step-defs sits in the middle, where the .feature file reads as English but is backed by a code registry. Assrt sits at the far end alone, where the test file is prose and there is nothing underneath it.
Same test, two files
Consider a three-case regression suite: homepage loads, signup with a disposable email and an OTP code, log out. Below is the Assrt version (Markdown) followed by an honest Playwright version. Both work. The Playwright version is faster per run and more precise. The Assrt version is shorter, has no dependencies beyond one npm install, and can be reviewed by someone who has never touched a test framework.
Assrt: tests/regression.md
#Case 1: Homepage loads Navigate to /. Verify the main heading is visible within 3 seconds. #Case 2: Sign up with a disposable email Click "Get started". Use a disposable email. Enter the OTP that lands in the inbox. Verify the URL contains /app. #Case 3: Log out leaves the landing page reachable Click the user menu in the top right. Click "Sign out". Verify the page URL equals / and the heading "Get started" is visible.
14 lines. No imports. No selectors. The agent reads the accessibility tree at runtime and picks the right element.
Playwright: tests/signup.spec.ts
// tests/signup.spec.ts
import { test, expect } from "@playwright/test";
import { MailSlurp } from "mailslurp-client";
test.describe("Sign up", () => {
test("homepage loads", async ({ page }) => {
await page.goto("/");
await expect(page.getByRole("heading").first()).toBeVisible({ timeout: 3000 });
});
test("sign up with disposable email", async ({ page }) => {
const mailslurp = new MailSlurp({ apiKey: process.env.MAILSLURP });
const inbox = await mailslurp.createInbox();
await page.goto("/");
await page.getByRole("button", { name: /get started/i }).click();
await page.getByLabel(/email/i).fill(inbox.emailAddress);
await page.getByRole("button", { name: /continue/i }).click();
const email = await mailslurp.waitForLatestEmail(inbox.id, 60_000);
const code = email.body?.match(/\d{6}/)?.[0];
expect(code).toBeTruthy();
await page.getByLabel(/code/i).fill(code!);
await expect(page).toHaveURL(/\/app/);
});
});30+ lines. A MailSlurp client. Role locators. Regex timeouts. Faster per run, but not something a non-engineer edits.
The parser, in full
If the matrix above strains credibility (“a tool cannot really use a .md file as its test format without some hidden schema somewhere”), the grounding is below. This is the entire function in the open source runner that turns a Markdown plan into a scenario list. It lives at assrt-mcp/src/core/agent.ts, lines 620 to 631. Twelve lines of code, one of which is a regex.
The regex is /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi. Thirty-nine characters, case insensitive, global. The leading #? means the hash is optional (so “Case 1:” also works, not only “#Case 1:”). The digit block captures optional case numbers. The trailing separator is a colon or a period. parseScenarios splits the input on that regex and pairs each chunk of text with the heading that preceded it. The steps field is the original English, unmodified. That is the entire DSL.
How a prose file becomes a browser session
The pipeline is short enough to draw in one diagram. A Markdown file goes in. The parser (one regex) splits it into scenarios. For each scenario an LLM agent reads the page’s accessibility tree through Microsoft’s @playwright/mcp server and emits tool calls. The browser driver is real Playwright, unchanged. A WebM video and a JSON report come out.
Assrt runtime pipeline
Five steps from .md file to pass/fail
1. You edit a .md file
The test file lives at tests/checkout.md (or anywhere you like). It has #Case headings and free-form English underneath each one. Commit it to git. Diff it in a PR.
2. The CLI reads the file verbatim
assrt run --plan-file tests/checkout.md --url http://localhost:3000 reads the file and passes the raw text to parseScenarios in agent.ts line 620.
3. The regex splits on #Case, #Scenario, or #Test
Each split yields { name, steps }. The steps field is the original English sentences, untouched. There is no compile step, no AST, no step-definition registry.
4. A Haiku agent drives Playwright MCP
For each scenario the agent reads the live accessibility tree, picks the next action (click, type, navigate, wait, http_request), and repeats until it calls complete_scenario. The browser driver is Microsoft's @playwright/mcp.
5. A video + structured report land on disk
Every run writes a WebM video to /tmp/assrt/videos and a JSON report with per-scenario pass/fail, assertions, and tool-call transcript. In extension mode the video also opens in your real Chrome tab.
What the missing column changes
Once you add the readability column, the rest of the matrix does not become wrong, it becomes incomplete. Speed, CI integration, and browser matrix still matter. But for teams choosing tools in 2026, the column below is the one whose answer you remember after you close the tab.
The columns every comparison already has
Language support. Browser matrix (Chromium, Firefox, WebKit). Headless vs headed. CI provider integrations. Parallelization model. Licence. Community size on GitHub.
The column nobody includes
What does the test source file look like on disk, and can a non-coder read it end-to-end and understand what it tests? This is the column that 2025 AI-era tools change.
Why the column was omitted
Until recently, the answer was boring and uniform. For every OSS tool the test file was a .js, .ts, .py, .java, or .feature file. The column would show the same thing for every row.
What changes when you add it
The traditional tools cluster at one end (test file is a program). Gherkin-over-BDD tools sit in the middle (test file is prose, but is backed by code). Assrt sits alone at the far end (test file is prose, full stop).
What the column unlocks
Product managers can read the tests. Designers can review what a flow should do. Claude Code can author a test during a feature PR without you writing selectors.
Where the column still does not help
Inner dev loop where you rerun a 3-second test 100 times in an hour. Hard-spec flows that care about exact locators. Use scripted Playwright for those and reserve Assrt for regression.
What a run looks like on the terminal
One last piece of evidence that the pipeline above is not a marketing story. Below is an annotated terminal session that reads a .md plan, runs the single Markdown case through parseScenarios, drives Playwright, and writes a video.
Honest tradeoffs
Assrt is not a drop-in replacement for Playwright or Cypress, and anyone selling you on that is oversimplifying. The comparison above does not pretend otherwise.
- Per-run latency is higher. An LLM round trip per step, typically 0-40s for a multi-step flow that Playwright runs in 3 seconds. Fine for regression. Slow for a tight inner loop.
- Agent variance exists. Two runs can pick slightly different paths through the page when the plan is ambiguous. Write plans that are specific enough to remove ambiguity and pin the model version.
- LLM cost is real. The default is claude-haiku-4-5-20251001. Cheap per run, non-zero. Scripted Playwright costs a few milliseconds of CPU.
- Sub-pixel visual regression is out of scope. If your job is to assert that a button moved by 2 pixels, use Playwright’s toHaveScreenshot. Assrt can assert “a button labelled X is visible.”
What Assrt does change is the shape of the test artifact and who on the team can author, read, and review it. That is the column no other comparison includes, and it is the only column that meaningfully separates open source testing tools in 2026.
Want to see a real regression suite ported to a .md file?
Book a 20-minute walkthrough. Bring the flakiest test in your repo, watch it run as English prose.
Book a call →FAQ
When you say 'the parser is one regex', what specifically does that mean in code?
It means that parsing a test plan from Markdown into a list of scenarios happens on one line of source. In assrt-mcp/src/core/agent.ts at line 621, the full parser is const scenarioRegex = /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi;. The parseScenarios function uses that regex to split the input text and returns an array of { name, steps } pairs where steps is the original English prose, untouched. No AST, no schema, no visitor pattern. Anything an agent could figure out from a plain-English test description lives in the LLM, not in the parser. This is the opposite of Cypress or Playwright, where the test file is a program that runs in a dedicated runner process with describe/it/beforeEach/afterEach semantics.
Is Assrt actually comparable to Selenium, Cypress, and Playwright, or is it in a different category?
It is in a different category, and that is the honest answer. Selenium, Cypress, Playwright, Puppeteer, and WebdriverIO are test runners: you write code in a supported language, the runner executes that code against a browser, the return value is pass or fail. Assrt is a test agent: you write English in a Markdown file, an LLM agent interprets each sentence at runtime and emits Playwright MCP tool calls through the @playwright/mcp server Microsoft maintains. The underlying browser driver is the same Playwright you would use directly; what differs is who writes the click-and-type sequence. In a comparison table for 2026, leaving Assrt off the list because it is 'different' is the same mistake a 2014 Selenium comparison made by leaving off Cypress because Cypress was 'just a JS test framework.'
Which open source tools does this comparison actually cover, and on what basis did you pick them?
Selenium (webdriver-based, 2004), Cypress (in-browser runner, 2017), Playwright (multi-browser, MIT-licensed by Microsoft, 2020), Puppeteer (Chrome-focused, Google, 2017), WebdriverIO (WebDriver Protocol wrapper, 2011), Nightwatch (Node-based runner, 2014), TestCafe (no WebDriver, 2016), and Assrt (Markdown-driven agent over Playwright MCP, 2025). The picks are the ones that keep showing up on every list of 'open source end-to-end testing tools' plus Assrt. Anything that is a paid-first product (QA Wolf, Momentic, Testim, Mabl) is excluded because the keyword is specifically open source.
Cypress and Playwright already support natural-language-ish test plans through Cucumber/Gherkin. Why is Assrt different?
Gherkin Given/When/Then steps are still a DSL. Every Given clause in a .feature file maps to a step definition function in JavaScript or TypeScript that you have to implement and maintain. Your BDD engineer writes 'Given the user is signed in' once, and then you write a step def that calls cy.login(user) or page.goto('/login'). The English is a fiction layered on top of code. In Assrt there is no step-definition registry. 'Sign in with a disposable email' gets passed to a Haiku agent, which calls create_temp_email, then type_text, then click, then wait_for_verification_code, all based on reading the page's accessibility tree at runtime. The maintenance cost of a step-definition library is zero, because there is no library.
If there is no compiled DSL, what does the test 'source' actually look like in version control?
A .md file. For example, tests/checkout-regression.md starts with '#Case 1: Add a product to the cart' on one line, then on the next line 'Navigate to /products/widget. Click Add to cart. Verify a toast containing Added appears within 2 seconds.' That is the full test. Commit it to git. Diff it in a pull request. Let a product manager read it and suggest an edit. Because the file is Markdown and the runner only recognizes #Case, #Scenario, or #Test as a scenario delimiter, you can interleave any other Markdown content (headings like ## Context, paragraphs of explanation, commented-out scenarios) and the parser will ignore it. The file is half documentation, half executable.
How does a one-regex parser handle numbered cases, nested cases, or parameterized scenarios?
Numbered cases: covered directly, because the regex matches a digit block (\s*\d*). #Case 1 and #Case 17 are both valid; the digit is captured as part of the name and stripped. Nested cases: not a concept in Assrt. There is no describe block. If you want grouping, use filesystem folders, e.g. tests/auth/signup.md and tests/auth/signin.md. Parameterized scenarios: covered by the Test Variables feature, which interpolates {{email}} and similar placeholders into the plan text before the agent reads it. The plan remains a single Markdown file; variables travel alongside it as a separate JSON object. The absence of a test-runner tree is a feature, not a gap.
How does Assrt's runtime cost compare to Playwright or Cypress for an equivalent test?
Slower per run, much lower per maintenance. A scripted Playwright test that takes 3 seconds might take 15 to 40 seconds in Assrt, because each plan step is an LLM tool-calling round trip. The default model is claude-haiku-4-5-20251001, which is cheap per call but not free. The trade is that when the signup button moves from the header to a drawer, a scripted test breaks and needs a selector update, while the Assrt plan ('Click Get started') keeps passing because the agent reads the accessibility tree at runtime. For a long-lived regression suite where selectors drift, that trade is usually worth it. For a tight inner dev loop where you rerun the same click 100 times in an hour, scripted Playwright is still faster and cheaper.
Is the comparison fair when only Assrt uses an LLM? That feels like an apples-to-oranges chart.
It is apples-to-oranges if you think the column 'uses an LLM' is the interesting one. It is apples-to-apples if the column you care about is 'does a human read the test file and understand what it tests without knowing the implementation.' Every open source testing tool on the list can answer 'yes' to 'can I write reliable tests with this', so that column does not separate them. The column that does separate them is readability of the artifact. Under that column, Gherkin BDD tools come closest (English-ish Given/When/Then), but they still require a step-definition layer. Assrt sits alone at the end of that axis.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.