Open source testing where the test file is plain Markdown and the parser is one regex
Every open source testing guide on page one of Google lists the same seven frameworks. Playwright, Cypress, Selenium, Appium, Kiwi TCMS, TestLink, Robot Framework. Pick a runner, pick a spec format, commit the locator strings, watch them rot. This guide skips the listicle and shows you a different shape entirely: a Markdown file at /tmp/assrt/scenario.md, one regex that splits it into cases, an AI agent that re-discovers every element at run time. The runner is MIT, the browser driver is Apache 2.0, and the only test artifact you own is a text file.
What you are reading, by the numbers
Open source testing posts drown you in framework counts. This one is about the lines of code and bytes of text that make a testing stack actually portable. Four numbers you can verify by opening the files linked below.
What every top open source testing guide leaves out
Search "open source testing guide" and the first page is BugBug, Momentic, Aqua, BrowserStack, TestMuAI, TheCTOClub. They all list the same seven frameworks and run through feature matrices. Every one of them assumes the test format is a TypeScript spec, a YAML file, or a row in a SQL-backed test manager.
None of them describe the format an AI agent actually wants: a Markdown file, one regex, zero locators. That is the gap this page fills. The rest of this guide is the concrete implementation: one file path, one RegExp, one wait primitive, one sync mechanism. All four are in open source repos you can clone today.
You leave with a list of tools to evaluate. Every tool wants a different spec file. The question of what a test looks like on disk is never answered.
Tests are Markdown at /tmp/assrt/scenario.md. The parser is at agent.ts:621. The watcher is at scenario-files.ts:90-111. You could build the same thing yourself from this page.
Inputs, runner, outputs, all named
The open source test stack in one diagram. Three inputs on the left become one scenario.md file. The runner takes that plus a live browser and an accessibility snapshot. Three outputs come out the right.
Markdown scenario → agent runner → pass/fail + video + cloud sync
The same test, in two formats
A user-signup flow is about the most common test case in the world. On the left, the kind of YAML a proprietary AI-testing platform stores in its dashboard. On the right, what scenario.md actually contains for the same flow. Line counts matter less than the question: when this page changes next sprint, which file breaks and which does not?
Proprietary YAML vs open source Markdown
# A typical proprietary AI-testing tool scenario.
# Vendor schema. Lives in the vendor's dashboard.
# You cannot grep it, GitHub cannot render it.
scenario:
id: sc_8f2e91c4ab
name: user_signup_flow
runtime_config:
browser: chromium
headless: true
viewport: { width: 1280, height: 720 }
retry: 2
wait_ms: 5000
steps:
- action: navigate
target: "/signup"
- action: fill
locator_type: "data-testid"
locator_value: "email-input-v3"
value: "${VAR.email}"
- action: click
locator_type: "role"
locator_value: "button"
locator_name: "Sign up"
- action: assert
type: "element_visible"
locator_type: "data-testid"
locator_value: "dashboard-heading-v2"
timeout_ms: 5000
on_failure: heal_with_modelThe entire test-file grammar, in one regex
Most open source test runners define a spec grammar in a TypeScript AST, a YAML schema, or a generated parser. Assrt does not. The grammar is one RegExp, the parser is split(), and everything that matches becomes a case boundary. If you want to add a new top-level keyword like "Flow" or "Journey", you edit this line and ship.
fs.watch plus 1-second debounce is the whole sync engine
No sync daemon. No queue. No broker. When you open scenario.md in an editor and save, the runner's file watcher fires the callback below, debounces for a second, and sends one PATCH to the hosted API. The UUID in the URL is the only credential the endpoint needs, which is why scenario-store.ts line 8 comments exactly this: the UUID v4 IS the access token.
A full open source run, from install to video
Three commands. The first registers the MCP server, the second writes a test as plain Markdown, the third runs it against localhost and records a video. No dashboard to log into, no locator strings to maintain, no YAML to validate. The entire loop lives on your machine.
How it runs, end to end
Five steps between "paste a Markdown block" and "pass/fail result with a video". Everything below corresponds to a real file in the repo, not a hand-waved architecture diagram.
Write tests as plain English #Case blocks
Open /tmp/assrt/scenario.md in any editor, or pipe a string into the CLI. The only grammar is #Case N: name followed by steps in prose. Any text that matches the regex at agent.ts:621 is a valid test. There is no YAML schema to learn, no TypeScript types to satisfy, no Page Object base class to inherit from.
An AI agent reads each case and drives a real browser
The runner starts @playwright/mcp over stdio, spawns Chromium, and hands the case text plus the live accessibility snapshot to Claude Haiku. For each step, the agent picks a ref from the snapshot, calls click/type/snapshot, checks assertions, and moves on. No locator strings exist in the spec so none can rot.
wait_for_stable waits on real DOM signals, not sleep(5000)
Before asserting, the agent can inject a MutationObserver via a 20-line primitive (agent.ts:956-1009) and block until the DOM goes 2 seconds without mutations, capped at 30 seconds. Fast pages return fast, streaming pages wait as long as they actually need. This is the single highest-impact decision in the runner for flakiness.
Cloud sync is opt-in and gated by a UUID, not a login
If the scenario has a UUID, the runner writes the plan to /tmp/assrt/scenario.md and starts an fs.watch on it. Edits trigger a 1-second debounced PATCH to app.assrt.ai. scenario-store.ts line 8 states explicitly: the UUID v4 IS the access token. No user accounts, no bearer tokens, no enumeration because the ID space is 2^122.
Everything survives cancellation
The tests are a Markdown file on your disk. The runner is a Node CLI. The browser driver is @playwright/mcp. None of those three depend on app.assrt.ai being up, on the hosted tier existing, or on your account being active. When a commercial AI testing platform goes dark, your suite disappears with it. When this one does, you still have scenario.md.
What "open source" actually covers
A lot of open source testing tools have a community edition that is genuinely open, and an enterprise edition that quietly owns the features you want. This stack does not split. The list below is every moving part, every license, and the fact that there is no "premium tier" hidden behind a login wall.
What open source means in this stack
- Runner: Assrt MCP server under MIT at assrt-mcp/LICENSE
- Browser driver: @playwright/mcp under Apache 2.0
- Browser: Playwright-managed Chromium, Apache 2.0
- LLM SDK: @anthropic-ai/sdk under MIT
- Test format: Markdown file on your disk, no license required
- No CLA. No community vs enterprise edition. One codebase.
Proprietary AI test tool vs open source Markdown runner
A row-by-row look at where the formats diverge. None of this is hypothetical; each right-hand cell is a file path or a repo line you can open yourself.
| Feature | Closed AI QA tool | Assrt (open source) |
|---|---|---|
| Test file format | Proprietary YAML, JSON, or TypeScript spec with locators | Plain Markdown #Case blocks at /tmp/assrt/scenario.md |
| Grammar definition | Schema file, JSON schema validator, or strict TS types | One regex at agent.ts:621 plus a split() call |
| Auth model for shared tests | User accounts, bearer tokens, org-level workspaces | UUIDv4 as the access token (scenario-store.ts:8) |
| What decays when the page changes | Locator strings, selectors, data-testid references | Nothing. Elements are re-discovered every step. |
| Wait strategy | Fixed sleeps or tool-specific waitFor helpers | MutationObserver at agent.ts:956-1009 (2s quiet / 30s max) |
| Runtime engine | Proprietary runner or wrapped Playwright | @playwright/mcp driving real Chromium, Apache 2.0 |
| Licensing | Closed source, commercial license, per-seat pricing | MIT. One version. No community vs enterprise split. |
| Self-hosting | Optional paid tier; core runs on vendor cloud | CLI on Node. No external calls beyond your LLM of choice. |
| Cost at comparable scale | $7.5K / month per seat for closed AI QA platforms | $0 + Anthropic tokens (self-hosted, BYO key) |
Want to see your app tested in plain Markdown today?
20 minutes. You bring a URL, we run a #Case scenario end-to-end in front of you and hand you the .md file to keep.
Book a call →Open source testing, answered
What counts as open source testing in 2026, and where does an AI agent fit into that?
Open source testing has meant two things for most of the last decade: an MIT-licensed runner like Playwright, Cypress, or Selenium, or an MIT-licensed test manager like Kiwi TCMS or TestLink. The runner is where you spend your time and where the lock-in usually hides, because even open source runners disagree on the spec format. Playwright wants a TypeScript file, Cypress wants a TypeScript file, Selenium will take any language but wants a compiled project. An AI agent changes this because the agent does not need a spec in any programming language at all. Assrt is an example of the newer pattern: the spec is a Markdown file, the agent is the runner, and the browser underneath is still Playwright. The whole stack is open source, but the format the tests live in is human prose. That is the piece the listicles at bugbug.io, momentic.ai, and the BrowserStack guide leave out.
Where is this mythical plain Markdown test file actually stored and who decides its shape?
/tmp/assrt/scenario.md, pinned by the constant ASSRT_DIR in /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts line 16. The shape is decided by a single regex in /Users/matthewdi/assrt-mcp/src/core/agent.ts line 621: /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi. Anything that matches that pattern becomes a scenario boundary, and everything between two boundaries is the body of a test. That is why you can open scenario.md in any text editor, paste plain English under each #Case, and have the runner execute it. There is no schema file, no code generation pass, no compile step. The parser is a split() call.
How does the file get synced to the cloud without any auth plumbing?
A chmod-less UUIDv4. scenario-store.ts line 8 comments exactly this: the UUID v4 IS the access token. When assrt_test or the CLI opens a scenario, it writes the plan to disk and calls fs.watch on it. The callback in scenario-files.ts line 97 has a 1-second debounce; when the file settles, a PATCH goes to https://app.assrt.ai/api/public/scenarios/<uuid>. Nothing checks a bearer token, nothing checks a session cookie, nothing checks a user ID. If you have the UUID, you can read and write the scenario. If you do not, there is no enumeration to walk because the ID space is 2^122. This is deliberate and unusual. Most open source testing tools either require a login (Kiwi TCMS) or keep the test entirely on disk (bare Playwright). Assrt is the middle ground: shareable by URL, private by entropy.
Why not just commit a Playwright spec and be done with it?
You can and many teams do. The case for Markdown over a .spec.ts file is specifically about what rots. A Playwright spec contains selectors and locators that are valid the moment you write them and start decaying the moment the page changes. A Markdown case says 'Click the Sign up button' and leaves the element discovery to the agent at run time. When the button moves from the navbar into a dropdown, the Playwright spec breaks and the Markdown case still passes. The tradeoff is speed and determinism: a Playwright spec runs in about 1 second per action against a pre-bound locator, and a Markdown case runs in about 4-6 seconds per action because the agent has to fetch an accessibility snapshot and pick the ref. For a smoke suite of 20 cases, that difference is real. For an auditing suite that runs once a day, it does not matter. The open source testing guide nobody writes is the one that tells you which format fits which job.
Can I run this entirely on my own infrastructure, with zero calls to any cloud?
Yes. The runner is a Node CLI at /Users/matthewdi/assrt-mcp/src/cli.ts that spawns @playwright/mcp locally over stdio. The only network call that leaves your box is the one to the Anthropic API to drive the agent, and even that respects ANTHROPIC_BASE_URL, so you can point at a local proxy, a self-hosted Claude gateway, or an air-gapped endpoint. The cloud sync to app.assrt.ai is opt-in and happens only if you passed a UUID into the CLI or MCP tool; run with --plan inline and the scenario never leaves /tmp. Compared to commercial AI testing platforms that bill $7.5K/month and require sending your DOM to their servers, the bytes-out budget here is zero plus your Anthropic token spend.
What does the actual open source license cover?
The Assrt MCP server is under the license in /Users/matthewdi/assrt-mcp/LICENSE (MIT). @playwright/mcp is Apache 2.0. Playwright itself is Apache 2.0. The Anthropic SDK is MIT. There is no CLA to sign, no contributor shield, no feature held back for a paid tier. The web UI at app.assrt.ai is the hosted convenience layer on top; you can run the full open source stack without ever creating an account. When an open source testing guide tells you 'download the community edition', check whether the commercial version has features that are silently better than the community one. Assrt has only one version, and it is the one in the repo.
How do I migrate an existing Playwright suite into this Markdown format?
Open the spec file, delete everything that is locator logic or imports, leave a one-line comment per test describing what it verifies in plain English, then prefix each with #Case N:. A 40-line test.describe + test.each becomes 6 lines of Markdown. The reverse migration is also mechanical: run assrt_diagnose on a failing case to get the observed element refs, then rewrite as a Playwright spec if you need the determinism. Neither direction is a tooling cliff. That is the point of an open format.
How does Assrt avoid the flakiness that burns people on AI-written tests?
Two decisions in the runner. First, the Markdown spec has no selectors, so there is nothing for a selector change to break. Second, wait_for_stable (agent.ts lines 956-1009) injects a MutationObserver and waits until the page goes 2 seconds with zero DOM mutations or hits a 30-second ceiling, whichever comes first. Fast pages return fast, streaming AI responses wait as long as they actually need, and the sleep(5000) anti-pattern is gone. Between intent-only specs and a real stability signal, the kind of flakiness you see on Reddit threads for AI codegen tools mostly does not happen.
What is missing from Assrt that a full test management tool like Kiwi TCMS has?
User roles, permissions, multi-tenant workspaces, issue tracker integrations, pre-built report templates, an approval workflow for test cases. Kiwi TCMS is a test management system with a full web app and a SQL schema. Assrt is a runner with a hosted scenario store that supports sharing but not governance. If your team needs to assign test cases to QA analysts, tag them against a release, gate PRs on manual sign-off, and attach bug reports, Kiwi TCMS is a better match. If your team needs tests that live next to code, run in a few seconds from an MCP client, and do not require anyone to open a dashboard, Assrt is the lighter fit. They are complementary more than competitive; a large org could reasonably run both.
Is there a benchmark for how many cases fit in one scenario.md before it gets unwieldy?
In practice, 5 to 8 #Case blocks per file is where the generator (see the PLAN_SYSTEM_PROMPT cap in assrt-mcp/src/mcp/server.ts) targets, and running more than about 15 in one pass risks the agent's context window on a long run. The right pattern for larger suites is many scenario.md files, one per feature area, each with its own UUID. Cloud sync handles that naturally because the UUID is the unit of scope, not the file count. The single-file ceiling is a feature budget, not a hard limit.
How did this page land for you?
React to reveal totals
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.