Test readability

Readable AI generated tests means a PM can sign off without opening an IDE

Every "AI readable test" article on the SERP today assumes the artifact a human reviews is Playwright TypeScript. It is not. The artifact a human reviews should be a spec they can read aloud in standup. Here is the grammar that makes that possible, the one regex that is the entire DSL, and why skipping the codegen step is the point.

M
Matthew Diakonov
9 min read
4.9from developers running Assrt locally
Scenarios are plain English #Case blocks, not .spec.ts files
Whole DSL is one regex at src/core/agent.ts:621
PASS criteria are user-visible truths, not selector chains

The word "readable" has been quietly redefined

Walk through the top search results for readable ai generated tests and a pattern emerges. Every vendor uses the word, and every vendor means the same thing: pretty Playwright TypeScript. Consistent locators. Named roles instead of CSS. An LLM that refactors Codegen output into something you would not be embarrassed to merge.

That is a fine definition for engineers. It is a useless definition for the person whose opinion the test exists to encode: a product manager, a designer, a compliance reviewer, the human who knows whether the app is actually doing the right thing. If they need to read getByRole to review a test, the test is not readable, no matter how tidy the indentation is.

The readability that matters is the sign-off test: can a non-engineer read the scenario aloud, understand every line, and tell you whether the outcomes match the product they thought they shipped? Everything else is marketing for engineers.

The heading "Sign in" is visibleThe description updates to include "Check your email"No password field is presentA submit button reading "Send sign-in link"No error message in the red areaThe form is no longer visible

Actual PASS criteria lines from a real Assrt scenario. No selectors, no chaining, no framework.

The three-rule grammar

The Assrt scenario format has three parts and nothing else. A scenario is readable if (and only if) the three parts stay in their lanes. Here is what each one is allowed to contain.

#Case N: <outcome>

Starts with a hash, the word Case (or Scenario or Test), a number, and a colon. The name describes a user-visible outcome, not an implementation. 'Login overlay shows new passwordless UI' is right. 'Tests the LoginOverlay component' is wrong.

Bulleted steps

Each step is one action in English. Navigate to a URL. Wait for something. Type text. Click a button. No selectors, no assertions, no sleeps. If a step reads like a sentence a PM would say, it is correct.

PASS criteria

A bulleted list of user-visible truths that must hold. 'The heading X is visible.' 'No password field.' 'The URL contains /onboarding.' PASS criteria are the sign-off surface. runScenario() treats them as MANDATORY: miss one, scenario fails.

No fourth thing

There is no DSL for assertions, no YAML block for data, no page object layer. If a scenario wants more structure, you split it into more #Cases.

No selectors

The agent resolves intent to accessibility refs at runtime through Playwright MCP's snapshot tool. Scenario files never contain CSS, XPath, or data-testid references.

The entire parser is a 49-character regex

You do not have to take the "whole DSL fits on one screen" claim on faith. The parser that splits a scenario file into #Case blocks is open source at src/core/agent.ts:620-631 in the @assrt-ai/assrt repo. Here it is in full. The regex on line 621 is literally the whole grammar.

assrt-mcp/src/core/agent.ts

That is the entire specification of what makes a line a scenario boundary. Anything else in the file, every step, every PASS criterion, is a free-form English line the LLM agent consumes at run time. You could port the parser to Python, awk, or bash in fifteen minutes. That portability is not marketing; it is a property of keeping the format this thin.

0regex defines the entire DSL
0 charsof grammar in total
0selectors in your scenario file
0parts per #Case, no fourth

Side by side: what your PM reviews

Same test, same browser behavior. Left column is what most AI test tools hand a human to review. Right column is what Assrt hands a human to review. Look at them through the eyes of somebody who does not know what toHaveCount(0) means.

What a non-engineer is asked to sign off on

import { test, expect } from '@playwright/test';

test('login overlay shows new passwordless UI', async ({ page }) => {
  await page.goto('https://app.example.com/');
  await page.waitForSelector('[data-test="login-card"]');

  await expect(
    page.getByRole('heading', { name: 'Sign in' })
  ).toBeVisible();

  await expect(
    page.getByText(
      "Enter your email and we'll send you a sign-in link."
    )
  ).toBeVisible();

  await expect(page.locator('input[type="email"]')).toBeVisible();
  await expect(
    page.getByRole('button', { name: 'Send sign-in link' })
  ).toBeVisible();

  await expect(
    page.locator('input[type="password"]')
  ).toHaveCount(0);
});

test('submitting an email shows confirmation', async ({ page }) => {
  await page.getByRole('textbox').fill('reader@example.com');
  await page.getByRole('button', { name: 'Send sign-in link' }).click();
  await page.waitForTimeout(10_000);

  await expect(
    page.getByRole('button', { name: 'Send sign-in link' })
  ).toHaveCount(0);
  await expect(
    page.getByText('Check your email for a sign-in link')
  ).toBeVisible();
  await expect(page.locator('[data-test="error"]')).toHaveCount(0);
});
54% fewer lines

The right column has zero functions, zero imports, zero awaits, and zero mystery method names. The left column has roughly thirty of each. Both describe the exact same test in the exact same browser. Only one of them is readable by the person whose opinion the test is trying to encode.

What flows into one readable scenario

A scenario does not generate itself from nothing. Inputs come from the agent that just edited the code, your existing product specs, and the current accessibility tree of the page. The runner fuses them into English.

Sources → readable scenario

Coding agent
Product spec
Page accessibility tree
Prior run history
#Case + PASS criteria
PM reads aloud
LLM agent drives browser
Scenario committed

What the file looks like after a real run

Here is a scenario lifted straight from a live /tmp/assrt/scenario.md on my machine. Two cases, eight steps, twelve PASS criteria, every line is something you could read aloud in standup without the word "selector" appearing once.

/tmp/assrt/scenario.md

The sign-off loop

Here is the loop that uses the readable format for what it is actually good for: getting a product truth approved by the person who knows the product, before the agent burns a minute running a test against something the PM does not want.

1

Agent drafts the scenario in plain English

assrt_plan navigates to the URL, reads the accessibility tree, and writes a #Case block with bulleted steps and PASS criteria. No .spec.ts files, no page objects. The draft lands at /tmp/assrt/scenario.md.

2

You read it. Or your PM does.

Open the markdown file. Read each #Case name aloud. Read the PASS criteria aloud. If a sentence sounds wrong, you already found the bug in the intent, before the browser ever opened.

3

Edit in place

Rewrite a step in Vim, Cursor, or TextEdit. The fs.watch in scenario-files.ts (startWatching) picks up the change, debounces 1 second, and syncs to central storage. The next assrt_test run picks up your edits.

4

Run against the browser

assrt_test executes each step through Playwright MCP, resolves intent to accessibility refs, checks every PASS criterion, and writes results to /tmp/assrt/results/latest.json. Failed cases report the specific criterion that missed, in the same English you wrote.

5

Commit the scenario, not the code

Copy the markdown into tests/scenarios/ in your repo. Diff it in PRs. Six months later, when the DOM has shifted three times, the scenario still runs because the agent resolves intent fresh. The readable format is the artifact with the longest half-life.

Read the scenario aloud in standup

The real acceptance test for "readable" is whether the scenario survives being spoken to another human without translation. Here is what reading the sample scenario aloud sounds like. There is no developer jargon. There is no framework. There is no selector.

standup — 9:03am

The trade-off, stated honestly

There is a cost to this format. An LLM runs in the loop at execution time, resolving intent to accessibility refs on every step. That is slower than a precompiled Playwright test with hardcoded getByRole calls. On a tight inner-loop CI suite with hundreds of tests, that matters.

The trade is that the scenario survives DOM drift. A traditional spec file that uses data-testid or CSS classes breaks the moment a designer restructures the page, even though the product intent is unchanged. A readable scenario breaks only when the intent itself moves. That is why it belongs in high-turnover areas: new features, AI-generated UI, anything a coding agent is rewriting multiple times a week.

The rule of thumb: readable #Case files for intent-sensitive flows, generated Playwright spec files for performance-critical smoke tests. You can run both with the same agent. Assrt does not ask you to pick one tool for everything.

The anchor fact

The entire scenario grammar is 0 characters of regex. Open src/core/agent.ts at line 621, confirm the regex, and you have read the whole DSL. Any other AI test tool requires a longer read.

/(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi

Try it on a real scenario in under a minute

npx @assrt-ai/assrt setup installs the MCP server, the CLI, and the QA reminder hook. Point it at a local URL, let assrt_plan draft a #Case, then open /tmp/assrt/scenario.md and read it out loud. That is the whole acceptance test for whether AI generated tests are actually readable.

Install Assrt

Want a readable-test audit of your repo?

Hop on a 20-minute call. We will walk through your existing E2E suite, pick three flows where intent-level scenarios belong, and leave you with the concrete #Case files to commit.

Book a call

Frequently asked questions

What makes an AI generated test 'readable' in practice?

Two things. One, the artifact a human reviews is not code. It is a sentence-per-step scenario, with PASS criteria written as observable user-visible truths (a heading is visible, an error message is shown, no password field is present). Two, the grammar is small enough to hold in your head. In Assrt's case the whole parser is a single regex at src/core/agent.ts:621, `/(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi`. There is no second file to learn.

Isn't well-formatted Playwright TypeScript already readable?

Only to engineers who know the framework. A PM reading `await expect(page.getByRole('heading', { name: 'Sign in' })).toBeVisible()` needs you to translate 'getByRole' and 'toBeVisible' before they can sign off. The same intent in the Assrt format is `The heading "Sign in" is visible`. Same meaning, no translation, no onboarding. That is the gap 'readable' is supposed to close.

Where does the scenario actually live, and can I edit it by hand?

Every running scenario is written to /tmp/assrt/scenario.md by the MCP server (src/core/scenario-files.ts, function writeScenarioFile). A file watcher keeps it in sync with central storage on a 1-second debounce. Open it in any editor, rewrite a step, save; the next run picks up the change. No dashboard round trip.

What are PASS criteria and why are they separate from steps?

Steps are what the agent does. PASS criteria are what must be true for the test to pass. Keeping them separate is the reason non-engineers can review a scenario at all. A step like 'click the Send sign-in link button' is an action; a PASS criterion like 'A description reading "Check your email for a sign-in link" is visible' is a product truth. A PM can tell you whether the product truth is right without reading a single line of the action block. The runtime at runScenario() treats PASS criteria as MANDATORY: every item must be verified or the scenario fails.

So the scenario is English. What actually runs in the browser?

An LLM agent (Claude Haiku 4.5 by default) reads each step, calls Playwright MCP tools (navigate, snapshot, click, type_text, press_key, wait_for_stable, etc.) against a real Chromium, and checks each PASS criterion against the current accessibility tree. There are no selectors in your scenario file. The agent resolves intent to refs at runtime. That is why a DOM refactor does not break the test unless the intent actually changed.

Why doesn't Assrt emit .spec.ts files like other AI test tools?

Because the selectors are the problem. A generated Playwright file with hardcoded locators is more familiar to read but inherits every selector-drift bug that hand-written E2E suites already have. Keeping the scenario as intent (natural English + PASS criteria) is what lets you hand the file to a coding agent six months later and have it still run. Selectors rot. Intent does not.

How does this compare to tools like Octomind, LambdaTest KaneAI, or Functionize?

Those tools store scenarios in a cloud database and expose them through a dashboard. Export to text is a compliance checkbox. In Assrt the markdown file is the primary artifact; cloud sync is the secondary one. Everything is open source (@assrt-ai/assrt on npm), the CLI self-hosts, and a scenario you wrote today will still open in a plain editor if the vendor vanishes tomorrow. Competitor plans run $7.5K/mo and up; Assrt is free to self-host.

Can I check the scenario into my repo?

Yes. Copy /tmp/assrt/scenario.md into tests/scenarios/ (or anywhere), commit it, diff it in PRs. Because the format is plain text, it merges like prose, blames like code, and greps with ripgrep. No proprietary YAML, no binary fixtures, no vendor export step.