Automation in QA, without CSS selectors

Every roundup post lists Selenium, Cypress, Playwright, Katalon. None explain how modern AI-driven automation actually works. This page walks through the real mechanic, straight from the Assrt source: an accessibility-tree snapshot with stable ref=e5 IDs, a #Case markdown plan on disk, and a 1-second fs.watch sync.

Read the source on GitHub Or try Assrt live →

Matthew Diakonov, Written with AI

Published April 20, 20269 min read

4.9from Playwright MCP + Claude Haiku 4.5 in production

Plans are plain .md files, diffable in git

Self-heal via fresh a11y snapshot on every tool error

MIT license, no cloud required, no vendor lock-in

Automation in QA, at the mechanic level

What the 'top 10 tools' posts never show you

Step 1: snapshot returns an accessibility tree

Step 2: elements have ref=e5 IDs, not CSS paths

Step 3: the plan is plain #Case markdown

Step 4: fs.watch syncs edits in 1 second

Step 5: on failure, a fresh snapshot replaces retries

0:00 / 0:05

The thing that separates AI QA from the old record-and-replay

A classic automation stack starts from a selector. You open DevTools, copy an XPath, paste it into a step. The step lives or dies on that string. Move the button into a new wrapper, the selector misses, the test fails, and you go hunt. Every team with more than 200 tests has the same dashboard of flaky runs.

An AI QA agent does not start from a selector. It asks the browser for a snapshot. Playwright MCP returns the accessibility tree with one stable reference per interactive element. The model decides which element it wants, passes that reference back, and the driver resolves it against the current DOM. The selector was never written down. It cannot decay.

What a real snapshot looks like

This is the shape the snapshot tool returns (tool definition at assrt/src/core/agent.ts:28). Every element carries a role, a name, and a ref. The model sees this tree, not HTML.

snapshot output

Same test, two models of 'automation'

// Traditional QA automation: selectors are yours to babysit.
await page.locator('header.navbar div.user-menu > button.primary').click();
await page.locator('form#signup input[name="email"]').fill('test@x.io');
await page.locator('form#signup button[type="submit"]').click();
await expect(page.locator('div.toast.toast-success >> nth=0'))
  .toContainText('Check your inbox');
// One DOM refactor and every selector above needs surgery.

-86% fewer selector strings

The test itself is a markdown file

No YAML, no proprietary DSL, no GUI scenario builder output pretending to be JSON. A test in Assrt is a plain text file. The MCP server writes it to /tmp/assrt/scenario.md at the start of every run, parses it with this regex:

agent.ts

and treats each matched section as one scenario. That is the entire contract. Here is a valid plan:

/tmp/assrt/scenario.md

The fs.watch loop nobody documents

Your coding agent opens /tmp/assrt/scenario.md, edits a step, hits save. Assrt has a fs.watch listener registered on that exact path (see assrt-mcp/src/core/scenario-files.ts:97-103). On change it debounces for 1000 ms, reads the new content, and calls updateScenario to push it to cloud storage. The next test run picks up the edit without a copy-paste. This is how a plan written in your editor stays in sync with the saved scenario UUID.

assrt-mcp/src/core/scenario-files.ts:97-103

activeWatcher = watch(SCENARIO_FILE, { persistent: false }, (_event) => {
  if (syncDebounceTimer) clearTimeout(syncDebounceTimer);
  syncDebounceTimer = setTimeout(() => {
    syncToFirestore(scenarioId);
  }, 1000);
});

What happens when you run assrt_test

One MCP call from your coding agent kicks off this sequence. No CI job, no selector map, no page object file.

Inputs, agent, browser, outputs

One scenario, step by step

Numbers from the source

These are not marketing stats. They are counts of things in the repo.

0Semantic tools the agent can call

0Line of the #Case regex in agent.ts

0Second debounce on scenario sync

0Retries on API rate-limit errors

Sources: TOOLS array in assrt/src/core/agent.ts:16-196, parseScenarios at agent.ts:620-631, watcher at assrt-mcp/src/core/scenario-files.ts:97-103, retry loop at agent.ts:696-745.

The self-healing move people keep describing wrong

"Self-healing" in AI QA marketing usually means "we trained a model to guess a new selector when the old one breaks." That is not what Assrt does. The mechanism is simpler and smaller. On any tool failure the catch block at assrt/src/core/agent.ts:1012-1020 grabs a fresh snapshot, prepends it to the tool result, and appends the literal string "Please call snapshot and try a different approach." The model gets the new page state inline and decides on its own whether to retry, scroll, or abort. There is no brittle mapping table. The recovery logic is: re-read the room.

what a recovery turn looks like in the event log

100% yours

“Tests are yours to keep: they live in /tmp/assrt/scenario.md, sync to your cloud workspace, and you can commit the same markdown into git.”

assrt-mcp readme

0 runtime dependencies on a vendor cloud. Literally zero.

What the agent actually exposes

Seventeen tool definitions, each semantic, each matching one thing a human would do. No CSS, no XPath, no private selectors.

snapshot

Returns the Playwright MCP accessibility tree. Every interactive element gets a stable ref like e5. Called before every interaction.

click / type_text

Take an element description plus a ref. The ref wins when present. If it is stale, the model is told to snapshot again and retry.

assert

The only way a scenario produces a pass or fail. Carries description, passed boolean, and an evidence string.

create_temp_email + wait_for_verification_code

Disposable inbox so auth flows work without your test touching the user's mailbox. OTP codes are pasted in one synthetic ClipboardEvent.

http_request

Hit any URL to verify external side effects: Telegram bot polls, Slack messages, webhook receivers, backend APIs.

wait_for_stable

Injects a MutationObserver, waits until the DOM stops changing. Replaces arbitrary sleep() calls on async pages.

complete_scenario

Required terminal call. Closes the scenario with a summary and the overall pass flag. Without this, the run never ends.

Where this fits in the wider QA automation landscape

Feature	Traditional platform	Assrt
Test format	Proprietary YAML / recorded JSON	#Case markdown on disk
Element targeting	CSS / XPath selectors you author	Accessibility refs the browser provides
Self-healing	Selector guessing model	Fresh a11y snapshot returned with every tool error
Where tests live	Vendor cloud UI	/tmp/assrt/scenario.md, optionally in git
Licensing	Commercial, often $5K-$7.5K/mo enterprise tiers	MIT, run locally, no cloud required
LLM choice	Locked to vendor's model	Claude Haiku 4.5 default, Gemini supported, swap via flag
Output	Dashboard screenshots	WebM video + JSON report + screenshot folder

The setup, end to end

Install the MCP server

npx assrt-mcp registers the server globally with your coding agent. One shell line, no config.

Call assrt_test

From Claude Code, Cursor, or any MCP-speaking client, pass a URL and a plain-text plan. The server writes /tmp/assrt/scenario.md.

The agent snapshots the page

Playwright MCP is launched, the target URL loads, and the accessibility tree is returned with a ref per element.

The LLM picks tools step by step

Each turn: read snapshot, choose click/type/assert, execute via Playwright MCP, attach the updated screenshot as an image, loop.

Failures trigger a snapshot, not a retry table

On any error the agent gets a fresh accessibility tree inline and decides what to do next. The recovery is model reasoning, not a hardcoded script.

Results land on disk

JSON report in /tmp/assrt/results/latest.json, WebM recording in the run directory, screenshots by step. The MCP response returns the file paths.

Edit the markdown, the watcher syncs

Open /tmp/assrt/scenario.md in any editor. On save, fs.watch fires, debounces 1 s, and updates your cloud scenario UUID.

What you get that a tool-list post will not tell you

Zero CSS or XPath selectors in the plan, ever
Tests diff cleanly in git, because they are plain text
Self-heal path is one snapshot + one retry decision by the model
Disposable email + http_request cover auth and webhook flows
A WebM recording is saved for every run, with a built-in 5x player
Swap the LLM provider per run (Anthropic or Gemini)
MIT licensed end to end; no cloud dependency required

A 90-second smoke test you can run right now

from your coding agent

Want to see your own app run through this loop?

15 minutes. Bring a staging URL, we will run a plan against it live and show you the markdown, the snapshot, and the WebM recording on the other side.

Questions people actually ask about automation in QA

What does 'automation in QA' really mean when an AI agent is driving the browser?

It means three concrete things, all verifiable in the Assrt source. One: the agent does not own selectors. It calls a snapshot tool that returns the Playwright MCP accessibility tree, each element tagged with a stable ref like ref=e5 (assrt/src/core/agent.ts:28 and :213-218). Two: the test plan is a plain text file in #Case N: format parsed by the regex /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi at agent.ts:621, not a YAML step list. Three: every scenario and result is kept on disk at /tmp/assrt/scenario.md and /tmp/assrt/results/latest.json, watched with fs.watch, and an edit triggers a 1-second debounced sync back to Firestore (assrt-mcp/src/core/scenario-files.ts:97-103). That combination, accessibility tree plus markdown plan plus file watch, is what replaces traditional record-and-replay.

How is this different from Selenium, Cypress, or vanilla Playwright automation?

Selenium and Cypress bind each test step to a selector you author, usually a CSS path or XPath. When a div wraps the button, the selector breaks. Assrt never writes a selector in the plan. The model sees the accessibility snapshot, picks the element it wants, and passes the ref string back. The browser driver resolves that ref against the live tree. If the DOM reshuffles but the role, name, and value are still there, the same plan keeps working. The plan itself stays human-readable, more like a spec than a script.

Show me a real Assrt #Case file and what happens when it runs.

A file literally looks like '#Case 1: Search flow\n- Type coffee into the search box\n- Press Enter\n- Verify at least one result appears'. The MCP server writes it to /tmp/assrt/scenario.md via writeScenarioFile in assrt-mcp/src/core/scenario-files.ts:42. The agent loop in assrt/src/core/agent.ts then snapshots, matches each step to a tool call (click, type_text, assert, complete_scenario), and at the end writes the result JSON to /tmp/assrt/results/latest.json. Edit the markdown, save, and the watcher pushes the new plan back to the cloud within a second (scenario-files.ts:97-103 debounce timer).

What is the self-healing mechanism, exactly?

On any tool failure the catch block in agent.ts:1012-1020 immediately calls browser.snapshot() again and attaches the fresh accessibility tree to the tool result returned to the model. The error message ends with 'Please call snapshot and try a different approach.' That is the entire contract. The LLM gets new context that encodes the current page state, then decides whether to retry with a different ref, scroll, or give up and call complete_scenario. There is no brittle retry table. The browser recovery path is just 'snapshot, summarize, keep reasoning'.

Which model runs the loop, and can I swap it?

Default is claude-haiku-4-5-20251001, defined as DEFAULT_ANTHROPIC_MODEL at assrt/src/core/agent.ts:9. Gemini is supported via DEFAULT_GEMINI_MODEL = 'gemini-3.1-pro-preview' at :10, and the provider is chosen per-run via a constructor flag. You can pass a model override on the MCP call (see the 'model' parameter on assrt_test in assrt-mcp/src/mcp/server.ts:351). The conversation uses a sliding window so long scenarios do not blow the context (agent.ts:1064-1080).

Where is the part that actually sends keystrokes to the browser?

The TOOLS array at agent.ts:16-196 defines semantic actions (navigate, click, type_text, select_option, scroll, press_key, evaluate, assert, complete_scenario). Each tool call routes through the browser manager in assrt-mcp/src/core/browser.ts, which speaks to the Playwright MCP server running locally. Clicks use a human-readable element description plus the ref from the last snapshot, so the same tool description works whether the target is picked by accessibility name or by ref ID.

How do external effects like OTP codes or webhooks get verified?

Two tools: create_temp_email spins up a disposable inbox via temp-mail.io, and http_request lets the agent hit any URL to poll a Telegram bot, Slack, a webhook receiver, or a backend API (agent.ts:115-131 and :172-184). A concrete example: after submitting a signup form, the agent calls wait_for_verification_code to poll the disposable inbox, extracts the code, and for a split-input OTP widget it pastes the code in one shot with a synthetic ClipboardEvent (agent.ts:228-236). QA automation on real auth flows works without your test touching the user's email at all.

Can I version scenarios in git, or do I have to use Firestore?

The file at /tmp/assrt/scenario.md is the source of truth at runtime, and the watcher pushes edits to Firestore for cross-device sync. But nothing stops you from committing the same .md file into your repo. In CI you would pass the markdown as the 'plan' argument to assrt_test (or pipe it to the CLI), skip the watcher, and log only the structured result. Because the plan is plain text, git diffs are readable and PR reviews actually work.

Is the whole stack open source, or is there a cloud gate?

The test agent (assrt) and the MCP server (assrt-mcp) are MIT licensed. The cloud scenario store is optional. You can run everything locally, skip saveScenario by setting ASSRT_NO_SAVE=1, and never make a network call beyond Playwright driving the browser and the LLM API. Competitors that charge enterprise-scale subscriptions (often cited in the $5K-to-$7.5K/mo range) bind you to their cloud. Assrt does not.

What is the smallest viable setup to try this?

One shell line: npx assrt-mcp connects the MCP server to your coding agent. Then from inside Claude Code or Cursor call the assrt_test MCP tool with a URL and a plan string. No config file, no selectors, no CI template. The first run writes ~/.assrt/browser-profile, the plan to /tmp/assrt/scenario.md, the result to /tmp/assrt/results/latest.json, and a webm video recording of the session you can replay at 5x in the auto-opened player.

Keep reading

Deep dive

AI QA Testing: The OTP Problem Every Other Agent Quietly Fails

The exact DataTransfer paste expression Assrt bakes into its system prompt to beat split-input verification codes.

Read

Architecture

AI-Powered Agentic Test Execution With Automation

How the agent loop decides what tool to call next, with real transcript snippets.

Read

Workflow

Visual Regression Testing With Built-In Automation

The 8-line PostToolUse hook that makes Claude Code run assrt_test on every git commit.

Read

The thing that separates AI QA from the old record-and-replay

What a real snapshot looks like

The test itself is a markdown file

The fs.watch loop nobody documents

What happens when you run assrt_test

Inputs, agent, browser, outputs

Numbers from the source

The self-healing move people keep describing wrong

What the agent actually exposes

snapshot

click / type_text

assert

create_temp_email + wait_for_verification_code

http_request

wait_for_stable

complete_scenario

Where this fits in the wider QA automation landscape

The setup, end to end

Install the MCP server

Call assrt_test

The agent snapshots the page

The LLM picks tools step by step

Failures trigger a snapshot, not a retry table

Results land on disk

Edit the markdown, the watcher syncs

A 90-second smoke test you can run right now

Want to see your own app run through this loop?

Questions people actually ask about automation in QA

Keep reading

AI QA Testing: The OTP Problem Every Other Agent Quietly Fails

AI-Powered Agentic Test Execution With Automation

Visual Regression Testing With Built-In Automation

Comments (••)

Comments ()