Automation in QA, without CSS selectors

Every roundup post lists Selenium, Cypress, Playwright, Katalon. None explain how modern AI-driven automation actually works. This page walks through the real mechanic, straight from the Assrt source: an accessibility-tree snapshot with stable ref=e5 IDs, a #Case markdown plan on disk, and a 1-second fs.watch sync.

M
Matthew Diakonov
9 min read
4.9from Playwright MCP + Claude Haiku 4.5 in production
Plans are plain .md files, diffable in git
Self-heal via fresh a11y snapshot on every tool error
MIT license, no cloud required, no vendor lock-in

The thing that separates AI QA from the old record-and-replay

A classic automation stack starts from a selector. You open DevTools, copy an XPath, paste it into a step. The step lives or dies on that string. Move the button into a new wrapper, the selector misses, the test fails, and you go hunt. Every team with more than 200 tests has the same dashboard of flaky runs.

An AI QA agent does not start from a selector. It asks the browser for a snapshot. Playwright MCP returns the accessibility tree with one stable reference per interactive element. The model decides which element it wants, passes that reference back, and the driver resolves it against the current DOM. The selector was never written down. It cannot decay.

What a real snapshot looks like

This is the shape the snapshot tool returns (tool definition at assrt/src/core/agent.ts:28). Every element carries a role, a name, and a ref. The model sees this tree, not HTML.

snapshot output

Same test, two models of 'automation'

// Traditional QA automation: selectors are yours to babysit.
await page.locator('header.navbar div.user-menu > button.primary').click();
await page.locator('form#signup input[name="email"]').fill('test@x.io');
await page.locator('form#signup button[type="submit"]').click();
await expect(page.locator('div.toast.toast-success >> nth=0'))
  .toContainText('Check your inbox');
// One DOM refactor and every selector above needs surgery.
-86% fewer selector strings

The test itself is a markdown file

No YAML, no proprietary DSL, no GUI scenario builder output pretending to be JSON. A test in Assrt is a plain text file. The MCP server writes it to /tmp/assrt/scenario.md at the start of every run, parses it with this regex:

agent.ts

and treats each matched section as one scenario. That is the entire contract. Here is a valid plan:

/tmp/assrt/scenario.md

The fs.watch loop nobody documents

Your coding agent opens /tmp/assrt/scenario.md, edits a step, hits save. Assrt has a fs.watch listener registered on that exact path (see assrt-mcp/src/core/scenario-files.ts:97-103). On change it debounces for 1000 ms, reads the new content, and calls updateScenario to push it to cloud storage. The next test run picks up the edit without a copy-paste. This is how a plan written in your editor stays in sync with the saved scenario UUID.

assrt-mcp/src/core/scenario-files.ts:97-103
activeWatcher = watch(SCENARIO_FILE, { persistent: false }, (_event) => {
  if (syncDebounceTimer) clearTimeout(syncDebounceTimer);
  syncDebounceTimer = setTimeout(() => {
    syncToFirestore(scenarioId);
  }, 1000);
});

What happens when you run assrt_test

One MCP call from your coding agent kicks off this sequence. No CI job, no selector map, no page object file.

Inputs, agent, browser, outputs

Your plan
Target URL
Pass criteria
Assrt agent
Playwright MCP
Disposable email
HTTP calls
Results + video

One scenario, step by step

Your agentAssrt MCPDisk (/tmp/assrt)Playwright MCPLLM loopassrt_test({ url, plan })write /tmp/assrt/scenario.mdfs.watch: editedsnapshot()accessibility tree + refstool: click(ref=e5)Playwright clickupdated DOMstep statuswrite results/latest.json

Numbers from the source

These are not marketing stats. They are counts of things in the repo.

0Semantic tools the agent can call
0Line of the #Case regex in agent.ts
0Second debounce on scenario sync
0Retries on API rate-limit errors

Sources: TOOLS array in assrt/src/core/agent.ts:16-196, parseScenarios at agent.ts:620-631, watcher at assrt-mcp/src/core/scenario-files.ts:97-103, retry loop at agent.ts:696-745.

The self-healing move people keep describing wrong

"Self-healing" in AI QA marketing usually means "we trained a model to guess a new selector when the old one breaks." That is not what Assrt does. The mechanism is simpler and smaller. On any tool failure the catch block at assrt/src/core/agent.ts:1012-1020 grabs a fresh snapshot, prepends it to the tool result, and appends the literal string "Please call snapshot and try a different approach." The model gets the new page state inline and decides on its own whether to retry, scroll, or abort. There is no brittle mapping table. The recovery logic is: re-read the room.

what a recovery turn looks like in the event log
100% yours

Tests are yours to keep: they live in /tmp/assrt/scenario.md, sync to your cloud workspace, and you can commit the same markdown into git.

assrt-mcp readme

0 runtime dependencies on a vendor cloud. Literally zero.

What the agent actually exposes

Seventeen tool definitions, each semantic, each matching one thing a human would do. No CSS, no XPath, no private selectors.

snapshot

Returns the Playwright MCP accessibility tree. Every interactive element gets a stable ref like e5. Called before every interaction.

click / type_text

Take an element description plus a ref. The ref wins when present. If it is stale, the model is told to snapshot again and retry.

assert

The only way a scenario produces a pass or fail. Carries description, passed boolean, and an evidence string.

create_temp_email + wait_for_verification_code

Disposable inbox so auth flows work without your test touching the user's mailbox. OTP codes are pasted in one synthetic ClipboardEvent.

http_request

Hit any URL to verify external side effects: Telegram bot polls, Slack messages, webhook receivers, backend APIs.

wait_for_stable

Injects a MutationObserver, waits until the DOM stops changing. Replaces arbitrary sleep() calls on async pages.

complete_scenario

Required terminal call. Closes the scenario with a summary and the overall pass flag. Without this, the run never ends.

Where this fits in the wider QA automation landscape

FeatureTraditional platformAssrt
Test formatProprietary YAML / recorded JSON#Case markdown on disk
Element targetingCSS / XPath selectors you authorAccessibility refs the browser provides
Self-healingSelector guessing modelFresh a11y snapshot returned with every tool error
Where tests liveVendor cloud UI/tmp/assrt/scenario.md, optionally in git
LicensingCommercial, often $5K-$7.5K/mo enterprise tiersMIT, run locally, no cloud required
LLM choiceLocked to vendor's modelClaude Haiku 4.5 default, Gemini supported, swap via flag
OutputDashboard screenshotsWebM video + JSON report + screenshot folder

The setup, end to end

1

Install the MCP server

npx assrt-mcp registers the server globally with your coding agent. One shell line, no config.

2

Call assrt_test

From Claude Code, Cursor, or any MCP-speaking client, pass a URL and a plain-text plan. The server writes /tmp/assrt/scenario.md.

3

The agent snapshots the page

Playwright MCP is launched, the target URL loads, and the accessibility tree is returned with a ref per element.

4

The LLM picks tools step by step

Each turn: read snapshot, choose click/type/assert, execute via Playwright MCP, attach the updated screenshot as an image, loop.

5

Failures trigger a snapshot, not a retry table

On any error the agent gets a fresh accessibility tree inline and decides what to do next. The recovery is model reasoning, not a hardcoded script.

6

Results land on disk

JSON report in /tmp/assrt/results/latest.json, WebM recording in the run directory, screenshots by step. The MCP response returns the file paths.

7

Edit the markdown, the watcher syncs

Open /tmp/assrt/scenario.md in any editor. On save, fs.watch fires, debounces 1 s, and updates your cloud scenario UUID.

What you get that a tool-list post will not tell you

  • Zero CSS or XPath selectors in the plan, ever
  • Tests diff cleanly in git, because they are plain text
  • Self-heal path is one snapshot + one retry decision by the model
  • Disposable email + http_request cover auth and webhook flows
  • A WebM recording is saved for every run, with a built-in 5x player
  • Swap the LLM provider per run (Anthropic or Gemini)
  • MIT licensed end to end; no cloud dependency required

A 90-second smoke test you can run right now

from your coding agent

Want to see your own app run through this loop?

15 minutes. Bring a staging URL, we will run a plan against it live and show you the markdown, the snapshot, and the WebM recording on the other side.

Book a call

Questions people actually ask about automation in QA

What does 'automation in QA' really mean when an AI agent is driving the browser?

It means three concrete things, all verifiable in the Assrt source. One: the agent does not own selectors. It calls a snapshot tool that returns the Playwright MCP accessibility tree, each element tagged with a stable ref like ref=e5 (assrt/src/core/agent.ts:28 and :213-218). Two: the test plan is a plain text file in #Case N: format parsed by the regex /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi at agent.ts:621, not a YAML step list. Three: every scenario and result is kept on disk at /tmp/assrt/scenario.md and /tmp/assrt/results/latest.json, watched with fs.watch, and an edit triggers a 1-second debounced sync back to Firestore (assrt-mcp/src/core/scenario-files.ts:97-103). That combination, accessibility tree plus markdown plan plus file watch, is what replaces traditional record-and-replay.

How is this different from Selenium, Cypress, or vanilla Playwright automation?

Selenium and Cypress bind each test step to a selector you author, usually a CSS path or XPath. When a div wraps the button, the selector breaks. Assrt never writes a selector in the plan. The model sees the accessibility snapshot, picks the element it wants, and passes the ref string back. The browser driver resolves that ref against the live tree. If the DOM reshuffles but the role, name, and value are still there, the same plan keeps working. The plan itself stays human-readable, more like a spec than a script.

Show me a real Assrt #Case file and what happens when it runs.

A file literally looks like '#Case 1: Search flow\n- Type coffee into the search box\n- Press Enter\n- Verify at least one result appears'. The MCP server writes it to /tmp/assrt/scenario.md via writeScenarioFile in assrt-mcp/src/core/scenario-files.ts:42. The agent loop in assrt/src/core/agent.ts then snapshots, matches each step to a tool call (click, type_text, assert, complete_scenario), and at the end writes the result JSON to /tmp/assrt/results/latest.json. Edit the markdown, save, and the watcher pushes the new plan back to the cloud within a second (scenario-files.ts:97-103 debounce timer).

What is the self-healing mechanism, exactly?

On any tool failure the catch block in agent.ts:1012-1020 immediately calls browser.snapshot() again and attaches the fresh accessibility tree to the tool result returned to the model. The error message ends with 'Please call snapshot and try a different approach.' That is the entire contract. The LLM gets new context that encodes the current page state, then decides whether to retry with a different ref, scroll, or give up and call complete_scenario. There is no brittle retry table. The browser recovery path is just 'snapshot, summarize, keep reasoning'.

Which model runs the loop, and can I swap it?

Default is claude-haiku-4-5-20251001, defined as DEFAULT_ANTHROPIC_MODEL at assrt/src/core/agent.ts:9. Gemini is supported via DEFAULT_GEMINI_MODEL = 'gemini-3.1-pro-preview' at :10, and the provider is chosen per-run via a constructor flag. You can pass a model override on the MCP call (see the 'model' parameter on assrt_test in assrt-mcp/src/mcp/server.ts:351). The conversation uses a sliding window so long scenarios do not blow the context (agent.ts:1064-1080).

Where is the part that actually sends keystrokes to the browser?

The TOOLS array at agent.ts:16-196 defines semantic actions (navigate, click, type_text, select_option, scroll, press_key, evaluate, assert, complete_scenario). Each tool call routes through the browser manager in assrt-mcp/src/core/browser.ts, which speaks to the Playwright MCP server running locally. Clicks use a human-readable element description plus the ref from the last snapshot, so the same tool description works whether the target is picked by accessibility name or by ref ID.

How do external effects like OTP codes or webhooks get verified?

Two tools: create_temp_email spins up a disposable inbox via temp-mail.io, and http_request lets the agent hit any URL to poll a Telegram bot, Slack, a webhook receiver, or a backend API (agent.ts:115-131 and :172-184). A concrete example: after submitting a signup form, the agent calls wait_for_verification_code to poll the disposable inbox, extracts the code, and for a split-input OTP widget it pastes the code in one shot with a synthetic ClipboardEvent (agent.ts:228-236). QA automation on real auth flows works without your test touching the user's email at all.

Can I version scenarios in git, or do I have to use Firestore?

The file at /tmp/assrt/scenario.md is the source of truth at runtime, and the watcher pushes edits to Firestore for cross-device sync. But nothing stops you from committing the same .md file into your repo. In CI you would pass the markdown as the 'plan' argument to assrt_test (or pipe it to the CLI), skip the watcher, and log only the structured result. Because the plan is plain text, git diffs are readable and PR reviews actually work.

Is the whole stack open source, or is there a cloud gate?

The test agent (assrt) and the MCP server (assrt-mcp) are MIT licensed. The cloud scenario store is optional. You can run everything locally, skip saveScenario by setting ASSRT_NO_SAVE=1, and never make a network call beyond Playwright driving the browser and the LLM API. Competitors that charge enterprise-scale subscriptions (often cited in the $5K-to-$7.5K/mo range) bind you to their cloud. Assrt does not.

What is the smallest viable setup to try this?

One shell line: npx assrt-mcp connects the MCP server to your coding agent. Then from inside Claude Code or Cursor call the assrt_test MCP tool with a URL and a plan string. No config file, no selectors, no CI template. The first run writes ~/.assrt/browser-profile, the plan to /tmp/assrt/scenario.md, the result to /tmp/assrt/results/latest.json, and a webm video recording of the session you can replay at 5x in the auto-opened player.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.