QA automation engineering: the defensive code, not the test

Most flakiness is not in your assertions. It is in what the test assumes: that the server is up, that the browser will return, that the DOM is stable, that one OTP field equals one input, that the tab survived the last scenario. This page walks through five defensive patterns from the Assrt source with exact line references, and shows what happens when a real QA automation engineer adds each one to the loop.

Read agent.ts on GitHub Or try Assrt live →

Matthew Diakonov, Written with AI

Published April 20, 202611 min read

4.9from five patterns, five files, all MIT-licensed

Preflight HEAD probe at agent.ts:518 (8s abort)

Promise.race navigate at agent.ts:441 (30s cap)

MutationObserver stability at agent.ts:962 (no sleeps)

Synthetic ClipboardEvent OTP paste at agent.ts:234

Shared browser session at agent.ts:489 (keep alive)

Five defensive patterns, one agent loop

What separates QA automation engineering from a Playwright script

Pattern 1: 8s HEAD probe before Chrome even boots

Pattern 2: Promise.race caps navigate at 30s

Pattern 3: MutationObserver replaces sleep(2000)

Pattern 4: Synthetic ClipboardEvent for split-field OTPs

Pattern 5: Browser stays alive across scenarios

0:00 / 0:05

The actual job

Every career guide for this keyword says the same things. Write tests, plug into CI/CD, learn Python or Java, adopt AI-assisted self-healing. None of that is wrong. It is just the outer two percent of the work.

The inner ninety-eight percent is defensive code that handles a specific enumerated list of failure modes. The test code is the easy part. The interesting engineering lives in the surface between the test runner and the real world: DNS, TCP, DOM mutations, browser lifecycle, auth state, OTP widgets. If your automation cannot distinguish a wedged dev server from a slow one, you do not have QA automation. You have a flaky CI job.

What follows is five defensive patterns. Each one fixes a specific failure mode. Each one lives at a line in assrt-mcp/src/core/agent.ts you can open and read.

~180s → 8s

“A wedged Next.js dev server used to cascade into 'MCP client not connected' after three minutes. The preflight probe catches it in eight seconds with a typed error message.”

agent.ts:518-543

Pattern 1 · Preflight HEAD probe, 8 second abort

Before we spend 4 seconds booting Chromium and another second wiring the Playwright MCP stdio bridge, we spend up to 8 seconds checking that the target server is actually there. A HEAD request is cheap. If the server returns 405 or 501 (some servers refuse HEAD), we fall back to GET. Any HTTP response counts as reachable, even 404 or 500. The only failure modes are DNS, connection refused, or abort after 8 seconds.

assrt-mcp/src/core/agent.ts

Wedged dev server, caught in 8 seconds

The win here is not the probe itself. It is that a class of opaque late-stage errors ("MCP client not connected") is replaced with an early, typed, actionable one ("Target URL did not respond within 8000ms"). An engineer debugging a CI failure now has enough information in the first log line to fix it.

Pattern 2 · Promise.race navigate, 30 second cap

Playwright has its own page-level timeouts, but they rely on the driver being responsive. When the browser process itself is stuck, a high-level wrapper never gets its callback. The defensive pattern: race every navigate against a hand-rolled timer and log a structured event on failure. If the real nav never comes back, the timer wins in bounded time and surfaces a clean error.

Navigate: naive vs engineered

// The naive version: assume the browser will come back.
await browser.navigate(url);
// If navigate hangs, the whole process hangs.
// Eventually the Playwright MCP stdio pipe gets killed and the error is
// "MCP client not connected" with no hint of which URL stalled.
await runScenarios();

-183% added for bounded failure

Pattern 3 · MutationObserver stability, not sleep()

Every flaky test suite in the world has a await sleep(2000) somewhere. It is there because an engineer got tired of asking "is the page done yet" and picked a number that worked once. That number is now too short on a slow CI runner and too long on a fast laptop. The defensive replacement is a MutationObserver injected into the page that counts DOM mutations, polled every 500ms from the Node side, with a configurable quiet window (default 2 seconds, capped at 10).

assrt-mcp/src/core/agent.ts

Three details matter. The observer is attached to document.body with subtree: true, so every descendant mutation counts. The polling approach is deliberate, not a workaround: the observer lives in the page context while the agent runs in Node, and evaluate() is the cheap bridge. The cleanup block runs unconditionally, so even if the observer fires an exception we do not leak an observer into a tab that will get reused by the next scenario.

The complete hardened run, end to end

Pattern 4 · Synthetic ClipboardEvent for OTP widgets

Login flows that use a six-character OTP rendered as six single-character inputs defeat every naive automation approach. Setting each input's value programmatically does not trigger React state updates. Typing into each one with the keyboard focuses the next field mid-stroke and drops characters. The defensive pattern in Assrt is a single synthetic ClipboardEvent that the widget's own paste handler distributes across the six inputs.

agent.ts system prompt, lines 234-236

The model is instructed to call evaluate() with this exact expression, substituting only the code digits. It is literally in the system prompt at agent.ts:234-236. This is how QA automation engineering handles the class of "widgets the web platform did not design for automation" without shipping a widget-specific shim.

Pattern 5 · Keep the browser alive across scenarios

The run loop iterates through all scenarios inside a try / finally. Most automation frameworks close the browser in the finally. Assrt explicitly does not. The inline comment in the code says so verbatim: Don't close the browser here — keep it alive so the user can take over and interact after the test finishes.

assrt-mcp/src/core/agent.ts

That single choice unlocks five properties that carry across every scenario in the run. Auth state, disposable email context, wizard progress, browser devtools history, and most importantly: the ability for a human to take over the live tab. When a scenario fails, the tab is still there. You attach with noVNC, click around, figure out the fix, write it into the plan, and rerun.

Cookies and session tokens

Log in during Case 1, your Supabase or NextAuth cookie is still set when Case 2 starts. You are not re-authenticating on every scenario.

The disposable email address

tempEmail is assigned to the agent instance, not the scenario. Case 1 creates it; Case 5 can still poll it for a reset-password link.

Draft state, wizard progress, cart

Anything the app keeps in IndexedDB, localStorage, or server-side session cookies rides along between #Case blocks for free.

Your VNC takeover slot

When a test fails mid-flow, the same Chromium stays open so you can attach with noVNC, click around, and feed a fix back into the plan.

Console and network breadcrumbs

The tab's devtools timeline survives across cases, so post-mortem analysis has continuous console messages and XHR traffic.

Pattern bonus · Snapshot on every tool failure

A sixth pattern worth calling out, because it is the one that replaces brittle retry logic with something that actually transfers to new codebases. Every tool dispatch is wrapped in a try / catch. On failure the catch block immediately calls browser.snapshot() for a fresh accessibility tree and attaches it to the error message returned to the LLM. The next turn of the conversation has current context, not stale plan text.

Call snapshot() immediately on failure

agent.ts:1012-1020. The catch block around every tool dispatch first fires browser.snapshot() to get a fresh accessibility tree, then composes an error string that ends with 'Please call snapshot and try a different approach.'

Append the tree to the tool result the model sees

The updated a11y state is concatenated to the error message and returned as the tool-call result. The next model turn has live context, not a stale plan.

No retry tables, no heuristics, no fixed backoff

The recovery path is just 'snapshot, summarize, let the LLM decide.' Fewer code paths, fewer edge cases, more transfer to new sites.

Scenario crashes do not poison the run

agent.ts:478-487. If a whole scenario throws, the run loop catches it, emits scenario_complete with passed: false, and moves on. One busted case does not cancel the other nine.

Inputs, defensive layer, outputs

The numbers, in one table

Four constants control the five patterns. They are all small, bounded, and in source. Anyone auditing the QA runtime can read them in under a minute.

0spreflight HEAD budget

0snavigate Promise.race cap

0sdefault DOM quiet window

0smax DOM quiet window

Cases in one run

With one shared browser, one auth session, and one disposable email, all surviving between #Case blocks.

Vendor lock-in

Plans are plain markdown, the agent is MIT-licensed, the runtime is Playwright. You own all of it.

Proprietary YAML

Every tool call maps to real Playwright MCP. Nothing is translated into a vendor DSL you cannot take with you.

What changes when the defensive layer is first-class

Against a naive Playwright wrapper and against the $5K to $7.5K per month enterprise QA platforms.

Feature	Typical approach	Assrt (MIT, self-hosted)
Detects a wedged dev server	Fails after ~3 min with opaque 'MCP not connected'	8s HEAD probe with GET fallback at agent.ts:518
Caps navigation time	Inherits driver default, often unbounded	Promise.race with NAV_TIMEOUT_MS = 30_000 at agent.ts:441
Waits for async content	sleep(2000) or selector-based timeout	MutationObserver with debounced quiet window at agent.ts:962
Handles split-field OTPs	Per-input type(), often drops digits	Synthetic ClipboardEvent + DataTransfer at agent.ts:234
Browser lifecycle across cases	Fresh browser per test (slow, no auth carryover)	One session; finally block skips close at agent.ts:489
Failure recovery strategy	Retry tables and custom waits per selector	Fresh a11y snapshot injected into the next model turn
Test artifact format	Proprietary YAML, tied to vendor runtime	Plain #Case markdown at /tmp/assrt/scenario.md
License and hosting	$5K to $7.5K per month, cloud-gated	MIT, self-hosted, zero vendor lock-in

Prices reference publicly-listed enterprise tiers; self-serve plans vary.

Talk through the defensive layer for your stack

Bring a real repo and we will walk through what the five patterns catch in your app, live.

Frequently asked questions

What does QA automation engineering really mean, past the job-description version?

It means owning the code around the test, not just the test itself. An assertion like 'the login form submits' is trivial. What is not trivial is handling the ten failure modes that happen before the assertion: the dev server is wedged, the page is mid-render, the OTP widget is a row of single-character inputs, the browser died on the previous scenario, the nav call hung with no timeout. Assrt's agent.ts carries five specific defensive patterns for exactly those cases, and you can read them at assrt-mcp/src/core/agent.ts lines 441, 518, 956, 234, and 489. That defensive layer is where QA automation engineering actually lives.

Why bother with an 8-second HTTP HEAD probe before Chrome launches?

Because a wedged Next.js dev server, a half-booted Docker container, or a dropped VPN all present the same way: the TCP handshake completes but the HTTP response never does. Without a preflight, browser.navigate(url) hangs until the Playwright MCP stdio connection gets killed by the kernel, and you see 'MCP client not connected' after roughly three minutes with no hint of the root cause. The preflightUrl function at agent.ts:518 fires a HEAD, falls back to GET on 405 or 501, accepts any HTTP status as 'reachable,' and throws a typed error on abort or connection-refused. You get the real failure in 8 seconds instead of 3 minutes.

Why Promise.race on the navigate call when Playwright already has timeouts?

Because Playwright's default timeout lives inside the page context, and when the browser driver itself is stuck, your wrapper never hears back. The code at agent.ts:441-454 wraps browser.navigate(url) in Promise.race with a hard 30-second ceiling and emits a structured agent.navigate.fail log event with durationMs. That guarantees a bounded failure no matter what layer breaks, which is the one property a CI step actually needs.

How does the DOM stability wait avoid the usual 'sleep 2 seconds' hack?

It injects a MutationObserver into the target page, counts mutations over time, and only returns when the count plateaus for the configured quiet window (default 2 seconds, capped at 10). See agent.ts:962-988. Implementation detail worth noting: it polls the counter every 500ms rather than subscribing, because the observer lives in the page context while the agent lives in Node, and the cheapest bridge is evaluate('window.__assrt_mutations'). The observer is torn down in a finally block so a single run cannot leak an observer into a long-lived tab.

What is the synthetic ClipboardEvent for OTPs about?

Split-field OTP widgets (six single-character inputs) do not accept a plain value assignment, and typing into them one by one triggers focus-change logic that skips characters or submits early. The system prompt at agent.ts:234-236 gives the model an exact evaluate expression that builds a DataTransfer with the code and dispatches a synthetic paste ClipboardEvent against the parent of the first input[maxlength='1']. The widget's own paste handler then distributes the digits. That one instruction turns a test case that used to be impossible without a dedicated library into a routine #Case step.

Why keep the browser open after a test run instead of closing it?

Two reasons, both load-bearing. One: subsequent scenarios in the same assrt_test call inherit cookies and auth state, so you log in once and the next ten cases run authenticated. Two: when a test fails, an engineer can take over the live tab, poke the app, and copy-paste the fix into the plan. The finally block in the run() loop at agent.ts:489-492 explicitly does not call browser.close() and has the inline comment 'Don't close the browser here — keep it alive so the user can take over.' Close happens only when the MCP call ends or the next assrt_test starts.

Is this open source or is there a pricing gate?

Both Assrt (the agent) and assrt-mcp (the MCP server) are open source under MIT. The plans you write are plain #Case markdown; the Playwright code the agent runs is real Playwright, not a proprietary YAML or custom runtime. You can self-host the whole thing, skip the cloud scenario store, and never send anything to a third party besides Playwright driving the browser and the LLM API of your choice. Competitors that sit at $5K to $7.5K per month typically gate the runtime behind their cloud and ship tests in a proprietary format you cannot take with you. That is the opposite of what QA automation engineering should feel like.

How do I try this in under five minutes?

Run npx assrt-mcp to register the MCP server with your coding agent, then in Claude Code or Cursor call assrt_test with a URL and a plan like '#Case 1: home loads\n- Verify the hero heading is visible'. The agent writes the plan to /tmp/assrt/scenario.md, runs it in a real Chromium via Playwright MCP, records a webm video, and opens a 1x/5x/10x player when it finishes. The five defensive patterns on this page are all already in the runtime. You get them by default.

Other parts of the same engineering story, written the same way.

Related guides

Guide

Automation in QA: how AI actually does it

The mechanic behind ref=e5 accessibility snapshots and the 1-second fs.watch scenario sync.

Read

Compare

Best QA automation tools

Opinionated look at where the category is going, and what open source delivers today.

Read

Guide

AI powered agentic test execution

How a single agent loops through a markdown plan, calls Playwright MCP, and reports pass/fail.

Read

The actual job

Pattern 1 · Preflight HEAD probe, 8 second abort

Pattern 2 · Promise.race navigate, 30 second cap

Pattern 3 · MutationObserver stability, not sleep()

Pattern 4 · Synthetic ClipboardEvent for OTP widgets

Pattern 5 · Keep the browser alive across scenarios

Cookies and session tokens

The disposable email address

Draft state, wizard progress, cart

Your VNC takeover slot

Console and network breadcrumbs

Pattern bonus · Snapshot on every tool failure

Call snapshot() immediately on failure

Append the tree to the tool result the model sees

No retry tables, no heuristics, no fixed backoff

Scenario crashes do not poison the run

Inputs, defensive layer, outputs

The numbers, in one table

What changes when the defensive layer is first-class

Talk through the defensive layer for your stack

Frequently asked questions

Related guides

Automation in QA: how AI actually does it

Best QA automation tools

AI powered agentic test execution

Comments (••)

Comments ()