Playwright internalsDeep read

Playwright web-first assertions retry: what the 5-second budget actually covers.

Every web-first matcher in Playwright shares the same retry engine: poll the same locator with the same matcher about every 100 ms for up to 5000 ms, then pass or fail. It is a precision tool for one specific problem, and it is weirdly silent about the problem it was not built to solve. This page walks through what the retry actually does under the hood, the three escalation points when the default is not enough, and a second retry tier layered on top: a MutationObserver that waits for DOM quiet and a re-snapshotted accessibility tree that lets the next retry use a different ref instead of the same stale one.

Read the source on GitHub Jump to retry tiers

Matthew Diakonov, Written with AI

Published April 23, 202612 min read

4.9from active testers this month

Real Playwright under the hood via @playwright/mcp@0.0.70, not a proprietary runtime

Open source, self-hosted, free. No cloud-only lock-in.

Scenarios are markdown on disk, not rows in a vendor database

Two retry tiers, not one

What Playwright covers, what Assrt adds on top

Tier 1: matcher-level retry, 5000 ms, same locator, ~100 ms poll.

Tier 2: MutationObserver, DOM-quiet wait, 60 s ceiling.

Tier 3: fresh a11y snapshot auto-injected on any tool failure.

Evidence field names the cause, not a 5-second timeout.

0:00 / 0:05

The 5-second budget, by the numbers

Before talking about what to do when the web-first retry is not enough, it helps to name exactly what it does. The numbers below come from the Playwright test runner defaults and from /Users/matthewdi/assrt-mcp/src/core/agent.ts for the Assrt side. The Assrt numbers are enforced by explicit Math.min clamps at lines 957 to 958, which means you cannot accidentally write a scenario that hangs for five minutes on one step.

0msDefault expect.timeout in Playwright

0msMatcher poll interval (approx)

0mswait_for_stable poll tick

0swait_for_stable hard ceiling

Matcher retries

Up to 0 polls on the same locator inside the default budget. Great for React hydration. Useless for a renamed button.

Budget failure mode

The retry engine does not know what 0 ms was spent on. You get Timed out 5000ms waiting for expect(locator).toBeVisible() and no hint about whether the element is late or gone.

Five retry tiers, in the order you should reach for them

Most guides on this topic stop at tier two. That is fine for a stable .spec.ts where the selectors are hand-picked and the authors know the DOM. Once an AI is writing the tests, or the page has long async settle, or the UI ships labels changes weekly, you need the tiers below. The first two are stock Playwright; the last three live inside the Assrt agent runtime.

Matcher-level retry

expect(locator).toBeVisible({ timeout: 5000 }) polls the DOM for the locator, checks the matcher, and loops on the same pair for up to 5 seconds. Best case for hydration races.

Locator-level retry (manual)

expect.poll or expect.toPass lets you retry across multiple actions. Still uses the selectors and matchers you wrote. Does not re-read the page to pick a different ref.

DOM-quiet wait (Assrt tier 1)

wait_for_stable injects a MutationObserver and waits for the whole page to stop mutating for a configurable stable window. Better than polling one locator for async settle.

Accessibility snapshot (Assrt tier 2)

After DOM is quiet, a fresh a11y tree is taken. The agent reads role plus accessible name, not a CSS selector, so a renamed class or moved button does not change the lookup.

Replan on failure (Assrt tier 3)

If any tool call throws, the error handler at agent.ts:1014-1017 prepends the first 2000 characters of a fresh a11y snapshot to the agent's next message. The agent retries with an updated view, not the stale one.

Matcher-level retry versus agent-level retry, side by side

Same intent, very different retry surface. The left side is the canonical Playwright pattern: a web-first matcher doing tier-one retry for up to five seconds. The right side is an Assrt scenario expressing the same goal in English, driving tiers three through five under the hood.

Same intent, two retry tiers

import { test, expect } from "@playwright/test";

test("welcome heading appears", async ({ page }) => {
  await page.goto("/app/dashboard");

  // expect.timeout default is 5000 ms.
  // Polls the SAME locator with the SAME matcher every ~100 ms.
  // If the heading is renamed to "Welcome!" the retry budget
  // still burns all 5 seconds, then fails with "not found".
  await expect(
    page.getByRole("heading", { name: "Welcome back" })
  ).toBeVisible();
});

-46% lines of TS you maintain

The actual MutationObserver, not a description of one

The existing guides on this topic describe DOM-quiet waits in the abstract. Here is the real implementation, copied from the Assrt agent source. Note the two Math.min clamps, the 500 ms poll, and the fact that stableSince resets on every new mutation so the timer only counts uninterrupted quiet.

assrt-mcp/src/core/agent.ts

This is the piece a standard web-first assertion cannot do. It watches the whole document subtree rather than polling one locator, which means heavy pages (streaming chat, virtualised tables, delayed GraphQL) get a realistic settle without you having to raise expect.timeout globally and hide real timing bugs in the process.

The retry tier almost nobody mentions: fresh-snapshot-on-failure

Where web-first retry runs out of moves is where tier five kicks in. When any tool call throws (a stale ref, a click that missed, a matcher that did not pass), the agent loop does not just retry with the same input. It catches the error, calls this.browser.snapshot() to get a fresh accessibility tree, and prepends the first 2000 characters of that tree to the agent's next message. The retry starts from the current page, not the page as it was before the failure.

assrt-mcp/src/core/agent.ts

What the retry looks like in flight

Real output from an agent run. The wait_for_stable step logs every 500 ms tick, showing mutations climbing while React hydrates, then settling once the dashboard finishes rendering. Because the agent waited for DOM quiet rather than polling a single locator, the next snapshot reliably has the new heading.

assrt run — DOM-quiet retry

Same scenario, but the button label drifted. Matcher-level retry would burn 5000 ms and emit not visible. The fresh-snapshot tier catches the cause: a label rename.

assrt run — selector drift caught by fresh snapshot

What feeds the retry engine, and what comes out

The retry playbook as a single page

One retry budget, one locator

The default expect.timeout of 5000 ms polls a single locator with a single matcher. A renamed class or a moved button cannot be recovered by the retry engine. The 5 seconds is spent, then the test fails with "not found" or "not visible", with no hint of the actual cause.

100ms poll cadence

Playwright polls the matcher roughly every 100 ms inside the 5000 ms window. That gives up to 50 tries on the same locator, which is plenty for hydration races. It does nothing for selector drift.

500ms MutationObserver tick

Assrt's wait_for_stable ticks the MutationObserver every 500 ms and only breaks when no new mutations have accumulated for the configured stable window. Quiet DOM is a better signal than polling one locator.

Auto fresh snapshot on failure

When a tool call throws (click, type, assert), the first 2000 characters of a newly taken a11y tree get prepended to the agent's next message (agent.ts:1014-1017). The retry does not run against a stale view, it runs against the current page.

60s hard ceiling

Math.min((timeout_seconds || 30), 60). You cannot accidentally set a 5-minute hang. The stable window is similarly capped at 10 seconds via Math.min((stable_seconds || 2), 10). These ceilings live at agent.ts:957-958.

Retry tiers in order of escalation

🌐

Matcher retry

5s, same locator

⚙️

Poll / toPass

block retry, same selectors

🔔

wait_for_stable

MutationObserver, 60s cap

✅

snapshot

fresh a11y tree

🔒

assert

description+passed+evidence

The retry-adjacent API surface you might reach for

Before you decide you need a second tier, know what the standard library already covers. Every name in this strip ships with the Playwright test runner today. Each is backed by the same retry engine under the hood.

toBeVisibletoBeHiddentoBeEnabledtoBeDisabledtoBeCheckedtoBeAttachedtoBeEmptytoBeEditabletoBeFocusedtoBeInViewporttoHaveTexttoContainTexttoHaveValuetoHaveCounttoHaveClasstoHaveCSStoHaveAttributetoHaveJSPropertytoHaveURLtoHaveTitletoHaveScreenshottoHaveAccessibleNametoBeOKexpect.pollexpect.toPass

Where to leave Playwright alone, and where to add a second tier

The web-first retry engine is tight and well-designed for what it is: a timing-based stabiliser for single-locator matchers. Leave it alone for login forms with predictable async settle, for pages where your team handpicks the selectors, and for pixel-diff screenshot assertions that have no equivalent in any agent runtime. Add the agent-level tiers when the pain is selector drift, long-async settle, or AI-authored tests where the locator is the weak link. They compose cleanly: Assrt drives the same @playwright/mcp runtime, so toHaveScreenshot lives next to scenario.md without interfering. Two tiers, one browser, pick per test.

Want to see the retry tiers wired into your own flaky test?

30 minutes with the team, we run one of your failing .spec.ts files through the Assrt agent and show the evidence trail.

Frequently asked questions

What exactly does Playwright retry when I call await expect(locator).toBeVisible()?

Two things, and it is important to separate them. One: the locator query itself. Playwright re-queries the DOM for elements matching the selector on every poll, so if a hydration step renders a new button, the retry engine picks it up. Two: the matcher check. toBeVisible reads the element's bounding box, offsetParent, and CSS visibility and returns pass or fail. Both happen on the same timer, defaulting to 5000 ms, polling roughly every 100 ms. What it does NOT do: synthesize a different selector, pick a different element if the current one is wrong, or retry at the scenario level. If the locator string was wrong at millisecond 1, it is still wrong at millisecond 4999. The retry engine is a loop that re-runs the same query, not a planner that can change strategy.

The 5-second default feels arbitrary. Where does it come from, and should I change it?

The default sits in the Playwright test runner config under expect.timeout (5000 ms). It is separate from action timeout (default 30 seconds, used by click, fill, waitFor) and test timeout (default 30 seconds for the whole test). The 5-second number was chosen as a compromise: long enough to absorb typical React hydration and network-bound UI changes, short enough that a truly broken selector fails fast in CI. If you are testing flows with a genuine wait (chat streaming, big GraphQL queries, server-side rendering), bump it per-assertion with .toBeVisible({ timeout: 15_000 }) rather than globally; global increases tend to hide real regressions. If you find yourself raising it to 30 seconds, the problem is usually not timing, it is locator fragility.

What is the difference between expect(locator).toBeVisible() retry and expect.poll() and expect.toPass()?

Three different tools for three different layers. toBeVisible (and the other web-first matchers) retries inside a single matcher on a single locator. expect.poll wraps any async function and retries it until the returned value passes a matcher, so you can do expect.poll(() => fetchCount()).toBeGreaterThan(0) with a custom interval and timeout. expect.toPass wraps an entire block of code, retrying the whole block (several actions plus assertions) until it throws no exceptions or the budget expires. The escalation ladder is: web-first matcher for visibility-class checks, expect.poll for single-value polling with custom intervals, expect.toPass for compound sequences. All three share the same underlying retry engine and all three have the same limitation: they cannot rewrite the selector or the request body you passed in.

Why does my web-first assertion hang for the full 5 seconds before failing? I thought it retried fast.

It does retry fast internally, roughly every 100 ms, but it does not fail fast. Each poll either returns pass (exit early and succeed) or continues silently. There is no halfway signal. If the matcher is checking a locator that does not exist in the current DOM, you will burn the full 5000 ms before seeing the error. This is one of the most surprising behaviors for engineers coming from Cypress, where assertions have a shorter default and retry is more visible. Mitigations: set a shorter per-assertion timeout when you expect immediate pass/fail; use .first() to narrow strict-mode ambiguity; prefer role-based selectors over CSS so a class rename does not invalidate the locator.

What is Assrt's version of the retry, and why do you need a second tier on top of Playwright's?

Two complementary tiers. Tier one is Playwright's web-first retry, which Assrt keeps because the agent drives a real @playwright/mcp@0.0.70 browser under the hood. Tier two is wait_for_stable, defined in /Users/matthewdi/assrt-mcp/src/core/agent.ts at lines 956-1008. It does something Playwright cannot: it injects a MutationObserver into the page, counts DOM mutations per 500 ms tick, and only proceeds when no mutations have occurred for a configurable stable window (default 2 seconds, clamped to 10 seconds max). Timeout is clamped to 60 seconds max via Math.min((toolInput.timeout_seconds as number) || 30, 60). So instead of retrying a single matcher against a single locator for 5 seconds, the agent can wait for the whole page to go quiet, then re-snapshot the accessibility tree to get fresh element refs. When that fails, the tool error handler at agent.ts lines 1014-1017 automatically injects the first 2000 characters of a fresh a11y snapshot into the agent's next message, so the retry uses an updated view of the page. That last step is the one web-first retry cannot do: it changes strategy, not just timing.

How does that second tier look in a real test run? Show me the agent call pattern.

A typical post-action sequence: the agent clicks Submit, calls wait_for_stable with timeout_seconds 30 and stable_seconds 2, the MutationObserver ticks every 500 ms until 4 seconds of DOM activity calms to 2 seconds of quiet (elapsed 6 seconds), then the agent calls snapshot to get a fresh a11y tree, finds the new heading by role and accessible name, and calls the assert tool with description, passed, and evidence. If the selector drifted (button was renamed "Log in" instead of "Sign in"), the assertion fails with evidence pointing at the rename, not at a timeout. Contrast with a web-first retry: you would see Error: Timed out 5000ms waiting for expect(locator).toBeVisible() and have to open the trace viewer to learn that the button got renamed. The evidence string saves the 5 minute diagnostic loop.

Can I still use my existing .spec.ts files alongside Assrt, or is it all-or-nothing?

All of them, unchanged. Assrt does not monopolise the browser or touch your .spec.ts files. It runs its own chromium instance via @playwright/mcp. You can keep toHaveScreenshot pixel diffs, keep expect.poll loops for waiting on backend work, keep .spec.ts files for tests where the selector is stable and the retry budget is fine. Add scenario.md files for the tests that are flaking because the AI-written selector drifts, or because the page has a long async settle and you want DOM-quiet semantics instead of a fixed 5 second poll. The two coexist, they do not compete.

Is this open source? What is the vendor lock-in story?

The agent runtime, the MCP server, and the CLI are all open source under /Users/matthewdi/assrt and /Users/matthewdi/assrt-mcp. Tests are plain markdown (scenario.md with #Case N headers and English steps). The execution log is plain JSON on disk at /tmp/assrt/<runId>/events.json. No proprietary YAML, no cloud-only database. If you cancel, your scenarios and evidence remain on disk. A Playwright-literate engineer can translate a scenario.md back into a .spec.ts by hand in 15 minutes. This matters compared to QA SaaS vendors that store tests in their cloud for $7,500 per month: your tests are never exportable from them.

Keep going

Keep reading

Assertions

Playwright web-first assertions: the assert primitive for agent-driven tests

Same matcher family, but a single assert tool with description, passed, evidence, read from a live a11y tree.

Read

AI tests

AI-generated Playwright tests: a review loop that actually catches selector drift

When an LLM writes the .spec.ts, the locator is where you get burned. Here is how to review and keep them honest.

Read

Reliability

Self-healing tests: why pattern-matching selectors are not enough

Most self-healing engines retry the same DOM. This one re-snapshots and replans.

Read

Playwright web-first assertions retry: what the 5-second budget actually covers.

The 5-second budget, by the numbers

Five retry tiers, in the order you should reach for them

Matcher-level retry

Locator-level retry (manual)

DOM-quiet wait (Assrt tier 1)

Accessibility snapshot (Assrt tier 2)

Replan on failure (Assrt tier 3)

Matcher-level retry versus agent-level retry, side by side

The actual MutationObserver, not a description of one

The retry tier almost nobody mentions: fresh-snapshot-on-failure

What the retry looks like in flight

What feeds the retry engine, and what comes out

The retry playbook as a single page

One retry budget, one locator

100ms poll cadence

500ms MutationObserver tick

Auto fresh snapshot on failure

60s hard ceiling

The retry-adjacent API surface you might reach for

Where to leave Playwright alone, and where to add a second tier

Want to see the retry tiers wired into your own flaky test?

Frequently asked questions

Keep reading

Playwright web-first assertions: the assert primitive for agent-driven tests

AI-generated Playwright tests: a review loop that actually catches selector drift

Self-healing tests: why pattern-matching selectors are not enough

Comments (••)

Comments ()