The four time budgets most automated testing tools ignore

Automated testing tools ship one timeout dial. Assrt ships four nested ones, and the last one returns a synthetic TestReport.

Open a top list of automated testing tools and you will read about recorders, reporters, parallelism, and pricing. You will read very little about what those tools do when a test wedges. Assrt defines four nested ceilings in source: an 8s preflight HEAD, a 30s navigate race, a MutationObserver quiet window, and a whole-run budget. When the last one fires, the catch block at server.ts:553-572 hands your pipeline back the same JSON shape it hands back on success, with one synthetic name: "Timeout" scenario marked failed.

M
Matthew Diakonov
11 min read
4.8from open source users
Four source-level time budgets, one JSON shape
Synthetic TestReport on whole-run expiry (server.ts:553-572)
Open source, self-hosted, Playwright under the hood

The one-line version

Four timeouts (preflight, navigate, DOM stability, whole-run) and the whole-run one throws a TestReport, not an exception.

That last detail is what makes it safe to add a budget to an existing pipeline. Your dashboard, your failedCount alert, your artifact uploader all keep working because the output contract does not change on a timeout; only a field inside it does.

The numbers, verbatim from source

Four ceilings, four constants

0 mspreflight HEAD abort (agent.ts:518)
0 msnavigate Promise.race (agent.ts:441)
0 sdefault stable_seconds (agent.ts:958)
×0synthetic TestReport on run expiry

Every automated testing tool has an output contract

Assrt's contract is one TypeScript type: TestReport. A run that passes and a run that times out both produce the same shape; only the field values differ. That is the whole point of this diagram.

Three inputs, one runtime with four ceilings, four output variants with the same type

Your URL
Your plan
--timeout N
assrt_test
TestReport (pass)
TestReport (fail)
TestReport (timeout)
Video + events.json

What it looks like when the whole-run ceiling fires

A checkout flow in CI, the Stripe iframe never stops mutating, the run exceeds its 120 second budget. Here is the exact sequence.

Wedged Stripe iframe, caught by whole-run timeout

The synthetic TestReport, in source

This is the block that makes the contract hold. When the Promise.race loses, the catch assembles a TestReport with the fields your pipeline already knows how to read. Grep for name: "Timeout" to find it; it is the only synthetic scenario the code ever emits.

assrt-mcp/src/mcp/server.ts

Four ceilings, laid out

Each budget lives in a different file, targets a different failure mode, and throws a different typed error. The source references are real.

1

Preflight HEAD, 8 seconds

Before Chrome even launches, assrt fetches the URL with method HEAD and an AbortController set to 8000ms. A wedged dev server fails here with the message 'Target URL X did not respond within 8000ms' instead of cascading into an opaque MCP connection drop three minutes later. 405 or 501 falls back to GET. Source: agent.ts:518-543.

2

Navigate, 30 seconds

Once Chrome is up, every navigate() call is wrapped in Promise.race against a 30 second timer. If the page doesn't load, the loser of the race is the navigate, the winner throws a typed error with the URL and the elapsed time. Not a runaway. Source: agent.ts:440-454 with NAV_TIMEOUT_MS = 30_000.

3

DOM stability, quiet period

The wait_for_stable tool injects a MutationObserver on document.body, polls window.__assrt_mutations every 500ms, and returns either when the counter is flat for stable_seconds (default 2, max 10) or when timeout_seconds (default 30, max 60) expires. The test waits for actual quiet, not a cargo-culted setTimeout. Source: agent.ts:956-1009.

4

Whole run, your budget

Pass timeout: 120 to assrt_test and the entire agent.run() gets wrapped in another Promise.race. On expiry, instead of throwing, the catch block builds a fully-formed TestReport with a single synthetic 'Timeout' scenario. Your pipeline branches on passed, not on try/catch. Source: server.ts:553-572.

The preflight probe, in source

HEAD request, 8 second abort, 405 or 501 falls back to GET, typed error names the URL and the boundary. The shortest of the four ceilings, and the one most competitor tools skip entirely because they assume your dev server is healthy at Chromium boot.

assrt-mcp/src/core/agent.ts

The navigate wrap, in source

Chromium launched, preflight passed. Now every navigate call races a 30 second timer. If the page hangs, the race resolves with a typed error that includes the URL and the exact millisecond count. Without this, a single misbehaving redirect eats the whole run.

assrt-mcp/src/core/agent.ts

DOM stability, the MutationObserver loop

A MutationObserver is injected into the page, every 500ms the agent reads a counter, when the counter is flat for the configured quiet window the call returns. Default quiet is two seconds. Default ceiling is thirty. Both capped. This is the wait primitive most competitor tools reach for a fixed sleep to simulate.

assrt-mcp/src/core/agent.ts

What a wedged test looks like in the rest of the category

Eight sentences from real postmortems. None of them describe a tool that emits a structured failure. All of them describe a tool that goes silent and hands the job to CI.

Playwright spec that never resolvesSelenium session stuck on driver.getCypress waiting for a 30s assertionBrowser puppet sitting on about:blankCI killing the job at the 30 minute capJenkins log ending at: "Still running..."Reporter HTML that never got writtenPostmortem that starts: "we think it hung"

Why each budget earns its layer

One dial cannot distinguish these failures. Each budget lives at a layer where the others are blind.

Typed errors, not timeouts

Each of the four ceilings throws a message that names both the layer and the boundary: 'Target URL X did not respond within 8000ms', 'Navigate to X timed out after 30000ms', 'Test run exceeded timeout of Ns'. Grepping a failure log tells you which layer bit you.

Same JSON shape on timeout

The whole-run catch builds a TestReport with the same fields you would get from a passing run: scenarios, passedCount, failedCount, totalDuration, generatedAt. The only difference is one synthetic scenario with name='Timeout'. Your pipeline never sees a 500 or a bare exception.

Budgets stack, they don't overlap

Preflight fires BEFORE launchLocal, so a wedged dev server never burns eight minutes of Chromium boot. Navigate fires per page, so a slow third-party redirect doesn't eat your whole budget. wait_for_stable is per-call; the run-level timeout covers the rest. Each layer kills what the layer above can't see.

Default stable_seconds = 2

Two seconds of zero DOM mutations is the quiet threshold. Short enough that a finished page feels instant. Long enough that a streaming chat widget or a skeleton-to-real swap does not prematurely resolve. Capped at 10 so the agent cannot ask for minute-long quiet.

Optional, not baked in

timeout on assrt_test is optional (server.ts:345). Leave it out and the run has no upper bound; set it to 120 and you get an enforced ceiling plus the synthetic TestReport on expiry. Both modes produce the same output type, which is what makes this safe to add mid-project.

How this compares to the rest of the category

The rows below are about failure behavior, not feature checklist. Every tool can click a button. The interesting question is what happens when the click hangs.

FeatureMost automated testing toolsAssrt
Failure mode when the target is wedgedBrowser launch hangs; CI kill at 30 minutesPreflight HEAD, 8s, typed error names the target URL
Failure mode when a page refuses to loadFramework default (often 30s), inconsistent messagePromise.race against NAV_TIMEOUT_MS = 30_000, typed error names the URL and elapsed ms
Waiting for async DOMFixed sleep, manual wait_for_selector, or 'just retry'MutationObserver counter, quiet for N seconds means done
Whole-run budgetOutside the tool; depends on CI, retries, wrapperstimeout parameter wraps agent.run() in Promise.race
Output shape on timeoutException, killed process, or empty HTML reportSame TestReport shape as success, with name='Timeout' scenario
Error message discoverabilityStack trace or silent CI log gapEach layer throws a unique typed string; grep the layer
License and sourceMixed: open frameworks with closed recorders and SaaS runnersOpen source, self-hosted, Playwright under the hood

Spot the anchor

Grep any Assrt repo for name: "Timeout" and you will find exactly one hit: the synthetic scenario at server.ts:563.

That single occurrence is the whole contract. Your dashboard code does not need special handling for the timeout case because the only thing that changed is a string inside a scenario. The shape, the types, the field names, the semantics of passedCount and failedCount, all identical. Open source, self-hosted, and you can verify it with grep -rn in thirty seconds.

Using this from your own code

From a coding agent (Claude Code, Cursor, Windsurf) it is a single tool call. From a CI pipeline it is a single CLI flag. Both paths produce the same TestReport; both respect the same four ceilings.

ci.sh

Thirty minutes on the four ceilings in your own pipeline

We bring the Assrt source up on screen, walk the exact four budgets in this guide, and help you map them onto an existing suite you already run. You leave with a working --timeout and a synthetic TestReport contract your dashboard can read.

Book a call

Frequently asked questions

Why four timeouts instead of one?

Because a single timer can't tell the difference between a wedged dev server, a slow page, an async DOM update, and an agent that is stuck in a tool loop. Each of those failures happens at a different layer of the runtime and emits a different signal. Assrt gives each layer its own dial. The preflight at agent.ts:518-543 catches unreachable servers before Chromium boots. The navigate wrap at agent.ts:440-454 catches hung page loads. The wait_for_stable tool at agent.ts:956-1009 catches DOM churn. The run-level timeout parameter at server.ts:553-572 catches everything else. Four specific failures, four specific error messages, four specific bounded behaviors.

What exactly is a 'synthetic TestReport' and why does it matter?

When the run-level timeout fires, the code at server.ts:555-570 catches the thrown error and builds a TestReport object that matches the exact shape of a passing run's report: url, scenarios, passedCount, failedCount, totalDuration, generatedAt. It inserts a single scenario with name='Timeout', passed=false, and one assertion whose description is 'Completed within Ns' and whose evidence is the original error message. The point is contract stability. Your downstream pipeline that sums passedCount across runs, or alerts on failedCount > 0, or writes the JSON to a dashboard, does not need special handling for the timeout case. The timeout IS a failure in the same JSON shape.

Does this replace per-action waits like waitForSelector or waitForLoadState?

No. The four time budgets sit ABOVE the per-action waits that Playwright still does internally. Inside a single tool call the agent is still using Playwright's own wait machinery. The Assrt-specific budgets are the ones Playwright doesn't give you: an 8s reachability probe BEFORE launch, a 30s ceiling on every navigate, a MutationObserver-based quiet window for DOM settle, and a budget on the whole run. Think of Playwright as the inner clock and Assrt as the four outer ones.

What happens when the timeout parameter is zero or omitted?

Server.ts:554 checks if (timeout && timeout > 0). If the parameter is missing, zero, or negative, the run uses await agent.run(...) directly with no outer race. The other three ceilings (preflight 8s, navigate 30s, wait_for_stable default 30s) are always on, because they are baked into the runtime at the file level. Skipping the run-level timeout is safe for interactive development; setting it is how you stop a runaway from eating a full CI slot.

What does the preflight probe actually check, and what does it skip?

It does a HEAD on the URL with method: 'HEAD', redirect: 'manual', and an AbortController set to 8000ms. If the server answers with 405 Method Not Allowed or 501 Not Implemented (the two statuses that some servers use to reject HEAD), the probe falls back to a GET with the same abort signal. Any other response, including 4xx and 5xx, is treated as 'reachable' because the point is network reachability, not correctness. The probe only throws on connection refused, DNS failure, or the 8s timeout. Source: agent.ts:518-543.

Can I change the defaults (8s preflight, 30s navigate, 2s stable)?

Preflight is a method-level default (timeoutMs = 8000) on preflightUrl; the caller in run() passes no override, so it's 8s every time. Navigate is a const NAV_TIMEOUT_MS = 30_000 at agent.ts:441. Both of those are source constants; to change them you edit the file. The wait_for_stable defaults come from the TOOL arguments timeout_seconds (default 30, max 60) and stable_seconds (default 2, max 10); the agent can override them in its own tool calls. The run-level timeout comes from the assrt_test parameter, defined with z.number().optional() at server.ts:345. Different layers, different configuration surfaces, all deliberate.

How does this compare to Cypress retry-ability or Playwright's expect polling?

Cypress retry-ability and Playwright's auto-waiting are per-assertion: they retry the locator until it matches or a 4-10s default expires. They're great for what they cover. They do not cover 'server is wedged', 'whole run is hung', or 'the DOM is still mutating because a streaming AI response hasn't finished'. Assrt's four budgets explicitly target those cases, and the per-assertion waits inside individual tool calls still happen via Playwright (because Assrt uses the Playwright MCP server under the hood). The two layers compose, they do not compete.

Where does the 'automated testing tools' category end and Assrt begin?

Most tools in this category give you a way to drive a browser, assert a thing, and produce a report. Where the abstraction leaks is on the axes that don't fit in a spec file: what to do when the server is down, when the page refuses to load, when the DOM never settles, when the whole run exceeds your patience. Assrt's opinion is that those four failures deserve their own time budgets, their own typed error messages, and their own matching output contract. The spec-file surface (the #Case markdown) stays small; the bounded-behavior surface is the product.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.