Automated testing tools ship one timeout dial. Assrt ships four nested ones, and the last one returns a synthetic TestReport.
Open a top list of automated testing tools and you will read about recorders, reporters, parallelism, and pricing. You will read very little about what those tools do when a test wedges. Assrt defines four nested ceilings in source: an 8s preflight HEAD, a 30s navigate race, a MutationObserver quiet window, and a whole-run budget. When the last one fires, the catch block at server.ts:553-572 hands your pipeline back the same JSON shape it hands back on success, with one synthetic name: "Timeout" scenario marked failed.
The one-line version
Four timeouts (preflight, navigate, DOM stability, whole-run) and the whole-run one throws a TestReport, not an exception.
That last detail is what makes it safe to add a budget to an existing pipeline. Your dashboard, your failedCount alert, your artifact uploader all keep working because the output contract does not change on a timeout; only a field inside it does.
The numbers, verbatim from source
Four ceilings, four constants
Every automated testing tool has an output contract
Assrt's contract is one TypeScript type: TestReport. A run that passes and a run that times out both produce the same shape; only the field values differ. That is the whole point of this diagram.
Three inputs, one runtime with four ceilings, four output variants with the same type
What it looks like when the whole-run ceiling fires
A checkout flow in CI, the Stripe iframe never stops mutating, the run exceeds its 120 second budget. Here is the exact sequence.
The synthetic TestReport, in source
This is the block that makes the contract hold. When the Promise.race loses, the catch assembles a TestReport with the fields your pipeline already knows how to read. Grep for name: "Timeout" to find it; it is the only synthetic scenario the code ever emits.
Four ceilings, laid out
Each budget lives in a different file, targets a different failure mode, and throws a different typed error. The source references are real.
Preflight HEAD, 8 seconds
Before Chrome even launches, assrt fetches the URL with method HEAD and an AbortController set to 8000ms. A wedged dev server fails here with the message 'Target URL X did not respond within 8000ms' instead of cascading into an opaque MCP connection drop three minutes later. 405 or 501 falls back to GET. Source: agent.ts:518-543.
Navigate, 30 seconds
Once Chrome is up, every navigate() call is wrapped in Promise.race against a 30 second timer. If the page doesn't load, the loser of the race is the navigate, the winner throws a typed error with the URL and the elapsed time. Not a runaway. Source: agent.ts:440-454 with NAV_TIMEOUT_MS = 30_000.
DOM stability, quiet period
The wait_for_stable tool injects a MutationObserver on document.body, polls window.__assrt_mutations every 500ms, and returns either when the counter is flat for stable_seconds (default 2, max 10) or when timeout_seconds (default 30, max 60) expires. The test waits for actual quiet, not a cargo-culted setTimeout. Source: agent.ts:956-1009.
Whole run, your budget
Pass timeout: 120 to assrt_test and the entire agent.run() gets wrapped in another Promise.race. On expiry, instead of throwing, the catch block builds a fully-formed TestReport with a single synthetic 'Timeout' scenario. Your pipeline branches on passed, not on try/catch. Source: server.ts:553-572.
The preflight probe, in source
HEAD request, 8 second abort, 405 or 501 falls back to GET, typed error names the URL and the boundary. The shortest of the four ceilings, and the one most competitor tools skip entirely because they assume your dev server is healthy at Chromium boot.
The navigate wrap, in source
Chromium launched, preflight passed. Now every navigate call races a 30 second timer. If the page hangs, the race resolves with a typed error that includes the URL and the exact millisecond count. Without this, a single misbehaving redirect eats the whole run.
DOM stability, the MutationObserver loop
A MutationObserver is injected into the page, every 500ms the agent reads a counter, when the counter is flat for the configured quiet window the call returns. Default quiet is two seconds. Default ceiling is thirty. Both capped. This is the wait primitive most competitor tools reach for a fixed sleep to simulate.
What a wedged test looks like in the rest of the category
Eight sentences from real postmortems. None of them describe a tool that emits a structured failure. All of them describe a tool that goes silent and hands the job to CI.
Why each budget earns its layer
One dial cannot distinguish these failures. Each budget lives at a layer where the others are blind.
Typed errors, not timeouts
Each of the four ceilings throws a message that names both the layer and the boundary: 'Target URL X did not respond within 8000ms', 'Navigate to X timed out after 30000ms', 'Test run exceeded timeout of Ns'. Grepping a failure log tells you which layer bit you.
Same JSON shape on timeout
The whole-run catch builds a TestReport with the same fields you would get from a passing run: scenarios, passedCount, failedCount, totalDuration, generatedAt. The only difference is one synthetic scenario with name='Timeout'. Your pipeline never sees a 500 or a bare exception.
Budgets stack, they don't overlap
Preflight fires BEFORE launchLocal, so a wedged dev server never burns eight minutes of Chromium boot. Navigate fires per page, so a slow third-party redirect doesn't eat your whole budget. wait_for_stable is per-call; the run-level timeout covers the rest. Each layer kills what the layer above can't see.
Default stable_seconds = 2
Two seconds of zero DOM mutations is the quiet threshold. Short enough that a finished page feels instant. Long enough that a streaming chat widget or a skeleton-to-real swap does not prematurely resolve. Capped at 10 so the agent cannot ask for minute-long quiet.
Optional, not baked in
timeout on assrt_test is optional (server.ts:345). Leave it out and the run has no upper bound; set it to 120 and you get an enforced ceiling plus the synthetic TestReport on expiry. Both modes produce the same output type, which is what makes this safe to add mid-project.
How this compares to the rest of the category
The rows below are about failure behavior, not feature checklist. Every tool can click a button. The interesting question is what happens when the click hangs.
| Feature | Most automated testing tools | Assrt |
|---|---|---|
| Failure mode when the target is wedged | Browser launch hangs; CI kill at 30 minutes | Preflight HEAD, 8s, typed error names the target URL |
| Failure mode when a page refuses to load | Framework default (often 30s), inconsistent message | Promise.race against NAV_TIMEOUT_MS = 30_000, typed error names the URL and elapsed ms |
| Waiting for async DOM | Fixed sleep, manual wait_for_selector, or 'just retry' | MutationObserver counter, quiet for N seconds means done |
| Whole-run budget | Outside the tool; depends on CI, retries, wrappers | timeout parameter wraps agent.run() in Promise.race |
| Output shape on timeout | Exception, killed process, or empty HTML report | Same TestReport shape as success, with name='Timeout' scenario |
| Error message discoverability | Stack trace or silent CI log gap | Each layer throws a unique typed string; grep the layer |
| License and source | Mixed: open frameworks with closed recorders and SaaS runners | Open source, self-hosted, Playwright under the hood |
Spot the anchor
Grep any Assrt repo for name: "Timeout" and you will find exactly one hit: the synthetic scenario at server.ts:563.
That single occurrence is the whole contract. Your dashboard code does not need special handling for the timeout case because the only thing that changed is a string inside a scenario. The shape, the types, the field names, the semantics of passedCount and failedCount, all identical. Open source, self-hosted, and you can verify it with grep -rn in thirty seconds.
Using this from your own code
From a coding agent (Claude Code, Cursor, Windsurf) it is a single tool call. From a CI pipeline it is a single CLI flag. Both paths produce the same TestReport; both respect the same four ceilings.
Thirty minutes on the four ceilings in your own pipeline
We bring the Assrt source up on screen, walk the exact four budgets in this guide, and help you map them onto an existing suite you already run. You leave with a working --timeout and a synthetic TestReport contract your dashboard can read.
Book a call →Frequently asked questions
Why four timeouts instead of one?
Because a single timer can't tell the difference between a wedged dev server, a slow page, an async DOM update, and an agent that is stuck in a tool loop. Each of those failures happens at a different layer of the runtime and emits a different signal. Assrt gives each layer its own dial. The preflight at agent.ts:518-543 catches unreachable servers before Chromium boots. The navigate wrap at agent.ts:440-454 catches hung page loads. The wait_for_stable tool at agent.ts:956-1009 catches DOM churn. The run-level timeout parameter at server.ts:553-572 catches everything else. Four specific failures, four specific error messages, four specific bounded behaviors.
What exactly is a 'synthetic TestReport' and why does it matter?
When the run-level timeout fires, the code at server.ts:555-570 catches the thrown error and builds a TestReport object that matches the exact shape of a passing run's report: url, scenarios, passedCount, failedCount, totalDuration, generatedAt. It inserts a single scenario with name='Timeout', passed=false, and one assertion whose description is 'Completed within Ns' and whose evidence is the original error message. The point is contract stability. Your downstream pipeline that sums passedCount across runs, or alerts on failedCount > 0, or writes the JSON to a dashboard, does not need special handling for the timeout case. The timeout IS a failure in the same JSON shape.
Does this replace per-action waits like waitForSelector or waitForLoadState?
No. The four time budgets sit ABOVE the per-action waits that Playwright still does internally. Inside a single tool call the agent is still using Playwright's own wait machinery. The Assrt-specific budgets are the ones Playwright doesn't give you: an 8s reachability probe BEFORE launch, a 30s ceiling on every navigate, a MutationObserver-based quiet window for DOM settle, and a budget on the whole run. Think of Playwright as the inner clock and Assrt as the four outer ones.
What happens when the timeout parameter is zero or omitted?
Server.ts:554 checks if (timeout && timeout > 0). If the parameter is missing, zero, or negative, the run uses await agent.run(...) directly with no outer race. The other three ceilings (preflight 8s, navigate 30s, wait_for_stable default 30s) are always on, because they are baked into the runtime at the file level. Skipping the run-level timeout is safe for interactive development; setting it is how you stop a runaway from eating a full CI slot.
What does the preflight probe actually check, and what does it skip?
It does a HEAD on the URL with method: 'HEAD', redirect: 'manual', and an AbortController set to 8000ms. If the server answers with 405 Method Not Allowed or 501 Not Implemented (the two statuses that some servers use to reject HEAD), the probe falls back to a GET with the same abort signal. Any other response, including 4xx and 5xx, is treated as 'reachable' because the point is network reachability, not correctness. The probe only throws on connection refused, DNS failure, or the 8s timeout. Source: agent.ts:518-543.
Can I change the defaults (8s preflight, 30s navigate, 2s stable)?
Preflight is a method-level default (timeoutMs = 8000) on preflightUrl; the caller in run() passes no override, so it's 8s every time. Navigate is a const NAV_TIMEOUT_MS = 30_000 at agent.ts:441. Both of those are source constants; to change them you edit the file. The wait_for_stable defaults come from the TOOL arguments timeout_seconds (default 30, max 60) and stable_seconds (default 2, max 10); the agent can override them in its own tool calls. The run-level timeout comes from the assrt_test parameter, defined with z.number().optional() at server.ts:345. Different layers, different configuration surfaces, all deliberate.
How does this compare to Cypress retry-ability or Playwright's expect polling?
Cypress retry-ability and Playwright's auto-waiting are per-assertion: they retry the locator until it matches or a 4-10s default expires. They're great for what they cover. They do not cover 'server is wedged', 'whole run is hung', or 'the DOM is still mutating because a streaming AI response hasn't finished'. Assrt's four budgets explicitly target those cases, and the per-assertion waits inside individual tool calls still happen via Playwright (because Assrt uses the Playwright MCP server under the hood). The two layers compose, they do not compete.
Where does the 'automated testing tools' category end and Assrt begin?
Most tools in this category give you a way to drive a browser, assert a thing, and produce a report. Where the abstraction leaks is on the axes that don't fit in a spec file: what to do when the server is down, when the page refuses to load, when the DOM never settles, when the whole run exceeds your patience. Assrt's opinion is that those four failures deserve their own time budgets, their own typed error messages, and their own matching output contract. The spec-file surface (the #Case markdown) stays small; the bounded-behavior surface is the product.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.