Test automation best practices

The best practice every list misses: the plan should grow itself as it runs.

Every top-ten guide on this keyword treats the test plan as a static document you author up front: prioritize by risk, cover the critical paths, review with the PM, ship. All correct. All incomplete. The practice none of them mention is the one that matters when your app actually moves: while your tests are running, the suite should be proposing the next cases. This page is about the exact code that does that in Assrt, and the four numbers that keep it bounded.

Matthew Diakonov, Written with AI

Published April 20, 202611 min read

4.8from Assrt MCP users

3 concurrent discovery LLM calls per run, bounded

20 unique pages discovered before the cap kicks in

Open-source, self-hosted, real Playwright underneath

The practice behind the practices

A test run that ends without proposing any new cases is a run that already knew the answer.

Classical best practices assume your test plan is a product of authoring: you wrote it, it is done, you run it. Assrt treats the plan as a product of both authoring and exploration. Authoring owns the revenue-critical head of the distribution: checkout, signup, upgrade. Exploration owns the long tail: admin pages, settings screens, the route a product manager added last Thursday that nobody wrote a case for. The exploration mechanism is a bounded discovery pipeline that runs in parallel with your tests. That is the rest of this page.

Install npx assrt-mcp

A plan that grows itself

Three discovery calls in parallel with your test run.

Every navigate triggers queueDiscoverPage

Dedup by origin+pathname, cap at 20 unique pages

Six URL patterns skipped: logout, api, javascript, blank, data, chrome

flushDiscovery runs up to 3 LLM calls in parallel

Output: 1-2 proposed #Case blocks per discovered page

0:00 / 0:05

Four numbers, one pipeline

Before the story, the constants. Everything below is derivable from these four. Three is the maximum concurrent discovery calls. Twenty is the per-run page cap. Six is the number of URL patterns that never become a discovery candidate. Four thousand is the character budget of accessibility tree passed into each discovery call.

0Concurrent discovery LLM calls max

0Unique pages discovered per run

0URL patterns skipped by default

0Chars of accessibility tree per discovery call

The anchor fact: the constants and the prompt

Here is the exact source that makes this work. First, the discovery system prompt itself. Read the five rules carefully. They are the guardrails that keep the discovery LLM proposing real cases instead of drifting toward hypothetical ones.

assrt-mcp/src/core/agent.ts:256-267

Below is the constants block right underneath it. The three values fix the shape of the pipeline: concurrency cap, per-run page cap, and the list of URLs that never enter the queue.

assrt-mcp/src/core/agent.ts:269-271

Six patterns the queue ignores, and what each one costs if you include it

The skip list reads like a trivial detail. It is not. Each pattern below, if removed, produces a specific failure mode in the discovered output. The cards below pair the pattern with what you would get if you forgot to skip it.

/logout

Landing on a logout URL terminates the session and wastes a scenario. The test would discover a login screen next, which is fine; discovering a signed-out cold boot is not the intent.

/api/

API endpoints return JSON, not an accessibility tree. There is nothing the agent can click. Including them as candidate pages would burn a discovery call with no useful output.

javascript:

A javascript: URL is a snippet, not a page. The agent has an evaluate tool for that.

about:blank

Empty document. Often the starting state before the first navigate. Queued and passed through would produce an empty #Case.

data:

Inline data URIs (base64 images, svg blobs) render as the resource itself, not a UI. No assertions worth running.

chrome

Internal Chrome URLs (chrome://new-tab, devtools). Out of scope for the app you are testing and full of first-party UI the agent cannot affect.

/logout/api/javascript:about:blankdata:chromesurvivors go to the queue

The five-step life of a navigate

Below is what happens, in order, when the main agent loop calls navigate on a fresh URL. Note steps 2 and 3 both run inline with the main loop; step 4 is the one that happens in parallel. None of this is scheduled, queued for later, or scanned after the run ends.

Navigate handler, end to end

Agent calls navigate

A step in the current #Case triggers navigate(url). The browser loads, a snapshot is returned.

queueDiscoverPage runs

The same navigate handler calls queueDiscoverPage(url). The URL is normalized to origin+pathname (trailing slash stripped), deduped against discoveredUrls, counted against the 20-page cap, and tested against the skip patterns. Survivors get pushed to pendingDiscoveryUrls.

flushDiscovery fires after the tool call

After each tool result resolves, flushDiscovery runs. It only starts a new discovery when the browser is not busy and fewer than 3 are in flight. It pulls the pending URLs, snapshots once, emits page_discovered with the screenshot, and launches generateDiscoveryCases.

generateDiscoveryCases streams a plan

Runs the smaller discovery LLM with DISCOVERY_SYSTEM_PROMPT and the page's accessibility tree (trimmed to 4000 chars) plus the JPEG. Streams tokens as discovered_cases_chunk events. Emits discovered_cases_complete when done.

Main agent loop keeps going

None of the above blocks the main test. The agent has already moved on to its next snapshot, next click, next assertion. Discovery concurrency is bounded at 3 so a 20-page run will max out 3 parallel LLM calls at a time.

The queue, the flush, the parallel call

Two functions do the work. queueDiscoverPage gates entry to the discovery list. flushDiscovery decides when to actually burn an LLM call.

assrt-mcp/src/core/agent.ts:547-562

Three checks, in order: dedupe, cap, skip. A URL only lands in pendingDiscoveryUrls if it is new, the cap is not hit, and no skip pattern matches. The normalization is worth reading twice: a trailing slash is stripped so /pricing and /pricing/ are the same page, but query strings are preserved in the normalized form (they are dropped because pathname does not include search, which is intentional for this product: /products?id=42 and /products?id=99 are one discoverable surface).

assrt-mcp/src/core/agent.ts:564-583

flushDiscovery is called after every tool result resolves. The three-level gate (browser idle, nothing pending, cap on concurrency) ensures a discovery call never competes with the main loop for the browser and never stacks more than three deep on the LLM side. The break at the end is intentional: one discovery launched per flush call, so the main loop can interleave its own work.

The point, compressed

Coverage is not a static inventory you maintain. It is a gradient: the distance between pages your plan visited and pages it never wrote a case for. Discovery measures that gradient once per run and proposes the diff as text you can review.

With this framing, most classical 'coverage best practices' turn into something you sign off on in a PR instead of something you audit in a quarterly review. The day the app adds a /billing route, the next test run surfaces it. The day someone deletes /legacy, the discovery for that path simply stops appearing. No coverage database, no spreadsheet.

What a run actually looks like

The trace below is a two-scenario plan executed against a local dev server. Watch the order of events. page_discovered fires on navigate. The next [discovery] line shows the pipeline starting work on a page one or two steps behind the main loop. The cases-proposed line lands while the main agent is clicking something else. At the end, the suggested cases are flushed to disk.

assrt run (trimmed)

Main loop in black, discovery in teal

Below is the communication shape between the two pipelines during a single navigate. The main agent never waits on discovery. Discovery never blocks the main browser. The only shared state is the discoveredUrls Set and the queue itself.

One navigate, two pipelines

Static authoring vs. live-growing plan

Same Playwright underneath either way. The right column is what the discovery pipeline adds on top of a standard run.

Feature	Classical static plan	Assrt (plan grows itself)
Who authors new test cases	A person, up front, before the run	The discovery LLM, during the run, bounded at 3 concurrent
When coverage grows	On the next sprint; test debt reviewed quarterly	On every navigate the agent makes; bounded to 20 unique pages
Drift signal	Flaky test count in CI	page_discovered events for URLs the plan never visited
Output of a single run	Pass / fail report	Pass / fail report plus 1-2 #Case candidates per discovered page
Cost of adding a new page to cover	Engineer writes a new spec file + page object	Merge one proposed #Case block into plan.md
How the plan reacts to a new route	Silent; the new route is uncovered until someone notices	page_discovered fires the moment the agent navigates there; cases appear
What bounds infinite exploration	A human PM with a priority list	MAX_DISCOVERED_PAGES = 20 and 6 skip patterns (logout, /api/, javascript:, about:blank, data:, chrome)

Why the top-ten guides leave this out

Three reasons, in descending order of charity. First, classical frameworks (Playwright, Cypress, Selenium) have no notion of an LLM that can read an accessibility tree, so pipelines like this were not buildable until recently. Fair enough. Second, SaaS test platforms are billed per scenario and per execution minute, so a discovery pipeline that proposes new cases for free is a direct conflict of interest. Third, best-practices content is written to flatter the reader's existing workflow, which is manual authoring. An article that started with "your plan should write itself" is a harder sell than one titled "10 tips to improve your tests".

Assrt is open source, self-hosted, and charges nothing per scenario, so the economics happen to line up with just building the pipeline and shipping it. The differentiator is not cleverness. It is incentives.

Watch the discovery pipeline run against your app

Twenty minutes. Give us your URL and one existing user flow. We run it through npx assrt-mcp, you watch page_discovered events land live, and we hand you the plan.suggested.md at the end.

Questions the top-ten test automation guides leave unanswered

What exactly is happening when Assrt says 'the plan grows itself'?

Every time the main agent loop calls navigate, the handler also calls queueDiscoverPage(url). That function normalizes the URL to origin+pathname (trailing slash stripped), deduplicates against a Set of already-seen pages, enforces MAX_DISCOVERED_PAGES=20, and applies six skip patterns. Survivors are pushed onto pendingDiscoveryUrls. After each tool call, flushDiscovery runs: it checks the browser is idle, checks fewer than MAX_CONCURRENT_DISCOVERIES=3 are in flight, then launches generateDiscoveryCases in parallel. That function runs the discovery LLM against the page's accessibility tree plus screenshot and streams 1-2 proposed #Case blocks per page. All of this lives in assrt-mcp/src/core/agent.ts between lines 547 and 618 and runs alongside, not after, your test.

Why only three concurrent discovery calls?

Two reasons. First, the discovery LLM is small (the file defaults use the same model as the main loop but asks for max_tokens=1024 and a smaller system prompt), so there is no shortage of budget; the bottleneck is the browser state. A 4th concurrent call would have to wait behind the browser anyway because every discovery snapshots before generating. Second, if every navigate kicked off an uncapped discovery call, a single-page app with client-side routing could burn hundreds of LLM calls in a minute. Three is high enough that a typical run (5-15 pages) keeps the discovery pipeline full, and low enough that a runaway router cannot drown the agent.

How is this different from just running the tests and then reviewing coverage?

Classical coverage review happens on a human timescale: days or weeks, with a separate CI run and a separate meeting. The suggestion here runs on a browser timescale: a candidate #Case is written to a buffer before your main test has finished the next scenario. The practical upshot is that when a PR introduces a new route (say, /settings/billing), you do not need a separate pass to notice the route is uncovered. Your existing plan navigates through /settings, the route resolves, page_discovered fires, the discovery LLM reads the billing page and proposes a case, and the merge reviewer sees it the same day.

Which six URL patterns are skipped and why those specifically?

They are exactly these, in regex form: /\/logout/i, /\/api\//i, /^javascript:/i, /^about:blank/i, /^data:/i, /^chrome/i. Logout is skipped because visiting it ends the session and the next discovery would see a cold login screen instead of authenticated app surface. /api/ is skipped because endpoint responses return JSON with no DOM for the agent to click. javascript: and data: are resource URIs, not pages. about:blank is the empty tab state Playwright starts from. chrome:// is internal Chrome UI. Together these six cover every URL that would produce a false-positive 'page' event without any real app surface to test.

What does a discovered case look like, and who turns it into a real test?

The discovery prompt is narrow on purpose. Each proposed case is 1-2 lines, no more than 3-4 actions, references only buttons/links/inputs that were actually visible in the accessibility tree, and skips login, signup, CSS, responsive layout, and performance topics. A real output on a pricing page looks like '#Case: Pricing tier selection. Click Pro plan button. Verify Pro plan is highlighted.' No person is required to 'turn it into' anything. It is already a valid #Case that npx assrt run can execute. The operator decision is whether to merge it into plan.md, and that is a one-line diff review.

Does this make test authoring obsolete?

No. It inverts the default. Up-front authoring is still the right approach for revenue-critical flows where you want a deterministic, reviewed case (checkout, signup, payment, upgrade). Those are legal, financial, or brand risks and you want a human to have looked at the plan. What grows itself is the long tail: side pages, admin screens, account settings, edge routes that nobody thinks to write cases for because they do not hit the 'top 10 flows' dashboard. Discovery fills the tail. Authoring owns the head.

How do you stop the discovery pipeline from drifting toward nonsense?

Four guardrails, all in code. (1) DISCOVERY_SYSTEM_PROMPT forbids login/signup, CSS, responsive, and performance cases, which are the classical 'junk case' categories. (2) Each case must be completable in 3-4 actions, which prevents the LLM from proposing open-ended exploration. (3) Cases must reference actual visible elements, which the accessibility tree enforces: if a label is not in the snapshot, referencing it will fail the first time. (4) The 20-page cap keeps runaway exploration bounded. Put together, a run cannot produce more than 40 candidate cases and every one of them is grounded in a real DOM element the agent saw.