AI Testing Guide

Auto-Discovers Test Scenarios by Crawling: The Two-Tier Prompt Architecture That Makes It Automatable

Q: Why does Assrt use two different AI prompts for test generation?

The initial plan prompt (PLAN_SYSTEM_PROMPT) generates 5-8 thorough test cases because you only run it once per URL. The discovery prompt (DISCOVERY_SYSTEM_PROMPT) generates 1-2 micro-cases because it runs on every page the agent visits during execution. If both used the thorough prompt, discovery would add 10-20 minutes per page, making automation impractical in CI/CD pipelines.

Q: Can I use Assrt's crawl discovery in a CI/CD pipeline?

Yes. Assrt runs as an MCP server via npx assrt-mcp and can be invoked from any CI environment that supports Node.js. The hard caps on discovery (20 pages, 3 concurrent jobs, 1-2 cases per discovered page) ensure predictable execution time regardless of application size.

Q: What events does the discovery engine emit?

Three events: page_discovered (when a new URL is found), discovered_cases_chunk (streaming partial test case text), and discovered_cases_complete (final generated cases for a page). These events can be consumed by automation scripts to build coverage dashboards or accumulate test cases across runs.

Q: Does discovery work on single-page applications with client-side routing?

Yes. Because discovery intercepts the test agent's navigate() calls rather than parsing static HTML links, it captures client-side route changes, dynamically loaded pages, and content behind authentication. The agent resolves relative URLs against the base URL, so SPA routes like /dashboard/settings are discovered correctly.

Crawl-based test discovery is not new. What is new is making it fast enough to run on every commit. Most tools treat crawling as a heavyweight pre-step that takes minutes or hours before a single test executes. Assrt takes a different approach: it uses two completely different AI prompts for initial plan generation versus discovered pages, keeping discovery lightweight enough to run as a side effect of testing. This guide explains the engineering behind that architecture and why it matters for automation.

By Assrt TeamPublished Apr 11, 20268 min read

$0/mo

“Generates real Playwright code, not proprietary YAML. Open-source and free vs $7.5K/mo competitors.”

Assrt vs competitors

1. Why Crawl-Based Discovery Fails in CI/CD

The idea behind auto-discovering test scenarios by crawling is straightforward: visit pages, analyze what is on each one, and generate test cases automatically. The problem is not the idea. The problem is execution speed.

Traditional tools like Testaify and Applitools Autonomous run crawling as a separate phase. They visit your root URL, follow every link, build a complete application map, and then generate test scenarios from that map. For a large application, this crawl phase can take 30 minutes to several hours. That works for a weekly regression suite. It does not work for a CI pipeline that needs feedback on every pull request.

The core tension is between coverage and speed. A thorough crawl finds more pages and generates more scenarios, but takes longer. A shallow crawl finishes fast but misses pages behind authentication, multi-step flows, or dynamic routing. Most tools resolve this tension by offering configuration knobs (crawl depth, session timeout, page limit) and leaving the tradeoff to the user.

Assrt resolves it architecturally, by using a different AI prompt for discovered pages than for the initial URL. The initial analysis is thorough. Discovery is deliberately minimal. That asymmetry is what makes the whole system automatable.

2. The Two-Tier Prompt System

When you point Assrt at a URL, two different AI system prompts govern test generation. Understanding the difference between them is key to understanding why crawl discovery can run fast enough for automation.

Tier 1: PLAN_SYSTEM_PROMPT (initial URL)

The initial plan generation uses a thorough prompt. It instructs the AI model to generate 5 to 8 test cases, each with 3 to 5 actions. The prompt enforces six rules: cases must be self-contained, use specific selectors, verify observable things only, avoid CSS or layout testing, avoid network simulation, and stay short. The agent takes three screenshots at different scroll positions (0px, 800px, and 1600px), captures the accessibility tree at each position, and feeds up to 8,000 characters of combined content to the model.

This is the expensive step. It sends multiple screenshots and a large text payload to the AI model, and it asks for a substantial response. But it only runs once per URL you explicitly target.

Tier 2: DISCOVERY_SYSTEM_PROMPT (crawled pages)

When the agent navigates to a new page during test execution, that page gets analyzed by a completely different prompt. The DISCOVERY_SYSTEM_PROMPT (defined in agent.ts, lines 256 to 267) is deliberately constrained:

Generate only 1 to 2 cases (not 5 to 8)
Each case limited to 3 to 4 actions (not 3 to 5)
Must reference actual visible buttons, links, and inputs
Explicitly banned: login/signup cases, CSS or responsive layout tests, performance tests

The discovery prompt also receives less context: a single snapshot (truncated to 4,000 characters instead of 8,000) and one screenshot instead of three. This means each discovered page costs roughly a quarter of the AI tokens that the initial plan costs.

Prompt Budget Comparison

	Initial Plan	Discovery
Cases generated	5-8	1-2
Actions per case	3-5	3-4
Screenshots sent	3	1
Text context	8,000 chars	4,000 chars
Login/signup tests	Allowed	Banned

This is the engineering that makes crawl-based discovery automatable. If every discovered page used the full plan prompt, discovering 15 pages would add 15 thorough AI analyses to the run. With the discovery prompt, those 15 pages add 15 lightweight analyses that complete in seconds each.

See two-tier discovery in action

Point Assrt at your staging URL. Watch the thorough plan generate first, then micro-cases appear as the agent discovers pages. No setup required.

3. How Discovery Is Scheduled Between Scenarios

The second piece of the automation puzzle is scheduling. Discovery must not interfere with active test execution, or it would introduce flakiness and unpredictable timing.

Assrt solves this with a browserBusy flag in the test agent. When a scenario is actively running, the flag is set to true, and the discovery queue will not flush. Discovery processing only happens in the gap between scenarios. The function flushDiscovery checks three conditions before processing any URL:

browserBusy must be false (no active scenario)
pendingDiscoveryUrls must have at least one entry
activeDiscoveries must be below the concurrency cap of 3

This means discovery is cooperative, not preemptive. The agent finishes executing a test scenario, checks if any URLs are pending, processes what it can within the concurrency limit, and then moves to the next scenario. If a scenario navigates to five new pages, all five URLs accumulate in the queue and get processed in the gap before the next scenario starts.

The concurrency cap of 3 is worth explaining. Discovery involves calling an AI model, which means network latency and token processing time. Running three analyses in parallel keeps throughput high without overwhelming the model provider. Running more than three would not meaningfully speed things up because the bottleneck is model inference, not I/O.

4. The Event-Driven Discovery Pipeline

The discovery engine emits three structured events that make it possible to build automation on top of crawl discovery:

page_discovered
Fired when a new URL passes all filters (deduplication, skip patterns, page cap). Payload includes the URL and a screenshot of the page. This is the earliest signal that the agent found something new.
discovered_cases_chunk
Fired as the AI model streams partial test case text for a discovered page. Useful for real-time UIs that want to show test generation progress.
discovered_cases_complete
Fired when all test cases for a discovered page are fully generated. Payload includes the URL and the complete case text in #Case N: name format.

These events are the interface between the discovery engine and whatever automation you build around it. In the web app, they drive the live UI that shows discovered pages appearing in real time. In the MCP server, they get collected into the final test results JSON. In a CI pipeline, you could consume them to build a coverage map that grows across runs.

After every test run, the results (including all discovered pages and their generated cases) are saved to /tmp/assrt/results/latest.json and the full test plan to /tmp/assrt/scenario.md. These files persist between runs, so you can diff them to see what the crawl discovered over time.

Test cases you own, from pages you never listed

Every case Assrt discovers is real Playwright code. Export it, version it, run it outside Assrt. Zero vendor lock-in.

5. Wiring Crawl Discovery into CI/CD

Because the discovery engine is bounded (20 pages max, 3 concurrent analyses, 1-2 cases per page), test runs complete in a predictable window regardless of application size. This makes it viable to run on every pull request, not just nightly.

Assrt runs as an MCP server via npx assrt-mcp. In a CI environment, you invoke it with a target URL (typically your staging or preview deployment) and either a stored scenario ID (to re-run a known plan) or no plan at all (to let it generate a fresh plan and discover from scratch).

Two workflows emerge from this:

Pinned scenario with discovery expansion

You create a baseline scenario once using assrt_plan, save the scenario ID, and re-run it on every PR. The pinned scenarios provide stable regression coverage. As the agent executes them, it discovers new pages and generates micro-cases for each one. You get stable tests plus opportunistic expansion on every run.

Fresh crawl on every deploy

You skip the pinned scenario and let Assrt generate a fresh plan for each deployment. This catches structural changes (new pages, removed routes, changed navigation) that a pinned scenario would miss. The tradeoff is less stability: because the plan is regenerated each time, test names and steps may vary between runs.

In either workflow, the hard caps guarantee that the discovery phase finishes. There is no risk of the crawler running indefinitely on a large application. The worst case is 20 discovered pages with 2 cases each, which adds 40 micro-tests to the run.

Frequently Asked Questions

Why does Assrt use two different AI prompts for test generation?

The initial plan prompt (PLAN_SYSTEM_PROMPT) generates 5 to 8 thorough test cases because you only run it once per URL. The discovery prompt (DISCOVERY_SYSTEM_PROMPT) generates 1 to 2 micro-cases because it runs on every page the agent visits during execution. If both used the thorough prompt, discovering 15 pages would add 15 heavyweight AI analyses to the run, making automation impractical in CI/CD pipelines.

How does crawl-based discovery avoid slowing down test execution?

Discovery only runs between scenario executions, not during them. A browserBusy flag prevents the discovery queue from being flushed while a test is actively running. Discovery also caps at 3 concurrent background analysis jobs and 20 total discovered pages per run.

Can I use crawl discovery in a CI/CD pipeline?

Yes. Assrt runs as an MCP server via npx assrt-mcp and can be invoked from any CI environment that supports Node.js. The hard caps on discovery (20 pages, 3 concurrent jobs, 1 to 2 cases per discovered page) ensure predictable execution time regardless of application size.

What events does the discovery engine emit?

Three events: page_discovered (when a new URL is found), discovered_cases_chunk (streaming partial test case text), and discovered_cases_complete (final generated cases for a page). These events can be consumed by automation scripts to build coverage dashboards or accumulate test cases across runs.

Does discovery work on single-page applications with client-side routing?

Yes. Because discovery intercepts the test agent's navigate() calls rather than parsing static HTML links, it captures client-side route changes, dynamically loaded pages, and content behind authentication. The agent resolves relative URLs against the base URL, so SPA routes like /dashboard/settings are discovered correctly.

How is this different from Testaify or Applitools Autonomous?

Those tools crawl your entire application as a separate phase before generating tests, which can take 30 minutes to several hours on large apps. Assrt discovers pages during test execution, caps at 20 pages, and generates real Playwright code (not proprietary formats). It is also open-source, free, and self-hosted with zero vendor lock-in.

Crawl-based discovery that fits in your CI pipeline.

Point Assrt at any URL. It generates test scenarios, discovers pages as it navigates, and produces real Playwright code you own. Free and open-source.

View on GitHub