Tool evaluationDisposable inboxwait_for_stableScenario continuity

The natural language test case descriptions tool is the three capabilities it ships, not the marketing copy around the English input.

Most of the SERP treats "natural language test case descriptions tool" as a compiler question: can we turn an English sentence into a Playwright or Selenium script? Assrt answers a different question: once the English sentence reaches a live browser, what capabilities does the agent actually hold? Three of them decide whether the tool survives a real app. I'll show the source for each.

Matthew Diakonov, Written with AI

Published April 20, 202611 min read

Three tools in the toolbelt

The capabilities that don't show up in any competitor's feature list

1. Disposable inbox wired to temp-mail.io

2. wait_for_stable injects a MutationObserver

3. #Case 2 inherits cookies from #Case 1

Open source, self-hosted, bring your own LLM

0:00 / 0:05

4.9from Assrt engineering

Disposable inbox via temp-mail.io (email.ts:43-77)

wait_for_stable injects a MutationObserver (agent.ts:956-1009)

One browser per plan, scenarios inherit state (agent.ts:462-493)

Open source, MIT, bring your own Anthropic or Gemini key

Why the "tool" question is different from the "language" question

The sibling guides on this site cover the language side: the grammar is one regex, the runtime binds English to an ARIA snapshot. That's the how-to view and the automation view. This page is about the tool the English reaches. What does it actually do once your sentence hits the runtime? If you demo three of these tools in any real app with auth, streaming UI, and more than one user flow, the field of competitors shrinks sharply.

The shortlist below is not about feature count. It's about capabilities that take real engineering to ship, that almost no vendor page in the SERP surfaces, and that you can verify by reading a single file in the open-source repo.

500ms

“wait_for_stable is not sugar on setTimeout. The agent injects `window.__assrt_observer = new MutationObserver(...)` into the live page, polls the mutation counter every 500ms, and only unblocks once the DOM has been quiet for stable_seconds. Then it calls observer.disconnect().”

agent.ts:956-1009

The three capabilities

Everything else in this page drills into one of these. Each card names the file, the line range, and the one English sentence that activates it.

Capability 1

Disposable inbox

create_temp_email

, wait_for_verification_code, and check_email_inbox hit temp-mail.io directly. Your signup case works without a hard-coded test email.

email.ts:9 · BASE = temp-mail.io/api/v3
email.ts:43 · DisposableEmail.create()
email.ts:82 · waitForVerificationCode()

Capability 2

wait_for_stable

The agent injects a

MutationObserver

into document.body and returns only after the DOM has been quiet for stable_seconds. Real fix for chat, streaming AI, lazy lists.

agent.ts:186-196 · tool schema
agent.ts:956-1009 · evaluate() injection
window.__assrt_mutations counter

Capability 3

Scenario continuity

One browser process per plan.

#Case 2

inherits cookies and localStorage from #Case 1. Previous summaries flow into the next case's prompt.

agent.ts:462-493 · plan loop
agent.ts:661-665 · previousSummaries
isFirstScenario routing

Capability 1. A disposable inbox, not a configured test address

Every real onboarding flow eventually needs a verification email. The tools that market themselves on "English test cases" almost always punt: either you set a fixed test email in their dashboard, or you stub the mail service, or you skip the verification step entirely and hope nothing else depends on it. Assrt ships a disposable inbox as a first-class tool on the agent. The sentence "Get a disposable email" activates real HTTP calls to temp-mail.io.

assrt-mcp/src/core/email.ts:1-77

Inside a #Case block, you never think about the HTTP calls. You write English. The agent translates "Get a disposable email" into a create_temp_email tool call, "Wait for the verification code" into wait_for_verification_code, and "Paste the code" into an evaluate call that uses a ClipboardEvent to fill any 6-digit OTP layout (including the common maxlength=1 one-box-per-digit pattern).

npx assrt run — signup flow, real inbox, real OTP

Capability 2. wait_for_stable injects a real MutationObserver

This is the anchor fact for the page. When your app does anything async that you can't wait on with a specific piece of text, the usual workaround is a fixed sleep(ms). That workaround fails in both directions: it either truncates a slow streaming response or it wastes seconds on a fast one. Assrt ships wait_for_stable as an agent tool that binds to DOM reality.

assrt-mcp/src/core/agent.ts:956-1009

There are two details worth pausing on. First, characterData: true in the observer config: this counts text token streams, not just DOM node additions. For a chat UI that streams a reply one token at a time inside one node, pure childList wouldn't fire. Second, the cleanup: the observer is disconnected and both window globals are deleted before the tool returns. No background counter leaks into the next scenario.

How wait_for_stable round-trips with the live page

A #Case that waits on a streaming AI reply

Capability 3. A plan is a session, not a script

Most real user flows have a prerequisite. To test posting a comment, the user needs to already be signed in. Tools that run each English test case in a fresh browser either force you to duplicate the login sentences at the top of every case, or they extract a shared setup hook into a config file — at which point the "natural language" promise has leaked back into scaffolding. Assrt keeps one browser process open for the whole plan and lets each #Case inherit whatever the previous one did.

assrt-mcp/src/core/agent.ts:462-493 + 660-665

One plan, N scenarios, one browser

Parse

parseScenarios on agent.ts:621 splits the plan on the #Case / Scenario / Test header into N scenarios. The browser is launched once for the whole plan, before any scenario runs.

Run scenario 1 (isFirstScenario=true)

The agent navigates to the base URL, snapshots, and executes the English steps against the 18-tool surface. When complete_scenario fires, the browser stays open; the summary is pushed into `results[]`.

Feed summaries into the next prompt

On scenario 2, contextInfo switches from 'Navigated to: base URL' to 'Previous Scenarios (browser state carries over): 1. <name>: PASSED — <summary>'. The model sees that the cookies and localStorage from scenario 1 are live.

Reuse the session

Scenario 2 does not navigate to the base URL. It starts from wherever scenario 1 left off. A test like '#Case 2: Create a post' can assume the user from '#Case 1: Sign up' is already authenticated, because the agent never closed the tab.

Terminate at end of plan

The browser is closed only when the whole plan finishes — or left open if keepBrowserOpen is set. Either way, every scenario in a single plan shares one browser process.

Practical consequence

A three-case plan like #Case 1: Sign up, #Case 2: Create a post, #Case 3: Log out and sign back in runs as one coherent user journey. Case 2 never has to re-enter credentials because the auth cookie from Case 1 is still live. Case 3's sign-back-in assertion is meaningful because it actually exercises the log-out codepath from Case 2.

How the three capabilities route through the agent

The English sentence is the input. The agent is the hub. The capabilities are the tools it can pick. None of them are pre-compiled; the model resolves the sentence against the live page snapshot on every turn and chooses the right tool.

#Case prose → TestAgent → toolbelt

0Runtime capabilities built in: disposable inbox, wait_for_stable, scenario continuity

0Agent tools an English sentence can compile to (agent.ts:16-196)

0msPoll interval wait_for_stable uses on window.__assrt_mutations

0Browser process per plan, shared across every #Case (agent.ts:462-493)

create_temp_emailwait_for_verification_codecheck_email_inboxwait_for_stableMutationObserver injectionScenario continuityisFirstScenario routingpreviousSummaries → promptPlaywright MCP refsARIA snapshot selectorshttp_request for webhooksevaluate() escape hatchAnthropic + Gemininpx assrt runassrt_test / assrt_plan / assrt_diagnoseOpen-source (MIT)

Every pill above is a real tool name, file, or concept you can grep for in assrt-mcp/src. No mystery surface.

Assrt vs the English-to-script compiler category

Every row below is comparable. "Signup flows" is a real capability that real apps need. "Streaming UI" either works or doesn't. "Multi-case plans" is either a session or a bunch of fresh tabs. The on-disk artifact is something you either commit or can't.

Feature	English-to-script NLP compilers	Assrt
What the tool actually ships	A hosted English-to-script compiler. You send prose, they run it in their cloud, you pay per seat.	A Node agent + Playwright MCP + an 18-tool surface. You import it or run npx assrt-mcp and it runs on your box.
Signup / login flows	Usually requires a fixed @example.com address in a settings page, or mocking the mail service. Verification codes fall out of scope.	Built-in disposable inbox. `create_temp_email` hits temp-mail.io/api/v3 and the agent polls waitForVerificationCode until the OTP arrives (email.ts:43-77).
Streaming / async UI (chat, AI responses, lazy lists)	Fixed `wait(ms)` or an AI-flavored retry that re-runs the assertion. Neither actually observes DOM stability.	wait_for_stable injects a MutationObserver on document.body, polls window.__assrt_mutations every 500ms, returns when the DOM is quiet for `stable_seconds`.
Multi-case plans	Each case runs in a fresh session. Login has to be inlined into every case or handled by a preamble hook.	One browser process per plan. #Case 2 inherits cookies and localStorage from #Case 1. Previous summaries fed to the next model turn as context (agent.ts:661-665).
On-disk artifact	Vendor database row or a proprietary JSON/YAML bundle. Grammar is a keyword list enforced by their parser.	One markdown file at /tmp/assrt/scenario.md. Commit it, diff it, grep it. The grammar is a regex on agent.ts:621.
Where the LLM lives	Their model, their account, their rate limits. No choice of provider.	Your Anthropic or Gemini key, your runtime. Two model backends share one function-declaration adapter (agent.ts:275-301).
Scenario recording	Cloud-hosted video replay, usually gated behind a higher-tier plan.	Optional Playwright video per plan + an auto-served HTML player with per-frame scrubbing (assrt-mcp/src/mcp/server.ts:35-110).
License cost	Up to $7,500 / month with seat limits and watermark caveats.	$0. Open source, MIT license, you own the code.

No hidden cloud. Every moving part is an npm dependency or an open protocol.

What's actually under the hood

Playwright MCP

Every English sentence ultimately calls the Playwright MCP server over stdio. The 18 tools (navigate, snapshot, click, type, scroll, press_key, wait, evaluate...) are wrappers around that surface.

temp-mail.io

Disposable inboxes powering `create_temp_email`, `wait_for_verification_code`, and `check_email_inbox`. Token-authenticated; 10-character mailboxes by default.

Anthropic (claude-haiku-4-5-20251001)

Default model for the agent loop. Low-latency tool calling with the exact 18-tool schema defined in agent.ts:16-196.

Google Gemini (gemini-3.1-pro-preview)

Alternate backend. A matching function-declaration adapter at agent.ts:275-301 normalizes types so the same #Case works across providers.

Node + TypeScript

Everything is a TS package. `npx assrt run ...` from any repo. No binary to install, no container to pull.

MCP (Model Context Protocol)

The whole agent is exposed as an MCP server with three tools: assrt_test, assrt_plan, assrt_diagnose. Wire it to Claude Code, Cursor, or any MCP client.

Runtime capabilities competitors usually skip

Agent tools the English sentence can bind to

ms between mutation counter polls in wait_for_stable

Vendor cloud dependencies (MIT, self-host, bring your own LLM)

Hand us an onboarding flow that needs a real inbox and a streaming UI

Thirty minutes. Send the URL of your dev or staging app and one user flow. We'll write a plan that uses all three capabilities (disposable inbox, wait_for_stable, scenario continuity), run it live, and hand you the scenario.md plus the log so you can see every English sentence map onto a real tool call.

FAQ on the natural language test case descriptions tool

What exactly is a "natural language test case descriptions tool"?

A tool that accepts English descriptions of test cases and turns them into something runnable. In the top SERP results this almost always means an English-to-test-script compiler: you write a sentence, the tool emits Selenium/Playwright code, and that code runs later. Assrt is in a different subcategory: the English is the program. The agent reads your #Case block and the live ARIA tree on every turn and picks one of 18 tool calls. Nothing is compiled in advance.

What makes Assrt's toolbelt different from testRigor, ACCELQ, TestSigma, Functionize, TestCraft, and Qase?

Three runtime capabilities that those tools either skip or paywall. First, a built-in disposable inbox: `create_temp_email` + `wait_for_verification_code` + `check_email_inbox` call temp-mail.io directly, so your signup #Case works end-to-end without a hard-coded test address. Second, `wait_for_stable`: the agent injects a MutationObserver into the live page and only returns once the DOM has been quiet for N seconds. This is critical for chat, streaming AI responses, and lazy lists. Third, scenario continuity: one browser process per plan. `#Case 2` inherits cookies and localStorage from `#Case 1`, and the model sees the previous case's summary in its context.

What is the anchor fact behind wait_for_stable?

Look at /Users/matthewdi/assrt-mcp/src/core/agent.ts, lines 956-1009. When the agent executes `wait_for_stable`, it calls `browser.evaluate(...)` to install `window.__assrt_observer = new MutationObserver(...)` and `observer.observe(document.body, {childList, subtree, characterData})`. It then polls `window.__assrt_mutations` every 500ms. When the counter stops incrementing for `stable_seconds`, it returns. Then it calls `observer.disconnect()` and deletes both globals. That is the real primitive: an observer injected into the target page, not a fixed sleep.

How does the disposable inbox work under the hood?

`DisposableEmail.create()` in email.ts:43-55 POSTs to `https://api.internal.temp-mail.io/api/v3/email/new` with `{min_name_length: 10, max_name_length: 10}` and stores the returned email + token. `waitForVerificationCode()` polls `/email/{address}/messages` every 3 seconds (default) and extracts the numeric code using a cascading regex. In your #Case you just write 'Get a disposable email' as an English sentence; the agent turns it into a `create_temp_email` tool call and the address is injected into the agent's prompt so the next `type_text` call uses it.

Why does scenario continuity matter for a natural language testing tool?

Because most real flows have a prerequisite. If you want to test 'post a comment', you need to be signed in first. In tools that run each test case in a fresh browser, you end up either (a) duplicating the login sentences at the top of every case or (b) extracting a shared setup hook in a config file — at which point the 'natural language' promise has leaked back into scaffolding. Assrt keeps the whole plan in one browser: `#Case 1: Sign up` leaves `#Case 2: Post a comment` already authenticated. The outer loop in agent.ts:462-493 never closes the browser between cases, and `previousSummaries` in runScenario feeds every prior result back into the next model turn.

What if my #Case needs something the 18-tool surface cannot do?

There's an escape hatch: the `evaluate` tool runs arbitrary JavaScript in the page. That covers anything you would otherwise reach for with devtools — reading localStorage, triggering a custom event, posting to a postMessage channel. But the happier pattern is to write the sentence so it binds to one of the observation-grade tools (assert against visible text, use http_request to hit a webhook endpoint). If a sentence can't bind to any tool, the agent will usually call `suggest_improvement` with a rewrite hint instead of confabulating a pass.

Where do I put my OpenAPI / auth tokens so the English case can use them?

Two options. Per run, pass a `variables` map with your plan: any `{{VAR_NAME}}` in the English text gets substituted before parsing (agent.ts:377-381). Long term, keep secrets in macOS keychain and let the MCP server read them via the keychain helpers in assrt-mcp/src/core/keychain.ts. Either way the sentence reads naturally: 'Type {{TEST_USER}} into the Email field.' The agent sees the substituted value; your markdown stays safe to commit.

Is the tool open source and self-hosted?

Yes. Both assrt (the web app) and assrt-mcp (the MCP server / CLI) are Node packages with MIT licenses. `npx assrt-mcp` runs the MCP server; `npx assrt run --url <u> --plan "$(cat scenario.md)"` runs a plan from the CLI. Bring your own Anthropic or Gemini key. Artifacts land in /tmp/assrt/ by default: scenario.md, results/latest.json, and per-plan video recordings if you pass --video.

Adjacent guides on the no-DSL testing model

Keep reading

Grammar

How to write natural-language test case descriptions

The parser regex, the #Case grammar, and the 18 tool calls your prose compiles to.

Read

Runtime

Natural language test case descriptions automation

The runtime loop that keeps English as English instead of pre-compiling it to a script.