AI tools for testing, opened up

AI tools for testing is a list of 18 function calls in one TypeScript file.

Every top-ranked roundup for this keyword walks you through a dozen vendors with cloud DSLs, per-seat pricing, and polite descriptions of “self-healing AI.” None of them open the agent. This page does. The Assrt MCP server exposes exactly 18 tools to a Claude Haiku 4.5 instance, declared as a TOOLS const at assrt-mcp/src/core/agent.ts:16. Two of those tools plus a cascade of 7 regexes at email.ts:101 handle the verification-code wall that every other “AI testing tool” quietly asks you to work around.

Matthew Diakonov, Written with AI

Published April 19, 202611 min read

4.9from Assrt MCP users

18-tool agent palette you can grep in one file

7-regex OTP cascade that passes real verification gates

MIT-licensed, self-hostable, no cloud runtime required

The anchor fact

Clone @assrt-ai/assrt, run grep -c 'name: "' node_modules/assrt-mcp/dist/core/agent.js, and the terminal prints 18.

That is the complete surface the AI is allowed to touch. Every “AI tools for testing” comparison page you will find on Google treats the AI as a black box. This one treats it as a file you can read, licensed MIT, with line numbers you can verify.

Install npx @assrt-ai/assrt setup

AI tools for testing

What the agent actually calls, live on screen.

18 named tools in one TypeScript file

Disposable inbox plus a 7-regex OTP cascade

MutationObserver settles streaming UIs in 2s of quiet

Real Playwright against real Chromium, MIT-licensed

0:00 / 0:05

The whole palette, scrollable

This is the full TOOLS array, in declaration order. Anything your Markdown plan asks for has to bottom out in one of these names, or the agent has to refuse.

navigatesnapshotclicktype_textselect_optionscrollpress_keywaitscreenshotevaluatecreate_temp_emailwait_for_verification_codecheck_email_inboxassertcomplete_scenariosuggest_improvementhttp_requestwait_for_stable

Count the pills: 18. That count is the ceiling on the agent's power and also its floor. Narrow palettes mean the Haiku call picks the right tool on almost every step and almost never hallucinates a non-existent one.

How the 18 tools bundle into six jobs

Perception

Reads what's on the screen before acting. snapshot returns the page's accessibility tree with stable ref IDs; screenshot captures a frame that gets attached to the next LLM turn.

snapshotscreenshotevaluate

Interaction

The verbs every browser test needs. Each call is passed through to the official @playwright/mcp server, so the wire protocol is identical to what any Playwright user's code would produce.

navigateclicktype_textselect_optionscrollpress_key

Timing

wait accepts a text pattern or a fixed ms (capped at 10s). wait_for_stable watches a MutationObserver and resolves as soon as the DOM is quiet; this is the one you reach for on streaming AI apps.

waitwait_for_stable

Verification gates

Disposable inbox from temp-mail.io, plus a 7-regex cascade that extracts the code out of whatever marketing-heavy HTML the sender wrapped it in. Makes OTP-gated signups testable without a human mailbox.

create_temp_emailwait_for_verification_codecheck_email_inbox

Reporting

assert records a boolean with evidence; complete_scenario closes out the current #Case; suggest_improvement is the escape hatch the agent uses when it spots an obvious bug unrelated to the plan.

assertcomplete_scenariosuggest_improvement

Out-of-band

http_request lets the agent poll an external API (Telegram getUpdates, Slack conversations.history, GitHub issues) to verify that a web action produced the side effect you expected.

http_request

Your English goes in. 18 tools come out. Chromium drives.

The MCP server sits between the LLM and the browser. The LLM never speaks Playwright; it emits tool calls named in the TOOLS const. The server translates each call into the equivalent @playwright/mcp request, and the browser acts.

AI tools for testing, wired up

The hardest part of AI testing is the verification code. Here is how we pass it.

Ranked-list guides for AI testing tools rarely mention OTP. The honest reason is that most products either stub the email API in test, or ask a human to paste the code. Neither is acceptable once the AI is supposed to replace the human.

The agent creates a disposable address through temp-mail.io, types it into the signup form, clicks through, and waits up to 60 seconds polling the inbox every 3 seconds. When mail arrives, seven regexes take turns, most specific first:

assrt-mcp/src/core/email.ts

Split-input OTP widgets would normally break a tool-by-tool approach: typing into six separate one-character inputs confuses most React OTP components. The SYSTEM_PROMPT tells the model to bypass this with one evaluate call that dispatches a synthetic ClipboardEvent carrying the full code. One tool. One paste. Six boxes populate together.

An end-to-end signup, live in the terminal

Twelve tool calls from “new tab” to “account exists.” No human ever sees the mailbox, and the OTP arrives through the 6-digit regex branch of the cascade above.

assrt run (signup + OTP)

Sequence view: one scenario, exactly the calls the agent makes

Scenario: sign up with OTP

Why this matters for streaming UIs

wait_for_stable is the tool every “AI tools for testing” list forgot

Modern AI apps stream tokens for anywhere from 500ms to 30 seconds per turn. A fixed sleep(10000) is either flaky or wasteful. The wait_for_stable tool installs a MutationObserver, counts changes into window.__assrt_mutations, and returns as soon as the counter has been unchanged for a configurable quiet period (default 2s).

assrt-mcp/src/core/agent.ts

The numbers that actually matter when you grep the repo

Numbers from listicles are usually made up. These are line counts and constants you can verify in one shell command.

0Tools in the agent palette

0OTP regexes in the cascade

0MutationObserver poll (ms)

0Concurrent discovery agents

18 tools in TOOLS at agent.ts:16. 7 regex patterns in patterns at email.ts:101. 500ms poll interval and 3 concurrent discovery agents from MAX_CONCURRENT_DISCOVERIES at agent.ts:269.

“The tool palette the agent actually calls, not a marketing diagram.”

agent.ts:16 (TOOLS const)

Versus the typical 'AI tools for testing' listicle entry

Generalised from the top Google results for this keyword.

Feature	Typical AI testing tool	Assrt
What drives the browser	Proprietary runtime, cloud-hosted, inaccessible	Official @playwright/mcp dispatched by Claude Haiku
Tool palette exposed to the AI	Opaque; vendor-controlled, not documented	18 named tools, declared as a const at agent.ts line 16
OTP verification flows	Manual code grab or API stub needed	Disposable inbox plus 7-regex cascade, fully hands-off
Waiting for streaming UI to settle	Fixed sleep or vendor 'smart wait'	MutationObserver polled every 500ms, settles in 2s of quiet
License & hosting	Commercial SaaS, $150 to $7,500 / month	MIT on npm as @assrt-ai/assrt; 100% self-hostable
Where your tests live	Vendor DB, inaccessible without export credits	Markdown on your disk, synced to /tmp/assrt/ locally

What the palette unlocks on day one

Flows most AI testing tools treat as edge cases

Sign up a fresh account end-to-end on a site you do not own
Pass any email OTP gate without touching an inbox yourself
Test a streaming AI chat app without guessing the right sleep()
Verify that a form submission actually reached a Telegram bot or webhook
Hand a non-engineer a Markdown file they can edit and re-run
Run the entire loop on a laptop with no vendor account

Read it yourself, or keep reading vendor brochures

The entire agent fits in one file. TOOLS at line 16. SYSTEM_PROMPT at line 198. The OTP paste rule at line 235. wait_for_stable at line 956. email.ts adds 131 lines for the disposable inbox and the regex cascade. Clone the repo, open the file, verify the count.

Talk to the maintainer about wiring Assrt into your CI

30 minutes on a call to map the 18-tool palette onto your real signup, OTP, and streaming-UI flows. Bring a URL.

Questions the listicle posts never answer

What does 'AI tools for testing' actually mean once you open the agent?

At the level of the Assrt MCP server it means a fixed list of 18 function-call schemas that a Claude Haiku 4.5 instance is allowed to invoke. They are declared as a const named TOOLS at assrt-mcp/src/core/agent.ts line 16, and they are the only surface the agent is exposed to. navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable. Clone the repo, grep for 'name: "' in that file, and you will count exactly 18. Nothing else happens server-side. No hidden cloud DSL, no proprietary selector engine.

How does the AI deal with signup flows that require a verification code?

Two tools do the work together. create_temp_email creates a real disposable address through temp-mail.io (the agent calls POST https://api.internal.temp-mail.io/api/v3/email/new at email.ts line 44), then wait_for_verification_code polls the inbox every 3 seconds for up to 60 by default, 120 seconds max. When mail arrives, a cascade of 7 regexes tries to extract the code, most specific first: code/Code/CODE, verification/Verification, OTP/otp, pin/PIN/Pin, then plain 6-digit, 4-digit, 8-digit runs. That cascade is at email.ts lines 101 to 109. You never touch the email yourself.

Most OTP login forms split the code across six one-character inputs. Does that break it?

No, and the workaround is hard-coded into the agent system prompt at agent.ts line 235 so the LLM does not have to reinvent it per site. Instead of typing into each input, the agent calls the evaluate tool with a specific JavaScript expression that dispatches a synthetic ClipboardEvent carrying the full code to the parent of the first input. React and most OTP widgets treat a paste as one atomic fill, so all six boxes populate at once and the Verify button becomes clickable. The expression literally starts with `() => { const inp = document.querySelector('input[maxlength="1"]');`. You can read it verbatim in the repo.

What is wait_for_stable and why does it matter more than a regular wait?

Most AI test tools paper over streaming UIs with a fixed sleep. Assrt installs a MutationObserver on document.body via evaluate, stores the running mutation count at window.__assrt_mutations, then polls every 500ms and resolves as soon as the count stops increasing for a configurable quiet period (2 seconds by default). Implementation lives at agent.ts lines 956 to 1009. In practice this means an AI chat app that streams tokens for 14 seconds becomes testable: the agent waits exactly as long as the stream takes and no longer.

Can I really read all of this in one file, or are you rounding off?

One TypeScript file. /Users/matthewdi/assrt-mcp/src/core/agent.ts is where the TOOLS array, the SYSTEM_PROMPT, the agent loop, wait_for_stable, and the OTP instructions live. email.ts is 131 lines and holds the disposable inbox plus the 7-regex cascade. browser.ts wraps @playwright/mcp. The published npm package @assrt-ai/assrt ships the same files under dist/, so `grep -n scenarioRegex node_modules/assrt-mcp/dist/core/agent.js` confirms the code you ran is the code you read. MIT license. Everything self-hostable.

Why 18 tools? Competitors list hundreds of actions in their documentation.

Competitor action lists tend to be feature checkboxes across a vendor's entire product: reporting, dashboards, parallelisation, analytics. The 18 here are only the function-call schemas the LLM sees at inference time. Fewer, sharper tools mean the Haiku agent picks the right one on almost every step and rarely fabricates a non-existent one. Adding more would bloat the prompt, lower the pick accuracy, and push us toward tool-selection errors. The whole point of the abstraction is that your English is the grammar and a small, orthogonal palette is the verbs.

How is this different from the AI tools that show up in every 'best AI testing tools 2026' list?

Most top-ranked options (Testim, Mabl, Functionize, Applitools, testRigor, Virtuoso, Checksum, ACCELQ) are cloud-hosted platforms with a proprietary scripting format, a recorder that captures a DOM path object repository, and monthly per-seat pricing in the $150 to $7,500 range. Your tests live on their servers. Assrt is an MIT-licensed Node package that dispatches to the official @playwright/mcp server. Tests are a Markdown file on your disk. The browser is the Chromium binary on your machine. There is no cloud dependency; app.assrt.ai is opt-in sync only.

Which LLM is the agent, and can I swap it?

The default is claude-haiku-4-5-20251001 (see DEFAULT_ANTHROPIC_MODEL at agent.ts line 9). The TestAgent constructor also accepts a Gemini provider and a default of gemini-3.1-pro-preview, plus an authType of 'oauth' for Claude Code sessions and 'apiKey' for regular API keys. Override with --model on the CLI or ANTHROPIC_MODEL / GEMINI_MODEL environment variables. Bring-your-own key; Assrt never proxies your traffic.

What happens when the agent cannot find the element I described?

It falls back in the order prescribed by the SYSTEM_PROMPT at agent.ts line 220. First it calls snapshot again to get a fresh accessibility tree with ref IDs (the rule on line 213: 'ALWAYS call snapshot FIRST'). Then it tries a different ref or a scroll. If three attempts fail it calls assert with passed:false and an evidence string, captures a screenshot (emitted to the run report), then complete_scenario. You do not get a silent hang; you get a red result with a frame at the moment of failure saved under /tmp/assrt/<runId>/.

Is there a way to use the agent for exploratory testing without writing any #Case blocks?

Yes. assrt_plan navigates to a URL, queues up to 20 pages (MAX_DISCOVERED_PAGES at agent.ts), and fans out up to 3 concurrent Haiku calls (MAX_CONCURRENT_DISCOVERIES) that each generate 1 to 2 #Case blocks per page from the live accessibility tree plus a screenshot. Feed that output straight into assrt_test. The tooling is intentionally small enough that the discovery pass and the execution pass share the same browser session and the same TOOLS palette.

Adjacent guides

Keep reading

Beginner

QA automation for beginners: the login problem no other guide solves

How persistent ~/.assrt/browser-profile keeps Case 2 through N logged in after Case 1 authenticates. Complements the OTP coverage on this page.

Read

Playwright

Playwright for beginners: the one regex that replaces the entire API

Companion deep dive into the scenario parser at agent.ts:621 and why #Case Markdown is the only grammar you need to learn.

Read

Reliability

Self-healing tests: how the agent retries when an element disappears

The snapshot-refs-retry loop baked into the SYSTEM_PROMPT at agent.ts:220. Why the agent almost never needs a human to re-record a broken selector.

Read

The whole palette, scrollable

How the 18 tools bundle into six jobs

Perception

Interaction

Timing

Verification gates

Reporting

Out-of-band

Your English goes in. 18 tools come out. Chromium drives.

AI tools for testing, wired up

The hardest part of AI testing is the verification code. Here is how we pass it.

An end-to-end signup, live in the terminal

Sequence view: one scenario, exactly the calls the agent makes

wait_for_stable is the tool every “AI tools for testing” list forgot

The numbers that actually matter when you grep the repo

Versus the typical 'AI tools for testing' listicle entry

What the palette unlocks on day one

Read it yourself, or keep reading vendor brochures

Talk to the maintainer about wiring Assrt into your CI

Questions the listicle posts never answer

Keep reading

QA automation for beginners: the login problem no other guide solves

Playwright for beginners: the one regex that replaces the entire API

Self-healing tests: how the agent retries when an element disappears

Comments (••)

Comments ()