AI tools for testing, opened up

AI tools for testing is a list of 18 function calls in one TypeScript file.

Every top-ranked roundup for this keyword walks you through a dozen vendors with cloud DSLs, per-seat pricing, and polite descriptions of “self-healing AI.” None of them open the agent. This page does. The Assrt MCP server exposes exactly 18 tools to a Claude Haiku 4.5 instance, declared as a TOOLS const at assrt-mcp/src/core/agent.ts:16. Two of those tools plus a cascade of 7 regexes at email.ts:101 handle the verification-code wall that every other “AI testing tool” quietly asks you to work around.

M
Matthew Diakonov
11 min read
4.9from Assrt MCP users
18-tool agent palette you can grep in one file
7-regex OTP cascade that passes real verification gates
MIT-licensed, self-hostable, no cloud runtime required

The anchor fact

Clone @assrt-ai/assrt, run grep -c 'name: "' node_modules/assrt-mcp/dist/core/agent.js, and the terminal prints 18.

That is the complete surface the AI is allowed to touch. Every “AI tools for testing” comparison page you will find on Google treats the AI as a black box. This one treats it as a file you can read, licensed MIT, with line numbers you can verify.

The whole palette, scrollable

This is the full TOOLS array, in declaration order. Anything your Markdown plan asks for has to bottom out in one of these names, or the agent has to refuse.

navigatesnapshotclicktype_textselect_optionscrollpress_keywaitscreenshotevaluatecreate_temp_emailwait_for_verification_codecheck_email_inboxassertcomplete_scenariosuggest_improvementhttp_requestwait_for_stable

Count the pills: 18. That count is the ceiling on the agent's power and also its floor. Narrow palettes mean the Haiku call picks the right tool on almost every step and almost never hallucinates a non-existent one.

How the 18 tools bundle into six jobs

Perception

Reads what's on the screen before acting. snapshot returns the page's accessibility tree with stable ref IDs; screenshot captures a frame that gets attached to the next LLM turn.

snapshotscreenshotevaluate

Interaction

The verbs every browser test needs. Each call is passed through to the official @playwright/mcp server, so the wire protocol is identical to what any Playwright user's code would produce.

navigateclicktype_textselect_optionscrollpress_key

Timing

wait accepts a text pattern or a fixed ms (capped at 10s). wait_for_stable watches a MutationObserver and resolves as soon as the DOM is quiet; this is the one you reach for on streaming AI apps.

waitwait_for_stable

Verification gates

Disposable inbox from temp-mail.io, plus a 7-regex cascade that extracts the code out of whatever marketing-heavy HTML the sender wrapped it in. Makes OTP-gated signups testable without a human mailbox.

create_temp_emailwait_for_verification_codecheck_email_inbox

Reporting

assert records a boolean with evidence; complete_scenario closes out the current #Case; suggest_improvement is the escape hatch the agent uses when it spots an obvious bug unrelated to the plan.

assertcomplete_scenariosuggest_improvement

Out-of-band

http_request lets the agent poll an external API (Telegram getUpdates, Slack conversations.history, GitHub issues) to verify that a web action produced the side effect you expected.

http_request

Your English goes in. 18 tools come out. Chromium drives.

The MCP server sits between the LLM and the browser. The LLM never speaks Playwright; it emits tool calls named in the TOOLS const. The server translates each call into the equivalent @playwright/mcp request, and the browser acts.

AI tools for testing, wired up

Your #Case Markdown
URL under test
Optional pass criteria
Assrt agent
18-tool palette
@playwright/mcp
Run report

The hardest part of AI testing is the verification code. Here is how we pass it.

Ranked-list guides for AI testing tools rarely mention OTP. The honest reason is that most products either stub the email API in test, or ask a human to paste the code. Neither is acceptable once the AI is supposed to replace the human.

The agent creates a disposable address through temp-mail.io, types it into the signup form, clicks through, and waits up to 60 seconds polling the inbox every 3 seconds. When mail arrives, seven regexes take turns, most specific first:

assrt-mcp/src/core/email.ts

Split-input OTP widgets would normally break a tool-by-tool approach: typing into six separate one-character inputs confuses most React OTP components. The SYSTEM_PROMPT tells the model to bypass this with one evaluate call that dispatches a synthetic ClipboardEvent carrying the full code. One tool. One paste. Six boxes populate together.

An end-to-end signup, live in the terminal

Twelve tool calls from “new tab” to “account exists.” No human ever sees the mailbox, and the OTP arrives through the 6-digit regex branch of the cascade above.

assrt run (signup + OTP)

Sequence view: one scenario, exactly the calls the agent makes

Scenario: sign up with OTP

#Case planAgent (Haiku)MCP serverChromiumtemp-mail.ioUser plan: '#Case 1: Create account'navigate(url)Playwright navigateloadedcreate_temp_emailPOST /email/newaddress + tokentype_text(email)click('Create')wait_for_verification_code(60)poll /messages every 3semail {body, subject}7-regex cascade -> 6-digit matchevaluate(ClipboardEvent paste)Playwright evaluateclick('Verify')wait_for_stable(2s quiet)assert(url contains /dashboard)complete_scenario(passed=true)

Why this matters for streaming UIs

wait_for_stable is the tool every “AI tools for testing” list forgot

Modern AI apps stream tokens for anywhere from 500ms to 30 seconds per turn. A fixed sleep(10000) is either flaky or wasteful. The wait_for_stable tool installs a MutationObserver, counts changes into window.__assrt_mutations, and returns as soon as the counter has been unchanged for a configurable quiet period (default 2s).

assrt-mcp/src/core/agent.ts

The numbers that actually matter when you grep the repo

Numbers from listicles are usually made up. These are line counts and constants you can verify in one shell command.

0Tools in the agent palette
0OTP regexes in the cascade
0MutationObserver poll (ms)
0Concurrent discovery agents

18 tools in TOOLS at agent.ts:16. 7 regex patterns in patterns at email.ts:101. 500ms poll interval and 3 concurrent discovery agents from MAX_CONCURRENT_DISCOVERIES at agent.ts:269.

18

The tool palette the agent actually calls, not a marketing diagram.

agent.ts:16 (TOOLS const)

Versus the typical 'AI tools for testing' listicle entry

Generalised from the top Google results for this keyword.

FeatureTypical AI testing toolAssrt
What drives the browserProprietary runtime, cloud-hosted, inaccessibleOfficial @playwright/mcp dispatched by Claude Haiku
Tool palette exposed to the AIOpaque; vendor-controlled, not documented18 named tools, declared as a const at agent.ts line 16
OTP verification flowsManual code grab or API stub neededDisposable inbox plus 7-regex cascade, fully hands-off
Waiting for streaming UI to settleFixed sleep or vendor 'smart wait'MutationObserver polled every 500ms, settles in 2s of quiet
License & hostingCommercial SaaS, $150 to $7,500 / monthMIT on npm as @assrt-ai/assrt; 100% self-hostable
Where your tests liveVendor DB, inaccessible without export creditsMarkdown on your disk, synced to /tmp/assrt/ locally

What the palette unlocks on day one

Flows most AI testing tools treat as edge cases

  • Sign up a fresh account end-to-end on a site you do not own
  • Pass any email OTP gate without touching an inbox yourself
  • Test a streaming AI chat app without guessing the right sleep()
  • Verify that a form submission actually reached a Telegram bot or webhook
  • Hand a non-engineer a Markdown file they can edit and re-run
  • Run the entire loop on a laptop with no vendor account

Read it yourself, or keep reading vendor brochures

The entire agent fits in one file. TOOLS at line 16. SYSTEM_PROMPT at line 198. The OTP paste rule at line 235. wait_for_stable at line 956. email.ts adds 131 lines for the disposable inbox and the regex cascade. Clone the repo, open the file, verify the count.

Talk to the maintainer about wiring Assrt into your CI

30 minutes on a call to map the 18-tool palette onto your real signup, OTP, and streaming-UI flows. Bring a URL.

Book a call

Questions the listicle posts never answer

What does 'AI tools for testing' actually mean once you open the agent?

At the level of the Assrt MCP server it means a fixed list of 18 function-call schemas that a Claude Haiku 4.5 instance is allowed to invoke. They are declared as a const named TOOLS at assrt-mcp/src/core/agent.ts line 16, and they are the only surface the agent is exposed to. navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable. Clone the repo, grep for 'name: "' in that file, and you will count exactly 18. Nothing else happens server-side. No hidden cloud DSL, no proprietary selector engine.

How does the AI deal with signup flows that require a verification code?

Two tools do the work together. create_temp_email creates a real disposable address through temp-mail.io (the agent calls POST https://api.internal.temp-mail.io/api/v3/email/new at email.ts line 44), then wait_for_verification_code polls the inbox every 3 seconds for up to 60 by default, 120 seconds max. When mail arrives, a cascade of 7 regexes tries to extract the code, most specific first: code/Code/CODE, verification/Verification, OTP/otp, pin/PIN/Pin, then plain 6-digit, 4-digit, 8-digit runs. That cascade is at email.ts lines 101 to 109. You never touch the email yourself.

Most OTP login forms split the code across six one-character inputs. Does that break it?

No, and the workaround is hard-coded into the agent system prompt at agent.ts line 235 so the LLM does not have to reinvent it per site. Instead of typing into each input, the agent calls the evaluate tool with a specific JavaScript expression that dispatches a synthetic ClipboardEvent carrying the full code to the parent of the first input. React and most OTP widgets treat a paste as one atomic fill, so all six boxes populate at once and the Verify button becomes clickable. The expression literally starts with `() => { const inp = document.querySelector('input[maxlength="1"]');`. You can read it verbatim in the repo.

What is wait_for_stable and why does it matter more than a regular wait?

Most AI test tools paper over streaming UIs with a fixed sleep. Assrt installs a MutationObserver on document.body via evaluate, stores the running mutation count at window.__assrt_mutations, then polls every 500ms and resolves as soon as the count stops increasing for a configurable quiet period (2 seconds by default). Implementation lives at agent.ts lines 956 to 1009. In practice this means an AI chat app that streams tokens for 14 seconds becomes testable: the agent waits exactly as long as the stream takes and no longer.

Can I really read all of this in one file, or are you rounding off?

One TypeScript file. /Users/matthewdi/assrt-mcp/src/core/agent.ts is where the TOOLS array, the SYSTEM_PROMPT, the agent loop, wait_for_stable, and the OTP instructions live. email.ts is 131 lines and holds the disposable inbox plus the 7-regex cascade. browser.ts wraps @playwright/mcp. The published npm package @assrt-ai/assrt ships the same files under dist/, so `grep -n scenarioRegex node_modules/assrt-mcp/dist/core/agent.js` confirms the code you ran is the code you read. MIT license. Everything self-hostable.

Why 18 tools? Competitors list hundreds of actions in their documentation.

Competitor action lists tend to be feature checkboxes across a vendor's entire product: reporting, dashboards, parallelisation, analytics. The 18 here are only the function-call schemas the LLM sees at inference time. Fewer, sharper tools mean the Haiku agent picks the right one on almost every step and rarely fabricates a non-existent one. Adding more would bloat the prompt, lower the pick accuracy, and push us toward tool-selection errors. The whole point of the abstraction is that your English is the grammar and a small, orthogonal palette is the verbs.

How is this different from the AI tools that show up in every 'best AI testing tools 2026' list?

Most top-ranked options (Testim, Mabl, Functionize, Applitools, testRigor, Virtuoso, Checksum, ACCELQ) are cloud-hosted platforms with a proprietary scripting format, a recorder that captures a DOM path object repository, and monthly per-seat pricing in the $150 to $7,500 range. Your tests live on their servers. Assrt is an MIT-licensed Node package that dispatches to the official @playwright/mcp server. Tests are a Markdown file on your disk. The browser is the Chromium binary on your machine. There is no cloud dependency; app.assrt.ai is opt-in sync only.

Which LLM is the agent, and can I swap it?

The default is claude-haiku-4-5-20251001 (see DEFAULT_ANTHROPIC_MODEL at agent.ts line 9). The TestAgent constructor also accepts a Gemini provider and a default of gemini-3.1-pro-preview, plus an authType of 'oauth' for Claude Code sessions and 'apiKey' for regular API keys. Override with --model on the CLI or ANTHROPIC_MODEL / GEMINI_MODEL environment variables. Bring-your-own key; Assrt never proxies your traffic.

What happens when the agent cannot find the element I described?

It falls back in the order prescribed by the SYSTEM_PROMPT at agent.ts line 220. First it calls snapshot again to get a fresh accessibility tree with ref IDs (the rule on line 213: 'ALWAYS call snapshot FIRST'). Then it tries a different ref or a scroll. If three attempts fail it calls assert with passed:false and an evidence string, captures a screenshot (emitted to the run report), then complete_scenario. You do not get a silent hang; you get a red result with a frame at the moment of failure saved under /tmp/assrt/<runId>/.

Is there a way to use the agent for exploratory testing without writing any #Case blocks?

Yes. assrt_plan navigates to a URL, queues up to 20 pages (MAX_DISCOVERED_PAGES at agent.ts), and fans out up to 3 concurrent Haiku calls (MAX_CONCURRENT_DISCOVERIES) that each generate 1 to 2 #Case blocks per page from the live accessibility tree plus a screenshot. Feed that output straight into assrt_test. The tooling is intentionally small enough that the discovery pass and the execution pass share the same browser session and the same TOOLS palette.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.