AI tools for testing is a list of 18 function calls in one TypeScript file.
Every top-ranked roundup for this keyword walks you through a dozen vendors with cloud DSLs, per-seat pricing, and polite descriptions of “self-healing AI.” None of them open the agent. This page does. The Assrt MCP server exposes exactly 18 tools to a Claude Haiku 4.5 instance, declared as a TOOLS const at assrt-mcp/src/core/agent.ts:16. Two of those tools plus a cascade of 7 regexes at email.ts:101 handle the verification-code wall that every other “AI testing tool” quietly asks you to work around.
The anchor fact
Clone @assrt-ai/assrt, run grep -c 'name: "' node_modules/assrt-mcp/dist/core/agent.js, and the terminal prints 18.
That is the complete surface the AI is allowed to touch. Every “AI tools for testing” comparison page you will find on Google treats the AI as a black box. This one treats it as a file you can read, licensed MIT, with line numbers you can verify.
The whole palette, scrollable
This is the full TOOLS array, in declaration order. Anything your Markdown plan asks for has to bottom out in one of these names, or the agent has to refuse.
Count the pills: 18. That count is the ceiling on the agent's power and also its floor. Narrow palettes mean the Haiku call picks the right tool on almost every step and almost never hallucinates a non-existent one.
How the 18 tools bundle into six jobs
Perception
Reads what's on the screen before acting. snapshot returns the page's accessibility tree with stable ref IDs; screenshot captures a frame that gets attached to the next LLM turn.
snapshotscreenshotevaluateInteraction
The verbs every browser test needs. Each call is passed through to the official @playwright/mcp server, so the wire protocol is identical to what any Playwright user's code would produce.
navigateclicktype_textselect_optionscrollpress_keyTiming
wait accepts a text pattern or a fixed ms (capped at 10s). wait_for_stable watches a MutationObserver and resolves as soon as the DOM is quiet; this is the one you reach for on streaming AI apps.
waitwait_for_stableVerification gates
Disposable inbox from temp-mail.io, plus a 7-regex cascade that extracts the code out of whatever marketing-heavy HTML the sender wrapped it in. Makes OTP-gated signups testable without a human mailbox.
create_temp_emailwait_for_verification_codecheck_email_inboxReporting
assert records a boolean with evidence; complete_scenario closes out the current #Case; suggest_improvement is the escape hatch the agent uses when it spots an obvious bug unrelated to the plan.
assertcomplete_scenariosuggest_improvementOut-of-band
http_request lets the agent poll an external API (Telegram getUpdates, Slack conversations.history, GitHub issues) to verify that a web action produced the side effect you expected.
http_requestYour English goes in. 18 tools come out. Chromium drives.
The MCP server sits between the LLM and the browser. The LLM never speaks Playwright; it emits tool calls named in the TOOLS const. The server translates each call into the equivalent @playwright/mcp request, and the browser acts.
AI tools for testing, wired up
The hardest part of AI testing is the verification code. Here is how we pass it.
Ranked-list guides for AI testing tools rarely mention OTP. The honest reason is that most products either stub the email API in test, or ask a human to paste the code. Neither is acceptable once the AI is supposed to replace the human.
The agent creates a disposable address through temp-mail.io, types it into the signup form, clicks through, and waits up to 60 seconds polling the inbox every 3 seconds. When mail arrives, seven regexes take turns, most specific first:
Split-input OTP widgets would normally break a tool-by-tool approach: typing into six separate one-character inputs confuses most React OTP components. The SYSTEM_PROMPT tells the model to bypass this with one evaluate call that dispatches a synthetic ClipboardEvent carrying the full code. One tool. One paste. Six boxes populate together.
An end-to-end signup, live in the terminal
Twelve tool calls from “new tab” to “account exists.” No human ever sees the mailbox, and the OTP arrives through the 6-digit regex branch of the cascade above.
Sequence view: one scenario, exactly the calls the agent makes
Scenario: sign up with OTP
Why this matters for streaming UIs
wait_for_stable is the tool every “AI tools for testing” list forgot
Modern AI apps stream tokens for anywhere from 500ms to 30 seconds per turn. A fixed sleep(10000) is either flaky or wasteful. The wait_for_stable tool installs a MutationObserver, counts changes into window.__assrt_mutations, and returns as soon as the counter has been unchanged for a configurable quiet period (default 2s).
The numbers that actually matter when you grep the repo
Numbers from listicles are usually made up. These are line counts and constants you can verify in one shell command.
18 tools in TOOLS at agent.ts:16. 7 regex patterns in patterns at email.ts:101. 500ms poll interval and 3 concurrent discovery agents from MAX_CONCURRENT_DISCOVERIES at agent.ts:269.
“The tool palette the agent actually calls, not a marketing diagram.”
agent.ts:16 (TOOLS const)
Versus the typical 'AI tools for testing' listicle entry
Generalised from the top Google results for this keyword.
| Feature | Typical AI testing tool | Assrt |
|---|---|---|
| What drives the browser | Proprietary runtime, cloud-hosted, inaccessible | Official @playwright/mcp dispatched by Claude Haiku |
| Tool palette exposed to the AI | Opaque; vendor-controlled, not documented | 18 named tools, declared as a const at agent.ts line 16 |
| OTP verification flows | Manual code grab or API stub needed | Disposable inbox plus 7-regex cascade, fully hands-off |
| Waiting for streaming UI to settle | Fixed sleep or vendor 'smart wait' | MutationObserver polled every 500ms, settles in 2s of quiet |
| License & hosting | Commercial SaaS, $150 to $7,500 / month | MIT on npm as @assrt-ai/assrt; 100% self-hostable |
| Where your tests live | Vendor DB, inaccessible without export credits | Markdown on your disk, synced to /tmp/assrt/ locally |
What the palette unlocks on day one
Flows most AI testing tools treat as edge cases
- Sign up a fresh account end-to-end on a site you do not own
- Pass any email OTP gate without touching an inbox yourself
- Test a streaming AI chat app without guessing the right sleep()
- Verify that a form submission actually reached a Telegram bot or webhook
- Hand a non-engineer a Markdown file they can edit and re-run
- Run the entire loop on a laptop with no vendor account
Read it yourself, or keep reading vendor brochures
The entire agent fits in one file. TOOLS at line 16. SYSTEM_PROMPT at line 198. The OTP paste rule at line 235. wait_for_stable at line 956. email.ts adds 131 lines for the disposable inbox and the regex cascade. Clone the repo, open the file, verify the count.
Talk to the maintainer about wiring Assrt into your CI
30 minutes on a call to map the 18-tool palette onto your real signup, OTP, and streaming-UI flows. Bring a URL.
Book a call →Questions the listicle posts never answer
What does 'AI tools for testing' actually mean once you open the agent?
At the level of the Assrt MCP server it means a fixed list of 18 function-call schemas that a Claude Haiku 4.5 instance is allowed to invoke. They are declared as a const named TOOLS at assrt-mcp/src/core/agent.ts line 16, and they are the only surface the agent is exposed to. navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable. Clone the repo, grep for 'name: "' in that file, and you will count exactly 18. Nothing else happens server-side. No hidden cloud DSL, no proprietary selector engine.
How does the AI deal with signup flows that require a verification code?
Two tools do the work together. create_temp_email creates a real disposable address through temp-mail.io (the agent calls POST https://api.internal.temp-mail.io/api/v3/email/new at email.ts line 44), then wait_for_verification_code polls the inbox every 3 seconds for up to 60 by default, 120 seconds max. When mail arrives, a cascade of 7 regexes tries to extract the code, most specific first: code/Code/CODE, verification/Verification, OTP/otp, pin/PIN/Pin, then plain 6-digit, 4-digit, 8-digit runs. That cascade is at email.ts lines 101 to 109. You never touch the email yourself.
Most OTP login forms split the code across six one-character inputs. Does that break it?
No, and the workaround is hard-coded into the agent system prompt at agent.ts line 235 so the LLM does not have to reinvent it per site. Instead of typing into each input, the agent calls the evaluate tool with a specific JavaScript expression that dispatches a synthetic ClipboardEvent carrying the full code to the parent of the first input. React and most OTP widgets treat a paste as one atomic fill, so all six boxes populate at once and the Verify button becomes clickable. The expression literally starts with `() => { const inp = document.querySelector('input[maxlength="1"]');`. You can read it verbatim in the repo.
What is wait_for_stable and why does it matter more than a regular wait?
Most AI test tools paper over streaming UIs with a fixed sleep. Assrt installs a MutationObserver on document.body via evaluate, stores the running mutation count at window.__assrt_mutations, then polls every 500ms and resolves as soon as the count stops increasing for a configurable quiet period (2 seconds by default). Implementation lives at agent.ts lines 956 to 1009. In practice this means an AI chat app that streams tokens for 14 seconds becomes testable: the agent waits exactly as long as the stream takes and no longer.
Can I really read all of this in one file, or are you rounding off?
One TypeScript file. /Users/matthewdi/assrt-mcp/src/core/agent.ts is where the TOOLS array, the SYSTEM_PROMPT, the agent loop, wait_for_stable, and the OTP instructions live. email.ts is 131 lines and holds the disposable inbox plus the 7-regex cascade. browser.ts wraps @playwright/mcp. The published npm package @assrt-ai/assrt ships the same files under dist/, so `grep -n scenarioRegex node_modules/assrt-mcp/dist/core/agent.js` confirms the code you ran is the code you read. MIT license. Everything self-hostable.
Why 18 tools? Competitors list hundreds of actions in their documentation.
Competitor action lists tend to be feature checkboxes across a vendor's entire product: reporting, dashboards, parallelisation, analytics. The 18 here are only the function-call schemas the LLM sees at inference time. Fewer, sharper tools mean the Haiku agent picks the right one on almost every step and rarely fabricates a non-existent one. Adding more would bloat the prompt, lower the pick accuracy, and push us toward tool-selection errors. The whole point of the abstraction is that your English is the grammar and a small, orthogonal palette is the verbs.
How is this different from the AI tools that show up in every 'best AI testing tools 2026' list?
Most top-ranked options (Testim, Mabl, Functionize, Applitools, testRigor, Virtuoso, Checksum, ACCELQ) are cloud-hosted platforms with a proprietary scripting format, a recorder that captures a DOM path object repository, and monthly per-seat pricing in the $150 to $7,500 range. Your tests live on their servers. Assrt is an MIT-licensed Node package that dispatches to the official @playwright/mcp server. Tests are a Markdown file on your disk. The browser is the Chromium binary on your machine. There is no cloud dependency; app.assrt.ai is opt-in sync only.
Which LLM is the agent, and can I swap it?
The default is claude-haiku-4-5-20251001 (see DEFAULT_ANTHROPIC_MODEL at agent.ts line 9). The TestAgent constructor also accepts a Gemini provider and a default of gemini-3.1-pro-preview, plus an authType of 'oauth' for Claude Code sessions and 'apiKey' for regular API keys. Override with --model on the CLI or ANTHROPIC_MODEL / GEMINI_MODEL environment variables. Bring-your-own key; Assrt never proxies your traffic.
What happens when the agent cannot find the element I described?
It falls back in the order prescribed by the SYSTEM_PROMPT at agent.ts line 220. First it calls snapshot again to get a fresh accessibility tree with ref IDs (the rule on line 213: 'ALWAYS call snapshot FIRST'). Then it tries a different ref or a scroll. If three attempts fail it calls assert with passed:false and an evidence string, captures a screenshot (emitted to the run report), then complete_scenario. You do not get a silent hang; you get a red result with a frame at the moment of failure saved under /tmp/assrt/<runId>/.
Is there a way to use the agent for exploratory testing without writing any #Case blocks?
Yes. assrt_plan navigates to a URL, queues up to 20 pages (MAX_DISCOVERED_PAGES at agent.ts), and fans out up to 3 concurrent Haiku calls (MAX_CONCURRENT_DISCOVERIES) that each generate 1 to 2 #Case blocks per page from the live accessibility tree plus a screenshot. Feed that output straight into assrt_test. The tooling is intentionally small enough that the discovery pass and the execution pass share the same browser session and the same TOOLS palette.
Adjacent guides
Keep reading
QA automation for beginners: the login problem no other guide solves
How persistent ~/.assrt/browser-profile keeps Case 2 through N logged in after Case 1 authenticates. Complements the OTP coverage on this page.
Playwright for beginners: the one regex that replaces the entire API
Companion deep dive into the scenario parser at agent.ts:621 and why #Case Markdown is the only grammar you need to learn.
Self-healing tests: how the agent retries when an element disappears
The snapshot-refs-retry loop baked into the SYSTEM_PROMPT at agent.ts:220. Why the agent almost never needs a human to re-record a broken selector.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.