Auth flow automation, the built-in version

Three tools, one pinned paste expression, zero mailbox setup. OTP and magic link tests that run from a plain-English #Case

Every other piece on automating OTP and magic-link verification hands you the same kit. Stand up Inbucket or register a Mailosaur server. Hand-roll a regex against the email body. Copy a DataTransfer paste helper into your spec because typing digit-by- digit into split boxes drops characters on React apps. The Assrt runner already ships those three pieces as built-in MCP tools, so the scenario file says what you want, not how to plumb it.

Walk through this on your own signup flow Read assrt-mcp/src/core/email.ts

Matthew Diakonov, Written with AI

Published April 23, 202610 min read

4.8from based on the actual file layout of assrt-mcp

Three OTP tools defined in agent.ts lines 114-131: create_temp_email, wait_for_verification_code, check_email_inbox.

Seven-pattern code extractor at email.ts lines 101-109; labelled matches before bare-digit fallbacks.

Pinned ClipboardEvent paste expression at agent.ts line 235; the system prompt forbids modifying it.

Three tools, zero SMTP setup

how Assrt handles OTP and magic-link tests without an external mailbox

create_temp_email grabs a fresh disposable address

wait_for_verification_code polls the inbox every 3 seconds

the seven-regex extractor pulls the code from the body

the pinned paste expression fills split input[maxlength="1"] boxes

same #Case handles the magic-link branch with one extra tool call

0:00 / 0:05

Every other guide hands you a build-your-own kit

Pull up the current writing on this topic and the shape is always the same. Pick an email-interception service (Inbucket if you use Supabase locally, Mailosaur if you need real SMTP in CI, Mailtrap if you want a shared sandbox). Install the client library. Configure an API key and a server ID. Write a polling helper, usually twenty lines. Regex the code out of the email body yourself, remembering to decode HTML entities. Copy a DataTransfer-based paste helper into your spec because typing into split OTP boxes with keyboard events silently drops characters on most React-controlled forms. Clean the inbox afterwards so the next run does not fetch a stale message.

That stack works. It is also forty lines of boilerplate per spec file, plus an account somewhere, plus a per-run vendor bill. The actual interesting part of the test — "does a real user finish signup after getting a code?" — is six lines buried inside it.

The Assrt runner collapses the plumbing into three MCP tools and a pinned JavaScript expression, all defined in the agent itself. The scenario file only describes the user intent; the agent calls the built-in tools in order, and the plain-English case reads the same for a six-digit OTP, a four-digit PIN, an eight-digit token, or a magic link.

What goes in, what comes out

The three tools, by their exact names

The Anthropic-tool schemas live in a TOOLS array in assrt-mcp/src/core/agent.ts. The email-related entries are at lines 114 through 131; everything else (browser navigation, clicks, asserts) sits around them. The agent picks these tools the same way it picks click or navigate: by intent inferred from the scenario text.

create_temp_email

A zero-argument MCP tool (agent.ts:115). Calls DisposableEmail.create(), which POSTs to api.internal.temp-mail.io/api/v3/email/new with a fixed ten-character local part, and hands the generated address back to the scenario. No API key, no tenant, no routing config.

wait_for_verification_code

Polls the disposable inbox every three seconds for up to sixty, runs the email body through a seven-pattern extractor, returns { code, from, subject, body } (email.ts:82-129). Accepts a timeout_seconds argument if your SMTP is slow.

check_email_inbox

A sibling tool (agent.ts:128) that lists everything in the disposable inbox. Useful when wait_for_verification_code's regex ladder misses an oddly-formatted message and the agent needs to read the raw text.

Seven-regex extractor

email.ts:101-109. Labelled matches first: 'code:', 'verification:', 'OTP:', 'pin:'. Then bare six-digit, four-digit, and eight-digit fallbacks. The returned code field is the capture group; body is the first 5000 characters of the email verbatim.

Pinned ClipboardEvent paste

agent.ts:235. The system prompt instructs the agent to call evaluate with a verbatim expression that builds a DataTransfer, targets input[maxlength="1"], and dispatches a paste event on the parent container. Works on React-controlled split OTP fields where per-digit typing drops characters.

Magic link fallback

If the email is a link rather than a numeric code, the body field is still populated. The agent extracts the href via evaluate, hands it to navigate, and the same #Case handles both OTP and magic-link flows without branching.

The split-box paste expression, verbatim

The most annoying part of automating OTP UIs is the six-input pattern where each digit is its own input[maxlength="1"]. Most React implementations intercept keypress events and autoadvance focus, which means typing digits keyboard-style either fails silently or fills fewer than six fields. The real-user behaviour that does work is paste: the browser fires a single clipboard event, the component handler sees the whole code as a string, and it splits it across the inputs itself.

The agent's system prompt pins the exact expression to use. It is the line below, and the prompt explicitly tells the agent not to modify it except to substitute the real code for CODE_HERE. That one restriction is what keeps the behaviour stable across scenarios; the agent does not get creative about how to fill OTP fields.

agent.ts line 235 (quoted from the system prompt)

If you are curious why this specific incantation and not some other approach, the short version: dispatchEvent on the parent container (not on the individual input) lets the React handler see the paste as a bubbling event, which is what the usual OTP component libraries listen for. Dispatching on the input itself misses the delegation; setting value imperatively skips the synthetic event entirely. The pattern works because it matches what a human clicking Paste actually causes.

The seven patterns in the extractor ladder

After the disposable inbox returns a message, the runner has to pull the code out of the email body. Rather than delegate that to every scenario, waitForVerificationCode in email.tsruns a fixed seven-pattern ladder. Labelled matches first (because "Your reference 482910" in a marketing footer would otherwise beat the real code). Bare-digit fallbacks last.

email.ts lines 101-109

What the extractor handles

Labelled six-digit codes: 'Your code is 482910'
Labelled four-digit PINs: 'PIN: 7421'
Labelled eight-digit tokens: 'Verification: 11928273'
Bare six-digit sequences inline in sentences
Bare four-digit fallback for legacy flows
HTML-only emails (body_html stripped to plain text, email.ts:91-98)
Magic links (body field returned, extract + navigate)

The whole run, at a terminal

The log below is what two scenarios look like when the runner works. Note the create_temp_email line before the form submission — that is the tool hop that usually takes twenty lines of your own helper code — and the pasted 6 fields line, which is the pinned expression confirming it filled the split-box OTP.

npx assrt run

0Built-in MCP tools

0Code patterns in the extractor

0sDefault inbox wait timeout

0SMTP credentials required

Six OTP tests the same scenario handles without branching

Because the regex ladder and the paste expression are inside the runner, the scenario text does not have to distinguish between six-digit codes, four-digit PINs, eight-digit tokens, or magic links. You write the user intent once and let the agent pick the right primitive. When a flow changes from OTP to magic link (or vice versa) during a product pivot, the scenario stays the same.

Write one #Case, no helpers

In /tmp/assrt/scenario.md: "Navigate to the signup page. Request an account using a fresh test email. Wait for the verification code. Enter the code. Assert the dashboard loads." No selectors, no API keys, no helper import.

Agent calls create_temp_email

The runner recognises the intent and calls the MCP tool before any form interaction. DisposableEmail.create() returns a ten-character address under a temp-mail.io subdomain (email.ts:49) that the agent plugs into the email input.

Signup form submits, the agent waits

wait_for_verification_code fires. It polls every three seconds (email.ts:67) against /email/<address>/messages. The default wait is sixty seconds, enough for almost any SMTP pipeline; pass timeout_seconds to stretch it.

Extractor returns the code

The seven-regex ladder at email.ts:101-109 runs most-specific patterns first. 'code: 482910' matches before the bare six-digit rule would, so no false positives from a transaction ID that happens to also be six digits elsewhere in the email body.

Agent pastes into split OTP boxes (or clicks the link)

For numeric codes in six-box UIs, the agent runs the pinned ClipboardEvent paste expression exactly as written in agent.ts:235 — DataTransfer, paste event, input[maxlength="1"]. For magic links, it extracts the href and calls navigate. Same #Case, both branches.

Scenario completes, file artifacts land

The runner writes /tmp/assrt/results/latest.json with the pass/fail, the assertions array, screenshots from each step, and the webm recording. No dashboard to poll, no vendor cloud to log into.

The 0-line Playwright version, next to the one-paragraph Assrt version

Tab through both. The left tab is the typical plain-Playwright implementation with Mailosaur and the DataTransfer paste helper inlined. The right tab is the Assrt scenario file that the runner compiles into the same behaviour. Everything the Playwright version does by hand, the runner does through its built-in tools.

Plain Playwright + Mailosaur vs Assrt scenario

import { test, expect } from "@playwright/test";
import MailosaurClient from "mailosaur";

const mailosaur = new MailosaurClient(process.env.MAILOSAUR_API_KEY!);
const serverId = process.env.MAILOSAUR_SERVER_ID!;

test("signup with OTP via Mailosaur", async ({ page }) => {
  const email = `signup.${Date.now()}@${serverId}.mailosaur.net`;

  await page.goto("/signup");
  await page.getByLabel("Email").fill(email);
  await page.getByRole("button", { name: /create account/i }).click();

  const message = await mailosaur.messages.waitFor(
    serverId,
    { sentTo: email },
    { timeout: 30_000 }
  );

  const body = message.text?.body ?? message.html?.body ?? "";
  const match = body.match(/\b(\d{6})\b/);
  if (!match) throw new Error("No 6-digit code found in email");
  const code = match[1];

  // Handle split OTP boxes with maxlength=1 via DataTransfer paste
  await page.evaluate((c) => {
    const inp = document.querySelector('input[maxlength="1"]');
    if (!inp) throw new Error("no otp input found");
    const container = inp.parentElement!;
    const dt = new DataTransfer();
    dt.setData("text/plain", c);
    container.dispatchEvent(
      new ClipboardEvent("paste", {
        clipboardData: dt,
        bubbles: true,
        cancelable: true,
      })
    );
  }, code);

  await page.getByRole("button", { name: /verify/i }).click();
  await expect(page.getByRole("heading", { name: /dashboard/i })).toBeVisible();

  await mailosaur.messages.del(message.id!);
});

65% fewer lines

Side-by-side: the bolt-together stack vs the built-in stack

A traditional approach means picking one of Inbucket, Mailosaur, or Mailtrap (your choice of three flavours of "mailbox you stand up yourself") and writing or maintaining helpers for extraction and paste. The Assrt row is what happens when those three pieces are primitives the runner owns.

Feature	Bolt-together stack	Assrt built-in
Mailbox provisioning	Stand up Inbucket on port 54324, or register a Mailosaur server, or wire up Mailtrap API credentials	create_temp_email returns a fresh address, no setup
Where the test code lives	Helper file with API client, polling loop, cleanup	One #Case in /tmp/assrt/scenario.md
Code extraction	Hand-roll a regex per email format; maintain as templates change	Seven-pattern ladder at email.ts:101-109, runtime-evaluated
Split-box OTP input	Copy a DataTransfer/ClipboardEvent helper into your spec; debug React controlled inputs	Pinned in the agent's system prompt at agent.ts:235
Magic link branch	Second helper: extract href, decode entities, navigate	Same scenario; body field + evaluate + navigate
Unique address per run	Append Date.now() to local part, track collisions across CI	Every create_temp_email call is a new address
Cost model	$40–$7,500 per month in seat or usage fees	MIT-licensed, local, pay only for your own LLM tokens
Vendor lock-in	Proprietary YAML or dashboard formats	Plain-text scenario.md + Playwright-shaped results

Two edge cases still want a real mailbox: apps that blocklist known disposable email domains, and deliverability tests where the point is to verify production SMTP end to end. For those, swap create_temp_email for a named address and keep the rest of the scenario identical.

The uncopyable anchor

Open /Users/matthewdi/assrt-mcp/src/core/agent.ts. Jump to line 115. The next sixteen lines define three MCP tools in plain JSON: create_temp_email, wait_for_verification_code, and check_email_inbox. Jump to line 235. The system prompt block contains the exact ClipboardEvent paste expression the agent is instructed to use for split OTP boxes, verbatim. Open the sibling file email.ts. Line 9 holds the base URL (https://api.internal.temp-mail.io/api/v3). Lines 82 through 129 are the waitForVerificationCode body, including the seven-pattern regex ladder at lines 101 through 109. That is the whole mechanism, and it is the part no competing piece on this question can copy, because they do not own the runner.

What this makes obsolete

Each chip below is a line item from a typical OTP or magic-link automation stack. When the three tools and the pinned expression live in the runner, you stop provisioning any of them.

Inbucket on port 54324Mailosaur server IDsMailtrap sandbox credentialsGmail API OAuthIMAP polling loopPer-test regex helperDataTransfer paste boilerplateEntity-decoding magic-link extractorUnique-address collision logicVendor inbox cleanupProprietary YAML scenario format$7,500/month per-team tier

When a real mailbox still matters

Two cases. First, apps that blocklist known disposable domains. Some products reject signup from temp-mail.io and similar domains to avoid analytics pollution; for those, the disposable address bounces at the email-validation step and never reaches the OTP flow. The fix is boring: the scenario passes in a named test address, and everything downstream still works the same way. Second, deliverability tests — when the actual point of the test is to verify that your production SMTP provider delivers the message, not just that the app handles the code. Those need the real pipeline end to end, not a disposable intercept.

For everything else — the ninety-plus percent of OTP and magic link cases that are really asking "does the app handle the code correctly" rather than "does the mail server send it" — the disposable mailbox is enough, and the runner owns it.

Want this run against your own signup flow?

Twenty minutes, a live screenshare, a #Case written against your real app, and a recording.webm you can carry away.

Questions this topic usually raises

How do you automate an OTP test without hooking up Inbucket, Mailosaur, or Mailtrap?

You call three tools the runner already exposes. create_temp_email (agent.ts:115) grabs a fresh disposable address from api.internal.temp-mail.io, you type it into the signup form, and after the form submits you call wait_for_verification_code (agent.ts:120) with a timeout_seconds argument. That tool polls the disposable inbox every three seconds for up to sixty seconds, runs the email body through a seven-pattern extractor at email.ts:101-109, and returns the code as a plain string. No SMTP credentials, no sandbox inbox to provision, no hand-rolled helper in your spec file.

What if the app uses those split six-box OTP inputs where each digit has maxlength=1?

The agent has a ClipboardEvent paste expression pinned verbatim in its system prompt at agent.ts:235. It targets input[maxlength="1"], builds a DataTransfer with the code, and dispatches a paste event on the parent container — which is the only reliable way to get all six digits into those controlled inputs at once. Typing digit-by-digit with keyboard events drops characters on React-controlled components; the clipboard event is what actually works. The system prompt instructs the agent to use that exact expression and not modify it, which keeps the behaviour stable across scenarios.

Does this also work for magic link email flows, not just numeric OTP codes?

Yes, through the same primitives. wait_for_verification_code returns an object with code, from, subject, and body (email.ts:85) where body is the full text of the message up to 5000 characters. If no pattern in the seven-regex ladder matches a numeric code, the body field still comes back populated, so the agent extracts the href with an evaluate call, then navigates to it with the navigate tool. A magic link flow is the same three-tool sequence as an OTP flow, just with the last step swapping 'paste the digits' for 'visit the URL'.

Where does the mailbox actually live?

api.internal.temp-mail.io/api/v3. The constant is at email.ts:9. create() posts to /email/new with a fixed local-part length of ten characters (email.ts:49), getMessages() GETs /email/<address>/messages, and waitForEmail polls that endpoint every three seconds (email.ts:67) until a message arrives or the sixty-second default timeout elapses. You can raise the timeout by passing timeout_seconds to wait_for_verification_code. No account, no API key, no inbox routing to set up — the disposable address is generated on demand and tossed after the run.

What if the code comes as a four-digit PIN or an eight-digit token instead of the usual six digits?

The extractor runs seven patterns in order, most-specific first (email.ts:101-109). It first tries labelled formats: 'code:', 'verification:', 'OTP:', 'pin:'. If nothing labelled matches it falls back to bare six-digit, four-digit, and eight-digit sequences. The vast majority of real emails match one of the labelled patterns on the first pass; the bare-digit fallbacks catch emails that just print the code without context. The returned object always includes the raw body alongside the extracted code, so if the regex misses you can still read the string and continue the scenario.

Is there anything that still requires a real mailbox instead of the disposable one?

Two cases. First, when the app's signup flow rejects known-disposable email domains (some SaaS products blocklist temp-mail.io and similar domains to fight churn analytics pollution). For those apps, the agent's variables feature lets the scenario pass in a real test mailbox address, and you swap create_temp_email for a named address in your Gmail or a dedicated Mailosaur server. Second, deliverability tests where the point is to verify that your production SMTP actually arrives; those need the real pipeline end to end, not a disposable intercept. For the other ninety-plus percent of OTP and magic link cases, the disposable mailbox is enough.

How does this compare to writing the same test in plain Playwright with Mailosaur?

The plain-Playwright version is roughly forty lines: instantiate MailosaurClient with an API key, generate a unique address per run, submit the form, call messages.waitFor with a thirty-second timeout, pull message.html.body, regex the code out, paste it into the UI with either page.type (works for single-input codes) or a page.evaluate block with DataTransfer (works for split-box codes), click Verify, clean up with messages.del. The Assrt version is a #Case block that says 'fill the signup form, wait for the verification code, enter it, and assert the dashboard loads' — and the runner chains the three built-in tools plus the pinned paste expression behind the scenes.

What file should I open first if I want to see the real implementation?

Three files, in this order. /Users/matthewdi/assrt-mcp/src/core/email.ts (131 lines) is where DisposableEmail lives — constructor, create(), getMessages(), waitForEmail, and the extractor with its seven-regex ladder. /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 114-131 are the three MCP tool definitions the agent can call. Line 235 of the same file is the system-prompt block that pins the ClipboardEvent paste expression. Reading those three spots takes ten minutes and gives you the whole mechanism.

Is this open source?

Yes. assrt-mcp is MIT-licensed. npx assrt-mcp runs it locally; the disposable-mailbox integration is built into the runner itself, not gated behind a cloud tier. Competing agent-testing platforms charge $7,500 per month for comparable flows and ship proprietary YAML. Assrt emits real Playwright-style scenario files you can commit, diff, grep, and carry to any other runner if you ever decide to stop using Assrt.