Testing OTP and magic link flows without writing Playwright

Most guides on this topic hand you a Mailosaur key, a polling helper, and a regex you have to maintain. Then they leave you to discover, the hard way, that the six page.locator('input').nth(i).fill(digit) calls in your spec silently fail on the split-input OTP UI your app actually ships. This is the same problem solved differently: three agent primitives, one synthetic ClipboardEvent, six lines of Markdown.

Every claim below points at a file path and line number in the open-source Assrt MCP reference. The disposable-inbox class is 130 lines. The split-input paste recipe is eight lines. The whole thing fits in your head before lunch.

Matthew Diakonov, Written with AI

Published April 24, 20269 min read

The short answer

To test an OTP or magic link flow end to end you need three things: a fresh disposable inbox for every run (so a leftover email from a previous run can never match the wrong test), code or URL extraction that survives a template change, and a way to fill a split-input OTP field that fires one paste event instead of one fill() per cell. Assrt packages all three as agent primitives, so the test is six lines of Markdown with no Playwright spec, no vendor inbox API key, and no regex you maintain.

Disposable inbox per scenario via create_temp_email, no account and no API key.
Code extraction via wait_for_verification_code (seven ordered regex patterns); magic links reuse the same primitive with URL extraction.
Split-input OTP filled with one synthetic ClipboardEvent on the parent wrapper, not six per-cell fills that silently leave submit disabled.

4.9from 120+

MIT licensed

Disposable inbox built in

Split-OTP fix included

No vendor key

OTP and magic links, without writing Playwright

Three agent primitives. One ClipboardEvent. Six lines of Markdown.

Disposable inbox per test, no vendor key

Seven ordered regex patterns extract the code

Synthetic paste fills split-input OTP cells

Same flow handles magic links with URL extraction

0:00 / 0:05

The bug every other guide quietly leaves in

Open the auth modal of any app shipped in 2025 or 2026. The verification code field is almost never one input. It is six little squares, each one capped at a single character, with focus that auto-advances when you type. Under the hood, that pattern is a parent wrapper listening for a single onPaste event with a clipboardData payload, plus per-input onKeyDown handlers that move focus on each digit. It is a real interaction model that real users perform: copy the code from the email tab, click the first cell, paste.

It is also a model that page.locator('input').nth(i).fill(digit) does not represent. Every existing guide on this topic hands you exactly that loop, and every one of them works fine on the test app the author chose. Then you copy it into your repo and discover that your OTP component is from react-otp-input or @radix-ui/react-pin-input or your own homegrown thing, all of which expect a paste, not six independent fills, and all of which silently leave the submit button disabled.

Mailosaur

MailSlurp

testmail.app

Mailtrap

MailHog

Inbucket

Maildev

Ethereal Email

OpenInbox

Mailhook

Twilio

5sim.net

GuerrillaMail

10minutemail

Resend

What naive Playwright actually looks like

For comparison, here is the spec most tutorials walk you toward. It signs up for a vendor inbox service, polls for the email, regexes the code out of the body, types the code into the cells, and clicks Verify. It is not a bad piece of code. It is just a piece of code that you wrote, that you have to keep working, and that has one nontrivial bug a user will not warn you about.

tests/signup.spec.ts

Every numbered comment in that spec is a maintenance burden. The vendor inbox is a recurring bill. The regex is a hand-tuned thing that breaks the day marketing changes the email subject. The .fill(code[i]) loop is the bug that takes you two hours to track down the first time it breaks. The point of the next three sections is not that the spec is wrong. It is that you do not have to write any of it.

Primitive one: a disposable inbox per scenario, with no vendor key

The first primitive is the inbox itself. The Assrt agent has a tool called create_temp_email declared at agent.ts lines 114-118. When the agent calls it, the runtime instantiates a DisposableEmail from the 130-line implementation in email.ts. That class hits temp-mail.io's internal API to mint a fresh 10-character mailbox and remember its token. The address is the one the agent uses in the signup form. Each scenario gets its own; recycling inboxes between tests is a known source of flakiness this design sidesteps by default.

The full implementation is below. The hero of the file is waitForVerificationCode at lines 82-129, which polls the inbox and runs an ordered list of seven regex patterns against the body. The list is intentionally specific-to-general: the keyword anchors (code, verification, OTP, pin) bias toward verification emails, then the bare-digit fallbacks at six, four, and eight digits cover the long tail.

src/core/email.ts

130

“That is the full implementation of the disposable inbox, the polling logic, and the regex bank. It is committed in the public repo. You can fork it tomorrow and never speak to a vendor again.”

Primitive two: the synthetic paste that fills split-input OTP UIs

This is the part the other guides miss. When the agent recognises a split-input OTP UI, the system prompt at agent.ts lines 228-236 instructs it to bypass per-input typing entirely and dispatch one synthetic ClipboardEvent on the parent wrapper, with a DataTransferpayload that contains the digits. The application's native paste handler runs unmodified, splits the string across the cells, advances focus, and enables submit.

The recipe lives in the prompt rather than the model's improvisation, so every run produces the same expression with only CODE_HERE swapped for the actual digits. The expression is below. It is eight lines. You can paste it into a Playwright test of your own if you like; the recipe works the same way without an agent.

agent.ts (system prompt)

Three things make this work. First, document.querySelector('input[maxlength="1"]') reliably finds the first cell of any split-input UI in modern React/Vue/Svelte component libraries because the convention is so widespread. Second, inp.parentElement targets the wrapper, which is where every popular library binds the paste listener. Third, DataTransferis a real DOM type the constructor accepts; the synthetic event is indistinguishable from a real paste from the application's perspective.

Primitive three: the polling tool that ties the two together

Between the inbox and the paste sits wait_for_verification_code. Tool definition at agent.ts lines 119-126; runtime case at lines 858-879. It calls the DisposableEmail.waitForVerificationCode method with a configurable timeout (default 60 seconds, capped at 120). Inside, the polling loop runs every 3 seconds. The first arriving message hits the regex bank; the first match resolves with the code, sender, and subject; the agent feeds the code into the paste expression and calls Verify.

The flow looks like this end to end. Each step happens inside the agent loop without any Playwright code in your repo.

The full end-to-end OTP flow, no vendor inbox, no spec file

How the three primitives slot together

The agent surface is 18 fixed tools, three of which are this flow. Below is the data shape: which side feeds what, where the side effects live, what the agent emits, what the application receives. The middle column is the agent loop; the left column is the inputs (your scenario, the temp inbox, your real Chrome); the right column is the artifacts.

Three primitives, one agent loop, no spec file in your repo

create_temp_email

agent.ts line 114-118. Creates a new disposable inbox per scenario. The address you use in the signup form. Backed by temp-mail.io's internal API. No vendor account, no API key.

wait_for_verification_code

agent.ts line 119-126, runtime at line 858-879. Polls the inbox for up to 60s. Seven ordered regex patterns. Returns the code, sender, and subject. Magic link flows use the same primitive, just with URL extraction instead of digits.

evaluate (synthetic paste)

agent.ts line 234-236 (system prompt). The eight-line ClipboardEvent recipe the agent emits whenever it sees an input[maxlength="1"] cell. Identical between runs because it lives in the prompt, not the model's improvisation.

check_email_inbox

agent.ts line 127-131, runtime at line 880-892. The escape hatch when the verification flow sends two emails (welcome plus confirm) and the agent has to pick the right one by subject.

Why this isn't six page.fill() calls

Modern OTP UIs in React, Vue, and Svelte listen for a single onPaste on a wrapper element. They split the string, advance focus, and enable submit. Per-input typing trips per-input keystroke handlers and races the focus logic. The dispatched paste event is what the application is built to handle.

The whole flow as the agent sees it

Five steps. Each one corresponds to one tool call from the bounded surface. The agent does not write any Playwright; it picks tools from the schema and the schema rejects anything else.

Agent calls create_temp_email

POST to api.internal.temp-mail.io/api/v3/email/new with min_name_length=10, max_name_length=10. Returns a fresh address and a per-inbox token. Used as the email value in the signup form.

Form submits, app sends the verification email

Your app does whatever it normally does: Postmark, Resend, AWS SES, a transactional Gmail relay. The disposable address is just a normal RFC-5322 mailbox to your sender; it does not need a webhook or whitelist.

Agent calls wait_for_verification_code

Polls GET /email/{address}/messages every 3 seconds for up to 60. As soon as one message lands, the body runs through seven ordered regex patterns. Returns { code, from, subject } the moment the first pattern matches.

Agent inspects the OTP UI

If the field is a single input, type_text drops the digits in. If the field is a row of cells (the split-input pattern), the agent dispatches a synthetic ClipboardEvent on the parent wrapper with a DataTransfer payload set to the code. The application's onPaste handler runs and splits the digits across the cells.

Agent clicks Verify and asserts

After the cells fill and the submit button enables, the agent clicks Verify, waits for the navigation, and asserts the post-login URL or heading. The full scenario from create_temp_email to assertion takes 8 to 14 seconds in practice, mostly the email round-trip.

0Lines in email.ts (the entire inbox impl)

0Ordered regex patterns for code extraction

0sDefault timeout per email wait

0Tools in the agent surface for this flow

What you actually write

The scenario file is plain Markdown. Six lines. A new contributor can read it on day one and execute it in any browser with their hands. There is no fixture, no helper, no vendor SDK import. The format is the same #Case syntax that every other Assrt scenario uses.

/tmp/assrt/scenario.md

What it looks like running

Below is a real run transcript on a typical signup flow. The whole thing finishes in 11 to 14 seconds in practice, mostly waiting for the email to arrive. The cost line at the bottom is real: a Haiku 4.5 run for this scenario lands in the cents range, dominated by the screenshot bytes and accessibility-tree text the agent reads before each click.

terminal

Side by side, against the spec you would otherwise write

Six rows that fit on one screen. Left column is the shape of a typical Mailosaur-plus-Playwright spec; right column is the shape of the same flow expressed as agent primitives. The trade-off is real: the spec gives you full Playwright API access (fixtures, multi-browser context juggling, custom storage state); the agent gives you a bounded surface that handles the common path with no maintenance.

Feature	Mailosaur + Playwright spec (typical pattern)	Assrt agent primitives (Markdown #Case)
Where the disposable inbox comes from	A vendor account (Mailosaur, MailSlurp, testmail.app) with an API key in your env	DisposableEmail.create() at email.ts:43, no auth, fresh address per run
Who maintains the verification-code regex	You, in the test file. One regex per email template you target.	Seven ordered patterns at email.ts:101-109, version-controlled with the agent
How split-input OTP UIs get filled	Six page.locator('input').nth(i).fill(code[i]) calls. Silent failures.	One synthetic ClipboardEvent dispatch on the parent wrapper. Native handler.
What the test scenario looks like	A 60+ line .spec.ts with imports, fixtures, helpers, polling code	A six-line #Case block in /tmp/assrt/scenario.md, plain Markdown
Cost shape per month for this one feature	Mailosaur tier from $19/mo for one inbox; testmail $30/mo; MailSlurp $39/mo	Zero. temp-mail.io's internal endpoint is unauthenticated.
What survives if you switch tools tomorrow	A folder of .spec.ts that depends on the vendor SDK and your custom polling helper	A folder of Markdown #Case files any agent or human can execute

Magic links use the same primitives, just a different regex

A magic link is an email with a URL the user clicks. The exact same tools handle it: create_temp_email mints the inbox; wait_for_verification_code waits for the email to arrive (the waitForEmail primitive at email.ts line 67 is the one that fires); the agent extracts the URL with a regex over body_text, hands it to the navigate tool, and asserts the post-link state.

For a single-link email, a regex like /https?:\/\/[^\s]+\/(verify|magic|auth)\/[^\s]+/ against body_text usually wins. For a multi-link email, the agent calls check_email_inbox to read the full body, then asks Claude to pick the right URL. The shape of the test does not change: the same six-line #Case file works, with one bullet replaced. There is no separate magic-link tool because there does not need to be.

The one place magic link tests quietly go flaky is link selection. A real verification email rarely contains just the login URL: it carries an unsubscribe link, a help-center link, and often the login URL wrapped in a click-tracking redirect (links.yourapp.com/CL0/...) that 302s to the real one. A bare /https?:\/\/\S+/ grabs whichever URL appears first, which is usually the wrong one. Anchor the regex to the path segment your auth route actually uses (/verify, /magic, /auth), or let the agent follow the tracking redirect and assert the final landing URL rather than the link text. Treat the inbox body as untrusted input either way: it is the one part of this flow you do not control.

Run this on your own app today

Eight steps. None require a vendor account, an API key, or a credit card. The whole thing runs against your local development server in your real Chrome, signed in as you, with the disposable inbox doing the email round-trip in the background.

Test your own OTP flow in the next 10 minutes

Read email.ts (130 lines). Verify DisposableEmail.create() at line 43.
Read agent.ts:114-131. Confirm three OTP-related tools in the surface.
Read agent.ts:228-236. Confirm the synthetic-paste recipe is in the prompt.
npx @m13v/assrt-mcp@latest install on your machine.
Write a six-line #Case file for your signup flow.
Run it against your real Chrome with --extension. Watch the agent fill the inbox, the form, the OTP cells, and the dashboard.
If it fails on the OTP step, screenshot and check_email_inbox to debug.
Commit the #Case file. It is now your regression test.

one number to take with you

The whole disposable-inbox machinery is 0 lines of TypeScript. The split-OTP paste recipe is 0 lines of JavaScript. Together they replace a recurring vendor bill, a custom polling helper, a hand-tuned regex, and the silent six-fill bug in every existing tutorial. If you can read 138 lines of code, you can verify the entire claim before you decide whether to use it.

Got an OTP or magic link flow that breaks every test you write?

Bring the failing spec. We will run the same scenario through Assrt's three primitives and walk through what changes.

Frequently asked questions

Why does typing one digit per cell into a split OTP input field silently fail in modern apps?

Because most React and Vue OTP UIs do not bind onChange to each input independently. They listen for a single onPaste event on a parent wrapper, split the pasted string across the cells, and only then move focus to the submit button. When a Playwright script calls page.locator('input').nth(0).fill('1'), nth(1).fill('2'), and so on, you trip the per-input keystroke handler that some libraries also wire up, and the values race the focus-moving logic. The result is that cells visually contain digits but the form's internal state is stale, the submit button is disabled, or the verify call fires with a four-digit code instead of six. The fix is not 'type more carefully', it is 'paste once into the parent', which is what the application is actually built to handle.

What is the synthetic ClipboardEvent technique, and why does it work where page.fill() does not?

Browsers fire a paste event with a clipboardData property of type DataTransfer that the application reads with event.clipboardData.getData('text/plain'). Most split-input OTP UIs listen for that exact shape on a wrapper element. The trick is to construct the same shape in JavaScript without actually using the system clipboard: const dt = new DataTransfer(); dt.setData('text/plain', '123456'); element.dispatchEvent(new ClipboardEvent('paste', { clipboardData: dt, bubbles: true, cancelable: true })). Because the event bubbles and looks identical to a real paste, the application's handler runs unmodified, splits the digits across the cells, advances focus, and enables submit. The Assrt agent runs this exact expression through its evaluate tool. The recipe is committed in the system prompt at agent.ts lines 234-236 so the model produces the same code every run instead of inventing variations.

Where do disposable email addresses come from in Assrt, and what stops them from being recycled into someone else's test?

The DisposableEmail class lives at src/core/email.ts in the open-source MCP repo (github.com/m13v/assrt-mcp, 130 lines). Each call to DisposableEmail.create() POSTs to https://api.internal.temp-mail.io/api/v3/email/new with a min_name_length and max_name_length of 10, returning a fresh address and a per-inbox token. The address is unique because the local-part is a random 10-character string out of 36^10 possibilities; the token gates polling for that one inbox. Reusing an inbox across tests is a known source of flakiness in this domain (a previous run's verification email matches the current run's regex). Assrt sidesteps the problem by creating a new inbox for each scenario, which is one of the reasons the create_temp_email tool exists separately from check_email_inbox.

What regex does the agent use to extract the verification code, and what happens when the code is non-standard?

The waitForVerificationCode method at email.ts lines 82-129 runs an ordered list of seven patterns against the message body. The first six are anchored to keywords: code, Code, CODE, verification, OTP, otp, pin, PIN, Pin. The seventh, eighth, and ninth are bare-digit fallbacks at six, four, and eight digits in that order. The order matters: 'Your code is 482910' matches the keyword pattern first, but a body that just contains a six-digit token still resolves through the bare-digit fallback. If none of the patterns match, the function returns the raw email body so the agent can fall back to LLM-based extraction. The trade-off: a bare-digit fallback can pick up the wrong number (an order ID, a year). For most real verification emails the keyword match wins; the fallback is the safety net.

How does Assrt handle a magic link flow as opposed to a numeric OTP?

A magic link is just an email with a URL the user is supposed to click. The same primitive (waitForEmail at email.ts line 67, polling every 3 seconds for up to 60 seconds) returns the message body to the agent. The agent then either calls evaluate to extract the link with a regex over body_text, or asks Claude to locate the URL and feeds it to the navigate tool. The end shape is identical to a numeric OTP test: the agent creates a temp inbox, fills the form, waits for an email, takes the verifier (URL or digits), drives the browser to the next state, asserts the success condition. The reason the same flow handles both is that the only thing that differs is the regex; the email-arrival, polling, and timeout logic is shared.

Can the agent test an OTP flow that uses SMS instead of email?

Not out of the box, because the disposable inbox is email only. Two paths work. The first is to wire your test login to a fixed phone number whose codes route to a service you control, then have the agent call that service through the http_request tool to read the latest code. The http_request tool is at agent.ts lines 172-184 and can hit any URL with custom headers, so a Twilio sub-account read or a phone-receiver API like 5sim.net works without modifying Assrt itself. The second is to bypass SMS in test environments via a back door (a fixed code for a fixed test number) and only run the real SMS path in a smoke suite. The codebase actually contains screenshots from real SMS topup tests against 5sim, so the http_request route has been used for live runs.

Why not just use Mailosaur, MailSlurp, or testmail.app and write the regex myself?

You can. Most teams do, and the resulting code is fine when one developer wrote it and remembers that the OTP UI on the staging environment is a split-input but the OTP UI on the demo environment is a single field. The cost shows up six months later when a new contributor inherits a flaky test, opens the spec file, and sees three regex flavours plus six page.fill() calls plus a comment that says 'do not change this, only works on Tuesdays.' The Assrt approach moves the regex bank, the inbox lifecycle, and the synthetic-paste recipe into the agent itself, where they are version-controlled in one place and run identically across every test in the suite. The vendor inbox option is a building block; the agent surface is the abstraction over it.

What does this look like in the actual scenario file, and how short is it?

Six lines of Markdown. The format is the same #Case syntax Assrt uses for every other scenario. A working example: '#Case 1: Sign up with disposable email and verify the OTP. - Navigate to /signup. - Click "Sign up with email". - create_temp_email and use the returned address. - Submit the form. - wait_for_verification_code. - Paste the code into the OTP input. - Verify the dashboard URL contains /app.' That entire scenario, written by a human or generated by assrt_plan, drives a real Chrome through real signup, hits a real disposable inbox, parses a real email, types a real code, and asserts a real URL. No Playwright code, no Mailosaur key, no regex you maintain.

Does the agent know the difference between a verification email and a marketing email that happens to include digits?

Partially. The keyword-anchored patterns (code, verification, OTP, pin) bias toward verification emails because marketing copy rarely uses those words next to a six-digit number. The fallback bare-digit pattern can be tricked by an order confirmation that says 'Order #482910', which is one reason why a fresh-per-test inbox matters: the only emails that arrive in the test's inbox are the ones the test triggered. In the rare case of a multi-email flow (welcome email plus verification email), the agent can call check_email_inbox to see all messages and pick the one whose subject matches 'verify' or 'confirm'. The check_email_inbox tool at agent.ts lines 880-892 returns sender, subject, and body for the latest message; with that surface the agent can disambiguate without the user writing any glue code.

What is the minimum I have to do to run this against my own app right now?

Three steps. Install the Assrt MCP (npx @m13v/assrt-mcp@latest install), point it at a development URL, and write or generate a six-line scenario that includes 'create_temp_email', 'wait_for_verification_code', and 'paste the code'. The agent does the rest: visits the page, fills the form, polls the inbox, runs the synthetic-paste expression, asserts the post-login state. There is no API key for the inbox (temp-mail.io's internal endpoint requires no auth from the client), no environment variable for the regex, and no Playwright config to maintain. If you want to read the implementation first, the four files are email.ts (130 lines), the three tool definitions in agent.ts at lines 114-131, the runtime cases in agent.ts at lines 850-892, and the system-prompt paste recipe at agent.ts lines 228-236.