AI QA Testing: The OTP Problem Every Other Agent Quietly Fails
Every vendor page for "ai qa testing" talks about self-healing selectors, natural language scenarios, and no-code coverage. None of them talk about the one screen that reliably breaks every AI browser agent the moment you point it at a real product: the split-input verification code form. This guide is about that screen, and the exact workaround Assrt ships in its system prompt so its AI QA tester gets past it.
“If the code input is split across multiple single-character fields, you MUST use evaluate to paste all digits at once. Do NOT type into each field one by one.”
Assrt system prompt, src/core/agent.ts:234
1. The Wall: Split OTP Inputs
Most modern auth flows send a one-time code to email or SMS, and render the input as six or eight separate single-character boxes. The boxes are glued together with custom JavaScript: listeners for keydown, paste, and input, focus shifts from box to box, deletion walks backward. It looks trivial from the outside. From inside an AI QA testing agent, it is the point where runs stall silently and time out.
If your AI QA tester cannot cross this screen, it cannot test anything behind a login. For most SaaS products that is ninety percent of the surface area.
2. Why Typing Into Each Field Fails
The obvious approach, click box one, type "1", click box two, type "2", and so on, fails for three overlapping reasons:
- Focus is driven by the widget, not the browser. Programmatic typing can fire an input event in box one while the widget was trying to hand focus to box two, leaving an orphaned digit.
- Most of these components only enable the submit button after a single paste event populates all boxes at once. Six individual input events do not trigger the same code path.
- Some widgets debounce or dedupe rapid keystrokes, so the agent's typing rhythm, faster than a human and slower than a real paste, is the exact case the widget was never tested against.
The agent sees no error. It sees a form with visible digits and a disabled Verify button. It waits. It times out.
3. The Exact Paste Expression, From the System Prompt
Assrt bakes the workaround directly into the agent's system prompt at src/core/agent.ts:234-236 of the public assrt-mcp repo. The model is told to call the evaluate tool with this expression, verbatim, replacing only CODE_HERE:
() => {
const inp = document.querySelector('input[maxlength="1"]');
if (!inp) return 'no otp input found';
const c = inp.parentElement;
const dt = new DataTransfer();
dt.setData('text/plain', 'CODE_HERE');
c.dispatchEvent(new ClipboardEvent('paste', {
clipboardData: dt,
bubbles: true,
cancelable: true,
}));
return 'pasted ' + document.querySelectorAll('input[maxlength="1"]').length + ' fields';
}Three things make this work where naive typing does not. The event is a real ClipboardEvent, which the widget's own paste handler already subscribes to. It is dispatched at the parent container, which is where react-verification-code-input, Chakra's PinInput, and almost every hand-rolled OTP widget attach the listener. And the payload is a real DataTransfer, so the handler can call e.clipboardData.getData('text/plain') and get the full code.
The prompt is deliberate about one thing: "Do NOT modify this expression except to replace CODE_HERE." Models, given flexibility, will try to loop and type. The prompt blocks that entirely.
Point an AI QA tester at your signup flow
npx @m13v/assrt, call assrt_test with a scenario that starts 'Sign up with a new email.' The OTP hop is already handled.
Get Started →4. Where the Code Comes From: Disposable Inbox
Before the paste expression runs, the agent needs a code. Assrt provisions a real inbox on every signup using the temp-mail.io internal API (src/core/email.ts:9):
POST https://api.internal.temp-mail.io/api/v3/email/new
→ { email, token }
GET https://api.internal.temp-mail.io/api/v3/email/
{email}/messages
→ [{ subject, body_text, ... }]The agent calls create_temp_email before typing anything into the signup form, uses the returned address, submits, then calls wait_for_verification_code to poll the inbox until a code arrives. Only then does the paste expression fire. Every step is in the prompt at src/core/agent.ts:228-236.
5. How the Full AI QA Chain Runs
1. snapshot → find email field
2. create_temp_email → dispo address
3. type_text(email) → into signup form
4. click(SignUp) → form submits
5. wait_for_verification_code → poll temp-mail
6. evaluate(pasteExpression) → DataTransfer
ClipboardEvent
fires at parent
7. snapshot → all boxes filled
8. click(Verify) → now enabled
9. ...rest of the scenario → authed stateNotice step 7. The prompt tells the agent to snapshot after evaluate returns, to confirm all digits are visible, before it clicks Verify. That guards the one failure mode where the paste handler silently rejects the payload (widget expects a 4-digit code, got 6). The agent sees a half-filled field and can retry, rather than clicking a disabled button.
6. Where the Workaround Stops Working
Two known failure modes, both narrow. First, widgets that use contenteditable spans instead of real inputs. The selector input[maxlength="1"] returns nothing, and the expression reports "no otp input found." Rare in auth flows, common in custom hardware UIs. The fix is to swap the selector for the target site.
Second, flows that send the code over SMS to a real phone number rather than email. Temp-mail does not help. Assrt handles this by letting scenarios reference an sms.dev or Twilio virtual number; the pattern is the same (wait for code, paste) but the poll target changes.
Everything else, react-verification-code-input, Chakra PinInput, Mantine PinInput, Auth0's default OTP, Supabase's email template, Clerk's default UI, crosses on the first try with the expression above.
Frequently Asked Questions
Why do most AI QA testing agents get stuck on verification code forms?
Signup flows often render the 6 digit code as 6 separate <input maxlength="1"> boxes. A generic AI agent will try to click the first box, type one digit, tab, type the next, and so on. Focus management inside those components is usually custom JavaScript: typing programmatically does not always move focus, paste handlers fire on the container rather than each input, and one stuck keystroke wedges the whole field. The agent sees no error, just a form it cannot submit.
What is Assrt's actual workaround for split OTP inputs?
A verbatim JavaScript expression hardcoded in the system prompt at src/core/agent.ts:234-236. It calls document.querySelector('input[maxlength="1"]'), builds a new DataTransfer, sets the code as text/plain, and dispatches a synthetic ClipboardEvent('paste') at the parent container. The prompt instructs the model: "Do NOT modify this expression except to replace CODE_HERE." Typing into each field one by one is explicitly forbidden.
Why dispatch a paste event instead of calling input.value = code?
Split-input OTP widgets almost always subscribe to the paste event on the container to distribute digits across inputs. Setting .value on a single box bypasses that controller, leaves the other boxes empty, and does not fire the React/Vue synthetic input handler that enables the submit button. Dispatching a real ClipboardEvent with a DataTransfer payload is the one thing the widget's own paste handler is guaranteed to listen for.
Where does the code come from in the first place?
From a disposable inbox on temp-mail.io's internal API (src/core/email.ts:9). The Assrt agent is told, in the same system prompt section, to call create_temp_email before submitting any signup form, then wait_for_verification_code after submit. The wait uses the token returned from POST /email/new to poll GET /email/{address}/messages. Only then does the OTP paste expression run.
Does the classic record-and-replay or Playwright Codegen approach solve this?
No, and the reason is not obvious. Codegen records a literal paste into the first input, which works on the dev's machine because their clipboard had the real code, and because the component's paste handler was already loaded. In a headless CI run, the clipboard is empty and the timing is different, so the recorded paste replays against a form that does not have the expected state. An AI agent that constructs the DataTransfer in-page sidesteps the OS clipboard entirely.
Is this workaround specific to one frontend framework?
It works for the common patterns (react-verification-code-input, @chakra-ui/pin-input, and most hand-rolled variants) because they all listen to the paste event on a parent container. The selector input[maxlength="1"] is the one load-bearing assumption. If a site builds its OTP UI with a single input that visually splits digits with CSS, the expression still works (single element, same paste event). The only layout it fails on is a custom component that uses contenteditable spans, which is rare in auth flows.
Can I read the exact system prompt, or is it a trade secret?
It is in the public assrt-mcp repo at src/core/agent.ts:198-254. The Email Verification Strategy section, including the word-for-word paste expression, is lines 228 through 236. The repo is MIT licensed; you can fork it, swap the temp mail provider, or tighten the selector for your own targets. Nothing about the AI QA testing behavior is behind a paid API.
Run AI QA Testing Past the Login Wall
Assrt's AI QA tester ships the OTP workaround in its system prompt. Open-source, MIT licensed, self-hosted, no vendor lock-in.