For solo founders shipping daily

E2E tests for an MVP, in three Markdown files.

Most guides on this topic argue Cypress versus Playwright versus Selenium and recommend a testing pyramid that takes a five-person team a quarter to wire up. An MVP is one person shipping daily. Three to five tests cover the surface that matters. The bottleneck is not which framework you pick, it is the signup-with-OTP flow, and that is where this page spends most of its time.

Matthew Diakonov, Written with AI

Published May 8, 202611 min read

Direct answer, verified 2026-05-08

What an MVP should test end to end: three flows at launch, no more than five before product-market fit. Signup with OTP verification (the front door, and the test most teams skip because it is hardest). The one critical action a paying user performs (the thing your billing depends on). Billing itself if money moves. Optionally a destructive action (delete account, leave team) so you know account state is reversible. Skip everything else until you have users.

The signup-with-OTP test is the hard one for everyone. The disposable-email plumbing, the multi-input paste workaround, and the polling loop together account for most of the manual Playwright code you would write for an MVP suite. Verified against the open-source assrt-mcp source (parseScenarios at agent.ts:621, DisposableEmail at email.ts:67).

The pyramid is wrong for an MVP. Here is the shape that fits.

The testing pyramid (lots of unit, fewer integration, very few E2E) is good advice for a 20-engineer team with a stable product. It is actively bad advice for an MVP, because it tells you to write hundreds of unit tests against internal shapes that are still moving. Every time you change the shape, you change the tests, and you have spent the morning on test maintenance instead of on the product.

The shape that fits an MVP is inverted. Three to five end-to-end tests against user-visible contracts (signup, the paid action, billing), nothing in the middle, nothing at the bottom. The contracts at the top are the only things you cannot break without losing users or money. Everything else is moving and does not need to be locked in yet. When the surface stabilizes (usually 6 to 12 months in, after customers), you start filling in the layers below. Not before.

What most MVP testing guides recommend vs what actually works for one or two people

Pick Cypress or Playwright. Set up the testing pyramid. Aim for 70% unit coverage. Write integration tests at module boundaries. Add a handful of E2E tests at the top. Wire CI to fail on any drop in coverage. Allocate two weeks for the initial scaffolding.

Two weeks of scaffolding before the first test runs
Hundreds of unit tests against shapes still in flux
70% coverage is a lie when the code is rewritten weekly
Test maintenance scales with team size you do not have yet

The three-test starter set

These are the flows. Each one is a single Markdown file, 4 to 6 lines. The shape is the same across products, only the specifics of the paid action change. If your product does not have one of these, drop it and stop at two tests; do not invent a third for the sake of a round number.

The flows that matter, in order of priority

1
Signup with OTP
Disposable email, fill form, wait for code, paste, assert dashboard. The hardest one. Worth setting up first because it gates every other test.
2
The one paid action
Whatever the user pays you for. For a SaaS, the first project. For a marketplace, the first listing. For a tool, the first export. Test what you bill on.
3
Billing posts a charge
Stripe test mode, full checkout flow, assert the webhook lands and the user record updated. Skip if you are not yet charging money.
4
Optional: destructive teardown
Delete account, leave team, cancel subscription. Confirms account state is reversible. Add the moment you have a paying customer.
5
Optional: third-party API
If you have one upstream service whose outage breaks the product visibly (Stripe, OpenAI, a webhook target), one test that exercises it. Not for completeness, for the page that catches the outage.

Why the signup-with-OTP test is the hard one

This test gates everything else. If it breaks, every downstream test passes against fake users and you learn nothing about whether the product works. So it is the first one you write and the one you must get reliable. It is also genuinely harder than the others, for three reasons that compound.

First, the OTP arrives out of band. The runner needs a real email inbox it can read at runtime, which means either spinning up a throwaway IMAP listener or hitting a disposable-email service. There is no way to fake this from inside the browser. Second, the OTP UI is commonly six single-character inputs side by side, and Playwright's default fill() does not always cascade across them. Third, OTP arrival times (10 to 60 seconds) blow past the default test timeout, so you need a dedicated polling pattern.

The chunk below is what those three concerns look like when you write them by hand in Playwright versus what they look like in an Assrt Markdown #Case. Both run in real Chromium against real Playwright. The difference is the maintenance surface.

Signup-with-OTP test, two ways

// signup-otp.spec.ts (manual Playwright)
import { test, expect } from "@playwright/test";

test("signup with OTP verification", async ({ page }) => {
  // 1. Get a disposable email (you wire this yourself)
  const email = await createTempInbox();

  // 2. Open signup, fill form
  await page.goto("/signup");
  await page.fill('input[name="email"]', email.address);
  await page.click('button[type="submit"]');

  // 3. Poll for the OTP email (60s timeout, 3s interval)
  const start = Date.now();
  let code: string | null = null;
  while (Date.now() - start < 60_000) {
    const messages = await email.fetchInbox();
    if (messages.length > 0) {
      const match = messages[0].body.match(/\b(\d{6})\b/);
      if (match) { code = match[1]; break; }
    }
    await page.waitForTimeout(3000);
  }
  if (!code) throw new Error("OTP did not arrive");

  // 4. Multi-input OTP paste workaround
  await page.evaluate((c) => {
    const inp = document.querySelector('input[maxlength="1"]');
    if (!inp) throw new Error("no otp input");
    const dt = new DataTransfer();
    dt.setData("text/plain", c);
    inp.parentElement!.dispatchEvent(
      new ClipboardEvent("paste", {
        clipboardData: dt, bubbles: true, cancelable: true,
      })
    );
  }, code);

  // 5. Click verify, assert dashboard
  await page.click('button:has-text("Verify")');
  await expect(page.getByRole("heading", { name: /dashboard/i }))
    .toBeVisible({ timeout: 10_000 });
});

87% fewer lines, same browser, same Playwright underneath

Anchor fact 1

The whole scenario format is one regex

Assrt's parseScenarios function at /Users/matthewdi/assrt-mcp/src/core/agent.ts:621 splits a Markdown file on /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi. Anything between two case headers is treated as imperative steps. There is no schema, no DSL, no test-framework setup. You write English between #Case headers and the agent runs it.

Anchor fact 2

Seven OTP regex patterns, in fall-through order

The OTP extractor at /Users/matthewdi/assrt-mcp/src/core/email.ts:101-109 tries seven regex patterns in order: explicit code: 123456, then verification:, then OTP:, then PIN:, then bare 6-digit, then 4-digit, then 8-digit. Most providers hit the first three. Outliers fall through to the bare-digit patterns. You write zero lines of regex.

What it actually looks like when you run the suite

Three files, three #Cases, one CLI invocation. The runner streams each step to stdout, records video and screenshots to disk, and writes a structured log you can grep. If a case fails, the agent emits a diagnosis with the step that broke and a suggested fix-the-test patch. There is no remote dashboard you must visit, no cloud account, no UUID gate. Everything lives in /tmp/assrt and your repo.

assrt run, three cases, real browser

Setup, end to end, in 20 minutes

The framework is MIT-licensed and ships as an npm package. There is no signup, no API key, no cloud account. You install it in your repo, drop three Markdown files into /tmp/assrt, and run.

From cold repo to first passing test

Install: npx @m13v/assrt-mcp connects the runner to your dev server
Author scenario.md: three #Case blocks, four lines each, in plain English
Save to /tmp/assrt/scenario.md (path defined at scenario-files.ts:16-20)
Run: assrt_test against http://localhost:3000 (or your preview URL)
Watch: video, log, screenshots write to /tmp/assrt/results/<runId>/
Commit scenario.md to the repo, optionally export to standard Playwright later

What this approach does not do

Worth being explicit about the boundary. A three-test MVP suite does not give you regression coverage on edge cases. It does not catch a subtle accessibility regression, a layout drift on iPad, or a performance regression in a deeply nested route. It does not replace Sentry, error monitoring, or analytics. It tells you whether the three flows you cannot afford to break still work, on the morning of a ship. That is its job.

When an MVP has 100 customers and the product surface stabilizes, you graduate. You add unit tests for the bug classes that bit you, integration tests at the module boundaries that stopped moving, and more E2E tests for the secondary flows that are now load-bearing. The Markdown #Cases either stay (because reading them is faster than reading a 60-line Playwright spec) or get exported to standard Playwright code. Either way, no rewrite, no day-zero break.

The thing to resist at MVP stage is the urge to set up the end-state suite on day one. It is cheaper to start with three tests and grow the suite as the product earns each new one than to start with thirty tests and watch most of them fall into disrepair before they catch their first real bug.

Sanity-check your MVP test plan in 20 minutes

Bring your signup flow and the one critical action; we will sketch the three-test minimum together and you will leave with a scenario.md you can run today.

Common questions

Frequently asked questions

How many E2E tests does an MVP actually need before launch?

Three is the floor. Signup, one critical paid action (the thing the user pays you for, or the thing that proves activation), and billing if money moves. Add a fourth if you have a destructive action that needs to be reversible (delete account, leave team, cancel subscription). A fifth if there is a single dependency on a third-party API that breaks visibly when the upstream service degrades. That is it. Five tests, end to end, in real browsers. The temptation is to write more, because each one feels small, but the maintenance cost of E2E tests scales superlinearly with how many you have. A single founder cannot realistically maintain 30 of them through daily UI churn. The discipline is to keep the suite small enough that you reliably keep it green.

Why is signup-with-OTP the hard test for an MVP, and what makes it harder than the others?

Three reasons compound. First, the test needs a real email inbox the runner can read, because the OTP is delivered out of band; you cannot fake it from inside the browser. Second, OTP screens are commonly built as multiple single-character inputs (six boxes, one digit each), which trips up traditional Playwright fill() because the framework reaches for the first input and the autocomplete logic does not always cascade. Third, OTP timeouts (60-120 seconds for code arrival) blow past the default test timeout in most frameworks, so you need an explicit polling pattern. The reason this one matters most for an MVP is that signup is the front door. If signup breaks, every other test passes against ghost users and you have no idea anything is wrong. Assrt's pattern: create_temp_email at the start, fill the form, wait_for_verification_code (polls every 3 seconds for 60 seconds, defined at /Users/matthewdi/assrt-mcp/src/core/email.ts:67), extract the digits with the seven-pattern regex at email.ts:101, paste them with a clipboard-event hack the system prompt teaches the agent verbatim. That is the whole signup test. Four lines of Markdown.

What does an MVP-grade E2E test actually look like in Assrt's format?

It is a Markdown file. The format is parsed by one regex at /Users/matthewdi/assrt-mcp/src/core/agent.ts:621 — `/(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi` — which splits the file on lines starting with #Case, #Test, or #Scenario, and treats everything between as imperative steps. A first signup test looks like: `#Case 1: Sign up with email`. New line: `Use a disposable email`. New line: `Enter the OTP from the inbox`. New line: `Verify the dashboard heading appears`. The agent reads that, decides at runtime which buttons to click, picks selectors from the live accessibility tree, and runs everything through real Playwright under the hood. You did not import a test framework. You did not write a page-object class. You did not pin a selector. You wrote four lines of English and saved it as scenario.md.

Why not just write the same three tests in vanilla Playwright? It is the standard.

You can. The generated artifacts Assrt produces are real Playwright code, so there is no lock-in either way. The argument for plain Markdown #Cases at MVP stage is purely time. A vanilla Playwright signup-with-OTP test is roughly 60 to 100 lines once you account for the disposable email plumbing, the OTP polling loop, the multi-input paste workaround, the dashboard assertion, and the cleanup. The Assrt equivalent is four lines. At MVP stage you are shipping daily, the UI is shifting, and the maintenance cost of those 100 lines is the part that kills the test. When the UI changes, the agent re-reads the accessibility tree and finds the new button. When the Playwright test breaks, you spend an hour figuring out which selector drifted. After product-market fit, when the UI stabilizes, exporting the recorded scenarios to standard Playwright is a one-shot operation. Before product-market fit, leave the tests in plain English.

What about unit tests, integration tests, and the testing pyramid? The standard advice is start at the bottom.

The pyramid was written for engineering teams of 5 to 50 with a stable product surface. An MVP is one or two people shipping behavior the team has not yet decided is right. Unit tests freeze internal shapes that are still moving. Integration tests freeze module boundaries that are still moving. E2E tests freeze user-visible contracts (does signup work, does the paid action work, does billing post a charge), and those are the contracts you cannot break without losing users or money. Start at the top of the pyramid. Add tests downward only when you have proof that a specific class of bug keeps biting you and a unit-level test would have caught it. For most MVPs, that day comes 6 to 12 months in, after you have customers and a stable surface. Do not pre-pay for it.

How do I run these tests in CI without paying for a SaaS testing platform?

Two choices. The first: run them locally before every push, on the assumption that you push 1 to 5 times per day at MVP stage and the human-loop friction is fine. `npx @m13v/assrt run` against your local dev server, eyeball the result, ship. The second: a GitHub Actions workflow that runs the suite against a preview deployment on every PR, using the runner from the open-source npm package. There is no SaaS gate. The framework is MIT-licensed, the runner ships as a CLI, and the artifacts (video, log, screenshots) write to /tmp/assrt by default and can be uploaded as workflow artifacts on failure. Cost: zero, except the GitHub Actions minutes you would already be spending. Compare that to QA Wolf at roughly $7,500 per month and the math is not close at MVP scale.

What about flake? E2E tests are notorious for being flaky.

Two sources of flake matter at MVP scale. The first is timing flake: the page is not yet rendered when the test tries to click. Assrt addresses this with wait_for_stable, a MutationObserver that polls every 500ms and waits for 2 seconds of zero DOM mutations before continuing (default, configurable, ceiling 30 seconds). The second is selector flake: a button class changed and the test cannot find the element. Traditional frameworks pin selectors at write time and break at run time. Assrt re-reads the accessibility tree on every action, which is slower per test but eliminates the entire category of selector-drift flake. Net effect at MVP scale: a suite of 5 tests reliably stays green through routine UI changes, where the same suite written in vanilla Playwright would need a selector touch-up every other ship. If a test does fail, the run produces a video with an injected red cursor (so you can see exactly what the agent clicked) and a log with the agent's reasoning at every step.

Should I write the tests before or after the feature?

After, at MVP stage. Test-first is a hypothesis-stable practice; an MVP feature is a hypothesis being tested. Build the feature, ship it to one or two users, watch them use it, and only then write the E2E test that locks in the behavior you confirmed they care about. The order matters because writing the test first against a feature you might rip out next week is wasted maintenance debt. The flip side: any feature that ships to users without an E2E test is on a stopwatch. By the end of the week, either it gets a test or it gets ripped out. That is the discipline that keeps the suite small and useful.

How long does the three-test minimum take to set up from a cold repo?

Roughly 20 minutes if you have an MVP with signup, a dashboard, and Stripe checkout already built. Step one: `npx @m13v/assrt-mcp` connects the agent to your dev server. Step two: write three #Case files, four lines each, save to /tmp/assrt/scenario.md (the directory and filename layout are at /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts:16-20). Step three: run `assrt_test` against http://localhost:3000 (or wherever your dev server lives), watch the runner execute each scenario in a real browser. Step four: if any test fails, the agent emits a diagnosis with the exact step that broke and a suggested fix. Step five: commit scenario.md to your repo. That is the whole setup. The framework is MIT-licensed and free; the only cost is the LLM tokens for the agent to drive the browser, which run on the order of pennies per test on Claude Haiku.

What changes when we are no longer an MVP and graduate to a real testing strategy?

Three things shift. First, you start adding tests downward into the pyramid because the surface is stable and unit-level bugs become repeat offenders. Second, you start exporting Markdown #Cases to standard Playwright code because the LLM-driven runtime cost stops mattering once you have hundreds of test runs per day. Third, you start branching the suite by environment: local, staging, production, with different scenarios in each. Assrt's outputs are standard Playwright files, which means the migration is mechanical, not a rewrite. The MVP-stage Markdown is a reading-easy front end on top of code that already exists; you keep the underlying tests, you just add a different way of editing them. There is no day-zero break.

Adjacent guides on the same surface

Keep reading

Workflow

Auto-generate E2E tests with share links that work before the run

A pre-flight UUID assigned before Chrome launches means video, log, and screenshot URLs are paste-able into Slack the moment you fire the test.

Read

Onboarding

E2E testing for beginners: the first test you can actually watch

What gets injected into the page under test (red cursor, click ripple, keystroke toast) so a beginner can see exactly what the runner is doing.

Read

Maintenance

Self-healing test maintenance hours, the line item nobody tracks

How much time small teams actually spend re-pinning selectors after UI changes, and what runtime selector resolution does to that line item.

Read

The pyramid is wrong for an MVP. Here is the shape that fits.

What most MVP testing guides recommend vs what actually works for one or two people

The three-test starter set

The flows that matter, in order of priority

Why the signup-with-OTP test is the hard one

The whole scenario format is one regex

Seven OTP regex patterns, in fall-through order

What it actually looks like when you run the suite

Setup, end to end, in 20 minutes

What this approach does not do

Sanity-check your MVP test plan in 20 minutes

Common questions

Frequently asked questions

Keep reading

Auto-generate E2E tests with share links that work before the run

E2E testing for beginners: the first test you can actually watch

Self-healing test maintenance hours, the line item nobody tracks

Comments (••)

Comments ()