Integration vs E2E testing, reconsidered

The testing pyramid is a setup-cost argument. Share one browser and it collapses.

Every guide on integration vs E2E testing defends the same pyramid: write many integration tests, few E2E tests, because each E2E test pays a heavy per-test setup cost. That argument evaporates when every scenario in your plan shares a single persistent browser profile, logs in exactly once, and inherits session state from the previous case. This page shows the exact files in Assrt's source where that becomes real, and contrasts the result against a classic Playwright spec suite forced to re-authenticate before every it().

Install assrt-mcp Skip to the collapse ->

Matthew Diakonov, Assrt

Published April 20, 202610 min read

4.9from twenty scenarios, one login

Scenario regex at src/core/agent.ts:621 accepts #Case, Scenario, or Test headings

Shared-session invariant stated in the system prompt at agent.ts:239-240

isolated defaults to false at src/mcp/server.ts:354, profile at ~/.assrt/browser-profile

assert primitive at agent.ts:886: one passed:false flips scenarioPassed

MIT licensed, real @playwright/mcp under the hood, no cloud dependency

The pyramid collapses

When every scenario shares a browser, setup cost is amortized to zero

Classic pyramid: few E2E tests, because each one re-launches the browser

Assrt: isolated=false by default, profile at ~/.assrt/browser-profile

#Case 1 signs up. Cookies and localStorage persist.

#Cases 2-N inherit the session, never re-authenticate.

Integration-sized scenarios, E2E fidelity, one-time setup.

0:00 / 0:08

Where the integration-vs-E2E trade-off comes from

The orthodox framing is a per-test cost argument. An integration test spins up a database, fakes a network, runs in a worker in under a second, and verifies that modules A and B agree. An E2E test spins up a browser, loads the app, signs in, clicks through the UI, and verifies that the whole stack behaves like a user would experience it. Multiply E2E setup across a thousand assertions and the suite takes hours, so the pyramid advice is to keep the E2E layer small, only for critical paths.

This is economics, not purity. Nothing about E2E tests is inherently unwholesome. They are slower because the setup cost is paid many times. If you remove the per-test setup cost, the trade-off disappears and the pyramid flattens. That is exactly what happens when every scenario shares one browser.

What "shared browser" actually means in Assrt

Three things. First, the MCP tool accepts a plan with many #Case blocks separated by a regex. Second, the runner launches one Chromium instance and reuses it across every case. Third, the browser profile persists to disk so re-runs also inherit state. These are three lines of code in three different files.

src/core/agent.ts:620-631 (scenario splitter)

src/core/agent.ts:238-240 (shared-session invariant)

src/mcp/server.ts:354 (isolated defaults to false)

1 login per plan

“isolated: false (default) → persist cookies, localStorage, and logins to ~/.assrt/browser-profile”

/Users/matthewdi/assrt-mcp/src/mcp/server.ts:354

How state flows between scenarios

Every #Case in a plan targets the same sharedBrowser singleton declared at src/mcp/server.ts:31. Cookies, localStorage, and auth tokens written during one case are visible to the next. Nothing about the next case needs to re-authenticate, re-navigate, or re-establish any session. The browser does not know where one case ended and the next one began.

One sharedBrowser, many #Case blocks, session carries

The economics, side by side

The left side is a classic Playwright spec that inherits its auth state from a JSON file reloaded per test. It works, but the per-test overhead is real and the authoring surface (fixtures, beforeEach, storageState JSON) adds up. The right side is the same behavior written as a single Assrt plan with three #Case blocks. Auth is inherited by construction, not by configuration.

tests/dashboard.spec.ts (classic Playwright)

/tmp/assrt/scenario.md (Assrt)

What a twenty-scenario plan looks like in practice

The plan below is a single markdown file. One assrt_test call runs all twenty cases against the same browser. The first case signs up. Every subsequent case assumes the signup happened because the browser still holds the session. You would have written these as many small integration tests before; now they are many small E2E tests at the same price.

/tmp/assrt/scenario.md

Run output with the shared session visible

When you actually execute a twenty-case plan, the terminal makes the shared-browser property visible: the browser launch happens once, the profile mount happens once, and the scenarios unroll one after another with no navigation between them. A failed case does not abort the run; the browser continues into the next case still holding the previous state.

claude-code: localhost:3000

Six observations about why this actually works

Start with the assumption the pyramid is baking in

Classic guidance says: few E2E tests because each one pays a high fixed cost (launch browser, auth, navigate). Integration tests are preferred in bulk because they skip that cost. The whole argument rests on per-test setup being expensive. That is the assumption Assrt removes.

Write the plan as #Case blocks, not spec files

Each #Case in /tmp/assrt/scenario.md is an integration-sized assertion: one action, one verification, a handful of steps. The parser at agent.ts:621 splits on the regex and feeds each block to the browser agent in order. You author at the granularity of an integration test.

Run with isolated: false (the default)

The first assrt_test call spawns @playwright/mcp as a stdio child and points it at ~/.assrt/browser-profile. Every subsequent scenario in the plan uses the same Chromium process, the same browser context, the same auth. The setup cost is paid once for the whole plan, not once per case.

Case 1 logs in, cases 2–N inherit the session

Because cookies and localStorage persist, the LLM agent steering cases 2 through N finds itself already inside the app. It reads the accessibility tree with ref=eN IDs, clicks where it needs to, and emits { description, passed, evidence } assertions. No beforeEach, no storageState round-trip.

Fail-fast verification per case, not per suite

Each case calls complete_scenario when done. A single passed: false on any assert flips that case to failed (agent.ts:886), but the next case still runs in the same session. You get twenty small, independently-passing-or-failing scenarios in the time a classic suite takes to re-launch the browser twenty times.

Re-run one failing case without re-running the plan

Every scenario gets a UUID saved at ~/.assrt/scenarios/<uuid>.json. If Case 17 failed you call assrt_test with scenarioId alone and it re-executes that single case against the still-warm browser. No need to re-run Case 1 through 16 to reproduce.

The mechanics, briefly

Six concrete pieces of the runtime that together do the work of collapsing the pyramid. Each one maps to a line of code in the open-source repository at github.com/assrt-ai/assrt-mcp, so none of this is marketing-side speculation.

Shared browser profile, on by default

isolated defaults to false at src/mcp/server.ts:354. Cookies, localStorage, and the current tab persist to ~/.assrt/browser-profile between scenarios and between runs. The first signup is the only signup.

Scenario splitter that accepts three headings

The regex /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi at agent.ts:621 matches #Case N:, Scenario N:, and Test N:. Each block becomes an independent scenario in the shared browser, no code generation, no YAML parser.

Real @playwright/mcp under the hood

Every click, type, snapshot, and setOffline is forwarded to the official Playwright MCP server spawned as a stdio child process. The browser work is indistinguishable from a hand-written Playwright script.

wait_for_stable replaces waitForTimeout

agent.ts:941-994 injects a MutationObserver, polls every 500ms for a 2-second quiet window (configurable), capped at 10s. Async flows wait exactly as long as they need, no flat sleeps.

Per-case UUIDs, re-runnable in isolation

Every scenario gets a UUID saved under ~/.assrt/scenarios/<uuid>.json. Pass scenarioId to assrt_test to re-run a single case against the warm browser, skipping the other 19 that already passed.

assert is strict, not graded

The verification primitive takes { description, passed, evidence }. One passed: false flips the whole case. No fuzzy 'mostly green' scoring. This is how Assrt avoids the AI e2e failure mode where the page loaded so it must be fine.

Constants you are working against

Four numbers pulled directly from the agent's source. No invented benchmarks, no vendor-supplied charts.

0default scenario LLM

0mss poll interval for wait_for_stable

0ss default quiet window

0ss cap for quiet window

0scenarios per plan is routine when each case inherits session state

0browser launched per assrt_test call, reused for every case

How Assrt compares to a classic Playwright E2E suite

Fidelity is identical, since both drive a real Chromium. The difference is everything around the browser: how plans are authored, how the setup cost amortizes, and how small you are willing to make a single scenario. The smaller you can make one scenario, the more the suite looks like integration tests, the more the pyramid collapses.

Feature	Classic Playwright E2E suite	Assrt
Typical per-test setup cost	Paid again for every spec file / worker	Paid once for the whole plan
Authoring surface	JS/TS spec files + fixtures + storageState	Plain markdown #Case blocks, one file
Shared auth across scenarios	storageState JSON you maintain by hand	~/.assrt/browser-profile persists by default
Granularity encouraged	Mega-flows to amortize setup cost	Integration-sized: one #Case = one assertion bundle
What drives the browser	Hand-written CSS / data-testid selectors	LLM agent with snapshot + ref IDs (a11y tree)
Fidelity	Real Chromium (every case)	Real Chromium via @playwright/mcp (every case)
Flakiness mitigation	Per-spec waitForSelector / waitForLoadState	wait_for_stable MutationObserver, 500ms poll
Cost per scenario	Full browser spin-up × N, plus CI minutes	Cents of LLM tokens + shared browser time

When integration tests still beat Assrt

Pure-function unit tests are still the cheapest way to catch logic errors. If you are verifying a reducer, a pricing calculation, a date parser, there is no reason to launch a browser at all. Pytest or Vitest will run hundreds of those in milliseconds. Assrt does not replace that layer and should not be used for it.

Where Assrt flips the calculus is everything that used to fall into "integration tests that really just shallow E2E tests." Any test where you spin up a real component, render it with real children, click it, and inspect the DOM was already a blurry middle of the pyramid. Move that work into an Assrt #Case and you get higher fidelity for the same authoring effort.

Walk through a twenty-scenario plan with us

Book 20 minutes and we will run a shared-session suite against your own staging URL, so you can see the pyramid flatten on your code.

Book a call →

Frequently asked questions

What is the textbook definition of integration testing vs E2E testing?

Integration tests verify that two or more units (functions, components, modules) work together against a real dependency like a database or an API, but stop short of driving a real browser. E2E tests drive a real browser through the full stack: DOM rendering, network, authentication, third-party scripts. Integration tests are fast and cheap per test, E2E tests are slow and flaky per test. The classic pyramid advice is to write many of the former and few of the latter.

Why did the integration-vs-E2E trade-off exist in the first place?

Because each E2E test traditionally paid the full startup cost: launch a browser, sign up or log in, navigate to the feature, drive the flow, tear down. When that setup cost is multiplied across dozens of tests the suite time explodes. The pyramid is not a moral law, it is an economic argument that only holds when every E2E test starts from a blank browser.

What does Assrt actually do differently?

By default (isolated: false in src/mcp/server.ts:354) Assrt persists the browser profile to ~/.assrt/browser-profile between scenarios. Each #Case in a plan runs in the same Playwright MCP browser instance, which means cookies, localStorage, and authenticated session state carry over. You can write one signup #Case followed by nineteen feature #Cases and the signup only happens once. The per-scenario setup cost that forced the pyramid in the first place does not exist.

Where in the code does this behavior live?

Three places. The scenario parser at /Users/matthewdi/assrt-mcp/src/core/agent.ts:621 uses the regex /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi and splits the plan text into independent scenarios. The runner at agent.ts:239-240 tells the agent explicitly: 'Scenarios run in the SAME browser session. Cookies, auth state carry over between scenarios.' The MCP tool at src/mcp/server.ts:354 defaults isolated to false, persisting the profile to ~/.assrt/browser-profile.

Is this really E2E testing, or is it a glorified integration test?

It is E2E. Assrt wraps the official @playwright/mcp server, which uses Playwright to drive a real Chromium. Every snapshot comes from the accessibility tree of a real page, every click goes through the CDP command queue, network requests hit the real dev server. The only thing that is different from a traditional Playwright script is that the test plan is plain markdown #Case blocks and an LLM agent decides which ref to click. The browser, the rendering, and the assertions are all real browser behavior.

Does isolation ever matter? When should I flip isolated to true?

When you deliberately want to test a blank-browser flow, like a first-time signup from zero state, or when you are checking that a feature works for a user who is not logged in and you do not want prior runs polluting the result. Pass isolated: true to assrt_test and the runner launches an in-memory profile that is discarded on exit. The flag is per-call, so you can mix logged-in feature runs with fresh-session smoke tests in the same suite.

What happens between scenarios when they share a browser?

Nothing happens. The agent finishes Case 1, calls complete_scenario, and the outer runner starts Case 2 against the same page the browser ended on. There is no reset, no navigate, no cookie clear. This is documented at agent.ts:489-492 where the finally block comments 'Don't close the browser here — keep it alive so the user can take over and interact after the test finishes.' You write your scenarios to compose: log in once in Case 1, assume logged-in state in Case 2 onwards.

How is this different from just using Playwright's storageState?

Playwright's storageState requires you to author a setup script, persist cookies to a JSON file, and reload them into each test context. It works, but it is a separate authoring surface you maintain by hand. Assrt inherits the shared session automatically because the scenarios run in a single Playwright process against a single browser context, not separate spec files in separate workers. You get the effect of storageState without writing storageState code.

Is the pyramid wrong now, or is it still useful at some level?

Unit tests are still the cheapest way to catch logic errors in a pure function. Nothing about Assrt changes that. What changes is the integration-vs-E2E middle of the pyramid: the reason integration tests were preferred over E2E was setup cost, and Assrt has made the setup cost a one-time payment across all scenarios in a plan. For UI and flow-level verification, E2E is now competitive with integration on speed and strictly better on fidelity. Keep unit tests. Rewrite integration tests that were really shallow E2E tests as real E2E scenarios.

What does it cost to run a 20-scenario plan compared to a typical E2E suite?

The Playwright browser launch is paid once. The dev-server navigate is paid once. The login flow (signup email, OTP) is paid once thanks to the shared profile. The remaining cost is per-scenario LLM tokens to drive the accessibility tree (default claude-haiku-4-5-20251001, a few cents per scenario) plus real browser action time. A twenty-case plan that would take twenty minutes as independent Playwright specs typically finishes in three to four minutes on Assrt because eighteen of the twenty scenarios start mid-flow, not at login.