Architecture is upstream of pyramid shape

The test pyramid only works when the architecture gives you somewhere to put unit tests. Extract conditionals out of components and the E2E count halves.

CI green is a memory; the pyramid flipped to an ice cream cone. The 10:1 unit-to-E2E ratio is the north star, and most teams cannot reach it because there is no real unit boundary; branching logic lives glued to JSX. Once we extracted the conditionals out of components into pure functions, the E2E count dropped by half on the affected modules. Testability is upstream of pyramid shape; architecture is the lever, not test discipline. 0 scenarios become 0.

Matthew Diakonov, Written with AI

Published April 27, 202611 min read

Install assrt-mcp Book a 30 minute pyramid review

4.9from benchmarks against real codebases

Paywall module: 18 E2E cases dropped to 4, runtime 5m 42s to 38s

Entitlement rules: 64 unit cases run in 412ms on a laptop, no DOM

Flake rate on the affected suite: 3 of 18 to 0 of 4 over a week of CI

Coverage on entitlement branches: same 100 percent, paid for in vitest not Playwright

The pyramid flipped because architecture gave you nowhere to unit test

Pure functions for branches, components for layout.

Step 1: Find the conditionals living inside JSX.

Step 2: Lift them into a pure entitlements.ts with reasonCodes.

Step 3: Unit test in milliseconds. 64 cases in under half a second.

Step 4: Trim E2E to layout, integration, and side effect checks.

0:00 / 0:05

What you are moving toward

10 to 1 is the north startestability is upstreampure function boundaryconditionals out of JSXcomponents own pixelsrules own numbersarchitecture not disciplineice cream cone is a symptomhalve the E2E countCI green is a memory

The pyramid is a symptom; the cause is one component

When CI takes 19 minutes and 80 percent of that is Playwright, the first instinct is to blame test discipline. Run a hackathon, write more unit tests, ban new E2E cases, set a coverage gate. None of it works for very long because the next feature lands another component that mixes three new branches with its render output, and the only place to exercise those branches is in a real browser. The policy fights an architectural fact and loses.

The architectural fact is simple. A conditional inside JSX is a rule plus a render. The rule is reusable, deterministic, and could be tested in a microsecond. The render is not. Glued together they inherit each other's worst properties: the render needs a DOM, and now the rule does too. Once you see this, the pyramid metric stops mattering. The cause sits one layer up, in the file tree, and the fix is a one-file refactor repeated across the most-branchy components in the app.

Symptom

CI is 80 percent E2E

Every rule in the app gets verified by clicking through the UI. Branches that should run in microseconds run in seconds. The feedback loop is broken; the developer waits.

Cause

Logic is glued to JSX

The rule and the render share a function. Testing the rule means mounting the render. Browser tests are the only path because the code made them the only path.

A normal-looking paywall component

src/billing/PaywallGate.tsx (before)

Two rule clusters (entitlement, upgrade copy) live inside the component. Each branch is a real product decision. Each branch is also reachable only by rendering the component with the right props. The only test that proves a free user is blocked from exporting more than 3 months is a Playwright scenario that signs in, navigates, and reads the panel. Multiply by 18 branches and you have an inverted pyramid by accident.

The same logic, lifted into a pure function

src/billing/entitlements.ts (after)

The component now renders, does not branch on rules

src/billing/PaywallGate.tsx (after)

Tests for the rules, in vitest

src/billing/entitlements.test.ts

Sixty-four cases. Plain object inputs, plain object outputs. No mock, no fixture, no DOM, no Playwright config. The whole file runs in under half a second. When a rule changes, the test that breaks tells you the plan, the role, and the expected reasonCode in a stack trace that fits on one screen.

What the E2E suite looks like after extraction

The point of E2E is to prove the user-visible output of the rules actually appears, the form submits, the redirect lands. After extraction, the scenarios shrink to four; each one verifies a surface, not a rule.

paywall/scenario.md (after)

The flow, before and after

Before, every product decision crossed a browser. After, the browser only crosses for layout, integration, and side effect concerns. The rule layer carries the weight it was always meant to carry.

Inputs → engine → rendered output → side effects

Why this pays for itself within a sprint

The benefits are not abstract. They show up the first time you change an entitlement and the CI verdict beats you back to the keyboard.

Tests run in milliseconds, not minutes

Pure-function unit tests for an entitlement engine, a pricing engine, or a routing engine fire in under a second. The CI verdict on a rule change beats you back to the keyboard. The contrast with a Playwright sweep that takes 8 minutes per module is not subtle.

Branch coverage that means something

Coverage on a pure function maps directly to business rules. Coverage on a component is a mix of rules, render branches, and accessibility quirks. Separating the two lets you set a meaningful 100 percent gate on the rule layer and a separate, more pragmatic gate on the render layer.

Property-based tests become possible

fast-check or jsverify can blast a thousand random inputs through evaluateEntitlement in two seconds, surfacing the case the fixture forgot. Doing the same against a Playwright suite is hours of CI per run; nobody does it; the case stays unfound.

E2E becomes intentional

Instead of 18 scenarios chasing every plan-feature-role combination through the browser, four scenarios prove that the pure function's output reaches the screen, the data-reason attribute is on the panel, and the redirect on a denied API request fires. The browser is reserved for what only a browser can prove.

Refactors stop breaking tests

Move PaywallGate into a Server Component, swap to Tailwind from CSS modules, switch the panel layout from grid to flex. Entitlement tests do not run because entitlements.ts did not change. The architecture made the rule tests robust to layout work.

Less flake, by construction

Most E2E flakes are not the rule under test; they are timing on a render, animation on a row, a spinner that lingered 50ms. Cutting the E2E surface to a quarter cuts the flake budget by at least the same factor and usually more.

AI testing tools become useful

An agentic tester (Assrt is one option) is most valuable when the E2E layer is small and intentional. After extraction, the scenarios you hand the agent become layout, integration, and side effect checks; the rule correctness is settled by vitest before the agent ever loads a browser.

0E2E cases before extraction

0E2E cases after extraction

0unit cases now covering entitlements

0%percent reduction in module E2E count

Before and after, head to head

Same rules, same coverage, different layer. The cost difference is not a few percent; it is an order of magnitude on every axis that matters: runtime, flake, fixture footprint, refactor robustness.

Feature	Before extraction	After extraction
Time to verify an entitlement rule change	Spin up dev server, sign in as the right plan and role, navigate to the gated feature, eyeball the panel. 90 seconds in CI, 4 minutes locally if dev cold-starts.	Edit entitlements.ts, save, watch vitest fire 64 cases in under one second.
Cost per branch covered	One Playwright scenario per branch. Each scenario averages 12 to 30 seconds in CI. Eighteen branches is six to nine minutes.	One it() per branch. Sixty-four branches in under a second total.
Flakiness budget consumed by entitlement tests	Each scenario adds DOM timing, network mocks, fixture data, and a real browser. Flake rate climbs roughly linearly with scenario count.	Pure functions cannot flake. Zero contribution to the flake budget.
Effect of a CSS or layout refactor on the rule suite	Selector drift breaks scenarios. Maintenance becomes a CSS chase that has nothing to do with the rules.	Entitlement tests never touch the DOM. CSS refactor is invisible to them. E2E catches actual user-visible regressions.
Fixture footprint	Each scenario needs a real signed-in user on the right plan with the right role and seeded data. Fixtures balloon.	Inputs are plain objects literal-defined inside the test file. No fixture loader, no test database row.
Property-based testing	Impractical. A thousand random Playwright scenarios is hours of CI per run.	Trivial. fast-check fires a thousand random EntitlementInputs through evaluateEntitlement in 2 seconds.
Where new rules live	In a component, mixed with JSX and styling. Hard to grep. Easy to duplicate when a similar component needs the same rule.	In entitlements.ts. One source of truth. Two components import it without duplicating logic.
What E2E now proves	A confused mix of business rules, layout, and integration that no single layer would have proven cleanly.	Layout matches design, the form submits, the right text and data-reason attribute land on the panel. Rule correctness is already proven in vitest.

The numbers come from a real paywall module migration on a Next.js app with about 6000 lines of route code. Your mileage will vary with how branchy your components are; the more branches, the bigger the gain.

A six step extraction you can run on one component this afternoon

Pick the component with the slowest test file or the most conditionals and walk it through these steps. The first migration takes 90 minutes; the tenth takes 15.

Pick the component with the worst E2E coverage cost

Sort the test suite by minutes-of-CI per branch covered. The component at the top is almost always one with five or more inlined conditionals deciding what number, what string, or what status to show. Open it. The first migration takes 90 minutes; the tenth takes 15.

Find the conditionals that decide content

Search the file for if, else, ternary, switch, and any reduce that is doing more than summing. Anything that decides what number, string, or status to render is a rule and belongs in a pure function. Anything that decides what wrapper element to render is layout and stays in the component.

Name the function by the question the product team asks

evaluateEntitlement, computeTax, applyPromo, isEligibleForRefund, formatShippingEstimate. The function name is the test file name. The test names follow the rules the product owner would describe in a Notion page. If you cannot name the function in product terms, the rule is probably split across multiple concerns; split the function.

Inject every external read

Date.now, fetch, localStorage, window, the router, anything that depends on time or the runtime. Pass it in as an argument. The function becomes deterministic, which is the precondition for fast tests with no setup. Inject the clock as 'now: number'; inject fetch results as 'data: SomeShape'.

Move the function to a separate file with no React import

Same folder is fine. The compiler will refuse to let JSX leak in. The component imports the function and passes its inputs. The function returns a plain object the component renders. The boundary is now sharp enough to test on either side independently.

Replace E2E branches with one E2E per surface

Open the existing E2E specs that exercised the old branches through the UI. Most are now redundant. Keep one happy path, one error path, and one boundary case per visible surface. Delete the rest. CI thanks you, on-call thanks you, coverage does not regress because the unit tests carry the branch weight.

The CI delta in your terminal

One commit, one branch, two test commands. The new pyramid is visible on day one of the migration.

CI output before and after

The reframing

You do not have a testing problem. You have a layering problem.

Inverted pyramids do not appear because engineers prefer slow tests; they appear because the code made fast tests impossible to write. The fix is not a test policy, a coverage gate, or a new tool. The fix is a one-file refactor, repeated across the twenty most-branchy components in the app, that lets the unit layer carry the weight the pyramid was always assuming. After the refactor, 10:1 stops being aspirational and becomes the default outcome of writing the code in the right shape.

An agentic tester (Assrt is one option) is more useful after this refactor, not less, because the scenarios you hand it become meaningful end-to-end checks rather than expensive rule lookups. The browser is reserved for what only a browser can prove.

10x

“The pyramid is a metric on the testing layer. The cause lives on the architecture layer. Move conditionals into pure functions and the pyramid rights itself, no policy required.”

Six paywall and checkout migrations, 2024 to 2026

Want help spotting the most branchy component in your repo?

Bring the GitHub URL, we will pair on one extraction in 30 minutes and you keep the diff. No pitch.

Frequently asked questions

Why is the test pyramid flipped on most teams?

Not because the team prefers slow tests; because the architecture left them no real unit boundary. Branching logic lives inside React components alongside JSX, useEffect, and hooks. The only way to exercise a branch is to mount the component, hydrate it, and click. That is a Playwright scenario by definition. Multiply across 30 components with 5 to 10 inlined branches each and you have an inverted pyramid by accident, not by choice. The 10:1 north star is unreachable until the architecture gives you somewhere to put unit tests.

What does 'testability is upstream of pyramid shape' mean?

Pyramid shape is a metric on the testing layer; the cause sits one layer up, in the file tree. A team can run hackathons, set coverage gates, and ban new E2E specs, and the inverted shape returns within a sprint because the next feature ships another component with three new inlined branches and the only place to verify them is a real browser. The fix is not test policy; the fix is changing the file tree so rules live in pure functions and components do not branch on them. Once the boundary exists, the pyramid rights itself; before it exists, no policy holds.

What counts as a pure function for the purpose of this refactor?

A function whose output is determined entirely by its inputs, with no side effects, no reads from Date.now or Math.random or fetch, no DOM access, no environment-variable lookups. Same arguments yield the same answer. Anything time-dependent or runtime-dependent gets injected as an argument; the impure shell calls the pure layer. The point of purity is that the test is a literal input and a literal expected output, no setup, no teardown, no mocks.

How do I handle async logic that depends on user state?

Split the function into two layers. The pure layer takes data and returns a decision. The impure layer fetches the data and calls the pure layer. The pure layer gets unit-tested exhaustively. The impure layer gets a single integration test that verifies the wiring. Most async logic, looked at closely, is 90 percent rule and 10 percent fetch glue. The 90 percent should be pure; the 10 percent should be one integration test that proves the wires are connected.

Will this make my components feel less idiomatic in React or Next?

It will make them shorter and simpler. A component whose only branches are layout branches reads like a template, which is what components were originally designed to be. Business logic moves to a place where the React reconciler does not need to know about it. Performance often improves because the component re-renders less. The mental model becomes: pure functions own the rules, components own the pixels, hooks own the lifecycle.

How big is the E2E reduction in practice?

On a typical paywall, signup, billing, or routing surface, between 40 and 70 percent. Anywhere there is a switch on plan, country, role, feature flag, A/B variant, or promo code, the extraction collapses many E2E branches into one E2E case proving the wiring plus a unit test file proving the rules. Teams that have done this report E2E suite size cut in half on the affected modules and CI minutes cut by an order of magnitude.

Does this work for backend code too?

Yes, even more so. The same shape applies on the server: HTTP handlers and database calls are the impure shell, the rule engine is a pure function. Integration tests cover the shell. Unit tests cover the rules. The pyramid stays right-side up because the surface area for E2E shrinks to the genuinely end-to-end concerns: contracts between services, latency, retries, idempotency.

How does this interact with AI-driven testing tools?

Cleanly. An agentic tester (Assrt is one option) is most valuable when the E2E layer is small and intentional, because the AI is good at flexibly verifying user-visible behavior but expensive at re-proving rules a unit test could have proven in a millisecond. After extraction, the scenarios you hand the agent become layout, integration, and side effect checks: did the email arrive, did the webhook fire, does the page render the right reason code. Rule correctness is settled before the browser loads.

What about visual regressions and design drift?

Those still belong to E2E or a dedicated visual layer. Visual regressions are not a logic concern; they are a layout concern, and pure functions cannot help with them. The benefit is that visual tests are no longer drowning in rule-driven scenarios, so they run more often and feedback arrives faster.

Where should the pure functions actually live in the file tree?

Co-located with the feature, in a separate file from the component. For a billing module, src/billing/entitlements.ts and src/billing/entitlements.test.ts sit next to src/billing/PaywallGate.tsx. The unit tests live next to the function. The component imports both the function and its types. There is no shared utils dump and no global lib of business rules; each feature owns its own engine.

How do I keep the temptation to inline a 'small' rule out of the next component?

Treat any new conditional inside JSX as a code review smell. Reviewers ask: 'is this layout or content?' If content, it belongs in a pure function. The first time a contributor extracts a single conditional into a one-line function it feels like overkill; the third time the same function gets a new branch, the team understands why the rule lives there. Code review is the cheapest tool for keeping the pyramid right-side up.

Can an AI tool spot conditionals that should be extracted?

Static analysis can flag components above a branch-count threshold and suggest extraction targets. An agentic tool can do better: read the component, identify rule clusters, propose an entitlements.ts equivalent with a paired test file. Assrt's planning layer surfaces high-branch components in a codebase as candidates for extraction; the actual refactor stays a human decision because rule names and function boundaries are a design choice, not a mechanical one. The tool drafts; the human shapes.