For the "passes in staging, fails in prod" thread

Cross environment test regression is a string substitution problem, not a platform problem

The usual fix for "passes in staging, fails in prod" is to fork the test. A second file with the production URL, a second test account, a second expected total. In a month the two copies have drifted and you do not know which one is lying. The mechanic that actually solves this is embarrassingly small: one regex that swaps {{KEY}} placeholders for real values before the agent reads the plan. Assrt ships exactly that mechanic, six lines of it, and because the plan stays plain Markdown and the runner stays real Playwright, the solution does not become a new vendor format you then have to migrate off.

Matthew Diakonov, Written with AI

Published April 20, 202610 min read

One plan, every environment

Swap the variables map, not the test

Write the scenario once, in plain Markdown

Replace env-specific strings with {{KEY}} placeholders

Pass a variables map per environment

Assrt interpolates at run time, drives real Playwright

Diff the per-env events.json; drift becomes visible

0:00 / 0:05

4.9from Assrt MCP users

One regex drives cross-env substitution: agent.ts:379

Variables param is typed z.record(string, string) at server.ts:344

Three viewport presets built in: 1600x900, 1440x900, 375x812

Plan stays plain Markdown; zero vendor lock-in

What the top results for this keyword leave out

Search the phrase and you get two genres of article. One is the QA-lead essay: "cross-environment testing is hard because environments drift, here are eight management-level strategies." True, architectural, zero code. The other is the vendor blog: "our platform has environment profiles, book a demo." True for them, not cheap. What none of the top results show is the concrete mechanic that lets you write one test and run it against many environments without forking. The rest of this page is that mechanic, pulled from the actual source.

The wrong instinct

Fork the test per env

Works for a week. Drifts in a month. Now you have two tests that disagree and no single source of truth.

The right instinct

One plan, many variable maps

The test file never changes between envs. Only the map changes. Drift becomes impossible by construction.

The whole mechanic, six lines

Most vendor tools that claim cross-environment support hide the mechanic in a closed binary. Assrt's version lives in six lines of agent.ts. Here it is, verbatim, from /Users/matthewdi/assrt-mcp. Before the browser launches, before any Playwright call, the agent walks the plan text and swaps every {{KEY}} with its mapped value. That is it. That is the whole feature.

assrt-mcp/src/core/agent.ts

The matching surface on the MCP tool is one line of Zod. The description matters because it is also what the agent reads, so the model knows the map is how you parameterize the run.

assrt-mcp/src/mcp/server.ts

6 lines

“The cross-environment feature is a for-loop around a regex. If a competitor tells you cross-env testing requires a platform, they are upselling a for-loop.”

/Users/matthewdi/assrt-mcp/src/core/agent.ts, lines 376-381

How one Markdown plan reaches three environments

On the left are the things that are identical across environments: the plan file, the Playwright code the agent emits, the scenarioId. In the middle is the runner, which consumes those plus a per-env variables map. On the right is the actual browser run, against whatever host you pointed {{BASE_URL}} at.

Same scenario, different variables, many environments

Forked-per-env vs parameterized, same user story

Left: the checkout test as two files, one per environment. This is what most repos have today. It is readable for a week. Right: the checkout scenario as a single scenario.md invoked with two different variables maps. Same user story, one source of truth, drift is no longer possible.

The same user story, expressed once instead of twice

// The way almost every team does cross-env tests today.
// One recorded test per environment. They drift in a month.

// tests/checkout.staging.spec.ts
test("checkout flow", async ({ page }) => {
  await page.goto("https://staging.example.com/cart");
  await page.fill('[name="email"]', "qa+staging@example.com");
  await page.click('button:has-text("Checkout")');
  await expect(page.getByText("Total: $42.99")).toBeVisible();
});

// tests/checkout.prod.spec.ts   <-- copy-paste, edit 4 strings
test("checkout flow", async ({ page }) => {
  await page.goto("https://example.com/cart");
  await page.fill('[name="email"]', "qa+readonly@example.com");
  await page.click('button:has-text("Checkout")');
  await expect(page.getByText("Total: $39.99")).toBeVisible();
});
// Now you have two sources of truth. One will drift.
// When prod fails, you diff... what? Two slightly different
// tests. Good luck.

-52% fewer lines

The six kinds of cross-environment drift, and the variable that kills each one

Every cross-environment regression I have debugged in the last two years falls into one of these buckets. The playbook is the same for each: give the drift a name, turn that name into a {{KEY}}, move the literal into the variables map. The scenario file goes back to being env-agnostic.

URL and routing drift

Different base URLs, different redirect chains, canonicalization that strips a trailing slash on prod and keeps it on staging. Fix: parameterize {{BASE_URL}}, verify every cross-env click target is a relative path in the plan, let the interpolation handle the host.

Auth and account drift

Staging has magic login links. Prod demands SSO. Test account exists in staging, has been deleted in prod. Fix: {{USER_EMAIL}}, {{USER_PASSWORD}}, {{SSO_TENANT}} as first-class variables, and a read-only prod account that is never deleted.

Feature flag drift

`checkout_v2` is on in staging, off in prod. A test that asserts on the v2 total fails in prod for the right reason. Fix: {{FEATURE_FLAG_CHECKOUT_V2}} read by the plan prose so the agent branches on the expected state.

Data and fixture drift

SKU PROD-001 exists in staging seed data, never seeded in prod. Price differs by currency gateway. Fix: {{SKU}}, {{EXPECTED_TOTAL}}, {{CURRENCY}}. Plan asserts on the variable, not a hardcoded string.

Viewport drift

Passes at desktop 1440x900, fails at mobile 375x812 because the CTA wraps below the fold. Run the same scenario with {viewport: 'mobile'} and {viewport: 'desktop'} to catch it.

Third-party drift

Staging hits a Stripe test key; prod hits live. CSPs differ. Analytics snippets differ. Fix: {{STRIPE_KEY}}, {{ANALYTICS_ID}} surfaced as variables so the assertions stay env-agnostic.

{{BASE_URL}}{{USER_EMAIL}}{{FEATURE_FLAG}}{{SKU}}{{EXPECTED_TOTAL}}{{STRIPE_KEY}}agent.ts:379server.ts:344viewport 1600x900viewport 375x812scenario.mdvariables.record(string, string)

What a cross-env regression looks like on the wire

Two runs of the same scenario.md, thirty seconds apart. Same steps. Same assertions. Only the variables map differs. Staging is green. Prod surfaces a real pricing drift that the test caught because the expected total came from the map, not from a hardcoded string.

npx assrt run — staging

npx assrt run — production

From here, the failing run goes to assrt_diagnose (prompt at server.ts:240-268). The verdict is almost always one of three buckets; for cross-env failures it is overwhelmingly "environment issue." The Analysis section names the concrete drifted variable. You add it to the map. The scenario is portable again.

Lines of code that implement cross-env (agent.ts:376-381)

Viewport presets (1600x900, 1440x900, 375x812)

Plan file, regardless of how many environments you run against

Signup cost; Assrt is open-source, tokens only

Adopt the pattern in a single afternoon

You do not need to rewrite your suite. Start with one scenario, the one that already hurts. Unfork it. Wire two variable maps. Run it twice in CI. When the first prod regression that would have shipped gets caught by the read-only prod run, the rest of the team will ask you to do the same for their scenarios.

Fork → parameterize → dual-run → diagnose

1
Pick the one scenario that hurts most
Start with the scenario you already forked per env. Usually it's checkout, signup, or an auth flow. Merge the two copies back into one Markdown #Case; every string that differs between envs becomes a {{KEY}}.
2
Define the variables map once
Create two JSON files (or two CI secrets groups): staging.vars.json and prod.vars.json. Same keys, different values. Missing keys fail loudly at the preflight probe; that's the contract.
3
Wire it into CI as two invocations
Run the same plan against staging first. On green, re-invoke the same plan against prod with the prod map. Upload both /tmp/assrt/<runId>/ dirs as CI artifacts so you can diff events.json if prod regresses.
4
When prod fails, run assrt_diagnose
Hand the failing scenario, variables, and stderr to the diagnose tool. The Root Cause section will name the drifted variable (URL, flag, fixture). Add the variable to the map and the scenario is portable again.

Cross-env readiness checklist

A scenario is cross-environment ready when each of these is true. If your current suite fails any of them, that is the work.

Eight properties of a cross-env-ready scenario

Every cross-env-sensitive string in the plan is a {{KEY}}, not a literal
staging.vars.json and prod.vars.json have identical keys
Your CI runs the scenario against staging first, then against prod
Both runs upload /tmp/assrt/<runId>/ as a CI artifact
Responsive-sensitive scenarios run with viewport 'mobile' and 'desktop'
A prod failure is routed through assrt_diagnose before anyone hand-patches
No scenario in your repo has a hardcoded hostname, email, or total
Read-only prod account is separate from staging account; both are in the vars map

Assrt vs vendor QA, for the cross-environment case specifically

This comparison is not about which tool records better. It is about what you own after a cross-environment regression bites and you need to investigate, fix, and move on. Vendor tools keep the env profile; Assrt keeps the plan and map in your repo.

Feature	Closed vendor QA	Assrt
Where test steps live	Proprietary JSON in a vendor DB, viewed through a web dashboard	Plain Markdown #Case in /tmp/assrt/scenario.md, committable to any repo
Environment profiles	Dashboard-owned env objects; invisible to git diff	A plain JS/JSON variables map; visible in CI logs and the run record
How one test runs against two envs	Duplicate the suite per env or clone within the dashboard	Same scenarioId, swap the variables map, re-invoke
Under-the-hood runner	Vendor-specific replay engine	Real Playwright via @playwright/mcp, same Chromium your CI already uses
Artifact you can diff between envs	Vendor report export if you pay for the plan	/tmp/assrt/<runId>/events.json, plus video, plus screenshots, always
Signup and vendor cost	Around $7,500/month for the named enterprise tools	$0; open-source, self-hosted, LLM tokens only
What you keep if you churn	A pile of JSON you cannot run	A Markdown plan plus a variables map plus real Playwright code
Adopting a new env (e.g. preview deployments)	Create a new dashboard profile, reconfigure per-seat	Pass a new {{BASE_URL}} on the next invocation; that's it

Bring one forked-per-env scenario, leave with a single plan

Thirty minutes. We take one of your cross-env tests, merge the forks into a single scenario.md, wire two variables maps, and run it against both envs live. You leave with a working pattern you can apply to the rest of the suite.

FAQ on cross environment test regression

What is 'cross environment test regression' actually about?

It is the recurring failure mode where the same end-to-end test behaves differently on staging than on production, or passes on your laptop and dies in CI. The symptom is usually a green suite that ships broken code, or a red suite that blocks a correct deploy. The root cause is almost never the test logic. It is a drift between environments: a feature flag that is on in staging and off in prod, a base URL that canonicalizes differently, seed data that exists in one and not the other, an OAuth client that only trusts the staging callback, a stricter CSP in prod that blocks a script the test assumes is there. Most teams solve this by forking the test, which works for a week and then diverges. Assrt solves it by keeping one #Case file and injecting the differences at run time.

How does Assrt interpolate variables across environments?

The MCP tool `assrt_test` accepts a `variables` parameter typed `z.record(z.string(), z.string())` at /Users/matthewdi/assrt-mcp/src/mcp/server.ts line 344. Before the browser launches, the agent walks the plan text and runs a regex substitution one line long: `scenariosText.replace(new RegExp(\`\\{\\{${key}\\}\\}\`, 'g'), value)` in /Users/matthewdi/assrt-mcp/src/core/agent.ts line 379. So `{{BASE_URL}}` becomes `https://staging.example.com` on the staging run and `https://example.com` on the prod run, and the agent also receives the resolved map in context so it can reason about what changed. The plan file on disk is untouched.

Why a regex substitution and not a proprietary DSL with typed envs?

Because a regex substitution is a forcing function for simplicity. It means the plan is still plain Markdown, the variables are visible to the human reading the file, and nothing about the mechanic is vendor-specific. You can copy the plan into a Google Doc, into a GitHub issue, into a hand-rolled script. Contrast that with vendor tools that encode env differences in a binary blob behind a web dashboard. When the vendor raises prices or you churn, those encoded differences leave with the vendor. With Assrt, the plan is yours. The variables are yours. The Playwright code the agent emits is yours.

What are the default viewport presets and why do they matter for cross-env regression?

Assrt's agent ships three viewports at /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 401-404: headless default 1600x900, preset `mobile` 375x812, preset `desktop` 1440x900. Responsive regressions are a second axis of cross-environment drift. A button that renders below the fold at 1440 may render above the fold at 1600 because an ad slot collapses differently. The same `variables` mechanic drives cross-env parameterization, but you pair it with `viewport: 'mobile'` on one invocation and `viewport: 'desktop'` on another to catch the layout regressions that a single-viewport test misses.

How does Assrt decide whether a cross-env failure is an app bug, a flawed test, or an environment issue?

There is a sibling tool, `assrt_diagnose`, whose system prompt at /Users/matthewdi/assrt-mcp/src/mcp/server.ts lines 240-268 forces exactly that fork. It demands a Root Cause section that picks one of three buckets (app bug, flawed test, environment issue), a three-to-five-sentence Analysis citing evidence, a Recommended Fix, and a drop-in Corrected Test Scenario in the #Case format. For cross-environment regressions, the verdict is usually 'environment issue' and the Analysis names the concrete drifted variable. You can then add a new `{{KEY}}` to the plan and the test is portable again.

Where do the test artifacts live after a cross-environment run?

The layout is defined at /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts lines 16-20: the plan text at /tmp/assrt/scenario.md, the scenario metadata at /tmp/assrt/scenario.json, and the latest results at /tmp/assrt/results/latest.json. Per-run artifacts (events, screenshots, WebM video) land under /tmp/assrt/<runId>/. When a staging run passes and a prod run fails with the same scenario and the same variables schema, you can diff /tmp/assrt/results/latest.json against a saved staging run and see which step changed verdict. That is a real cross-env regression signal, not a vibe.

How does this differ from Testim, Mabl, or other vendor QA tools?

Vendor tools typically store test steps as a proprietary JSON-in-database format, tie env differences to a dashboard-owned profile, and gate re-use behind per-seat pricing that starts around seven thousand five hundred dollars a month. Assrt is open-source and self-hosted. The plan is plain Markdown in your repo. The variables are a plain JavaScript object. The runner is real Playwright under the hood and emits real Playwright code you can commit. There is no vendor format to migrate off. Your cost is the Anthropic tokens the agent consumes, usually pennies per scenario, plus whatever it costs to run the Chromium your CI already runs.

Can I parameterize authentication and feature flags, not just URLs?

Yes. The variables map is untyped strings so anything you can express as text is fair game. Typical keys: `BASE_URL` for the target environment, `USER_EMAIL` and `USER_PASSWORD` for env-specific test accounts, `API_KEY` for a staging versus prod key pair, `FEATURE_FLAG_CHECKOUT_V2` to assert the flag state the test expects, `SKU` or `PRICE_ID` for products that exist in one env and not the other. The agent also sees the variables, so you can phrase the plan as `navigate to {{BASE_URL}}/checkout and verify the total matches {{EXPECTED_TOTAL}}` and the verification reads from the map.

What happens to the scenario if I forget to pass a variable that the plan references?

The literal string `{{BASE_URL}}` survives into the plan the agent sees, which means the agent will try to navigate to the literal URL `{{BASE_URL}}/checkout` and Playwright will fail the preflight probe at /Users/matthewdi/assrt-mcp/src/core/agent.ts line 390 with an actionable error. That failure is loud and early (before the browser even launches), which is the right behavior: you do not want a cross-env run that silently falls back to a default. If a key is optional, encode the optionality in the plan as prose ('if {{FEATURE_FLAG_X}} is enabled, also verify the banner appears') so the agent can branch explicitly.

How do I run the same scenario against staging and prod in CI?

Two invocations of `assrt_test` (or two runs of the `npx assrt` CLI) with the same `plan` text and different `variables` maps. A typical CI job takes the checked-in /scenarios/checkout.md, passes it to staging first with the staging `BASE_URL` and test account, and on green, runs the identical text against prod with the prod `BASE_URL` and a read-only prod account. Each run drops a video under /tmp/assrt/<runId>/, which CI can upload as an artifact. Because the scenario text is identical, the diff between the staging events.json and the prod events.json is the regression signal you actually want.