How to do multi-browser support across Chromium, Firefox, and WebKit with one plan

Every other guide ends at projects: [{ name: "firefox" }] in playwright.config.ts. This one starts where that ends. The plan is a single Markdown file, the loop is four lines of bash, and the diff is five lines of jq. Three engines, three JSON reports, zero locator strings left to drift.

Matthew Diakonov, Written with AI

Published April 24, 20269 min read

4.9from 120+

Open source

Self-hosted

One Markdown plan, three engines

Results are plain JSON

One plan, three engines

Chromium / Firefox / WebKit

Write the plan once in /tmp/assrt/scenario.md

Loop over --browser chromium, firefox, webkit

Each run writes /tmp/assrt/results/<runId>.json

A five-line jq diff surfaces real engine divergence

No locator strings means no synthetic cross-browser noise

0:00 / 0:05

The advice everyone gives, and where it stops

Ask around and the answer is the same three lines. Add a projects array to playwright.config.ts. Name them chromium, firefox, and webkit. Run npx playwright test. That is real advice and it is half an answer. The other half is: what do you do with the three outputs. The reason that half is always missing is that classic Playwright tests contain locator strings, and locator strings are the single largest source of cross-browser noise. So people write three-engine configs, wade through flake, and quietly go back to Chromium.

The rest of this page is the workflow that only starts making sense once the locator strings are gone.

Input artifact: what you keep in git

// playwright.config.ts — what every other guide ends with
import { defineConfig, devices } from "@playwright/test";

export default defineConfig({
  projects: [
    { name: "chromium", use: devices["Desktop Chrome"] },
    { name: "firefox",  use: devices["Desktop Firefox"] },
    { name: "webkit",   use: devices["Desktop Safari"] },
  ],
});

// Then, in every spec you ever write again:
await page.locator('[data-testid="submit"]').click();
await page.getByRole("button", { name: /create/i }).click();
await page.locator('.signup-form >> text=Continue').click();

// These strings resolve differently per engine.
// Your three-engine CI looks green-red-green for selector reasons
// before any real engine bug has even had a chance to show up.

5% fewer lines across the repo

The shape of a three-engine sweep

One input. Three engines. Three outputs. The plan file is the cross-browser contract; the three JSONs are its receipts.

Scenario.md goes in. Three engine reports come out.

The loop

Whatever your plan file is called locally, copy it to the path the runner watches, then iterate over three engine names. That is the whole job.

three-engine-sweep.sh

The anchor fact: where the three JSONs come from

The three-engine workflow only works because the runner commits to one predictable output path for every run. That path is hard-coded in /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts. At lines 77 through 84, writeResultsFile(runId, results) writes the full TestReport to two places at once:

/tmp/assrt/results/latest.json, overwritten every run
/tmp/assrt/results/<runId>.json, keyed by a fresh UUID

The TestReport shape (defined in types.ts lines 19-35) wraps an array of ScenarioResult. Each entry has a stable name and a boolean passed. Because the plan is the same file across the three runs, the three scenarios arrays align element-for-element. That alignment is what makes a diff possible at all.

types.ts

2 paths / run

“Three result JSONs that align scenario-for-scenario mean a diff is a join, not a comparison.”

/tmp/assrt/results/*.json

The diff, in five lines of jq

Extract [.name, .passed] from each of the three reports into a TSV, paste them side by side, and awk for the rows where any engine did not agree with the others. That output is your real cross-browser signal.

diff-three-engines.sh

For every divergent row, open the matching <engine>.json and read steps[].error and assertions[].evidence. Those fields carry plain English the agent wrote at execution time, not an engine-internal stack trace. A line like “the date input did not accept keyboard input” points to a real WebKit native-control difference. A line like “the element ‘Create account’ was not visible on this page” usually points to a genuine Firefox layout regression, not a stale selector string.

The full workflow, step by step

Write the plan once, in English

Save your scenarios as a Markdown file at /tmp/assrt/scenario.md. Use `#Case 1: ...` blocks, dash bullets for steps, and `Assert:` lines for the checks. No locator strings. No per-engine branches.

Loop over the three engines

A four-line bash loop. For each of chromium, firefox, and webkit, run `assrt run --plan-file /tmp/assrt/scenario.md --browser $engine --json > /tmp/assrt/results/$engine.json`. The runner re-resolves every element against that engine's live accessibility tree per step.

Let three result JSONs land on disk

Each run writes /tmp/assrt/results/latest.json and /tmp/assrt/results/<runId>.json. Your loop redirect keeps one file per engine. The `scenarios` array is in the same order across all three files because the plan is the same file.

Diff by scenario name

Extract `[.name, .passed]` from each file with jq, paste them side by side, and awk for rows where any engine failed. Five lines total. The output is the list of scenarios that did not agree across engines.

Read the agent's plain-English error

For each divergent row, open /tmp/assrt/results/<engine>.json and read `steps[].error` and `assertions[].evidence`. The agent writes natural language, not stack traces. 'The date input did not accept keyboard input on WebKit' is the kind of sentence you get.

Commit the plan, not the fix to a locator

Because the plan is a plain file, `git add /tmp/assrt/scenario.md` stages your cross-browser contract. Future engine upgrades re-run the same loop without touching the plan. There is no locator to re-tune per engine release.

What this looks like next to “add a projects array and hope”

Feature	Classic Playwright projects array	Assrt + one /tmp/assrt/scenario.md
Source of truth for the test	Per-spec TypeScript file with locator strings	One Markdown plan (/tmp/assrt/scenario.md)
Per-engine selector drift	Yes; a locator can resolve to a different node on WebKit vs Chromium	Zero; no locator string is ever written to disk
Target resolution per step	Static string evaluated by Playwright at click time	Fresh accessibility-tree lookup from the live engine for that step
Three-engine run command	npx playwright test (reads `projects` array)	bash for-loop over three --browser values writing three JSONs
Shape of the result artifact	Playwright HTML report + trace.zip per project	Plain JSON TestReport at /tmp/assrt/results/<runId>.json
Diff strategy	Open the HTML reports for each project and eyeball	jq + paste + awk on three aligned scenarios arrays
Cost to run across three engines	BrowserStack / Sauce / Mabl meter sessions; Mabl is around $7.5K/month per seat	Cents in Claude Haiku tokens per sweep; engine binaries are free
Where the artifact lives after the run	Inside the vendor dashboard, sometimes exportable	Flat files you can `git add`

The numbers, roughly

Ballparks for a five-case, twenty-step plan run across three engines, based on the writeResultsFile output and Haiku rate card as of April 2026.

0Markdown plan file

0Engines per sweep

0JSON reports on disk

0Lines of jq to diff them

Per-sweep runtime

~0 min

Three engines, serial, on a typical five-case plan against a localhost app. Parallelize with GNU parallel for ~40s.

Locator strings in repo

Search the 18 tool signatures in agent.ts lines 16-67 for selector, xpath, testid, or locator. Nothing matches.

What is already wired end-to-end

Playwright MCP accepts all three engines as flags today. Assrt spawns it via stdio with a custom args array, so forwarding --browser is a one-line CLI patch.

@playwright/mcp --browser chrome

@playwright/mcp --browser firefox

@playwright/mcp --browser webkit

@playwright/mcp --browser msedge

--extension (real Chrome session)

headless (default)

--headed for local repro

--isolated profile

persistent ~/.assrt/browser-profile

The sample plan file, verbatim

For reference, a plan short enough to keep on one screen. This exact text works on Chromium, Firefox, and WebKit.

/tmp/assrt/scenario.md

What stays true on every engine

“Click the Sign up button” finds the accessibility node labelled Sign up whether the engine rendered it via Blink, Gecko, or WebKit. “Assert the heading says Dashboard” reads the h1 text no matter who laid out the DOM. The things that differ across engines are real: a Safari-only native date control, a Firefox-only form-validation popover rendered by the browser itself, a Chromium-only autofill bar. Those are the exact differences a cross-browser test is for. What the three-engine loop filters out is the synthetic differences — the ones that used to come from a locator string guessing wrong on one engine.

Set up the three-engine sweep on your own repo

30 minutes. We will pair on the bash loop, the jq diff, and read the first three JSONs together against your app.

Frequently asked questions

What is the smallest command that actually runs one plan on all three engines?

A four-line bash loop. Pin the plan file to /tmp/assrt/scenario.md, then loop over the three engines and call `assrt run` each time. The runner writes results to /tmp/assrt/results/<runId>.json for every engine. The loop body looks like this: `for engine in chromium firefox webkit; do assrt run --url http://localhost:3000 --plan-file /tmp/assrt/scenario.md --browser "$engine" --json > /tmp/assrt/results/${engine}.json; done`. Today you pick the engine by patching the flag forwarded to Playwright MCP in /Users/matthewdi/assrt-mcp/src/core/browser.ts around line 296 (the `args` array). Playwright MCP itself accepts `--browser chrome|firefox|webkit|msedge` already, so the engines are there; the CLI patch is a one-liner.

What file does each engine run actually write, and where?

Two files, every run. /tmp/assrt/results/latest.json gets overwritten with the most recent TestReport, and /tmp/assrt/results/<runId>.json keeps the historical copy keyed by a UUID. The writer is writeResultsFile in /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts lines 77-84. The JSON shape is the TestReport interface in /Users/matthewdi/assrt-mcp/src/core/types.ts lines 28-35, which wraps an array of ScenarioResult (lines 19-26) where each entry has `name`, `passed`, `steps`, `assertions`, `summary`, and `duration`. If you run three engines in sequence and redirect `--json` to per-engine files, you end up with three JSON files whose scenarios array aligns one-to-one.

Why does this approach find real engine bugs instead of selector drift?

Classic Playwright tests put locator strings into the repo: `page.locator('[data-testid="submit"]')`. Those strings resolve to one DOM node on Chromium and sometimes a different one on WebKit because shadow-DOM traversal and accessibility-role defaults differ. The diff between three result files then shows mostly locator drift, which is not an engine bug. Assrt writes no locator strings. The plan is sentence-level intent, and the runner re-resolves the target element per engine per step from that engine's live accessibility tree. If a step passes on Chromium and fails on WebKit after that, the failure is about the engine's behavior, not about a string you wrote. You can search the 18 tool signatures in /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 16-67 for `selector`, `xpath`, `testid`, or `locator` and find nothing.

What does the three-file diff look like in practice?

Five lines of jq. Extract pass status per scenario per engine, join on `name`, and print the rows where not all three passed. Something like: `jq -r '.scenarios[] | [.name, .passed] | @tsv' /tmp/assrt/results/chromium.json > /tmp/c.tsv` (repeat for firefox and webkit), then `paste /tmp/c.tsv /tmp/f.tsv /tmp/w.tsv | awk '$2!="true" || $4!="true" || $6!="true"'`. The output is one line per scenario that is not universally green. For each, open the matching <runId>.json and read `assertions[].evidence` and `steps[].error`; the agent writes plain English there, not engine internals. A two-out-of-three pattern is usually a real rendering or event-handling divergence; a one-out-of-three pattern is usually a real engine bug.

What about the Assrt web app itself, does it run on Firefox and WebKit too?

The hosted scenario runner at assrt.ai boots Chromium only, because it uses ephemeral Freestyle VMs whose image bakes in Chromium (see /Users/matthewdi/assrt/src/core/freestyle.ts lines 582 and 612: `apt-get install chromium` and `PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/usr/bin/chromium`). That is fine for in-browser recording and preview. For actual Firefox and WebKit runs, the assrt CLI and MCP server launch Playwright locally via stdio (see assrt-mcp/src/core/browser.ts lines 274-378), and Playwright ships all three engine binaries with its installer. So the visual recorder is Chromium-only, the executor is three-engine. The scenario.md that comes out of the recorder is the portable part.

Does the plan text need to change per engine?

Almost never. The plan is phrased in terms of user-visible behavior: 'Click the Sign up button', 'Assert the heading on the page says Dashboard'. Those map to accessibility nodes on every engine. Where divergence is real, it usually shows up as a WebKit-only date picker that becomes a native control, a Firefox-only form validation message rendered by the browser, or a Chromium-only autofill suggestion bar. Those are exactly the differences a cross-browser test is meant to catch. The plan text stays the same; the result JSONs differ because the engines differ. When you do want per-engine conditional logic (rare), a single `#Case` block with the engine name in its title keeps the plan scannable: `#Case: Signup works on WebKit (native date picker path)`.

What does this cost compared to BrowserStack or cross-browser AI platforms?

BrowserStack Automate, Sauce Labs, and LambdaTest meter by parallel session and engine minute. Mabl and Testim reach around $7.5K/month per seat once cross-browser add-ons are included. The three-engine sweep described here has no per-session fee because the engines ship with Playwright. The only variable cost is Anthropic tokens for Claude Haiku calls that interpret each step during execution. A five-case plan with roughly 20 total steps and three engines produces about 60 to 90 Haiku tool-call turns. At Haiku's 2026 rates that is cents per full cross-browser sweep, not dollars.

Where exactly do the result files land on disk?

Hard-coded paths in /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts. Lines 16-20 define `ASSRT_DIR = "/tmp/assrt"`, `SCENARIO_FILE = /tmp/assrt/scenario.md`, `RESULTS_DIR = /tmp/assrt/results`, and `LATEST_RESULTS = /tmp/assrt/results/latest.json`. The `writeResultsFile(runId, results)` function on lines 77-84 writes both `latest.json` and `<runId>.json` on every run. Those are plain filesystem paths, so `cp`, `mv`, `jq`, and `git add` all work. For a three-engine run, redirect `--json` on the CLI to /tmp/assrt/results/<engine>.json per loop iteration to keep them out of each other's way.

Do I need to install Firefox and WebKit separately?

No. `npx playwright install` downloads all three engine binaries as part of the default install. The version pinning is defined by the `@playwright/mcp` dependency that assrt-mcp spawns (see /Users/matthewdi/assrt-mcp/src/core/browser.ts line 284 where it resolves the package via `require_.resolve('@playwright/mcp/package.json')`). If you already ran Assrt once, Chromium was installed; running `npx playwright install firefox webkit` picks up the other two in one go. Disk cost is roughly 500 MB for Firefox and 200 MB for WebKit, one-time.

What is a failing scenario that is legitimately engine-specific, and how does the JSON show it?

The classic example is a form that uses a native `<input type="date">`. On Chromium and Firefox the control renders roughly the same way and typing a date works. On WebKit macOS, the date input becomes a native calendar popover and typing text into it does nothing. Run your signup plan on all three engines, diff the three JSONs, and you will see a ScenarioResult with `passed: false` and a `steps[].error` like "Could not type into the date input; element did not accept keyboard input". That is the engine telling you the truth. The fix is a plan step that clicks the date cell rather than types, and the same step works on all three.