Cross-browser automation, open source

Multi-browser automation for Chromium, Firefox, and WebKit, using an API that does not know which browser it is driving.

Open /Users/matthewdi/assrt-mcp/src/core/agent.ts and grep for chromium, firefox, or webkit. Zero matches, across 1,087 lines and 18 tool definitions. That is the whole trick. The automation contract lives one layer above the engine, on the accessibility tree every browser is obligated to expose.

See the source

Matthew Diakonov, Written with AI

Published April 24, 202610 min read

4.9from early adopters

18-tool agent API, zero engine parameters

Accessibility tree as the cross-engine contract

Open source, MIT licensed, no cloud dependency

One API, three engines

The automation does not know which browser it is driving

18 tools. None take a browser parameter.

The agent reads an accessibility tree.

Every engine exposes one.

The launcher picks the engine, not the code.

0:00 / 0:05

The whole argument in four commands

Most cross-browser automation guides argue by reference: they tell you the product works on three engines and you have to take their word for it. This one argues by file path. Open a terminal in the assrt-mcp repository and run these four commands. The automation surface either names an engine or it does not, and that is a grep away.

Verify it yourself

Eighteen tool definitions. Zero of them take a browser parameter. That is the anchor fact. Everything below this section is a consequence of it.

The tool contract

Here is the TOOLS array from agent.ts, elided to just the shape. Read the parameter lists. Ask yourself which parameter, if removed, would require a second one to replace it. The answer is none, because there is no engine parameter to replace.

assrt-mcp/src/core/agent.ts (lines 16-196, elided)

navigate

Takes a url string. No engine. The accessibility tree the next snapshot returns is whatever engine the launcher picked.

snapshot

Returns the current engine's accessibility tree with [ref=eN] handles. Called before every action and after every action per the agent system prompt.

click / type_text / select_option

Take element (human-readable description) and ref (transient accessibility-tree id). No selector string is ever persisted to disk.

wait_for_stable

Waits for 2s of zero DOM mutations via MutationObserver. Handles per-engine paint-speed differences without hardcoded timeouts.

assert

Takes description, passed, evidence. Engine-agnostic by design — evidence is plain English the agent writes.

complete_scenario

Terminal tool. When the agent calls it, the runner stops. No engine knowledge needed to decide when a scenario is done.

Where the engine actually gets chosen

If the agent does not pick the engine, someone else does. That someone is the launcher at /Users/matthewdi/assrt-mcp/src/core/browser.ts line 296. The args array you see below is forwarded to node_modules/@playwright/mcp/cli.js via a stdio transport. Playwright MCP itself accepts --browser chrome|firefox|webkit|msedge, documented on line 364 of its README, and it also reads the PLAYWRIGHT_MCP_BROWSER environment variable.

assrt-mcp/src/core/browser.ts (line 296)

Where the engine actually gets picked

Everything on the left chooses the engine. Nothing on the left is the agent. The agent is a layer up, consuming whatever accessibility tree the selected engine returns on the next snapshot call.

0Tools in the agent API

0Tools that take an engine parameter

0Engines Playwright MCP drives

0Flag to switch engines

The per-step loop is the whole portability story

Engines do not disagree about intent. They disagree about structure. Chromium and WebKit render the same page and produce accessibility trees that are shaped differently, with different ref ids, different role defaults, different combobox conventions. Any automation that caches a selector across actions inherits every one of those disagreements as flake. The Assrt loop never caches.

Call snapshot

Agent asks the current engine for its accessibility tree. Chromium, Firefox, and WebKit each return their own tree with engine-native refs like e3, e4, e5.

Model reads the plan step

Claude Haiku receives the step ('Click the Sign up button') plus the tree. It picks the ref that matches the described element on this engine's tree.

Agent calls a tool

click({ element: 'Sign up button', ref: 'e5' }). The tool definition lives at agent.ts lines 31-42. No browser parameter exists.

Re-snapshot

Per agent.ts line 209: 'After each action, call snapshot again to see the updated page state'. Every engine's mutation is read back fresh.

On failure, auto-resnapshot

agent.ts lines 1014-1017: the error handler calls browser.snapshot() before re-prompting, so stale refs from one engine never cross over into the next retry.

Repeat until complete_scenario

The loop only exits when the agent calls complete_scenario (agent.ts lines 145-156). Engine identity never enters the loop body.

The key line of the system prompt is at agent.ts:209: “After each action, call snapshot again to see the updated page state.” The error path at agent.ts:1014-1017 does this automatically on any exception, folding a fresh tree back into the re-prompt. A ref from Chromium never crosses over into a Firefox retry. A Firefox tree never contaminates a WebKit step. Per-step, per-engine resolution is the default, not a feature.

A sequence view of one step running on any engine

The dashed line at the end of this diagram is the critical one. The agent calls snapshot again immediately after the tool returns. The next step starts with a fresh tree, regardless of which engine served it.

One plan, three engines, per-step re-resolution

Engine-aware code vs engine-agnostic scenarios

On the left, a typical cross-browser Playwright spec. Three projects, one matrix, and a per-engine branch once a locator starts drifting. On the right, the same intent as a Markdown scenario Assrt executes.

typical .spec.ts — engine named in code

scenario.md + agent.ts — engine-neutral

Side-by-side against the usual cross-browser patterns

Feature	Typical Playwright + projects	Assrt
Engine named in automation code	Yes — browserName, devices[name], project config	No — grep returns 0 matches in agent.ts
Where engine is chosen	Config file, per-test fixture, or CI matrix	Spawn args at browser.ts:296 (one-line flag)
What drifts between engines	Locator strings (persistent, need per-engine branches)	Accessibility tree refs (transient, re-read each step)
File the automation lives in	.spec.ts bound to Playwright locator API	/tmp/assrt/scenario.md (plain English)
Selector re-resolution cadence	Once at test authoring time	Per step, per engine, from live a11y tree
Per-engine conditional branches needed	Common when DOM trees diverge	Never (agent cannot see engine)
Cost to run on three engines	$7.5K/mo (Mabl/Testim tier) or session metering	Open source + Haiku tokens (cents per sweep)

Engines, roles, and what actually moves between them

The three engines that Playwright drives all agree on one surface, and disagree on almost everything else. The accessibility tree is anchored to WAI-ARIA, which browsers must honor for screen readers. CSS selector resolution, shadow-root traversal, and DOM event ordering are not similarly anchored. Automating against roles is automating against the one layer engineers at Google, Mozilla, and Apple are externally accountable to keep aligned.

role=buttonrole=textboxrole=listboxrole=comboboxrole=dialogrole=menurole=tabrole=tabpanelrole=linkrole=navigationrole=headingrole=alertrole=imgrole=checkboxrole=radiorole=slider

Every one of those roles is what Assrt's snapshot tool returns from the current engine. The agent asks the model: “which node in this tree matches ‘the Sign up button’?” The model replies with a ref like e5. That ref is valid for the duration of one tool call on one engine. It is not saved, not diffed across engines, not committed to a selector file. It has no value outside that one moment.

0 engine params

“The automation surface is literally incapable of naming an engine, because the data type is not in the tool signatures.”

agent.ts lines 16-196

What this changes in your workflow

If your cross-browser coverage plan today is: (a) maintain a projects matrix in playwright.config.ts, (b) accept the resulting locator drift, and (c) write per-engine branches when needed — this is an honest upgrade. You write one Markdown scenario, you point the launcher at a different engine, and the agent re-resolves every step against whatever that engine rendered. Per-engine branches stop being useful, because the agent cannot read a branch condition that says “if webkit” — the agent cannot see webkit.

The scenarios you generate live at /tmp/assrt/scenario.md (path hardcoded at scenario-files.ts line 17). You can check them into Git beside your app. They do not drift per engine. They drift only if the product changes.

Want to run one Markdown scenario on all three engines?

Book 20 minutes. I will show you the grep, the spawn args, and a live three-engine run against your own URL.

Cross-browser automation FAQ

What is the actual proof that Assrt's automation API does not name a browser?

Run `grep -Ei 'chromium|firefox|webkit' /Users/matthewdi/assrt-mcp/src/core/agent.ts` from the assrt-mcp repo root. Zero matches. The file is 1,087 lines; the TOOLS array runs from line 16 to line 196 and defines 18 tool schemas (navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable). None of their input schemas accept a field named browser, browserName, engine, or anything resembling an engine selector. The agent is literally incapable of branching on engine identity because the data type is not in its tool signatures.

If the agent does not know the engine, how does the engine actually get picked?

At the Playwright MCP spawn point, which is one function below the agent. Open /Users/matthewdi/assrt-mcp/src/core/browser.ts line 296 and you see the exact line: `const args = [cliPath, "--viewport-size", "1600x900", "--output-mode", "file", "--output-dir", outputDir, "--caps", "devtools"];`. That array is forwarded to node_modules/@playwright/mcp/cli.js via a stdio transport. The @playwright/mcp CLI accepts `--browser chrome|firefox|webkit|msedge` (documented at node_modules/@playwright/mcp/README.md line 364) and also reads the `PLAYWRIGHT_MCP_BROWSER` environment variable. Appending `--browser firefox` to that args array is a one-line patch; the agent above it does not change.

Why is the accessibility tree the right abstraction for cross-browser automation?

Because it is the only DOM projection all three engines are contractually obligated to expose. CSS selectors resolve through each engine's own DOM traversal, which means shadow-root encapsulation rules, slot assignment, and display-contents behavior diverge. XPath traversal depends on the engine's document-order resolution, which is spec-leaky at edges. The accessibility tree is anchored to WAI-ARIA roles, names, and states, which browsers must match for screen readers. When Assrt's snapshot tool returns, it is not returning 'what Chromium thinks the page looks like' — it is returning the accessibility projection each engine renders for assistive tech. That is the one layer where Chromium, Firefox, and WebKit actually agree by construction.

What does the re-snapshot-after-every-action loop actually do for engine portability?

It makes engine differences absorbable per step instead of fatal per test file. The agent's protocol at /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 207-209 says verbatim: 'ALWAYS call snapshot FIRST... Use the ref IDs from snapshots... After each action, call snapshot again to see the updated page state'. Line 218 adds: 'If a ref is stale (action fails), call snapshot again to get fresh refs'. At lines 1014-1017 the runtime does this automatically on any error — it catches the exception, calls `this.browser.snapshot()` before re-prompting, and hands the fresh accessibility tree back to the agent as part of the error context. When Firefox and WebKit render the same page with slightly different ARIA label generation, each engine gets its own fresh tree per step. The refs (like 'e5') are transient. Nothing about the previous engine's tree contaminates this one.

Is there a concrete case where engine differences would break a Playwright test but not an Assrt scenario?

Dropdown menus. On Chromium, a `<select>` rendered via a custom component typically produces aria-haspopup="listbox". On WebKit, the same component may produce aria-haspopup="menu" or emit role=combobox depending on how it registers options. A Playwright locator like `page.getByRole('listbox')` works on Chromium and silently returns null on WebKit. Assrt's `click` tool takes a human description ('the country dropdown') plus a ref from the latest snapshot. On WebKit, the snapshot returns the element under whatever role WebKit assigned; the agent asks Claude Haiku 'which ref matches the country dropdown' and clicks that ref. Same intent, different engine-native ref, same outcome.

Does Assrt expose the browser choice to the end user yet, or is it only the underlying Playwright MCP that supports it?

Today the plumbing is all there but the CLI flag is not surfaced. /Users/matthewdi/assrt-mcp/src/mcp/server.ts lines 339-356 show the `assrt_test` MCP tool schema accepts viewport, headed, isolated, extension, and extensionToken, but not browserName. The one-line path to expose it is adding a `browser` Zod field to that schema and forwarding it into the args array on /Users/matthewdi/assrt-mcp/src/core/browser.ts line 296. In the meantime you can already set `PLAYWRIGHT_MCP_BROWSER=firefox` in the environment before spawning assrt-mcp; the env var is inherited by the child process because browser.ts line 363 spawns with default `process.env` when no extension token override is present.

How does this compare to running a Playwright test file with a `projects` array?

A projects-array approach reruns your .spec.ts file N times with a different engine each time. What actually gets rerun is a file full of locator strings that resolve differently per engine. Your test becomes a matrix of passes and fails whose failures are usually locator drift, not product bugs. Assrt inverts the relationship: the automation is re-resolved per engine at execution time, not per engine at config time. There is no spec file to drift. The scenario is a plain Markdown file at /tmp/assrt/scenario.md (path hardcoded at /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts line 17). When three engines run the same scenario, the failures you see are real engine divergence — a WebKit date picker becoming native, a Firefox form validation message rendering differently — not a locator that stopped matching.

What is the cost per three-engine run compared to the commercial cross-browser platforms?

BrowserStack, Sauce Labs, and LambdaTest meter by parallel session and engine minute. Mabl and Testim land around $7,500 per month per seat once cross-browser add-ons are included. Assrt has no per-session fee: Chromium, Firefox, and WebKit ship with Playwright at zero cost via `npx playwright install`. The only variable cost is Anthropic tokens for the Claude Haiku calls that interpret steps at runtime (the default model is set at /Users/matthewdi/assrt-mcp/src/core/agent.ts line 9: `DEFAULT_ANTHROPIC_MODEL = "claude-haiku-4-5-20251001"`). A five-case scenario with roughly twenty total steps produces about 60-90 tool-call turns per engine, which is cents per three-engine sweep at 2026 Haiku rates.

What do I actually keep if I stop using Assrt tomorrow?

The scenario file at /tmp/assrt/scenario.md, which is plain Markdown with `#Case N:` blocks. That file describes what to test in English sentences. If Playwright is replaced tomorrow by something else, you feed the same file into whatever replaces it. Compare against a .spec.ts file full of `await page.getByRole('button', { name: 'Submit' }).click()` — that file is bound to Playwright's locator API. It cannot be moved to a different runner without a rewrite. The engine-neutrality of Assrt's automation contract also makes the artifact runner-neutral as a side effect: a scenario that does not name a browser also does not name a framework.

How do I prove to my team that the automation code really does not care which engine is running?

Four commands. First, `wc -l /Users/matthewdi/assrt-mcp/src/core/agent.ts` to show the file exists and is 1,087 lines. Second, `grep -cE 'chromium|firefox|webkit' /Users/matthewdi/assrt-mcp/src/core/agent.ts` returns 0. Third, `grep -n '^ name:' /Users/matthewdi/assrt-mcp/src/core/agent.ts | head -20` lists the 18 tool names; not one mentions a browser. Fourth, `grep -n 'args.*viewport\|args.*browser' /Users/matthewdi/assrt-mcp/src/core/browser.ts` shows that engine-related concerns live in browser.ts, not agent.ts. The separation is physical, not convention.