Read the source, not the pitch

AI open source testing, as a process tree, not a brand directory

Every other guide on this topic is a list of tools tagged "open source" with no explanation of what that means at runtime. This one uses the Assrt source as a worked example: exactly 0 runtime dependencies, one stdio spawn of @playwright/mcp, 0 tools the model can call, and a plan file you can open in any text editor. Every number on this page is countable in an MIT-licensed repo.

M
Matthew Diakonov
11 min read
4.8from developers using assrt-mcp locally
6 runtime dependencies in assrt-mcp/package.json, MIT licensed
One subprocess per run: @playwright/mcp cli.js over stdio
Plans at /tmp/assrt/scenario.md, results at /tmp/assrt/results/latest.json
@anthropic-ai/sdk@google/genai@modelcontextprotocol/sdk@playwright/mcpposthog-nodews6 runtime deps, MIT licensedPlan at /tmp/assrt/scenario.mdResults at /tmp/assrt/results/latest.json18 agent tools in agent.tsZero API key with Claude CodeNo YAML DSL

Why brand directories are not an answer

If you query this topic you get listicles. Eight or ten tool names, one paragraph each, a little icon, a link. The word "open-source" is used as a marketing tag, not as a technical claim. None of those articles tell you what a single test run looks like at the operating-system level: which child processes start, what they write to disk, how they authenticate to a model, or what the agent is allowed to do. That is the actual question someone choosing an open runtime wants answered.

This guide is the opposite. It uses one real runtime, Assrt, as a worked example. Every number and file path on this page corresponds to a specific line in src/core or assrt-mcp/package.json. If you want a different runtime, the template still applies: count its dependencies, find its spawn command, list the tools it exposes to the model, and locate the file its plan lives in. A genuinely open runtime will give you all four without making you sign up.

The six-dependency floor

Open the repo and grep package.json. Assrt's runtime declares six dependencies, each with a boring job. Anthropic's SDK calls the LLM. Google's GenAI SDK is the Gemini path. The MCP SDK speaks the protocol. @playwright/mcp is the subprocess. posthog-node handles optional telemetry. ws is used only to fan out screencast frames when the agent runs in server mode. There is no hidden transitive binary, no proprietary runtime, no compiled blob. This is the whole dependency set.

assrt-mcp/package.json

Some of these deps are optional. You can set ANTHROPIC_API_KEY="" and never import the Anthropic SDK code paths. You can delete posthog-node and the telemetry module and the loop still runs. The one hard dependency is @playwright/mcp: everything in Assrt sits on top of that upstream subprocess. If @playwright/mcp were closed, Assrt would be too. Because it is open, the whole stack is.

One spawn, no cloud call on the spawn path

When you run a test, exactly one child process is started. It is the upstream Playwright MCP CLI. Everything Assrt does to configure the browser is expressed as argv for that one process. The full argv is assembled in src/core/browser.ts around line 296. Here is what the process tree looks like if you watch it with ps -ef during a run.

The actual spawn, reconstructed from browser.ts

Before this line runs, zero network calls have been made. The LLM-side API call only happens after the agent loop is ready to ask the model for its first tool decision, and even that call can be routed through a proxy of your choice. Between your terminal and the OS, there is one Node process and one Chromium subprocess. That is the entire footprint.

What actually flows between model, runner and browser

Three local inputs, one agent, three local outputs

scenario.md
Keychain / env
Your localhost
assrt-mcp
@playwright/mcp
LLM endpoint
/tmp/assrt/results

Every edge in this diagram is code you can read. Left column: the plan is a file on your disk, the credential comes from your own shell or your own Keychain, and the URL is whatever dev server you already have running. Right column: the Playwright MCP subprocess is a Node child, the LLM endpoint is a regular HTTPS request to a documented API, the results land in /tmp. No part of this requires a vendor endpoint to operate.

The six steps of a run, exactly as the code executes them

If you put a breakpoint on TestAgent.run and step through, this is what you see, in order.

1

1. CLI parses args, loads credentials

assrt CLI reads your command line, then calls getCredential in src/core/keychain.ts. ANTHROPIC_API_KEY env var wins; otherwise on macOS the function shells out to 'security find-generic-password' and pulls the Claude Code OAuth token. Three lines of keychain code, zero cloud round-trip.

2

2. TestAgent spawns @playwright/mcp over stdio

browser.ts line 284 resolves @playwright/mcp's cli.js via require_.resolve, then StdioClientTransport forks a Node subprocess with --viewport-size 1600x900 --output-mode file --output-dir ~/.assrt/playwright-output --caps devtools. This is the only child process the runner creates.

3

3. Scenario written to /tmp/assrt/scenario.md

writeScenarioFile writes the plan as plain Markdown. Metadata goes to /tmp/assrt/scenario.json. fs.watch subscribes so you can edit the file with your normal editor and have changes debounced for sync. None of these files need a login to read.

4

4. Agent loop: snapshot → decide → act → capture

The model calls snapshot first. The accessibility tree is saved to ~/.assrt/playwright-output/<uuid>.yml, returned as text. Model emits a tool call citing ref=eN. Agent runs it through the MCP client. If the tool was one of the six visual tools, a JPEG is attached to the next model message.

5

5. Assertions, video, and results written locally

Every assert call is appended to the report. If --video is set, a .webm is saved to /tmp/assrt/video-*/. When the model calls complete_scenario, writeResultsFile dumps the report to /tmp/assrt/results/latest.json and /tmp/assrt/results/<runId>.json. Both are plain JSON.

6

6. Browser quits (or stays open if you asked)

Close the Playwright MCP subprocess, exit. Pass --keep-open and the browser window stays so you can interact with the remaining state. Scenario.md and latest.json remain on disk; nothing is uploaded unless cloud sync is explicitly configured.

Your plan is a .md file. That is the whole input contract.

There is no YAML DSL, no JSON schema you have to learn, no visual editor. The plan is a sequence of #Case N blocks in a plain Markdown file. The parser in src/core/agent.ts splits on the #Case regex, turns each block into an independent scenario, and hands the steps to the model as human-language instructions. The only formatting rule is the case header.

/tmp/assrt/scenario.md

Because the plan is text, you can commit it to the same git repository as the code it tests. You can generate it from a template. You can have an agent write it. You can diff two plans to see what coverage changed. None of this requires Assrt itself; any open runtime that uses plain-text plans inherits the same property.

The report is a .json file. That is the whole output contract.

writeResultsFile writes the TestReport to /tmp/assrt/results/latest.json and a copy to /tmp/assrt/results/<runId>.json. The TestReport schema is five fields plus a scenarios array. Nothing else. This is what a pass looks like on the wire.

/tmp/assrt/results/latest.json

jq, diff, a CI checker, a dashboard you write yourself: any of these can consume this file. There is no required SDK, no authentication to read it, no server-side rendering step. If you want a dashboard, you build one. If you want a CI gate, you pipe the JSON into your existing check.

The artefacts that land on your disk

Every file a run can produce, with the exact path it ends up at. None of these are cloud-only. None of them require a login to read.

Plan in Markdown

Test cases are #Case blocks in a plain .md file at /tmp/assrt/scenario.md. Commit to git. Diff between runs. No DSL, no visual editor required.

Results in JSON

Every run writes /tmp/assrt/results/latest.json and /tmp/assrt/results/<runId>.json. TestReport schema is defined in src/core/types.ts.

Browser profile on disk

Cookies and localStorage persist in ~/.assrt/browser-profile by default. Pass --isolated for in-memory, or --extension to reuse your real Chrome.

Snapshot output files

Accessibility trees are written to ~/.assrt/playwright-output/<uuid>.yml instead of shipped inline over the MCP transport, so huge DOMs do not blow up the stdio pipe.

Video recordings

Pass --video and every scenario is captured to /tmp/assrt/video-*/<scenario>.webm. A local HTML player auto-opens on finish. No cloud upload required.

Auth in macOS Keychain

If you already use Claude Code, the OAuth token is pulled from the 'Claude Code-credentials' service in Keychain. Zero extra setup on macOS.

Treat these as the public interface of an open-source AI test

Paths you will see in ~/.assrt and /tmp/assrt

scenario.md

Plain Markdown test plan at /tmp/assrt/scenario.md

scenario.json

Meta at /tmp/assrt/scenario.json: id, name, url

results/latest.json

Most recent run at /tmp/assrt/results/latest.json

results/<runId>.json

Historical runs keyed by UUID

playwright-output/

Accessibility trees at ~/.assrt/playwright-output/<uuid>.yml

browser-profile/

Persistent profile at ~/.assrt/browser-profile for cookies

extension-token

Playwright Bridge token cached at ~/.assrt/extension-token

video-*/*.webm

WebM recordings at /tmp/assrt/video-<ts>/

Auth without an API key: Keychain, if you want it

Most AI test runners make you paste an API key into a dashboard. On macOS, if you already use Claude Code, Assrt does not need one. The function getCredential in src/core/keychain.ts checks ANTHROPIC_API_KEY first, then shells out to the macOS security binary and reads the OAuth access token Claude Code already saved under the service name "Claude Code-credentials". If that succeeds, the Anthropic SDK is initialised with an oauth-2025-04-20 beta header instead of an X-Api-Key header. You do nothing. On Linux, the function throws and asks for ANTHROPIC_API_KEY. That is the full auth surface.

src/core/keychain.ts (abridged)

This is an assrt-specific shortcut, but the pattern is the point: because the runtime is on your machine, it can read from your machine's secure storage. A SaaS AI test runner cannot read your macOS Keychain. It cannot use an OAuth token you have not explicitly handed over. Local trust is a capability that only makes sense when the runtime is open and local.

What the agent can actually do: 18 tools, no more

The set of actions the model is permitted to take is a closed, enumerable list. It lives in the TOOLS array at src/core/agent.ts starting at line 16. Every tool has an input_schema the model must satisfy. There is no plugin loader, no remote tool registry, no hidden tool you cannot see. Extending the agent means adding one record to the array and one branch in the switch at line 766.

Browser-facing tools (11)

  • navigate
  • snapshot
  • click
  • type_text
  • select_option
  • scroll
  • press_key
  • wait
  • wait_for_stable
  • screenshot
  • evaluate

Assertion and side-channel tools (7)

  • create_temp_email
  • wait_for_verification_code
  • check_email_inbox
  • assert
  • complete_scenario
  • suggest_improvement
  • http_request

Listicles on this topic never enumerate a tool surface. They cannot, because for a closed runtime the list is not published. An open runtime is one where the list is a file you can read. The difference in auditability is the difference between knowing what the agent can do and hoping you know.

Brand-directory article vs. source-level guide

Same topic, two incompatible levels of resolution. Side by side, the gap looks like this.

Same topic, different resolution

FeatureThe usual listicleThis guide
What the article actually tells youList of tool names and one-paragraph descriptionsThe exact spawn command, dependency list, file paths, and tool surface
Number of runtime dependenciesUsually unstatedExactly 6, listed in assrt-mcp package.json
Where test plans are storedOften a vendor-hosted database or YAML spec/tmp/assrt/scenario.md as plain Markdown, fs.watch-synced
Where results are storedVendor dashboard, sometimes with an export button/tmp/assrt/results/latest.json and /<runId>.json, both plain JSON
Auth for the LLMPaste an API key into the vendor UIANTHROPIC_API_KEY env var, or Claude Code OAuth token from macOS Keychain
Ability to read the agent's perception policyClosed: the agent is a black boxOpen: src/core/agent.ts lines 16-196 and the SYSTEM_PROMPT at line 198

Numbers you can count in the source

Clone assrt-mcp, open the files, count. Every number below has a concrete referent.

0Runtime dependencies in package.json
0Agent tools defined in agent.ts
0Child process spawned per test run
0Cloud calls on the spawn path

Reading the runtime yourself

This page is a derivation of the source, not a summary of it. If you want to verify every claim, these are the exact landmarks.

Where each claim lives in assrt-mcp

  • TOOLS array at src/core/agent.ts line 16, 18 tools, each with a JSON Schema input_schema
  • SYSTEM_PROMPT at src/core/agent.ts line 198, one string that shapes the agent
  • Spawn command assembled at src/core/browser.ts line 296, passed to StdioClientTransport
  • Plan and results paths at src/core/scenario-files.ts lines 16-20
  • Keychain read at src/core/keychain.ts line 48, shells out to the macOS 'security' binary
  • Six runtime dependencies declared in assrt-mcp package.json (MIT licence in LICENSE)
  • Playwright MCP is the upstream subprocess, pinned to ^0.0.70 in package.json

Want a walk-through of your own open test run?

Bring a URL and a plan. We will spawn the subprocess, watch the tools fire, and point at every file the runner writes in real time.

Frequently asked questions

What does 'open source' actually mean when applied to an AI testing tool?

It means the code that drives the browser, the prompt that shapes the agent, and the policy that decides when to take a screenshot all live in a repository you can clone, read, patch, and run locally. In Assrt's case the entire runtime sits in assrt-mcp, which is MIT licensed and published to npm as assrt-mcp. There are six runtime dependencies declared in package.json: @anthropic-ai/sdk, @google/genai, @modelcontextprotocol/sdk, @playwright/mcp, posthog-node, and ws. No proprietary YAML DSL, no cloud-only control plane, no signup wall to read the code. You can grep the TOOLS array in src/core/agent.ts and see exactly which tools the model can call. When people say 'open source' in a listicle they often mean 'has a GitHub repo somewhere' without the guarantee you can audit and self-host the full loop. The distinction matters because the AI decisions are the thing you most want to audit.

What literally spawns on my machine when I run an assrt test?

A single Node.js process spawns @playwright/mcp's cli.js over stdio. The full argv assembled in src/core/browser.ts is: process.execPath followed by [cliPath, '--viewport-size', '1600x900', '--output-mode', 'file', '--output-dir', '~/.assrt/playwright-output', '--caps', 'devtools']. If you pass --isolated, '--isolated' is appended. Otherwise the command adds '--user-data-dir ~/.assrt/browser-profile' so cookies persist. If you pass --headed=false (the default), '--headless' is spliced in at position 1. That subprocess manages the actual Chromium. Your assrt-mcp process talks to it via StdioClientTransport from @modelcontextprotocol/sdk and speaks the MCP protocol. There is no cloud call on this path. The LLM call is the only network hop, and even that one can be local if you run it through an Anthropic-compatible proxy or a self-hosted Gemini endpoint.

Where does the test plan live? Can I edit it with a normal text editor?

Yes. The plan is written to /tmp/assrt/scenario.md as plain UTF-8 Markdown. Metadata sits next to it at /tmp/assrt/scenario.json. Results for the last run are at /tmp/assrt/results/latest.json and every historical run is dumped to /tmp/assrt/results/<runId>.json. These paths are defined as constants in src/core/scenario-files.ts. An fs.watch subscribes to scenario.md; if you edit the file in Cursor or VS Code, the change is debounced for one second and optionally synced to the cloud scenario store. Nothing about reading or editing the plan requires a browser, a login, or a dashboard. You can commit scenario.md to git alongside the app it tests. You can diff two runs by diffing two latest.json files. This plain-file surface is what people mean when they talk about treating tests as first-class code.

How many tools can the model call, and where are they defined?

Eighteen. They are declared as a const TOOLS array at the top of src/core/agent.ts, starting at line 16. In order of definition: navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable. Each one has a JSON Schema input_schema the model must satisfy. The SYSTEM_PROMPT at line 198 explains how to use them. There is no external registry, no plugin loader, no secret tool surface hidden behind a feature flag. If you want to add a tool, you add a record to TOOLS and a case branch in the switch at line 766. That is the whole extension mechanism, and that is a property you only get from an open-source runtime.

Can I run this without an API key if I already use Claude Code?

Yes, on macOS. src/core/keychain.ts calls 'security find-generic-password -s Claude Code-credentials -w' to read the OAuth access token Claude Code already stored in Keychain, then sends it to Anthropic with the 'oauth-2025-04-20' beta header. The getCredential function tries ANTHROPIC_API_KEY first, then falls back to the Keychain entry, and throws a clear error on Linux if neither is present. The implication for installing the tool is that a developer with Claude Code already logged in has zero additional auth to do. That behaviour depends on the tool reading from your local system, which is only something an open-source runtime can do with your trust. A SaaS test runner cannot read your Keychain on your behalf, and it cannot use a token you have not explicitly pasted into its dashboard.

What is the actual test output format? Is it locked to Assrt or can I grep it?

Plain JSON. The writeResultsFile function in src/core/scenario-files.ts writes a pretty-printed JSON report to /tmp/assrt/results/latest.json and a copy to /tmp/assrt/results/<runId>.json. The TestReport type in src/core/types.ts is the schema: url, scenarios, totalDuration, passedCount, failedCount, generatedAt, and each scenario has name, passed, steps, assertions, summary, duration. Steps themselves carry action, description, status, timestamp. There is no proprietary binary format, no encryption layer, no required SDK to parse the output. 'cat', 'jq', 'diff', and 'grep' work. This matters because when teams describe vendor lock-in in test tools they usually mean a proprietary test definition (like a YAML DSL) plus a proprietary result format that only renders in the vendor dashboard. Assrt exposes plain markdown in and plain JSON out.

How does the agent decide which element to click without being told?

It calls the 'snapshot' tool, which asks @playwright/mcp for the accessibility tree of the current page, written out to a file in ~/.assrt/playwright-output and returned as a text blob. Each interactive element is tagged with a ref ID like [ref=e5]. The system prompt instructs the model to cite the ref when it emits a click or type_text call. This is the standard Playwright MCP selector pattern; Assrt just wires the model's tool calls to it. If a ref goes stale because the page changed, the model re-snapshots and picks a fresh ref rather than guessing pixel coordinates. The behaviour is traceable because the snapshot output is literally saved to a file you can open: ~/.assrt/playwright-output/<uuid>.yml.

Can I use this with my existing Chrome session, cookies and logins intact?

Yes, via the extension mode. Pass --extension (or extension: true to assrt_test via MCP) and the spawn command adds '--extension' instead of the user-data-dir flag. This connects to your live Chrome via the Playwright MCP Bridge extension using Chrome DevTools Protocol. The first time you run it, you will be prompted for a one-time token printed by 'npx @playwright/mcp@latest --extension' after you approve the connection in Chrome; Assrt saves that token to ~/.assrt/extension-token so future runs are zero-friction. Running against the real Chrome you sit in front of every day means you inherit your own cookies, your own 2FA state, your own Google sign-in. There is no parallel test-user-fleet to manage. Cookies stay on your laptop.

What is the licence, and can I fork it?

MIT. The LICENSE file in assrt-mcp reads 'MIT License, Copyright (c) 2026 Assrt'. You can fork it, change the system prompt, rip out the cloud sync, add your own tools, ship a binary to your own team. The only things Assrt does to 'check in with home' are the optional Firestore scenario store and optional PostHog telemetry; both are gated behind configuration and both can be disabled without touching the test loop. If you want a private, air-gapped AI test runner, you can build one by deleting the scenario-store import and pointing getCredential at your local LLM proxy. The licence does not constrain that, and the architecture does not either.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.