Self-hosted AI testing, by the filesystem

Run tests locally, self-hosted: the four files an AI-driven run writes to your disk, and the one env var that keeps them there.

Most articles for this topic tell you to npm install Playwright and call it a day. That works when the runner is 0 file. When the runner is an AI agent that receives a live accessibility tree and picks one of 0 tools per turn, "self-hosted" means something more specific: the browser, the scenario, the results, and the profile all have to live on your machine, and you need a kill switch for anything that tries to phone home. This guide shows you every path Assrt touches, every env var that matters, and the exact two lines of source that form the offline fallback.

Matthew Diakonov, Written with AI

Published April 23, 202610 min read

Install assrt from npm

4.9from source lines cited

browser.ts line 286: spawns local Playwright MCP via stdio

browser.ts line 313: persistent profile at ~/.assrt/browser-profile

scenario-store.ts line 14: CENTRAL_API_URL = process.env.ASSRT_API_URL || default

scenario-files.ts line 93: local- prefix skips the sync watcher

Self-hosted, by the filesystem

Four directories. One env var. No cloud in the hot path.

~/.assrt/browser-profile — logins persist

/tmp/assrt/scenario.md — the plan you edit by hand

/tmp/assrt/results/latest.json — structured run output

ASSRT_API_URL='' — kills scenario sync at the root

local-<uuid> — the two-line offline fallback

0:00 / 0:05

What most pages on this topic actually tell you

Open any of the top existing write-ups for this question and you get the same four-step shape: npm install, write a playwright.config.ts, set the webServer option to your localhost, run npx playwright test. Great advice when the test artifact is a .spec.ts file a developer wrote by hand. None of it helps when the artifact is the runtime trace of an AI agent that never committed a test file.

runs leave your networkproprietary YAML you can't grepseat-based pricingcloud-only selectorsquota on test minutesvendor-managed credentialsno local repro without a subscriptionartifact lock-in

Those are the problems a hosted AI testing platform hands you the moment you want to run something against a private staging URL, an air-gapped laptop, or a regulated environment. Self-hosting is not about frugality. It is about keeping the scenario plan, the browser profile, and the run results on a disk you own.

The four directories a local Assrt run touches

These paths are not a cache. They are the canonical storage the agent itself reads and writes. Every interaction with a live test flows through them.

~/.assrt and /tmp/assrt — the entire on-disk surface

Each path, and what puts it there

~/.assrt/browser-profile

Persistent Chromium profile. Cookies, localStorage, service workers, IndexedDB. Survives reboots so logins stick across runs. Created by browser.ts line 313 on first launch.

/tmp/assrt/scenario.md

The #Case plan, written in plain markdown. Watched by fs.watch with a 1-second debounce. Edit it by hand while the agent is idle — your changes get picked up on the next run.

/tmp/assrt/results/latest.json

Last run's structured output: scenarios, assertions, timings, error evidence. Same schema the MCP tool returns. Tail it from your CI runner instead of screen-scraping the CLI.

~/.assrt/playwright-output

Where accessibility-tree snapshots land as .yml files so the MCP transport doesn't choke on 2MB Wikipedia-sized trees. Set via --output-mode file at browser.ts line 296.

~/.assrt/extension-token

One-time token for --extension mode. Written on first use, read on every subsequent run. Only needed if you want the agent to attach to your real Chrome session.

~/.assrt/scenarios/local-<uuid>.json

The offline-mode fallback. When scenario-store.ts line 124 hits its catch block, this is where the scenario gets written — with a 'local-' prefix so the watcher knows never to try to sync it.

The kill switch: one env var, two lines of source

The only external endpoint the MCP server talks to (beyond the Anthropic API) is the scenario store at app.assrt.ai. That endpoint is overridable with ASSRT_API_URL and, more importantly, it is optional. When the POST fails, the code path at scenario-store.ts line 124 generates a local-only UUID and writes the scenario to disk. A second check at scenario-files.ts line 93 then disables the sync watcher for any ID with a local- prefix. That is the entire offline mode.

assrt-mcp/src/core/scenario-store.ts

assrt-mcp/src/core/scenario-files.ts

The anchor fact, in one place

When the scenario store POST fails — whether because you unset ASSRT_API_URL, pointed it at an unreachable self-hosted endpoint, or blocked app.assrt.ai at the network layer — this exact catch block runs. Note the local- prefix on line 124:

} catch (err) {
  console.error("[scenario-store] Central save failed:",
                (err as Error).message);
  // Generate a local-only ID with a prefix so we know it's unsynced
  const crypto = await import("crypto");
  const localId = `local-${crypto.randomUUID()}`;
  writeLocal({ id: localId, plan: data.plan, ... });
  return localId;  // <-- line ~131
}

The prefix is not cosmetic. It is how the watcher on the next line (scenario-files.ts line 93) decides never to attach to the file. One prefix, two files, a complete offline mode. Open the source after installing and verify it yourself — the relevant function is saveScenario in scenario-store.ts, and startWatching in scenario-files.ts.

How the pieces connect during a self-hosted run

Three inputs flow into a single agent, which fans out to real services on your own infrastructure plus a single LLM call. Nothing in this diagram is a hosted Assrt component.

Self-hosted data flow — everything but the LLM call stays on your disk

Three launch modes, three levels of "local"

A CLI flag decides which of these modes you get. Every mode keeps the browser, the plan, and the results on your machine. The difference is how much state sticks around between runs.

Picking the right launch mode for a self-hosted run

Persistent profile (default)

Chromium profile at ~/.assrt/browser-profile survives reboots. Sign into Gmail, Shopify admin, or your staging dashboard once; every future run starts already authenticated. browser.ts line 313.

Isolated (no disk writes)

Pass --isolated. Browser profile lives in memory and dies with the process. Every run is a clean slate. Useful for untrusted apps or CI where every artifact is ephemeral. browser.ts --isolated flag.

Attach to real Chrome (--extension)

Pass --extension. The agent connects to your already-running Chrome (the one with your extensions, password manager, enterprise SSO session) via @playwright/mcp's extension bridge. First run saves a token to ~/.assrt/extension-token so subsequent runs just work.

The persistent profile is why authenticated apps are testable at all

Cookies and service-worker storage survive across runs because the Chromium profile lives in a stable directory on disk. That is what turns a 5-minute Gmail signin dance into a one-time setup. The singleton-lock cleanup below is the part most homegrown profile persistence implementations forget.

assrt-mcp/src/core/browser.ts

A first self-hosted run, end to end

This is the exact transcript you get when ASSRT_API_URL is set to an empty string and the agent points at a local dev server. Note the offline-mode log line: the scenario saves locally and keeps running.

assrt run --url http://localhost:3000 --plan tests/signup.md --video

Getting to your first self-hosted run in five steps

The shortest path from zero to a passing test against a private localhost URL, with zero outbound traffic except the Anthropic model call.

Install the CLI

npx @assrt-ai/assrt setup — registers the MCP server globally, installs a PostToolUse hook that nudges the agent after git commits, and appends a QA testing section to ~/.claude/CLAUDE.md. The install itself is local: the CLI lives inside node_modules.

Export ANTHROPIC_API_KEY

Or let the CLI pull a Claude Code OAuth token from the macOS Keychain. Either way, the model call goes straight from your machine to api.anthropic.com. No Assrt middle tier.

Write a #Case plan

A plan is markdown with #Case headers. Example: '#Case 1: Signup flow\nNavigate to /signup, fill the form with a disposable email, verify the dashboard heading.' Pipe it via stdin, paste it with --plan, or point at a file with --plan-file.

Run against localhost

assrt run --url http://localhost:3000 --plan-file tests/signup.md. The agent does an 8-second HEAD preflight (browser.ts), spawns Playwright MCP over stdio, and starts calling its 18 tools against your real DOM. No tunneling, no port forwarding, no WebSocket to a cloud runner.

Optional: kill scenario sync

export ASSRT_API_URL="" — this leaves the env var unset-in-practice. Every scenario save now hits the catch block at scenario-store.ts line 123, gets a local- ID, and stays on disk forever. Useful for air-gapped laptops, regulated environments, or just because.

Counting what leaves the box

Four concrete numbers, each verifiable in the Assrt MCP source.

0External call per run (Anthropic)

0Agent tools, all local

0sLocal preflight timeout

$0/moLicense fee

Self-hosted Assrt vs. a hosted AI testing platform

Where the artifacts live is the fault line between the two approaches. Everything downstream (pricing, auditability, offline support, regulated-environment fit) follows from that one decision.

Where things actually live

Feature	Hosted AI testing platform	Assrt (self-hosted)
Scenario plan	Proprietary YAML / DSL on their servers	/tmp/assrt/scenario.md (plain markdown)
Run results	Web dashboard behind login	/tmp/assrt/results/latest.json (structured)
Browser profile	Ephemeral worker VM per run	~/.assrt/browser-profile (persistent)
Video recording	Cloud-hosted viewer URL	127.0.0.1 local player, .webm on disk
Private localhost support	Tunnel or agent runner required	Direct — localhost IS the target
Offline / air-gapped mode	Unavailable	local-<uuid> fallback at scenario-store.ts line 124
Outbound endpoints per run	Control plane + browser worker + storage	One (Anthropic) when ASSRT_API_URL is unset
License cost	$1,000 to $7,500+ per month	MIT, free
Source readable	Closed	github.com/assrt-ai/assrt-mcp

What "self-hosted" actually buys you, feature by feature

Every item below maps to a named file or function in the MCP source. Install the package, open the file, verify the claim.

The real self-hosted surface, line by line

Spawns a local Playwright MCP subprocess — no remote runner
Browser profile at ~/.assrt/browser-profile survives reboots
Scenario plan is plain markdown at /tmp/assrt/scenario.md
Results are plain JSON at /tmp/assrt/results/latest.json
ASSRT_API_URL env var redirects (or kills) scenario sync
local- prefixed IDs skip the sync watcher entirely (scenario-files.ts line 93)
Video recording stays at a 127.0.0.1 player URL — not a cloud viewer
LLM call goes to Anthropic directly, no Assrt proxy
MIT license; npm install, no signup to run

Want to run this against your private staging URL, live?

Bring a localhost dev server, an air-gapped laptop, or a regulated environment. We will walk through every path the agent touches on your disk and show you the kill switch in source.

Run-tests-locally questions, answered from the source

What exactly does 'self-hosted' mean when the runner is an AI agent, not a pre-written test file?

It means three things stay on your machine. First, the browser: spawned via @playwright/mcp version 0.0.70 as a local stdio subprocess (browser.ts line 286 logs 'spawning local Playwright MCP via stdio'). Second, the scenario text and all run results: written to /tmp/assrt/scenario.md and /tmp/assrt/results/latest.json (scenario-files.ts lines 17 to 22). Third, the persistent browser profile with cookies and logins: ~/.assrt/browser-profile (browser.ts line 313). The only outbound traffic on a default run is the LLM call to Anthropic and, if you leave ASSRT_API_URL unset, an optional scenario sync to app.assrt.ai that silently falls back to local-only when unreachable.

How do I stop the scenario plan from being uploaded anywhere?

Either set ASSRT_API_URL to a host you control (your own self-hosted scenario store), or leave it unset and block app.assrt.ai at the network layer. scenario-store.ts line 14 reads the env var with a fallback, so any non-200 response (including a connection refused) trips the catch block at line 123. That block generates an ID with a local- prefix, writes to ~/.assrt/scenarios/<uuid>.json, and returns. The watcher at scenario-files.ts line 93 then checks scenarioId.startsWith('local-') and returns early, so the 1-second debounced sync loop never fires. Two lines of source form a complete offline mode.

Where do my test results actually live on disk?

Three places, all predictable: /tmp/assrt/scenario.md holds the plan text, /tmp/assrt/scenario.json holds metadata (id, name, url, updatedAt), and /tmp/assrt/results/latest.json holds the most recent run's structured output. Historical runs are keyed by runId at /tmp/assrt/results/<runId>.json. These are the paths the agent itself reads and writes — not a separate cache or export layer. If you want to pipe results into a CI system, tail the files directly.

How does Assrt avoid re-logging-in every time it tests a page that requires auth?

Default mode persists a full Chromium profile at ~/.assrt/browser-profile. browser.ts line 313 creates the directory and passes it to Playwright MCP via --user-data-dir, so cookies, localStorage, and session tokens survive across test runs. Sign into Gmail once, the next 50 test runs against Gmail start already authenticated. If you prefer zero persistence, pass --isolated and the profile is in-memory only. If you want to use your real Chrome (with your real profile, real extensions, real bookmarks), pass --extension and the agent attaches to your running Chrome instance via a one-time token saved at ~/.assrt/extension-token.

What is the 'local-' prefix in the scenario ID and why does it matter for self-hosting?

It is a marker the watcher uses to decide whether to sync. When scenario-store.ts saveScenario() fails to POST to the central API (lines 107 to 131), it generates an ID using crypto.randomUUID() prefixed with 'local-', writes the scenario to ~/.assrt/scenarios/<id>.json via writeLocal(), and returns the local ID. Then whenever the scenario file is loaded or saved, scenario-files.ts line 93 short-circuits: 'if (scenarioId.startsWith("local-")) return'. No watcher attaches, no debounced sync fires, no network call happens. You can see the local- prefix in the scenario JSON file itself.

Can I run the whole stack without Chrome ever launching a visible window?

Yes. The default launch mode is headless. browser.ts line 348 inserts '--headless' into the Playwright MCP args whenever headed is false. So 'assrt run --url ...' with no extra flags runs fully offscreen. Pass --headed to see what the agent sees, --video to record the whole thing to a .webm plus an auto-opening local 127.0.0.1 player, or --extension to attach to your already-running Chrome. Video recording goes through devtools capability in the Playwright MCP and the recording stays on your disk at the path printed in the run log.

Do I need an API key, and if so, where does it go?

You need one LLM key: ANTHROPIC_API_KEY env var, or a Claude Code OAuth token the CLI pulls from the macOS Keychain automatically (cli.ts). The agent runs Claude Haiku 4.5 by default (agent.ts line 9: DEFAULT_ANTHROPIC_MODEL = 'claude-haiku-4-5-20251001') and calls Anthropic directly from your machine. There is no Assrt proxy, no Assrt-issued key, no vendor-managed credential. Your dev server is still localhost; only the model call leaves the box.

What's actually in the 18-tool agent surface, and can I verify it without trusting this page?

Eighteen tools defined in the TOOLS array between agent.ts lines 16 and 196: navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable. No hidden SDK, no proprietary YAML schema, no cloud-only capabilities. Open the file after installing (node_modules/assrt-sdk/dist/cli.mjs if you npx-installed it, or the GitHub source directly) and count.

How does this compare to running a hosted AI testing platform?

The hosted platforms in this category price somewhere between $1,000 and $7,500+ per month for midmarket tiers and produce a proprietary test artifact (YAML, DSL, their own spec format) that you can't run anywhere else. Assrt is MIT licensed (LICENSE file in assrt-mcp root), npm-installable, and the 'test artifact' is a markdown #Case plan you edit as plain text. The only ongoing cost is the Anthropic invoice for Haiku calls during runs — typically a few cents per scenario. Zero per-seat fee, zero platform fee, zero vendor lock.