Automation Guide

AI-Powered Agentic Test Execution with Automation: When Your Coding Agent Runs Its Own QA

Q: Does the coding agent need to write Playwright code?

No. The coding agent writes test plans in plain text. The inner test agent (Claude Haiku) translates those natural language instructions into browser actions using accessibility tree snapshots and ref-based element targeting.

Q: What LLM does the inner test agent use?

Claude Haiku by default (claude-haiku-4-5-20251001). You can override this with the model parameter. Haiku is chosen because test execution needs fast, cheap inference rather than deep reasoning.

Q: Can I re-run the same test scenario without rewriting the plan?

Yes. Every test run generates a UUID. Pass that UUID as scenarioId to assrt_test instead of a plan. The server fetches the saved scenario from cloud storage and re-executes it.

Q: How does the inner agent handle pages that load content asynchronously?

The agent uses a wait_for_stable mechanism that attaches a MutationObserver to the DOM. It monitors mutations and polls every 500ms. When mutations stop for 2 seconds, the page is considered stable. Maximum wait is 60 seconds.

By the Assrt team|April 12, 2026|12 min read

Every "agentic testing" page on the internet describes a standalone platform. You sign up, you configure a project, you write tests inside the platform, and the platform runs them. The AI part lives inside their walls.

This guide describes a different architecture. Assrt runs as an MCP (Model Context Protocol) server. That means your existing AI coding agent (Claude Code, Cursor, Windsurf, or anything that speaks MCP) can call assrt_testas a tool, read the structured result, and act on it. The coding agent becomes the orchestrator. It writes code, runs tests, reads failures, and fixes bugs in a single automated loop. No dashboard. No browser tab. No human handoff between "development" and "testing."

$0/mo

“Free, open-source MCP server. No monthly subscription, no per-seat pricing, no vendor lock-in.”

Assrt vs $7.5K/mo platforms

1. The Handoff Problem in Agentic Testing

Agentic test execution is a real advance. Instead of brittle CSS selectors, an AI agent reads the page, reasons about what to do, and recovers from unexpected states. That is genuinely useful. But every commercial platform that offers this capability packages it as a separate product: a dashboard, a project, a billing plan, an API key, and a separate context for understanding your application.

That packaging creates a handoff. Your coding agent (Claude Code, Cursor, Copilot) writes a feature. Then you, the human, switch context: open the testing platform, configure a test, run it, read the results, go back to the code, explain the failure to your coding agent, and ask it to fix the bug. You are the message bus between two AI systems that could talk to each other directly.

The "automation" in "agentic test execution with automation" should mean eliminating that handoff. The coding agent should be able to run tests, read structured results, and fix problems without waiting for a human to copy error messages between windows.

2. How Assrt Exposes Test Execution as MCP Tool Calls

MCP (Model Context Protocol) is a standard for connecting AI agents to external tools. An MCP server exposes a set of callable tools with typed parameters and structured return values. Any MCP client (Claude Code, Cursor, Windsurf, custom scripts) can discover and call those tools without custom integration code.

Assrt implements an MCP server in server.ts using the McpServer class from @modelcontextprotocol/sdk. It runs on stdio transport, which means the host application spawns it as a child process and communicates over standard input/output. No HTTP server, no open port, no authentication handshake.

The entire server starts with one line in your MCP configuration:

npx assrt-mcp

That is the entry point. When your coding agent starts a session, it discovers the four tools Assrt exposes, their parameter schemas (defined with Zod), and their descriptions. From that point on, the coding agent can callassrt_test or assrt_plan the same way it calls any other tool, like reading a file or running a shell command.

No platform to sign up for

Assrt is a local MCP server. Install it, point your coding agent at a URL, and run tests from your terminal. Free and open-source.

3. The Four Tools: Plan, Test, Diagnose, Analyze

The MCP server registers exactly four tools. Each one does one thing. Together they form a complete test automation workflow that a coding agent can orchestrate without human intervention.

Tool	Purpose	Key Parameters
assrt_plan	Navigate to a URL, analyze the page, generate test cases automatically	`url`
assrt_test	Execute test scenarios against a live URL with a real browser	`url`, `plan` or `scenarioId`
assrt_diagnose	Analyze a failed test: distinguish app bug from flawed test from environment issue	`url`, `scenario`, `error`
assrt_analyze_video	Analyze a test recording using Gemini vision to review what the agent did	Video file path

The assrt_test tool is the workhorse. It accepts either a plan (free-form text with #Case N: markers) or a scenarioId (UUID from a previous run). It returns a structured TestReport with pass/fail per case, individual assertion results, improvement suggestions, and screenshots. The coding agent parses this report and decides what to do next.

The assrt_diagnose tool is what makes the loop self-correcting. When a test fails, the coding agent does not need to guess whether the failure is a real bug or a flawed test case. It sends the failure to assrt_diagnose, which classifies the root cause into three categories: application bug, bad test case, or environment issue. It also returns a corrected test case that can be re-run immediately.

4. The Closed Loop: Code, Test, Fix, Repeat

This is where "agentic test execution with automation" becomes something concrete instead of a marketing phrase. Here is the loop a coding agent runs when it has access to Assrt as an MCP tool:

The agent implements a feature or bug fix by editing files in the project.
The agent calls assrt_test with the local dev server URL and a test plan describing what the feature should do.
Assrt launches a real browser, runs the test agent (an LLM that navigates the page using accessibility tree snapshots), and returns a structured report with pass/fail results and screenshots.
If all tests pass, the coding agent moves on (commits, reports to the user, starts the next task).
If a test fails, the coding agent reads the failure details from the report. It can optionally call assrt_diagnose to classify the root cause.
The coding agent fixes the code based on the failure evidence and re-runs assrt_test with the same scenario ID.

The human is not in this loop. The coding agent is the orchestrator. Assrt is a tool the agent calls, not a separate platform the human monitors. This is the difference between "AI-powered testing" (a platform that uses AI internally) and "agentic test execution with automation" (test execution that is composable with other AI workflows).

Because assrt_test returns structured data (not a dashboard link), the coding agent can programmatically parse which assertions failed, what the expected vs. actual behavior was, and what the page looked like at the point of failure. It has everything it needs to make an informed fix.

Your coding agent already knows how to call tools

Add Assrt as an MCP server and your agent gains the ability to run real browser tests. No new workflow to learn.

5. What Happens Inside a Single assrt_test Call

When your coding agent calls assrt_test, a lot happens behind a single tool invocation. Understanding this is important because the caller just sees "tool call in, report out" but the internal execution is itself an agentic loop.

The server writes the test plan to /tmp/assrt/scenario.md and pre-saves a scenario UUID to cloud storage for deterministic artifact URLs.
It selects a browser mode: local Playwright (spawns a new browser process with video recording), or an existing remote VM with Playwright MCP already running, selected based on whether ASSRT_PLAYWRIGHT_SSE_URL is set.
A TestAgent instance is created with the chosen LLM (default: Claude Haiku). This is the inner agent that actually drives the browser.
The inner agent enters its execution loop: for each test case, it reads the accessibility tree, reasons about what action to take, executes it, observes the result, and repeats for up to 60 steps per scenario.
Every action is logged with duration (e.g., [mcp] browser_click el="Sign In" (243ms)). Screenshots are captured after visual actions and saved to the run directory.
When all cases complete, the browser closes (finalizing the video recording), results are written to /tmp/assrt/results/latest.json, and the structured TestReport is returned to the calling agent.

The key architectural detail: there are two AI agents in this system. Your coding agent (Claude Code, Cursor, etc.) is the outer agent that decides when to run tests and what to do with results. The test execution agent (Claude Haiku inside Assrt) is the inner agent that navigates the browser. They communicate through a structured tool interface, not through natural language prompts pasted between windows.

6. The File System Contract: Scenarios and Results on Disk

One design choice that separates Assrt from cloud-only testing platforms: every test run produces files on your local disk that any tool can read. This is deliberate. The MCP protocol handles tool call I/O, but sometimes the coding agent needs to read or modify test scenarios outside of a tool call.

File	Contents	Editable?
/tmp/assrt/scenario.md	The test plan in Markdown with `#Case N:` markers	Yes, changes sync to cloud
/tmp/assrt/scenario.json	Scenario metadata: UUID, name, URL	Read-only reference
/tmp/assrt/results/latest.json	Structured pass/fail results from the last run	Read-only reference
/tmp/assrt/<runId>/video/	Video recording of the test run with cursor overlays	Read-only artifact

The scenario file is the most important one. It is plain Markdown. Your coding agent can read it to understand what was tested, edit it to add or remove test cases, and the changes sync back to cloud storage. The scenario ID persists across runs, so you can re-execute the same scenario months later by passingscenarioId instead of a plan.

This file-based contract means you are never locked in. The test plan is a text file you own. The results are a JSON file you can pipe to any dashboard. The video is a standard recording. Nothing requires Assrt to be running in order to read, share, or archive your test artifacts.

7. Setting It Up

Adding Assrt to your coding agent takes one configuration entry. Here is how it works with Claude Code (other MCP clients follow the same pattern):

Step 1: Add the MCP server

Add this to your Claude Code MCP settings (either project-level.mcp.json or global config):

{
  "mcpServers": {
    "assrt": {
      "command": "npx",
      "args": ["assrt-mcp"]
    }
  }
}

Step 2: Start your dev server

Make sure your application is running locally. Assrt tests against a live URL, so the dev server needs to be up before you run tests.

Step 3: Ask your agent to test

Tell your coding agent what to test. It will call assrt_test with the URL and a test plan, or use assrt_plan to auto-generate cases from the page. Example:

assrt_test({
  url: "http://localhost:3000",
  plan: "#Case 1: User signup\n" +
        "Navigate to /signup\n" +
        "Fill in email and password\n" +
        "Click Create Account\n" +
        "Verify redirect to dashboard"
})

The agent handles browser launch, page navigation, element targeting, stability waiting, and pass/fail verification. You get back a structured report with individual assertion results and screenshots.

Frequently Asked Questions

What coding agents work with Assrt as an MCP server?

Any MCP-compatible client. Claude Code, Cursor, Windsurf, and custom agents built with the MCP SDK all work. The server uses stdio transport, which is the most widely supported MCP transport. If your tool can spawn a child process and communicate over stdin/stdout in JSON-RPC, it can call Assrt.

How is this different from running Playwright tests in CI?

CI tests run after you push. Assrt runs before you push, during development, as part of the coding agent's workflow. The coding agent can run a test, read the failure, fix the code, and re-run the test, all before making a commit. CI catches regressions after the fact. Assrt catches them in the development loop itself.

Does the coding agent need to write Playwright code?

No. The coding agent writes test plans in plain text (#Case 1: Login flow, navigate to /login, fill email...). The inner test agent (Claude Haiku) translates those natural language instructions into browser actions using accessibility tree snapshots and ref-based element targeting. No selectors, no page objects, no test framework boilerplate.

What LLM does the inner test agent use?

Claude Haiku by default (claude-haiku-4-5-20251001). You can override this with the model parameter on assrt_test. Haiku is chosen because test execution needs fast, cheap inference (dozens of steps per scenario) rather than deep reasoning. A typical 10-step test costs fractions of a cent in inference.

Can I re-run the same test scenario without rewriting the plan?

Yes. Every test run generates a UUID. Pass that UUID asscenarioId to assrt_test instead of a plan. The server fetches the saved scenario from cloud storage and re-executes it. You can also edit the plan file at/tmp/assrt/scenario.md and the changes sync back to the cloud automatically.

Is there vendor lock-in?

No. Test plans are plain Markdown files. Results are JSON. Videos are standard recordings. The MCP server is open-source. Nothing requires Assrt to be running in order to read, share, or archive your test artifacts. If you stop using Assrt, your scenarios and results remain accessible as ordinary files.

How does the inner agent handle pages that load content asynchronously?

The agent uses a wait_for_stable mechanism that attaches a MutationObserver to the DOM. It monitors child list mutations, subtree changes, and character data updates, polling every 500ms. When mutations stop for 2 seconds (configurable), the page is considered stable. This adapts automatically to both fast and slow pages without requiring per-page timeout configuration. The maximum wait is 60 seconds to prevent infinite hangs on pages with continuous animations.

Add agentic test execution to your coding workflow. One config line.

Assrt is a local MCP server. Your coding agent calls assrt_test, reads structured results, and fixes failures. No dashboard, no subscription, no lock-in.

View on GitHub