Service test automation

Service test automation where the test file is a prompt, not a spec folder.

Every other article on this keyword reads like the same 2018 framework decision tree: pick data-driven or BDD or keyword-driven, commit to Selenium or Cypress or Playwright, spin up a Page Object Model, wire it into CI, budget for a maintenance tax. That story assumes the test is a codebase. Assrt assumes it is a paragraph. One markdown file of English #Case blocks. An LLM agent reads it at runtime and drives real Chromium through exactly ten tools registered in a single TOOLS array. This page is about those ten tools and the file that replaces your test suite.

M
Matthew Diakonov
11 min read
4.8from Open-source, self-hosted, MIT
One plan.md replaces the spec folder, page objects, and DSL
Real Chromium via @playwright/mcp — the same package you would use directly
Free and self-hosted versus $500-15K/mo managed platforms

The claim in one paragraph

The spec file is the bottleneck of service test automation. Assrt deletes it. The tests are two-line English paragraphs; the runner is an LLM agent driving real Playwright under the hood.

Because the agent re-reads the plan and re-picks refs from a live accessibility tree on every run, a button rename does not break anything. Because the runner is the official @playwright/mcp package, the browser behavior matches what a Playwright script would do. Because the entire input is English, the coding agent that wrote your service can write the #Cases, too.

What every other guide on this keyword tells you to pick

The SERP for "service test automation" is a decision tree. Framework type, runner, assertion library, CI orchestrator, reporting layer. Each of the boxes below is a real choice a real team has debated. Assrt replaces the whole tree.

Data-drivenKeyword-drivenBDD / GherkinPage Object ModelSeleniumCypressPlaywright specsServiceNow ATFKarate DSLREST AssuredPostman collectionsUiPath StudioAssrt (one plan.md, ten tools)

The incumbents all assume the test is a codebase you own and maintain. The framework picks are about how that codebase should be structured. Assrt starts from a different axiom: the test is the prompt, and the prompt is versioned as a single markdown file.

Four numbers that describe the whole stack

Everything downstream of this page is derivable from these four constants.

0Tools in the single TOOLS array the agent sees
$0Vendor cost. MIT license, self-hosted
0Markdown file is the entire test suite
0xDefault playback speed on the auto-opening video
0 tools is the full vocabulary the agent composes from.
$0 platform cost. The $7.5K/mo managed competitors charge for this job.
0 file is your entire test suite. It sits next to README.md.
0x playback is the default when the video auto-opens in your browser.

The anchor: exactly ten tools in one TOOLS array

This is the hinge of the entire design. The agent has a finite, inspectable vocabulary. Nine tools drive the browser; one fires an HTTP request. Every English sentence you write has to express itself as a sequence of these ten. If you can imagine the call stack, you can predict the behavior.

assrt-mcp/src/core/agent.ts:14-196

MAX_STEPS_PER_SCENARIO is Infinity because a real end-to-end flow can take dozens of actions: open a modal, upload a file, wait for OCR, pay with Stripe, refresh, assert the confirmation email. There is no artificial budget the way there is with many AI test tools. The agent keeps going until a scenario completes or fails.

The file that replaces your test suite

This is a real plan.md. Three #Cases, a UI flow, an external webhook check, a rate-limit assertion. No imports. No page objects. No base URLs hardcoded. The only file you version.

plan.md

The trace of one run

This is what npx assrt run actually prints. Note that each browser action is preceded by a fresh snapshot, and http lines sit next to click lines in the same case.

assrt run --plan-file plan.md --video

From English paragraph to real browser action

The runtime shape of one #Case. Three actors, one coordinator. The LLM never talks to the browser directly; every browser action is mediated by @playwright/mcp over stdio. Every HTTP assertion is a direct fetch from the same process.

One #Case: paragraph -> tools -> verdict

plan.mdLLM AgentPlaywright MCPYour Service#Case 1: New signup...navigate /signupload page200 OK + DOMsnapshot47 refs (a11y tree)type_text email [ref=e9]click Submit [ref=e14]http_request GET /api/orders200 + JSON (truncated 4000c)assert orders.length == 1 -> PASSscenario_complete passed=true

Where the service and the validator meet

Below: the data flow that makes "service test automation" a single command. Your agent-authored service, your production APIs, and your CI trigger all feed into the same Assrt MCP process, which drives a real browser and returns structured verdicts.

Your service, your agent, your real browser

plan.md
Coding agent
CI trigger
assrt_test (MCP)
Real Chromium
Your service API
video + report

Six mechanical properties that follow from the design

None of these are features bolted on top. They are consequences of the shape: one TOOLS array, one plan.md, one LLM tool loop over @playwright/mcp.

The scenario is the spec. No second artifact.

A #Case paragraph is the input and the documentation at once. There is no parallel test spec file to drift out of sync. When the flow changes, edit the paragraph.

Snapshot-then-act, every single action

The system prompt at agent.ts:207-209 forces the agent to call snapshot before each interaction, then use a fresh [ref=eN] handle. That is why a color or layout change never breaks the test.

Browser is persistent by default

Scenarios run in the same Chromium tab. Cookies, localStorage, and auth survive across #Cases. The code for this is in browser.ts under ~/.assrt/browser-profile.

http_request sits beside click

The agent can GET an API endpoint mid-scenario to verify a side effect, in the same tool loop as the browser actions. One UI/API test. No second runner.

Recording auto-opens at 5x

Every run writes a .webm. The generated HTML player (assrt-mcp/src/mcp/server.ts) opens in your browser at 5x, with Space/arrow-key seeking and 1x/2x/3x/5x/10x speed pills.

Self-hosted, open source, MIT

The runner is npx @assrt-ai/assrt. The core repos are assrt (web/recorder) and assrt-mcp (CLI + MCP server). No account required. Every byte of the stack is inspectable.

Six steps from zero to a passing service test

There is no scaffold. There is no generator. You install one package, write one file, call one tool. The ordering below is the exact flow end-to-end, and the last step is the recovery path when a scenario fails.

1

Install the MCP server

npx @assrt-ai/assrt setup registers the assrt MCP server with Claude Code, Cursor, or any MCP-speaking agent. Writes a reminder hook and updates your CLAUDE.md.

2

Write a plan.md in English

#Case 1: ... two sentences describing what to click, what to type, what to assert. No YAML, no Gherkin, no .spec.ts. You can keep it in the repo root next to README.md.

3

Call assrt_test or the CLI

The coding agent fires the assrt_test MCP tool with url and plan. Or from a terminal: npx assrt run --url ... --plan-file plan.md. Either way it spawns @playwright/mcp over stdio.

4

The agent reads the plan at runtime

Claude Haiku by default. Each #Case becomes a conversation turn. On every action the agent calls snapshot first, picks a [ref=eN] from the live a11y tree, then issues click/type_text/http_request.

5

Assertions stream back as events

scenario_start, step, assertion, improvement_suggestion, scenario_complete. A structured JSON report lands in /tmp/assrt/results/latest.json. The video auto-opens in a browser tab at 5x.

6

Failed run? Use assrt_diagnose

The diagnose MCP tool reads the run's events, screenshots, and DOM snapshots and returns a root-cause analysis with a suggested fix. You never guess what went wrong from a flat log.

Classic service test automation vs. Assrt

Every line below is the honest shape of the tradeoff. The incumbents are not wrong for their use case; they just assume the test is a codebase.

FeatureSelenium / Cypress / Playwright spec folders + DSLsAssrt
Where tests liveA repo folder of .spec.ts, .feature, or .java files that must be kept in sync with the UIOne plan.md of English #Case paragraphs. No folder. No framework.
What the runner readsA compiled test script with hard-coded selectors and assertionsA prompt. An LLM agent interprets each #Case fresh, picking refs from a live accessibility tree on every run.
When selectors changeCI breaks. Someone reads the diff, updates locators, reruns.Agent gets the new snapshot and re-picks the ref. No code change needed for cosmetic UI drift.
Mixing UI and API assertionsTwo runners, two artifacts, a CI step to glue themhttp_request is in the same TOOLS array as click. Both halves live in one #Case.
Cross-scenario statePer-suite beforeEach / afterEach hooksShared browser session by default. Cookies, auth, localStorage carry across #Cases.
Artifacts from a runJUnit XML, HTML report, maybe a CI artifactVideo recording (5x playback auto-opens), screenshot per step, structured JSON report, shareable run URL.
Who can author a testA QA engineer or SDET who knows the DSLAnyone who can describe the flow in English. Including the coding agent that wrote the service.
Cost to startFramework setup + CI wiring + locator maintenance cadencenpx @assrt-ai/assrt setup. Free. MIT. Zero vendor lock-in.

The honest tradeoff

If you need hand-tuned deterministic test scripts with line-level coverage reports for a regulated industry audit, a classic Playwright or Cypress suite is still the right pick. Assrt is not competing on that axis.

Assrt is the right pick when the cost of maintaining the suite is larger than the cost of running it. When a UI rename breaks a dozen tests no one wants to fix. When the flow to test is a thing the product manager can describe in three sentences. When the coding agent that wrote the service is still in the loop. For that job, a markdown file plus a tool loop beats a codebase plus a framework.

Why a traditional framework cannot collapse into a prompt-driven runner

The instinct is always "add an AI layer on top of my existing Playwright suite." The problem is that the suite is what you are trying to delete. Playwright scripts hard-code selectors; the LLM layer cannot rewrite them mid-run without becoming a second, fragile codegen step. A framework like ServiceNow ATF encodes the flow in its own XML schema; an AI layer that interprets English still has to translate to that schema before executing. Karate is a DSL; Gherkin is a DSL; every DSL is a second grammar for the agent to guess against.

Assrt starts from the other end. The TOOLS array is the grammar. The plan.md is the sentence. The agent is the parser. There is no second language, no second runner, no second artifact. Adding a new capability is a new entry in TOOLS and a new case in the executor switch. That is the whole product surface.

Turn your next feature spec into a passing #Case in 20 minutes

Give us a URL and one flow that should work end to end. We write the plan.md live, run npx assrt-mcp, and watch the ten-tool loop validate your service in a real browser. You keep the plan.

Book a call

Service test automation questions the framework guides skip

What do you mean by 'no test suite'? There is no .spec.ts file at all?

Correct. The artifact you maintain is a plan.md file (or inline string) of #Case paragraphs. There is no test runner to install (beyond @playwright/mcp, which Assrt spawns for you), no test helper library to import, no selector constants file to keep in sync. The entire input is English. At runtime, an LLM agent reads the plan and drives real Chromium through the ten tools in assrt-mcp/src/core/agent.ts. Everything else, including the pass/fail verdict, is emitted as events from the tool loop. If you delete plan.md, nothing remains that looks like a traditional test.

How does the agent not hallucinate a click on a button that does not exist?

Because it does not guess. The system prompt at agent.ts:207-209 explicitly requires calling snapshot before every interaction. Snapshot returns the live accessibility tree with [ref=eN] handles. The agent must pick a ref that is present in that tree and include it with the click. If the element is not on the page, there is no ref to reference, and the agent records a failed assertion with evidence instead of a fake click. The refs come from @playwright/mcp, not from the LLM.

Is this really different from 'AI-powered test automation' I have read about for years?

The differences are mechanical. First, the tests are the prompt: there is no code-gen step that turns English into Playwright scripts you then maintain. Second, the browser driver is the official @playwright/mcp (the same Anthropic/Microsoft package you would use directly), not a proprietary replay engine. Third, MAX_STEPS_PER_SCENARIO is Infinity (agent.ts:7), so a single #Case can run for hundreds of steps without an artificial cutoff. Fourth, the runner is distributed as an MCP server, so the same coding agent that writes your service is the one that calls assrt_test. That feedback loop did not exist two years ago.

How do tests stay deterministic if an LLM interprets them every run?

Three mechanisms. Assertions you write as English are graded against the agent's evidence (screenshots, DOM snapshots, tool results), not against its opinion. Concrete phrases like "wait for text 'Order confirmed'" or "GET /api/orders and assert response.orders.length == 1" are pass/fail with a clear signal. Variables are substituted before the prompt ever reaches the model, so secrets and IDs are stable. And the default model is Claude Haiku with temperature 0 where supported; bigger models are available behind --model when you want stronger reasoning on flaky flows. Determinism is a function of how specific your #Case is, which is the same tradeoff a human QA engineer already makes.

Can I run this as part of CI, or is it only for local development?

It runs anywhere Node runs. In CI, call npx @assrt-ai/assrt run --url $PREVIEW_URL --plan-file plan.md --json and parse the JSON exit. The --headless default is fine for CI, --extension is for attaching to a developer's already-open Chrome when you need a real login session. Videos are written to /tmp/assrt/<run>/video and can be uploaded as CI artifacts. If you sync to app.assrt.ai (optional), every run gets a shareable URL; otherwise nothing leaves your runner.

Where is the code for the ten-tool TOOLS array I keep seeing referenced?

assrt-mcp/src/core/agent.ts, lines 14 to 196. The const is literally called TOOLS. Each entry is a plain object with name, description, and input_schema in the shape the Anthropic SDK expects. The executor switch lower in the same file dispatches each tool_use to either a @playwright/mcp call (for browser tools) or a direct fetch (for http_request). The SYSTEM_PROMPT immediately below the array (lines 198-254) is what tells the agent to snapshot first and use refs. Clone assrt-mcp and grep for 'TOOLS:' to read it yourself. MIT license, no account required.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.