Test automation, beginner edition

Test automation for beginners, without picking a tool.

Every other guide for this keyword starts the same way. Selenium or Cypress. Python or JavaScript. Testing pyramid. CSS selector. Page object. Those are real choices, but they are not your first step. Your first step can be: point a tool at your URL, get a Markdown plan, run it, and if something breaks, paste back the #Case the repair tool gave you.

M
Matthew Diakonov
11 min read
4.8from Assrt MCP users
Three MCP tools: plan, test, diagnose
Plans are plain English Markdown #Case blocks
Failed cases return a drop-in corrected scenario

The whole idea, one sentence

The three-tool loop turns a URL into 5-8 #Case blocks, runs them, and repairs the ones that fail.

You did not pick a framework. You did not pick a language. You did not even decide what to test. The plan tool read your app and wrote the plan for you, the test tool ran it in a real browser, and when a case failed the diagnose tool handed you a new #Case you could paste back in.

The curriculum you don't have to learn first

A good chunk of beginner content for this keyword is, in practice, a reading list. A language, a framework, a locator syntax, a runner, a reporter, a CI config, an HTML report reader. Fine stuff to eventually know. Not your first hour. Everything on this scroller is something you can postpone indefinitely if you start from the three-tool loop.

Selenium IDEWebDriverCypress config.jsPlaywright Test Runnerpage-object-modelcy.get('[data-testid=...]')JUnitpytest fixturesXPathnpm install @playwright/testdescribe() / it()headless: trueJavaScript promisesPython virtualenvAllure reportingTestNGSelenium GridMochacustom waitsimplicit waitslocator resolutionWebDriverWait

None of those concepts is wrong. They are just downstream of a more useful first question: when someone uses my app, does the important thing still happen. You can answer that question in Markdown.

The loop in three steps

Three MCP tools, one linear flow. Each tool removes a thing a beginner would otherwise have to figure out.

Plan → Test → Diagnose

  1. 1

    assrt_plan (URL in, 5-8 #Case blocks out)

    Launches local Chromium, scrolls in three chunks, captures screenshots and accessibility snapshots, sends them to Claude Haiku with a system prompt that forces the #Case Markdown format.

  2. 2

    assrt_test (#Case blocks in, real Playwright run)

    Feeds the plan to an AI browser agent that calls Playwright MCP tools to drive a real Chromium. Returns structured scenario-by-scenario pass/fail with a WebM video and a screenshot per step.

  3. 3

    assrt_diagnose (failure in, corrected #Case out)

    Takes a failing case and the failure evidence. Returns Root Cause, Analysis, Recommended Fix, and a Corrected Test Scenario — that last section is a literal #Case block you paste straight back into your plan file.

The anchor: the prompt that writes your tests

Here is the literal system prompt that assrt_plan sends to the model with each screenshot. The output format and the count (5 to 8 cases) are hard constraints, which is why the tool never returns a 40-case plan you would not read.

assrt-mcp/src/mcp/server.ts:219-236

Notice what the prompt forbids: CSS inspection, network error testing, JavaScript execution. That is the prompt keeping the generated plan on the narrow set of things a browser agent can reliably do. A beginner does not have to pick that scope; the prompt picks it for them.

What a plan looks like when assrt_plan hands it to you

Five cases for a typical small-SaaS landing page and app. You did not write any of these. You ran one tool call. Save this block to scenario.md or pass it straight to assrt_test.

scenario.md (returned by assrt_plan)

When it fails, the repair tool writes the next test

This is the uncopyable part of the loop. Every other beginner guide ends at "and now you read the error output." The output contract below, enforced by the diagnose system prompt, ends at "and here is a drop-in replacement case."

assrt-mcp/src/mcp/server.ts:240-268

Below, the two tabs show a real before and after. The left-hand case is what the plan tool produced and it failed because the signup button label was ambiguous and the OTP field lacked a label hint. The right-hand case is what the diagnose tool handed back, verbatim: every missing hint is now explicit, the wait is bounded, the button disambiguation is spelled out.

A failing case vs. the diagnose output

#Case 2: Signup with a disposable email
Click "Sign up". Enter a disposable email.
Click Continue.
Verify /app in URL.
-75% extra hints added

You copy the right-hand block, paste it over the left-hand block in your scenario.md, and re-run. No stack trace parsing, no selector hunting, no guessing a wait timeout. The beginner loop stays a copy-paste loop.

The round-trip, as a sequence

Here is the literal order of calls when you kick off the loop from a Claude session with the Assrt MCP server registered. Read it bottom to top as you gain confidence: once you trust the plan, you skip the diagnose step.

One loop, five hops

youagentassrt_planPlaywrightassrt_diagnosegive me tests for http://localhost:3000call assrt_plan({ url })launch, navigate, 3x scroll + snapshotscreenshots + a11y tree5-8 #Case blocks in Markdowncall assrt_test with the planpass/fail + video + screenshotsassrt_diagnose(url, case, error)Root Cause + Corrected #Casepaste the corrected #Case, re-run

The numbers from the source

All verifiable by opening assrt-mcp/src/mcp/server.ts.

0MCP tools exposed
0 maxCases per plan generation
0Screenshot scrolls before planning
0Char cap on snapshot text in prompt
0Max model tokens per call
0Sections in diagnose output
0Retries with exponential backoff
0sDefault verification-code wait

What you do vs. what a traditional beginner does

A fair comparison. "Traditional beginner" here means someone following any of the top five SERP guides for this keyword, which universally prescribe a tool choice, a language choice, and a written test.

FeatureTraditional beginner pathAssrt three-tool loop
Pick a framework (Selenium, Cypress, Playwright)Required, chapter oneSkipped — Playwright MCP is implicit
Pick a language (Java, Python, JS)Required, chapter twoSkipped — tests are Markdown
Decide what to testYou write the listassrt_plan writes 5-8 cases from a URL
Write your first testLocators, imports, fixtures3-5 sentences of English per #Case
Handle a test failureParse stack trace, hunt selectorPaste the Corrected #Case, re-run
Tests are yours to keepYes, if you wrote themYes — plans are Markdown, videos WebM, results JSON
Self-hosted, no cloud dependencyVaries; hosted platforms lock the dataLocal Node + local Chromium + your own API key
Cost to startFree, but many hours of setupFree, npx assrt-mcp, minutes to first green case

Anchor fact

The Corrected Test Scenario section of assrt_diagnose's output is byte-compatible with the #Case N: format assrt_test consumes.

The system prompt at assrt-mcp/src/mcp/server.ts:249-263 pins the output to four sections and locks the last one to the same grammar the plan tool emits. That is why the beginner loop stays a copy-paste loop all the way through the first failing test.

0 sections in every diagnose response. 0 of them is a drop-in test.

Write zero tests. Run real ones.

Point assrt_plan at your URL, pass the Markdown to assrt_test, paste the Corrected #Case from assrt_diagnose when something breaks. Three tools. Local. Free.

Install npx assrt-mcp

Test automation for beginners: specific answers

I have never written a test in my life. Do I need to know Python or JavaScript to start?

No. With Assrt the input is plain English inside a Markdown file. A single scenario is a line starting with #Case 1:, a short name, then 3-5 imperative sentences about what to click, what to type, and what to verify. You never import a test framework, never write a selector, never hand-roll a waitForElementVisible. If you are comfortable writing a Jira ticket or a bug report, you already know the dialect. Traditional tools (Cypress, Selenium, Playwright) do want a real language; that is a legitimate path, just not a required one.

What is the assrt_plan tool actually doing under the hood?

It launches a local Chromium via Playwright MCP, navigates to the URL you gave it, scrolls through the page in three chunks, takes a screenshot and an accessibility snapshot at each position, then sends all three screenshots plus the first 8,000 characters of visible text to claude-haiku-4-5-20251001 with a system prompt that says 'generate 5-8 cases in #Case N: name format, keep each case self-contained, 3-5 actions max, verify only observable things.' The output is a markdown block you can either save to scenario.md and run with assrt_test, or edit in place. The full system prompt lives in assrt-mcp/src/mcp/server.ts at line 219.

What does a #Case actually look like? Show me a real example the tool would produce.

For a signup form, assrt_plan will output something like: '#Case 1: Email signup succeeds with a disposable inbox. Navigate to /signup. Click the Sign up with email button. Type a disposable email into the email field. Press Continue. Wait for the verification code and paste it into the six-digit OTP field. Verify the URL changes to /app and the heading contains Welcome.' No setup blocks, no imports, no fixture files. Two to five sentences of imperative English. Assrt's own agent knows how to find the signup button because it calls snapshot() first and matches the text you wrote against the accessibility tree.

My test fails. A normal framework gives me a stack trace. What does Assrt give me?

Four sections. assrt_diagnose takes the URL, the plan text, and the failure evidence, then sends it to the same Claude Haiku model with a system prompt that constrains the output to Root Cause (1-2 sentences), Analysis (3-5 sentences), Recommended Fix (concrete steps), and Corrected Test Scenario (a literal #Case N: block in the same format assrt_test consumes). The last section is the important one: you do not read a stack trace and infer the fix, you paste a complete new case into your scenario.md and re-run. The output contract is pinned in assrt-mcp/src/mcp/server.ts at line 240.

Which traditional beginner concepts do I actually get to skip with this loop?

You skip: picking a framework, installing it, picking a language runtime, learning CSS selectors, learning XPath, learning the page object pattern, writing describe()/it() blocks, configuring a test runner, configuring a headless browser, wiring up fixtures, writing hand-tuned waits, reading HTML reports, reading selector timeout stack traces, and deciding which flows to cover on a brand-new app. You still need: an idea of what your app is supposed to do, and the willingness to eyeball the plan assrt_plan returned before you run it. Everything else is automated. This is not a claim that you will never need to know these things; it is a claim that you do not need to know any of them to run your first test.

Is the generated test actually a Playwright test I can check into my repo, or a proprietary format?

Assrt does not emit a .spec.ts file you can commit next to your Cypress tests. The #Case Markdown file is the test. Under the hood the agent uses real Playwright MCP calls (navigate, click, type_text, snapshot, press_key) to drive a real Chromium, so the underlying execution is plain Playwright. If you outgrow the English format later and want full control, you can read the agent's tool-call transcript and port any scenario to a Playwright script by hand in an afternoon. Nothing about the data you create is locked to Assrt: plans are Markdown, results are JSON, videos are WebM.

How many test cases does assrt_plan generate, and can I control that?

Five to eight. The number is a hard constraint in the PLAN_SYSTEM_PROMPT at line 236 of assrt-mcp/src/mcp/server.ts: 'Generate 5-8 cases max, focused on the MOST IMPORTANT user flows visible on the page.' You can override the model (pass a different value in the tool call), but the count is baked into the prompt so that a beginner never gets back a 40-case plan they will not read. If you want more coverage, run assrt_plan on different URLs (signup page, pricing page, docs page) and concatenate.

How long does the first loop actually take for someone with no testing experience?

Launching Claude with the Assrt MCP server registered is one npx command. Calling assrt_plan on a localhost URL typically returns a plan in 20 to 45 seconds (three screenshots plus model latency). Calling assrt_test on the resulting plan depends on what the test does: a simple homepage-loads case returns in 1-2 seconds, a signup flow with an email verification round-trip takes 25-30 seconds because of the real inbox poll. If a case fails, assrt_diagnose usually returns a corrected case in under 10 seconds. So the end-to-end first loop (generate a plan, run it, fix the one broken case) is usually in the 2-3 minute range, not the half-afternoon a traditional framework setup would cost.

Does any of this send my data to a vendor, or is it self-hosted?

The MCP server runs on your machine as a local Node process. The browser Assrt drives is a local Chromium started by Playwright. Plans, videos, and screenshots land in /tmp/assrt on your disk. The only remote call is to the Anthropic API for the model that reads the accessibility snapshot and writes the plan, which uses your own token. There is no Assrt cloud you have to sign up for to run the three tools. Compare that to hosted AI test platforms that charge $7,500 per month and require your production URL to be reachable from their cloud: the entire loop here is something you can run with an API key and a localhost URL, no outbound exposure of your app.

Which of the three tools should I call first?

If you have nothing written yet, call assrt_plan with your app's URL. That returns a Markdown plan. Save it as scenario.md (or just pass the string straight to assrt_test). If you already know what you want to test, skip assrt_plan and call assrt_test directly with a plan you wrote. Only call assrt_diagnose after assrt_test reports a failure; it is the repair step, not an inspection step. The server's own instructions are explicit about this ordering and they ship with the MCP server so any connected Claude instance follows the same order without prompting.

The tests write themselves

A URL in. Markdown out. Pass/fail in a real browser. A replacement case when it breaks.

0 MCP tools, one plan file, zero vendor lock-in.

Try Assrt free

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.