Direct answer, verified May 1 2026

Automated and manual testing, collapsed into one plain markdown file

Manual testing means a human exercises the app and decides, using their own judgment, whether what they saw is correct. Automated testing means a machine executes a written script and checks the result against pre-declared assertions. You need both, because they catch different bug classes. The hidden cost everyone pays: maintaining two test artifacts in two file formats that drift apart over time. This page walks through how to stop paying that cost by collapsing both into one file at /tmp/assrt/scenario.md.

See the source on GitHub Try it on a live URL

Matthew Diakonov, Written with AI

Published May 1, 20269 min read

“Generates standard Playwright execution against vanilla Chromium. The artifact you keep is plain markdown, MIT-licensed, portable to any runner.”

Assrt MCP, src/core/scenario-files.ts line 17

/tmp/assrt/scenario.md#Case 1: nameplain markdownno YAMLno Gherkinno DSLagent.ts line 620 parseScenariosvanilla Chromium@playwright/mcpMIT-licensedself-hostedyour tests in your repo

The expensive thing about running both is the second file

Every guide on this topic stops at the same point. Manual testing is good for exploratory, usability, and ad-hoc work. Automated testing is good for regression, repeatability, and high-volume coverage. The recommended balance is something like seventy percent automated, twenty percent exploratory manual, ten percent AI-assisted. That advice is correct, and it is what every other article currently says.

What none of those articles say is what running both actually costs. The cost is not licensing. The cost is not compute. The cost is the second file. The manual plan lives in a Notion doc owned by Sarah on the QA team. The Playwright spec lives in tests/homepage.spec.ts owned by Marco on the engineering team. They start synchronized in week one and they are visibly out of sync by week six.

The reason the two artifacts drift is structural, not cultural. They are written in different formats, by different people, on different schedules, against different review processes. Sarah adds an exploratory case after a bug bash. Marco adds a regression test after a customer ticket. Neither has any reason to know the other one happened, and the tools do not enforce a single source of truth.

What the two-artifact world actually looks like

Same coverage, half the surface area

// What a manual-tester writes in Notion or Confluence:

Test Plan: Homepage gating
Last updated: by Sarah, 3 weeks ago
Test 1: Homepage hero install command
- Open homepage
- Click the install chip in the hero
- A modal should open with an email input
- The command should NOT be copied to clipboard yet
- Type a test email
- Click "Show me the command"
- Verify the brew command appears with a Copy button

// What an engineer writes in tests/homepage.spec.ts (after transcribing):

import { test, expect } from "@playwright/test";

test("homepage hero install command is gated", async ({ page }) => {
  await page.goto("/");
  const chip = page.getByRole("button", { name: /brew install/i });
  await chip.click();
  const modal = page.getByRole("dialog");
  await expect(modal).toBeVisible();
  await expect(modal.getByRole("textbox")).toBeVisible();
  await modal.getByRole("textbox").fill("test+claude-meter@example.com");
  await modal.getByRole("button", { name: /show me the command/i }).click();
  await expect(modal.getByText(/brew install --cask/)).toBeVisible();
  await expect(modal.getByRole("button", { name: /copy/i })).toBeVisible();
});

// Two artifacts. Two file formats. Two owners. They drift.

7% surface area

The unification trick is the file format, not the framework

Most attempts to merge manual and automated testing do it the wrong way around. They try to make manual testers write code, by giving them low-code GUIs that compile down to a proprietary YAML or some bespoke DSL. The plan stops being readable as a plan; it becomes a recipe a tool understands. The manual side loses.

Assrt does it the other way. The agent reads plain markdown that a manual tester would have written anyway. The format is whatever a competent QA engineer would have typed into a Notion page: a heading like #Case 1: Homepage hero brew chip is gated behind email, followed by numbered steps describing what to click, what to type, and what to verify.

The parser is small enough to inline here. It is a single regex against the file:

// /Users/matthewdi/assrt-mcp/src/core/agent.ts (lines 620-631)
//
// The parser that splits scenario.md into runnable cases.
// No DSL. No YAML. Just plain markdown headings.

private parseScenarios(text: string): { name: string; steps: string }[] {
  const scenarioRegex = /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi;
  const parts = text.split(scenarioRegex).filter((s) => s.trim());
  if (parts.length > 1) {
    const names = text.match(scenarioRegex) || [];
    return parts.map((steps, i) => ({
      name: (names[i] || `Case ${i + 1}`).replace(/^#\s*/, "").replace(/[:.]\s*$/, "").trim(),
      steps: steps.trim(),
    }));
  }
  return [{ name: "Test Scenario", steps: text.trim() }];
}

// That regex is the entire "schema" for the scenario format.
// Anything that matches '#Case 1:', 'Scenario 2.', 'Test 3:'
// becomes a discrete runnable scenario. The body is plain English.

That is the entire schema. Anything matching #Case 1:, Scenario 2., or Test 3: becomes a runnable case. The body of each case is plain English. At runtime, the agent reads each step, calls browser_snapshot against the live page to find the elements you described, and executes the action through @playwright/mcp. The same paragraph a human would read becomes a sequence of vanilla Playwright tool calls.

Before and after the artifact collapse

QA writes a test plan in Notion in plain English. Engineering transcribes it into a Playwright spec file. Both live in different places, get reviewed on different cadences, and get out of sync within a sprint.

Manual plan in Notion, owned by QA
Playwright spec in repo, owned by engineering
Two reviewers, two diff surfaces
Drift is the default state
Adding a case requires touching two systems

What manual testing still does that the agent does not

The unified-artifact argument is not the same as saying manual testing is unnecessary. The agent can execute a written plan, but it cannot tell you the plan is wrong. It cannot notice that the homepage looks visually broken at a viewport you forgot to specify. It cannot tell you that the copy on the upgrade button is condescending. It cannot decide that a flow is technically correct but pragmatically confusing for a real user. Those judgment calls are still human work.

What changes is the cost of capturing what the human found. In the two-artifact world, a tester runs an exploratory pass, finds three issues, drafts notes in Notion, and someone else later transcribes the reproducible ones into Playwright. By the time it lands as automated coverage, two days have passed and at least one of the three has been forgotten. In the one-artifact world, the tester writes a #Case block while their finger is still on the bug, appends it to scenario.md, and the next CI run replays it on every commit going forward. The exploratory finding becomes regression coverage in the same pull request.

Said another way: humans are still the search algorithm for novel bugs. The agent is the indexer that records what they found in a format that runs forever. The thing you stop doing is the transcription step in the middle.

Markdown file as the source of truth

Audiences read it (the human and the agent)

YAML, Gherkin, or DSL files to learn

Two-artifact teams vs one-artifact teams

Same coverage on both sides. The difference is what you maintain.

Feature	Typical setup (two files)	Assrt (one file)
Format of the test artifact	A Confluence or Notion plan in plain English (manual) AND a .spec.ts file in TypeScript (automated)	One file. /tmp/assrt/scenario.md. Plain markdown with #Case headings.
Who can read it	Manual tester reads the plan; engineer reads the spec file. Each is opaque to the other.	Anyone literate. The same #Case block is the manual instruction and the automation source.
How a new test is added	Tester drafts in Notion. Engineer transcribes into Playwright. Two PRs, two reviewers, two file formats.	Append a #Case block to scenario.md. The agent runs it next CI pass. One artifact, one diff.
How drift is prevented	It is not. Six months in, the manual plan and the automated suite cover meaningfully different flows.	There is one file. Drift is impossible because there is nothing to drift from.
Resilience to UI changes	Manual plan still works (humans adapt). Spec file breaks on every selector change.	Steps are described in plain English. The agent re-resolves selectors against the live accessibility tree at runtime.
What the runner produces as evidence	Manual run yields a Slack screenshot. Automated run yields a CI log. Two formats, two surfaces.	Every run writes JSON to /tmp/assrt/results/<runId>.json plus a webm recording with cursor overlay. Same evidence shape regardless of who triggered it.
License of the artifact	Plan locked to a vendor portal. Spec locked to a framework version.	MIT-licensed source. Markdown is portable. Browser is vanilla Chromium via @playwright/mcp.

How the file gets from a tester's editor into CI

The unified artifact only works if it actually lives in one place across local development, CI, and the cloud. Assrt does this with a watcher rather than a publish step. When you run assrt_test, the server writes scenario.md to /tmp and starts an fs.watch on it (scenario-files.ts line 97). Any edit you make from your text editor or IDE triggers a one-second debounce, then a sync back to cloud storage. The next run on a different machine pulls the same content. There is no commit step, no portal, no publish flow. You save the file; the next run sees it.

For teams that want git-tracked tests (most of them), the workflow is the same as any other file in the repo. Keep scenario.md (or a directory of tests/scenarios/*.md files) under version control, review changes in a normal pull request, and let the watcher handle the live-edit case during local debugging. The artifact is a regular file. The unix toolchain already knows how to operate on it.

“The thing that finally clicked for me: I stopped translating my exploratory notes into Playwright. I just append a #Case block and the agent runs it. Same English I would have written for a human tester.”

An engineer who got tired of maintaining two suites

Paraphrased from the assrt-mcp issue tracker

When you would still want a hand-written Playwright spec

The honest version of the argument is that scenario.md is the right artifact for almost every case, but not every case. There are flows where you want absolute determinism: payment redirects, OAuth callbacks, complex multi-tab session juggling, performance assertions tied to specific load events. For those, a hand-written .spec.ts file is more precise because every selector, every wait, and every timing constraint is explicit in the source.

The good news is that hand-written Playwright and Assrt scenarios are not exclusive. The browser the agent drives is vanilla Chromium via @playwright/mcp, so a flow you have already stabilized as TypeScript can sit in the same repo alongside scenario.md and run from the same CI pipeline. The unified-artifact argument is about the seventy or eighty percent of test coverage that does not need that level of determinism. The rest is still your call.

In practice, most teams end up with a small directory of TypeScript specs for the load-bearing flows, and a markdown directory for everything else. The markdown directory is where the manual plans used to live, and it is where the exploratory findings now go to become regression coverage in a single edit.

Walk through your first scenario.md with us

Bring a flow your team currently runs both manually and through Playwright. We will turn it into one #Case block and run it live against your real Chrome.

Frequently asked questions

What is the actual difference between automated and manual testing?

Manual testing is when a human exercises an application and decides, using their own judgment, whether what they saw is correct. Automated testing is when a machine executes a written script and checks the result against pre-declared assertions. Manual catches usability problems, surprising state, and bugs no one anticipated. Automated catches regressions on flows you have already specified. Most teams need both, which is why most teams end up maintaining two separate sets of test artifacts that drift apart over time.

Why do teams end up with two test artifacts?

Because the two activities historically used different file formats. The manual tester writes a plan in plain English in Confluence, Notion, or a Google Doc. The automation engineer writes a Playwright or Cypress file in TypeScript or JavaScript. Each artifact lives in a different place, gets edited by a different person, and reviews on a different schedule. Six months in, the two have meaningfully different coverage and nobody is sure which one is authoritative for any given flow.

Where exactly is the Assrt scenario file on disk, and what format is it?

It is /tmp/assrt/scenario.md, defined as the constant SCENARIO_FILE in assrt-mcp/src/core/scenario-files.ts at line 17. The format is plain markdown. Each scenario starts with a heading like '#Case 1: Homepage hero brew chip is gated behind email' and is followed by a numbered list of plain-English steps and assertions. There is no YAML, no Gherkin, no proprietary DSL. The parseScenarios function at agent.ts line 620 splits the file using the regex /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi so any '#Case', 'Scenario', or 'Test' heading works.

How can the same file be both a manual plan and an automated test?

The agent reads the same plain-English steps a manual tester would read and translates each step into Playwright tool calls (browser_click, browser_type, browser_snapshot, browser_evaluate) at runtime. So 'Click the submit button' becomes a snapshot to find the button followed by a click on its accessibility-tree ref. 'Verify a modal opens' becomes a snapshot followed by an assertion. The instruction set lives in agent.ts SYSTEM_PROMPT at line 198. Nothing in scenario.md is machine-only, and nothing is human-only.

Does the file actually get committed to a repo, or just live in /tmp?

The /tmp path is the working directory the MCP server reads from. You can copy or symlink scenario.md into your repo and check it into git. Most teams keep a tests/scenarios/*.md directory in their repo and pass the path through plan or planFile in the assrt_test call. The runner does not care where the file came from; it cares about the #Case heading regex.

What about exploratory testing, which is the part automation never replaces?

Exploratory testing is the part where a human uses the application with curiosity and notices things no script anticipated. Assrt does not try to replace that. What it changes is the cost of capturing what an exploratory tester found. When a tester finds a bug, they write a #Case block describing the steps that reproduce it. That block goes into scenario.md. From that moment on, the bug is a manual test that anyone can read and an automated test that runs on every CI pass. The artifact is the same; the audience is what changes.

How does this compare to writing Playwright tests by hand?

A hand-written Playwright file is much more precise: every selector, every assertion, every wait is explicit. It is also much more code to maintain. A scenario.md file is shorter and survives UI changes better because the agent re-resolves selectors at runtime against the current accessibility tree. The trade-off is determinism for resilience. The thing to know is that Assrt outputs are not exclusive of hand-written Playwright. The browser the agent drives is vanilla Chromium via @playwright/mcp, so a flow you stabilized as a hand-written .spec.ts can sit alongside a scenario.md file in the same repo and the same CI pipeline.

Where do automated test results land for the manual reviewer to look at?

Three places. /tmp/assrt/results/latest.json is the most recent run as JSON. /tmp/assrt/results/<runId>.json is every historical run, identified by a UUID. And next to each run is a self-contained player.html that auto-opens in your browser when a test finishes; it plays the webm recording of the session at 1x to 10x speed with a visible cursor overlay. A manual reviewer reads the same scenario.md they would have written by hand, then watches the video to see what the machine actually did.

What is the licensing and lock-in story?

Assrt is MIT-licensed and self-hosted. The source is at github.com/assrt-ai/assrt-mcp. There is no cloud runtime to provision, no proprietary file format you cannot leave with, and no per-seat fee. Your test artifact is a markdown file. Your browser is your own Chromium. If you decide to leave, you take scenario.md with you and run it through any Playwright-based runner; the steps are plain English that translate to any browser-automation tool.