QA automation engineer in 2026: a Markdown plan and a JSON report
Almost every other guide on this role is a job description template. Required languages, required tools, expected salary band, soft skills. That work is fine for hiring, but it does not describe what the job actually feels like to do once an LLM agent is the runtime that drives the browser. This page is the opposite: the actual artifacts a QA automation engineer authors and reviews each day, with file paths and line numbers from a working open-source reference.
Every claim below comes from the Assrt reference loop on GitHub. You can clone it and read along; nothing here is a brochure number.
The frame the job-description templates miss
Type this role into any general source and the answers you get back are aimed at hiring managers. They list languages (Java, Python, JavaScript), frameworks (Selenium, Cypress, Playwright, TestNG), CI tools (Jenkins, GitLab), and a paragraph about “working alongside AI as a co-tester.” All accurate, none of it tells an engineer what the work feels like. The interesting question, in 2026, is what changes about the daily authoring loop when an LLM agent reads a live accessibility tree and clicks the buttons for you.
The short answer: most of the artifacts collapse. The page object hierarchy, the locator file, the fixture wrappers, the bespoke retry helpers, the per-test config sprinkles. They become one Markdown file. The report shape collapses too. JUnit XML, vendor JSON, custom Allure plugins all become one TestReport object on disk. What stays is the part that was always actually engineering: deciding what to verify, modelling user identity and state, and gating the build.
Eighteen tool schemas. That is the entire authoring surface the agent can reach for, defined as Anthropic input_schema objects between lines 16 and 196 of /Users/matthewdi/assrt-mcp/src/core/agent.ts. Every Cypress plugin, every Selenium addon, every Playwright helper a traditional automation engineer would reach for has to map onto one of those eighteen calls. It almost always does.
The anchor: a twelve-line parser holds the whole authoring layer together
Before any agent loop runs, before any tool is called, the runtime has to turn your plan into runnable scenarios. The function that does it is twelve lines of TypeScript at agent.ts line 620. There is one regex. There are no plugins. The shape of every test in your repo is determined by what this function accepts.
The whole DSL is “a header that starts with #Case, #Scenario, or #Test, optionally numbered, followed by free-form English.” That is it. The ergonomic consequence is significant: the artifact in your repo is a Markdown file your product manager can read and your designer can suggest edits to in a pull request. There is no separate “test code” layer to maintain.
“The entire authoring layer is a twelve-line parser whose only regex is /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi.”
agent.ts line 620
What the daily plan file actually looks like
Three concerns, three #Case blocks, all in one Markdown file. Browser state carries between blocks, so the third case can rely on the second case's signup. Notice there are no locator strings, no test-id selectors, no page.getByRole chains. The agent reads a fresh accessibility tree before each action and resolves the English description to a [ref=eN] handle at runtime.
The same checkout test, before and after
For an honest comparison, here is roughly what the same checkout happy-path looks like as a traditional Playwright page-object setup vs as a single #Case block. Same coverage, different artifact.
Same coverage, two artifacts
// page-objects/checkout.page.ts
import { Page, Locator } from "@playwright/test";
export class CheckoutPage {
readonly page: Page;
readonly emailInput: Locator;
readonly cardNumberInput: Locator;
readonly cardExpiryInput: Locator;
readonly cardCvcInput: Locator;
readonly submitButton: Locator;
readonly errorBanner: Locator;
readonly confirmHeading: Locator;
constructor(page: Page) {
this.page = page;
this.emailInput = page.getByRole("textbox", { name: "Email" });
this.cardNumberInput = page.locator('input[name="cardnumber"]');
this.cardExpiryInput = page.locator('input[name="exp-date"]');
this.cardCvcInput = page.locator('input[name="cvc"]');
this.submitButton = page.getByRole("button", { name: /pay/i });
this.errorBanner = page.locator(".error-banner").first();
this.confirmHeading = page.getByRole("heading", { name: /thanks/i });
}
async pay(email: string, card: string, exp: string, cvc: string) {
await this.emailInput.fill(email);
await this.cardNumberInput.fill(card);
await this.cardExpiryInput.fill(exp);
await this.cardCvcInput.fill(cvc);
await this.submitButton.click();
}
}
// tests/checkout.spec.ts
import { test, expect } from "@playwright/test";
import { CheckoutPage } from "../page-objects/checkout.page";
test("happy path checkout", async ({ page }) => {
const checkout = new CheckoutPage(page);
await page.goto("/cart");
await checkout.pay("test@example.com",
"4242 4242 4242 4242", "12 / 30", "123");
await expect(checkout.confirmHeading).toBeVisible();
});The locator chains, the page-object class, the constructor wiring, and the fixture import all disappear because the agent does not need them. What is left is what the engineer actually wanted to express: navigate, fill, click, assert. The leverage is not in writing fewer lines for one test; it is in not having to maintain those lines as the UI evolves.
The before-and-after of an automation engineer's day
One feature, two days of work
A new export-to-CSV feature ships. The automation engineer spends Monday writing a page-object class for the export modal, Tuesday wiring the download fixture and waiting for the file system, Wednesday chasing a flaky locator that broke when the marketing team renamed the button, and Thursday plumbing the test into CI.
- Page-object class for the export modal
- Fixture for the file-system download wait
- Locator update after a button rename
- CI plumbing and flake retries
Where the agent fits in your stack
At the box-and-arrow level, the agent is a thin layer between a Markdown plan and the Playwright MCP server. The plan goes in, the JSON report comes out, and along the way the agent is allowed to talk to a disposable mailbox and to your own backend.
A QA automation engineer's day, end to end
The wire-protocol of one step
For the click-the-Pay-button step in a checkout #Case, here is what happens between the agent and the page. The engineer writes one English bullet; the loop fans out into roughly six wire-level calls.
One bullet, traced end to end
How the day reshapes itself
Below is the actual list of activities that fill an automation engineer's week when the runtime is an agent loop. None of these are about locator strings, page objects, or test-runner config. All of them are engineering.
The week, six categories
1. Author #Case blocks for new feature work
One Markdown file per feature, three to eight #Case blocks each. Plain English steps, one bullet per agent action. The parser at agent.ts line 620 turns the file into runnable scenarios; you do not author locators, refs, or selectors.
2. Review the JSON report from the last run
Open /tmp/assrt/results/latest.json. Look at .scenarios[].assertions[] for soft passes. Open the per-step screenshots in /tmp/assrt/<runId>/screenshots and the .webm video for any FAIL. Both paths are in the response from the assrt_test tool.
3. Tighten pass criteria where the agent was too lenient
If a scenario passed but the screenshot shows the wrong tenant logo, add an explicit passCriteria string (server.ts line 343) that names the tenant. The agent must verify every condition or the run fails.
4. Decide which plans run on which gate
smoke/*.md on every commit, regression/*.md on every PR, full/*.md on every deploy. The same .md files; what differs is which list your CI step reads. Because plans are plain text, this lives in your repo as a directory tree, not a vendor dashboard.
5. Keep the variables file boring
Use the variables map in the run options to inject fixture identities ({{TEST_USER_EMAIL}}, {{TEST_ORG_ID}}). The runtime interpolates them into the plan text before the agent ever sees it. One variables.json per environment, no per-test mocking.
6. Pair-review with the engineer who wrote the feature
Plans live in the same PR as the feature. Reviewing 'do we have a #Case for the new export endpoint?' is a one-line check on a Markdown diff, not a spelunk through a page-object hierarchy. This is the part where automation engineering becomes embedded in feature work, not a downstream queue.
A real CI run on the terminal
Nine scenarios, one Markdown file, one bash line that gates the build. The cost line at the bottom is the part that surprised everyone the first time we ran it on a real product: pennies, not dollars.
The two-hundred-character bash invocation at the end is the entire CI integration. There is no agent runner to install on a worker, no SaaS sidecar, no proprietary report parser. The JSON file is just a JSON file, and jq -e is just jq.
Why this beats the traditional stack for the application layer
Selenium and Cypress remain excellent at what they do. The point of this section is not to argue against them in general; it is to be specific about which slices of an automation engineer's day get easier under an agent loop and which do not.
| Feature | Selenium / Cypress / classic Playwright | Agent loop with #Case files |
|---|---|---|
| Source artifact in your repo | .spec.ts files plus page-object classes plus fixtures | One scenario.md per feature, all #Case blocks plain English |
| Where locators come from | Hand-authored CSS, role, or test-id strings | Live accessibility tree, [ref=eN] resolved at runtime |
| What review on a pull request looks like | Locator diffs, test runner config diffs | Markdown diffs of the English #Case blocks |
| Test data primitives built in | BYO email server, BYO OTP fixture | create_temp_email + wait_for_verification_code in the loop |
| Authentication against your real identity provider | Login macro saved as JSON or recorded session | --extension attaches the agent to your running Chrome |
| API and UI parity check in one test | Two test suites, two fixture stacks | http_request inside the same #Case as the UI assertion |
| Report shape consumed by CI | JUnit XML, vendor-specific JSON, vendor dashboard | TestReport JSON in /tmp/assrt/results/<runId>.json |
| Evidence on a failure | Screenshot + HTML dump | Screenshots + per-step trace + .webm video, all on disk |
| Cost shape | Vendor seat licenses or per-scan SaaS | MIT license + LLM tokens (a few cents per scenario) |
Five ideas worth keeping
scenario.md is the test, the fixture, and the doc
Path is /tmp/assrt/scenario.md, defined at scenario-files.ts line 17. Each #Case block is parsed by the regex at agent.ts line 620 and run as its own scenario, sharing browser state. No spec.ts, no page object, no fixture file. The Markdown diff in a pull request is the test diff.
18 tool schemas is the entire authoring surface
agent.ts lines 16-196. navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable.
Plain JSON results, one bash line gates CI
writeResultsFile at scenario-files.ts lines 77-84 dumps a TestReport (types.ts lines 28-35) to /tmp/assrt/results/<runId>.json. Pipe through jq, exit non-zero on any .scenarios[].passed === false.
wait_for_stable replaces page.waitForLoadState ladders
agent.ts line 187. Waits until the DOM stops mutating for a configurable stable_seconds window. Replaces the chain of waitForResponse, waitForLoadState, waitForSelector calls that traditional Playwright suites accumulate.
create_temp_email retires fixture email infrastructure
agent.ts line 115. Disposable inbox per scenario, polling for OTPs at line 120. Lets you write signup, password-reset, and rate-limit cases without a mailbox server, without burning a real address per run.
The numbers, ballparked from a real run
Orders of magnitude for a nine-scenario regression run on a typical SaaS staging environment, executed locally on a dev MacBook against the default model.
Two zeros worth pausing on. Zero locator strings means no merge conflicts on a renamed button, no per-PR locator audit, no “why did the suite go red overnight” standup question. Zero monthly seat licenses means the cost is the LLM tokens for an actual run, not a contract that bills whether you ran or not.
What this does not change
The work that survives is the work that was always actually the job. Test design, especially across the user model. Risk modelling, especially around what to gate on. Flake budget ownership. Synthetic data and fixture identity. Pair-review with feature engineers in the same pull request as the change. The judgement call about what should run in production and what should not. None of those evaporate, and a senior automation engineer's career still hinges on doing them well. What changes is that the locator-wrangling tax is gone, and the energy that used to go there can go to the parts that matter.
Walk through your test stack with the engineer who built the loop
Bring a real plan file or a real flake. Forty-five minutes, no slides, leaves with a working scenario.md you can run.
Frequently asked questions
Is the QA automation engineer role going away in 2026?
No, it is changing shape. The slice that goes away is the locator-by-locator authoring layer: the page object models, the brittle CSS selectors, the wrappers around wrappers around fixtures. The slice that grows is the part that was always the actual job: deciding what to verify, modelling user identity and state, owning the flake budget, gating CI, and writing assertions that catch real product regressions. When the runtime is an LLM agent reading a live accessibility tree, the engineer authors plain-English #Case blocks (see /Users/matthewdi/assrt-mcp/src/core/agent.ts line 620 for the parser) and reviews structured JSON reports (see writeResultsFile at /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts lines 77-84). The skill that matters is good test design, which has not changed.
What is the daily artifact a QA automation engineer authors when the agent runs the loop?
One Markdown file per concern. The reference path is /tmp/assrt/scenario.md (defined at /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts line 17). Each block in that file starts with a #Case heading and contains 3 to 8 bullet points in plain English. The parser at /Users/matthewdi/assrt-mcp/src/core/agent.ts line 620 splits on the regex /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi and runs each block as an independent scenario, sharing browser state. There is no separate fixture file, no page object, no locator file, no test runner config. The plan is the test, and a Markdown diff in a pull request is exactly what review looks like.
What does the agent actually have access to during a run?
Eighteen tool schemas, all defined as Anthropic tool input_schema objects between lines 16 and 196 of agent.ts. The list: navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, and wait_for_stable. That is the entire authoring surface. Anything an automation engineer would need a Cypress plugin or a Selenium addon for is one of those eighteen calls; the agent picks which to invoke based on the English in your plan.
How does this compare to the day-to-day with Selenium, Cypress, or Playwright on its own?
With Selenium or Cypress, the engineer's day is roughly 60% locator wrangling, 20% fixture and CI plumbing, and 20% actual test design. With raw Playwright (or Playwright MCP) plus an LLM, the locator share drops to near zero because the agent reads a fresh accessibility tree before each action (see the system prompt at agent.ts lines 206-218 for the snapshot-act loop). The remaining time goes to test design, which is where the engineering judgment actually lives. The reference loop wraps Playwright MCP and adds the test execution layer (#Case parser, scenario continuity, JSON report shape, video recording per run). The README at /Users/matthewdi/assrt-mcp/README.md frames it as adding a QA engineer on top of the Playwright tools.
Do I still need to know Playwright or Selenium to be useful?
Yes, but not at the locator level. You need to know how a real browser actually behaves: when an event fires, when a form submits, when a navigation happens, what an accessibility tree looks like. Those mental models still drive how you write a #Case. What you do not need is muscle memory for page.locator() chains or expect().toHaveText() variants. The agent emits the equivalent calls under the hood through the Playwright MCP server. browser.ts at line 296 of the reference implementation shows the actual Playwright MCP launch args, including --viewport-size 1600x900 and --output-mode file. Knowing what those flags mean is part of the modern automation engineer's literacy; writing them out by hand on every project is not.
What does a CI gate look like with this stack?
One bash line. The CLI emits a JSON TestReport (typed at /Users/matthewdi/assrt-mcp/src/core/types.ts lines 28-35) which wraps a ScenarioResult[] (lines 19-26). Each scenario carries name, passed, assertions[], and steps[]. In CI you run assrt run --url ... --plan-file plans/regression.md --json, pipe through jq, and exit non-zero on any .scenarios[].passed === false. There is no proprietary report format to parse, no SaaS dashboard to log into, no quota on how often you can run. The same JSON file lives at /tmp/assrt/results/latest.json (see scenario-files.ts line 20) for local debugging and at /tmp/assrt/results/<runId>.json for historical comparison.
What about flaky tests? Does the agent loop solve flake?
It changes the failure mode. Locator flake (the test failed because the button got renamed from 'Submit' to 'Save changes') goes away, because the agent reads the accessibility tree on every step and finds the new label. Timing flake (the test failed because the API response had not arrived) is partly addressed by wait_for_stable at agent.ts line 187, which waits until the DOM stops mutating for a configurable stable_seconds window. What does not go away is intent flake: a #Case that says 'verify dashboard appears' will pass under multiple legitimate dashboard renderings, and that is the engineer's design problem. The fix is the same as it has always been: tighter assertions and explicit pass criteria, which the run interface accepts as a passCriteria string at server.ts line 343.
Where does an automation engineer focus their time once the locator work is gone?
Five places, roughly in order of leverage. First, scenario design across the user model: which user, with which permissions, doing which sequence. Second, the assertion layer: what counts as 'passed', written as explicit pass criteria. Third, the regression boundary: which #Case files run on every commit, which on every release, which on every deploy. Fourth, the test data layer: disposable email is built in via create_temp_email at agent.ts line 115, but synthetic fixtures (orgs, projects, billing states) are still your problem to seed. Fifth, the report-review habit: a passing build with passing assertions but suspicious step traces (twenty retries on the same button, a screenshot that shows an unexpected modal) is something only a human catches.
Is this open source, or is there a vendor I will end up paying $7,500 a month to?
The reference runner at github.com/m13v/assrt-mcp is MIT licensed. The Playwright MCP package it spawns (@playwright/mcp) is Microsoft's, also open source. The only closed thing in the loop is whichever LLM you point at it; the default is Claude Haiku 4.5 (DEFAULT_ANTHROPIC_MODEL at agent.ts line 9), but the Gemini path is also wired up at lines 354-367. There is an optional hosted runner at app.assrt.ai for sharing scenarios in a browser, but the local CLI produces identical reports without it. No proprietary YAML, no closed-source rule engine, no per-seat license. A senior automation engineer's evaluation budget for this stack is: clone the repo, read agent.ts top to bottom, decide.
What does the hand-off between engineer and LLM look like inside a single test run?
The engineer writes English. The agent translates to tool calls. Inside a #Case, every step is roughly: agent calls snapshot to get the accessibility tree (with [ref=eN] handles), reasons about which ref matches the English instruction, calls click or type_text with that ref, waits for the page to settle, calls snapshot again. The system prompt at agent.ts lines 206-218 documents this loop explicitly: 'ALWAYS call snapshot FIRST', 'After each action, call snapshot again', 'If a ref is stale (action fails), call snapshot again to get fresh refs'. The engineer never writes a ref. The agent never writes a sentence the engineer did not authorise.
Can I run this against production, or only against staging?
Both, with judgment. The agent does what your plan says and nothing else. A read-only #Case (login as a synthetic user, navigate to a public report, assert a number) is fine in production with a dedicated test account. A destructive #Case (create and delete projects, mutate billing state) belongs in staging with a fixture dataset. The --extension flag at /Users/matthewdi/assrt-mcp/src/core/browser.ts lines 299-306 even lets the agent attach to your real running Chrome with your SSO session intact, which is the right answer for verifying that a feature works under your actual identity provider state without re-implementing it in a fixture. The trade-off is that you must be sure the plan does only what you intend; the same way you would not run a flaky destructive Cypress suite against prod, you would not run a flaky destructive #Case there either.
How do I onboard a junior engineer onto this stack?
Faster than onto Selenium or Cypress. There is no DSL to learn. Step one: read /Users/matthewdi/assrt-mcp/README.md (under 100 lines). Step two: read the system prompt at agent.ts lines 198-254. Step three: read the parser at line 620 and the eighteen tool schemas at lines 16-196. That is roughly an hour of reading, and at the end of it the engineer can author a working #Case. Compare to a typical Selenium codebase, where the page-object hierarchy and the custom-fixture conventions are several days of orientation before the first test lands. The remaining ramp is on test design, which is the part that was always actually hard.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.