QA automation engineer job in 2026: a four-tool surface, a runnable PR
Almost every page on this role reads like a LinkedIn job description: required tools, salary band, soft skills. Useful for a recruiter, useless to a working engineer. This page is the opposite. It walks through the four MCP tools the daily loop actually uses (assrt_test, assrt_plan, assrt_diagnose, assrt_analyze_video), shows the take-home a 2026 candidate should ship to demonstrate them, and points at the open-source reference loop a candidate or a hiring manager can clone before tomorrow.
Every claim below is grounded in the Assrt reference loop on GitHub. No vendor key, no procurement form, no per-seat license. Read along.
The frame the job-description templates miss
Type this role into any general source and the answers you get back are aimed at recruiters. They list languages (Java, Python, JavaScript), frameworks (Selenium, Cypress, Playwright, TestNG), CI tools (Jenkins, GitLab), and a paragraph about “working alongside AI as a co-tester.” Accurate as a checklist, useless as a description of what the work feels like to do or to evaluate. The interesting question in 2026 is shorter: which tools does the role actually loop over each day, and what does a take-home look like that demonstrates literacy across all of them in four hours?
The answer in this guide is grounded in one specific open-source reference. If your shop uses a different stack the shape generalises; the four-tool surface (run, plan, diagnose, review) is the same regardless of vendor. The advantage of an MIT-licensed reference is that a candidate or a hiring manager can read every line; nothing in this writeup hides behind a closed-source rule engine.
The anchor: four tools, three unconditional, one gated on a key
The reference MCP server registers exactly four tools. Three are unconditional (the engineer always has them). The fourth registers only when GEMINI_API_KEY is present in the environment. That is the entire surface a 2026 take-home should exercise, and it is the entire surface a 2026 hiring manager should interview against. The line numbers are real and you can read along.
The daily loop is one Markdown plan in, one JSON report out
The four tools, mapped to interview questions
Each tool maps cleanly onto an interview question a hiring manager can ask, and onto a portfolio artifact a candidate can ship. Read both columns: the right cards are the questions you should be ready to answer or to ask, the left cards (when you reach the comparison further down) are what the answer should look like in code.
assrt_test is the primary loop the role lives in
Registered at server.ts line 335. Takes a URL plus a plan and returns a TestReport with screenshots and a per-step trace. The first thing a hiring manager should ask the candidate to demonstrate. If you can author one #Case and run it green in three minutes, you have demonstrated the entire core competency.
assrt_plan is the scenario draft layer
server.ts line 768. Navigates to a URL, reads the live page, drafts #Case blocks. The interview question this maps to: 'given this URL, what should we test first?' If the candidate can edit the draft into something tighter, they have the scenario design instinct the role rewards.
assrt_diagnose is your stack-trace literacy
server.ts line 866. Takes a failed scenario plus error and returns a root cause and corrected plan. The interview question: 'this #Case failed in CI but passes locally; walk me through your triage.' Junior says 'rerun it.' Senior reads the JSON, finds the wait_for_stable that should be there, and cites the line.
assrt_analyze_video is the evidence-review habit
server.ts line 930. Conditional on GEMINI_API_KEY (registered at server.ts line 1015, skipped at line 1017). Reviews the .webm recording of a passing run for unexpected modals, double-renders, and other things assertions miss. The interview signal: do you treat a passing build as final, or as evidence that needs a second look?
Setup is one command, checked into your repo
assrt setup writes a git commit hook (cli.ts lines 201-275) that injects a QA-testing reminder. A candidate who checks in .claude/settings.json with this hook in their portfolio is signalling continuous-test discipline. A hiring manager who asks 'how would you wire QA into your daily flow?' has a single-command answer to test against.
“A reviewer can clone the candidate's portfolio repo and reproduce the run on their own machine in about a minute. No vendor account, no SaaS sign-up, no per-seat license to provision.”
The take-home: four hours, one PR, all four tools
A clean take-home for this role fits in one PR. The brief: pick an open-source web app the candidate has not seen, author at least three #Case blocks in a single Markdown plan, run the suite once, commit the JSON report, deliberately mistype a button label so one assertion fails, run assrt_diagnose against the failure, and ship the resulting commit. A reviewer can read the diff in fifteen minutes and see test design, evidence handling, and triage instinct in one place.
One bash line runs the whole thing. A reviewer pastes the line into a terminal, watches the recording auto-open, and reads the JSON report. The same line is what the CI gate runs. The same line is what the candidate runs locally before pushing. There is no second flow, no second tool, no vendor dashboard in the loop.
The senior signal: the OTP paste trick at agent.ts line 234
One specific anchor separates a senior candidate from a junior in five minutes of code review. Web apps with split-character OTP fields (each digit in its own input) cannot be filled by typing one character at a time, because each input consumes the focus and the next field is unmounted before the keystroke lands. The fix is a single paste event with a constructed DataTransfer, dispatched at the parent. The reference loop documents this trick in the agent system prompt itself, with the exact expression the agent must call.
A senior candidate already knows why per-field typing breaks; a junior learns it the day a signup flow flakes intermittently in CI and they can't reproduce it locally. Asking “walk me through the OTP step in your take-home plan” surfaces the gap immediately. Either the candidate references the paste mechanism, or they do not.
The daily loop, step by step
Once a candidate lands the role, the day looks like this. None of the steps require a vendor key. None require a separate test repository. None route through a QA queue.
Standup names two features in flight
The plan files for those features live at /plans/<feature>.md. You open the relevant Markdown alongside the engineer who is writing the implementation. No separate test repository, no cross-team handoff, no QA queue.
Author or update three #Case blocks
Plain English, one bullet per agent action. The parser at /Users/matthewdi/assrt-mcp/src/core/agent.ts line 620 splits the file on the regex /(?:#?\s*(?:Scenario|Test|Case))\s*\d*[:.]\s*/gi. No locators, no page objects, no fixtures.
Run the suite locally with --video on
npx assrt run --url http://localhost:3000 --plan-file plans/<feature>.md --video. The recording auto-opens in a player. You watch it, looking for unexpected modals, slow loads, or assertions that passed too easily.
Tighten passCriteria where the agent was lenient
If the screenshot shows the wrong tenant logo or the wrong toast text, you add an explicit passCriteria string (server.ts line 343) that names what must be true. The agent verifies every criterion or the run fails.
Run assrt_diagnose on the one flake
The agent reads the failed step trace, the screenshot before failure, and the error. It returns a root-cause sentence and a corrected #Case. You commit the corrected plan diff in the same PR as the feature.
CI runs the same plan, exits on the same JSON shape
jq -e '.scenarios[] | select(.passed == false)' out.json. There is no proprietary report format to teach the build server, no vendor SaaS to log into, and no quota on how often the build can run.
Two job markets, two resumes, two salary bands
The market for this role is bifurcating. Postings still anchored to the 2022 stack (Selenium plus Java plus Jenkins, single-engineer team owning a brittle suite) cluster in the United States in the rough range of $95,000 to $135,000 base. Postings written for the 2026 loop (engineer embedded in a feature team, owns the plan-design layer, ships scenarios in the same PR as the feature) are paying closer to $150,000 to $200,000 because the role is functionally that of a backend engineer with QA judgment. The compression is real: the locator-by-locator authoring layer that bottlenecked junior IC throughput is gone, every engineer on the team operates at higher leverage, and the cost line that justified offshore QA contractors is shrinking to a few cents per scenario in tokens.
The practical implication for a candidate: do not lead your resume with “eight years of Selenium.” That sentence used to read as senior; in 2026 it reads as “measured my career by a tool that is now legacy.” Lead with test design outcomes (flake budget reduced from N to M, time-to-detect cut from H to M) and follow with the reference loop in your portfolio repo. The bar for “I know what my tooling does” has gone up because the tooling is now readable.
| Feature | The 2022 stack (Selenium plus vendor SaaS) | The 2026 stack (open-source reference loop) |
|---|---|---|
| Authoring artifact in your portfolio | Selenium spec.ts files plus a page-object hierarchy plus fixture wrappers | One Markdown plan file with three to eight #Case blocks per feature |
| Reviewer setup time | Sign up for the vendor dashboard, get a sandbox key, paste your test ID | git clone, npx assrt run --url ... --plan-file plans/take-home.md |
| Failure triage tool the role uses | Vendor dashboard 'flaky test detector' with a black-box rule engine | assrt_diagnose at server.ts:866 reads the JSON report and returns a corrected plan |
| Evidence on a passing build | Vendor screenshot diff overlay, often hidden behind a paywall | Per-step screenshots plus .webm recording on disk, reviewed by assrt_analyze_video |
| Day-zero CI gate | JUnit XML wrangling plus vendor-specific webhook plus dashboard integration | jq -e '.scenarios[] | select(.passed == false)' out.json |
| Cost shape an employer signs off on | Per-seat license at thousands of dollars per engineer per year | MIT license plus a few cents in LLM tokens per scenario; nothing on a procurement form |
| Identity in front of a real SSO provider | Recorded session JSON or a brittle login macro | --extension flag (cli.ts:36) attaches the agent to the candidate's running Chrome |
Interview-day prep: a concrete checklist
If you have an interview for this role on the calendar this week, the prep is short and concrete. None of these steps require a vendor account or a credit card.
Interview-day prep: the things to demonstrate
- Clone github.com/m13v/assrt-mcp before the call.
- Read the parser at agent.ts:620 (twelve lines).
- Read the four MCP tool registrations at server.ts:335, 768, 866, 930.
- Have a target URL ready (your own side project is fine).
- Write one #Case block for that target before the interview.
- Run it with --video so you can show the recording on screen.
- Commit /plans and /results to your portfolio repo.
- Add the assrt setup hook to .claude/settings.json and check it in.
why this works
Every artifact above is a real file on disk that a hiring manager can read without trusting your screen-share. The plan is a Markdown diff. The report is a JSON file. The recording is a .webm on disk. The hook is a .claude/settings.json checked in. Each one is independently verifiable. The asymmetry is in your favour: a candidate who shows up with a runnable portfolio is interviewing for a job at a different leverage from a candidate who shows up with a list of frameworks they have used.
Who is shaping the role from the inside
The reference loop in this guide is built and maintained by the team behind Assrt. Open source, MIT licensed, public roadmap. The fastest way to influence what the 2026 role looks like is to read the source, file an issue, or open a PR.
Matthew Diakonov
Founder, Assrt
Building the open-source reference loop QA engineers can read end-to-end. Prior: backend infra at consumer apps, ten years of testing pain.
The MIT-licensed reference
github.com/m13v/assrt-mcp
Four MCP tools, eighteen agent tool schemas, twelve-line parser. Read it on a flight, ship a take-home with it the next day.
Hiring for this role, or interviewing for it?
Talk through what a 2026 take-home should look like at your team and the loop the reference runner exposes.
Frequently asked questions
What does a QA automation engineer job look like in 2026 vs 2022?
The shape of the work has compressed. In 2022, the engineer's day was largely page-object maintenance, locator chasing, fixture wiring, and CI plumbing across Selenium, Cypress, and a stack of vendor plugins. In 2026, the same engineer ships one Markdown plan and one JSON report per feature. The reference loop at github.com/m13v/assrt-mcp registers exactly four MCP tools that cover the entire surface: assrt_test (server.ts line 335) runs scenarios, assrt_plan (server.ts line 768) generates them from a live page, assrt_diagnose (server.ts line 866) explains failures, and assrt_analyze_video (server.ts line 930) reviews recordings. Three are always-on; the fourth registers only when GEMINI_API_KEY is set. Hiring criteria that have not changed: test design judgment, ownership of the flake budget, ability to read a stack trace.
What should a candidate put in their interview portfolio for this role?
A public GitHub repo with three things. First, a /plans directory of Markdown #Case files written for one open-source web app (your choice; Excalidraw and Cal.com are popular targets). Second, the JSON results from the last run committed at /results/latest.json. Third, a README with one bash line: npx assrt run --url https://example-target.com --plan-file plans/regression.md --json. A reviewer can clone the repo and reproduce the run on their own machine in 60 seconds without signing up for a vendor dashboard. The reference loop is MIT licensed, so the only billable thing is the LLM tokens (default model is claude-haiku-4-5-20251001, set at /Users/matthewdi/assrt-mcp/src/core/agent.ts line 9).
What does a sensible take-home look like for this job?
Four hours, one PR. The brief: pick a target web app the candidate has never seen. Author at least three #Case blocks in a single Markdown file. Run the suite once and commit the JSON report. Cause one assertion to fail intentionally (mistype a button label) and run assrt_diagnose against it. Ship the resulting commit. The PR exercises all four tools (assrt_plan to draft, assrt_test to run, assrt_diagnose to explain, assrt_analyze_video to review the recording), exactly mirroring the daily loop. A reviewer can read the diff in fifteen minutes and see the candidate's test design, evidence-handling, and triage instinct in one place. No proprietary YAML, no vendor sandbox.
What languages and frameworks should a QA automation engineer know in 2026?
TypeScript or JavaScript at a working level, because the reference runner is TypeScript and the agent's emitted calls go through the Playwright MCP server (a Microsoft package). Python is acceptable for environments that prefer it; the Markdown plan is language-neutral and the JSON report is consumed by jq. Beyond language, the candidate should be fluent reading an accessibility tree (the agent reads one before every action; see the system prompt at /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 206 to 218), comfortable with one CI runner (GitHub Actions or GitLab is fine; the JSON-in, exit-code-out shape is identical), and able to reason about identity (which user, which permissions, which session). Selenium and Cypress muscle memory still helps for legacy code review; new code increasingly does not need it.
How does a hiring manager evaluate a candidate for this job in 2026?
Three filters that map cleanly onto the four-tool surface. First, ask the candidate to draft a #Case for a flow you describe in one sentence; this exercises the same judgment as assrt_plan and surfaces whether they reach for the right assertions. Second, hand them a deliberately wrong #Case (a stale button label, a missing wait_for_stable) and ask what assrt_diagnose would say; this surfaces their stack-trace literacy. Third, hand them a 90-second screen recording of a passing run and ask what is suspicious about it; this tests whether they would catch the unexpected modal a passing assertion misses. None of these require a license, a SaaS sign-up, or a per-seat fee. The whole interview can run on your laptop with the open-source CLI.
What is the salary range for a QA automation engineer in 2026, and how is it shifting?
The market is bifurcating. Postings still anchored to the 2022 stack (Selenium plus Java plus Jenkins, with a single-engineer team owning a brittle suite) cluster in the United States in the rough range of $95,000 to $135,000 base. Postings written for the 2026 loop (engineer embedded in a feature team, owns the plan-design layer, ships scenarios in the same PR as the feature) are paying closer to $150,000 to $200,000 because the role is functionally that of a backend engineer with QA judgment. The compression: the locator-by-locator authoring layer that bottlenecked junior IC throughput is gone, so every engineer on the team operates at higher leverage. The open-source reference loop costs a few cents in LLM tokens per scenario, so the cost line that justified offshore QA contractors is also shrinking.
Is the QA automation engineer role going away because of AI?
No. The artifact stack collapses, but the judgment layer expands. The agent can drive the browser; it cannot decide which user identity to test as, what counts as a passing dashboard render, which scenarios go on the smoke gate vs the full release gate, or whether a passing run with an unexpected modal in the screenshot is actually a regression. The open-source reference makes this concrete: every passCriteria string (server.ts line 343) is something a human authored. Every variables map (server.ts line 344) is a fixture set the engineer shaped. The role compresses on its repetitive parts and grows on its design parts, which is the same trajectory backend engineering took with ORM frameworks twenty years ago.
What should I avoid putting in my QA automation engineer resume in 2026?
Two anti-patterns. First, do not list 'X years of Selenium' as the headline metric; it signals you measured your career by a tool that is now legacy. Lead with the test design and ownership outcomes (flake budget reduced from N to M, time-to-detect cut from H to M, feature PRs that shipped with embedded scenarios). Second, do not name a vendor framework as your primary skill if you cannot also describe what its closed-source rule engine does. The open-source loop is reviewable end-to-end (the entire authoring layer is the twelve-line parser at /Users/matthewdi/assrt-mcp/src/core/agent.ts line 620), so the bar for 'I know what my tooling does' has gone up. A candidate who can read the parser and explain it on a whiteboard is competitive.
How does the assrt setup hook fit into the daily loop?
The CLI exposes a setup command at /Users/matthewdi/assrt-mcp/src/cli.ts lines 201 to 275. It writes a Claude Code hook that fires on every git commit and push, injecting a reminder to run assrt_test after any user-facing change. For a candidate, this is the small but specific thing you can show in your portfolio repo: a .claude/settings.json hook checked in, with a reproducible bootstrap. For a hiring manager, asking 'how would you wire continuous testing into your local workflow' has a single-command answer (assrt setup) that produces a checked-in artifact you can read. It is the smallest concrete demonstration that the candidate operates a continuous-test discipline, not a quarterly-test one.
Can I use the open-source reference at work without my company paying a vendor?
Yes. The reference runner at github.com/m13v/assrt-mcp is MIT licensed. The Playwright MCP package it spawns is Microsoft's, also open source. The only thing your employer pays for is the LLM token usage; default is Anthropic Claude Haiku 4.5, but the Gemini path is wired at /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 354 to 367 (DEFAULT_GEMINI_MODEL = 'gemini-3.1-pro-preview' at line 10). If the team prefers a hosted run viewer, app.assrt.ai exists, but the local CLI produces the same JSON report without it. There is no per-seat fee, no quota on how often you can run, and no proprietary YAML to learn. A candidate who walks into an interview with a working portfolio against this stack arrives with leverage on tooling cost.
What does the daily loop actually look like once I have this job?
Roughly: morning standup names two features in flight. You open the relevant /plans/<feature>.md, write or update three #Case blocks while the engineer pairs on the implementation. You run assrt_test against the local dev server (one bash line, video on with --video), watch the recording, tighten one passCriteria string after seeing the agent shrug at an empty toast. You open assrt_diagnose against a flaky retry that fired twenty times on the same button (the variable is .scenarios[].steps in /tmp/assrt/results/latest.json). You commit the plan diff alongside the feature commit. The build runs the same plan, jq exits non-zero on .scenarios[].passed === false, and the PR ships. Four tools, two artifacts (plan plus report), one PR. That is the day.
What signals separate a senior candidate from a junior one in this role?
Two specific things, both visible in the take-home. First, evidence handling: a junior writes assertions that read 'expect(result).toBe(true)' and stops; a senior writes a passCriteria string that names the tenant logo, the toast text, and the URL path, then verifies the screenshot matches. Second, OTP and email-verification literacy: there is a real DOM trick documented in the system prompt at /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 234 to 236 for split-character OTP fields (the agent must dispatch a paste ClipboardEvent via DataTransfer rather than typing into each field). A senior candidate already knows why per-field typing breaks; a junior learns it the day a signup flow flakes in CI. Both are visible in five minutes of code review.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.