QA Automation Guide

How to QA Automation: From First Test to a Self-Healing Suite

QA automation means a machine drives your application the way a user would, every time you ship. This guide walks the practical path: pick a runner, write a first test by hand, wire it into CI, stamp out flakiness, then hand the long tail to an AI that emits real Playwright code you own forever.

$7,500/mo

QA Wolf starts at $7,500 per month with an annual contract. Assrt is open-source, free, and self-hosted, and the Playwright files it generates belong to you.

QA Wolf public pricing, 2025

0%Manual regression time removed
$0Assrt license cost per seat
0dayFrom zero to first CI run
0%Standard Playwright output

The QA Automation Loop

DeveloperGitCI RunnerBrowserAssrtgit push feature branchTrigger workflow on pull requestnpx playwright testPass / fail / trace / videoFail? assrt heal --diffRe-run with corrected locatorHealed test now passesOpen auto-fix PRReview, merge, ship

1. What QA Automation Actually Is (And Is Not)

QA automation is the practice of replacing manual click-through regression testing with code that runs the same checks every commit. The point is not to eliminate humans. The point is to free humans from the tedious repetition of the same fifty checks every Friday, so they can spend their time on the work that actually requires judgment: exploratory testing, edge-case hunting, accessibility audits, and incident review.

A useful working definition has three parts. The tests live in version control next to the application code. They run without human intervention on every pull request. And the result is a green or red signal that engineers trust enough to use as a merge gate. If any of those three things is missing, you do not yet have QA automation. You have a folder of scripts.

QA automation is not record-and-replay, although a recorder can help you bootstrap the first few tests. It is not a manual QA team writing Selenium scripts in their off hours. And it is absolutely not a vendor SaaS that hides your tests in their cloud and bills you per execution. The tests are assets and assets belong on your hard drive.

What a Healthy QA Automation Setup Looks Like

  • Tests live in the application repo, version-controlled with the code they cover
  • Every pull request runs the suite headlessly in CI without manual triggers
  • Failures block merges; engineers trust the signal enough to act on it
  • Flakiness rate stays under 1% across all tests across a 30-day window
  • Adding a new test takes minutes, not hours, because the harness is solid
  • Tests are written in code engineers actually read: TypeScript, not vendor YAML
  • AI generation handles the tedious long-tail flows so humans focus on judgment

2. Pick a Runner: Playwright, Cypress, Selenium, or Assrt

Tool choice is the first thing teams agonize over and the last thing that actually matters once you start writing tests. The short answer in 2026 is Playwright. It is fast, it ships cross-browser support out of the box, the locator API is built around accessibility roles, and the trace viewer is the best debugging tool any test runner has ever shipped. Cypress is a reasonable second choice if your team already runs it. Selenium still works and you should not migrate off it without a real reason.

The longer answer is that the runner is a substrate. What you actually need to build on top of it is three things: a way to spin up the application with seeded data, a way to authenticate once and reuse the session across tests, and a way to write new tests fast enough that the suite keeps up with the product. Assrt sits on top of Playwright and gives you the third one for free by generating real Playwright spec files from a running URL.

RunnerStrengthsWeaknessLicense
PlaywrightCross-browser, fast, role locators, tracesSteeper learning curve than recordersApache 2.0
CypressFriendly UI, time-travel debuggerSingle-tab, weaker iframe supportMIT, paid Cloud
SeleniumMature, every language bindingSlower, brittle waits, dated APIApache 2.0
Assrt (on Playwright)Generates real Playwright, self-heals, OSSNeeds an LLM key (your own)Open-source, free
QA WolfFully managed, human in loop$7.5K/mo, vendor-hostedProprietary SaaS

For the rest of this guide, the runner is Playwright. Every code sample is real TypeScript that you can copy into a fresh project. The Assrt examples generate the same Playwright files that a human would write by hand, which is the entire point.

3. Write Your First Test in Thirty Minutes

The single best move on day one is to write one test, by hand, against your real application, and get it green in CI before you do anything else. Not five tests. One. The goal is to prove the entire pipeline works end to end. Once it does, every additional test is cheap.

install.sh

With the harness installed, write the first real test against something concrete. The pattern that works on every product is the same: navigate to the homepage, do the one thing the product is famous for, assert the thing the user came to see. For an e-commerce site this is search and add to cart. For a SaaS dashboard this is log in and load the main view. For a content site this is open an article and verify the body renders.

tests/smoke.spec.ts

Two things to notice. The locators use getByRole rather than CSS selectors. Role locators survive design refreshes that would shatter a CSS-based test. And the assertions are about user-visible state, not implementation details. A test that asserts an internal CSS class will break the next time the design system ships an update; a test that asserts a heading is visible will keep working.

By Hand vs Generated by Assrt

// Hand-written: takes a developer 20 minutes
import { test, expect } from '@playwright/test';

test('homepage loads', async ({ page }) => {
  await page.goto('/');
  await expect(page).toHaveTitle(/Acme/);
  await expect(
    page.getByRole('link', { name: 'Get started' }),
  ).toBeVisible();
});
-33% fewer lines
First Run, Local

4. Wire Tests Into CI on the First Day

A test that only runs on a developer laptop is not QA automation. It is a personal productivity tool. The transition from one to the other happens the moment the suite runs on every pull request and blocks merges when it fails. Do this on day one, with a single test, before you accumulate a backlog of tests that have never executed in CI.

.github/workflows/playwright.yml

From Push to Merge Block

🌐

Push

Developer opens a pull request

⚙️

Build

CI builds the app and starts it on port 3000

↪️

Test

Playwright runs the full smoke suite

Report

HTML report uploaded as a CI artifact

🔒

Gate

Failures block merge until fixed

Three things to enforce from day one. First, the pipeline must be required for merge in your branch protection settings. A test that does not block merges is decorative. Second, the artifact upload step matters more than it looks; when a developer hits a CI failure they will need the trace, the video, and the screenshot to debug, and uploading them on failure is one line of YAML. Third, set a hard timeout of fifteen minutes. If the suite cannot finish in fifteen minutes, you have an architecture problem to solve before you add more tests.

Skip the manual authoring grind

Point Assrt at your running app and it generates real Playwright tests, validates each one, and writes only the green ones to disk. Open-source, self-hosted, zero lock-in.

Get Started

5. Three Real Scenarios You Should Cover First

Once one test runs in CI, the question is which tests to write next. The answer is the three flows that would cause an immediate customer escalation if they broke. For most products that means signup, the primary value action, and the billing handoff. Write one test for each, then stop and look at how the suite behaves before adding more.

1

Signup with email verification

Moderate
tests/signup.spec.ts
2

Primary value action behind login

Moderate
tests/create-project.spec.ts
3

Billing handoff with Stripe Checkout

Complex
tests/billing.spec.ts

These three tests cover roughly 80 percent of the customer blast radius. If they all pass, the product is fundamentally working. If any of them fails, you have a Sev-1 blocker that needs human attention before merging. That is the entire point of QA automation: convert the vague question of “is the build okay” into a deterministic green or red signal that engineers can act on without thinking.

6. Kill Flakiness Before It Kills Trust

Flakiness is the silent killer of QA automation programs. A test that fails one run in ten teaches engineers that CI lies, and once they learn that lesson they stop reading the results. From that point the suite is decorative. The fix is to treat every flake as a bug in the test or a bug in the application, never as a fact of life. Retries are not a fix. Retries are how you hide the bug until it shows up in production.

The Flakiness Triage Checklist

  • Use auto-waiting locators (getByRole, getByLabel) instead of arbitrary timeouts
  • Replace sleep() with web-first assertions like toBeVisible() that retry until timeout
  • Seed every test with its own data; never share fixtures across tests
  • Pin the browser version with playwright install --with-deps and a version lock
  • Mock third-party APIs at the network layer with page.route() not at the application layer
  • Run the suite ten times in a row against the same commit; quarantine any test that fails
  • Treat the quarantined queue as a bug list with an SLA, not a graveyard

Flaky Test vs Resilient Test

// Resilient: web-first assertions auto-wait
import { test, expect } from '@playwright/test';

test('cart total updates after add', async ({ page }) => {
  await page.goto('/products/hoodie');
  await page.getByRole('button', { name: 'Add to cart' }).click();

  // toBeVisible retries up to the test timeout
  await expect(page.getByTestId('cart-total')).toHaveText('$42.00');
});
-33% fewer lines

The resilient version uses web-first assertions, which retry internally until either the condition is met or the test timeout is reached. The flaky version uses a hard sleep and a CSS-class selector. On a slow CI runner the 500ms wait can fire before the cart total has updated, and the test fails even though the application is working. Worse, the CSS class will break the next time anyone touches the cart component, and the failure will look identical to a real bug.

Quarantine in Action

7. Hand the Long Tail Off to an AI Runner

Once the harness is solid and a handful of critical tests are green in CI, the bottleneck shifts. The hard part is no longer writing the harness. It is writing the next two hundred tests that cover the long tail of the product: every form field, every error state, every settings page nobody remembers existed. This is where AI generation earns its keep. A human should never spend a Friday afternoon writing a test for the password reset flow. A model should write that test in ten seconds and the human should review the diff.

Assrt Generation Pipeline

🌐

Crawl

Headless Chromium walks every reachable route

↪️

Plan

LLM groups interactions into user-meaningful flows

⚙️

Generate

Emits Playwright code with role-based locators

Validate

Runs each spec live; failures enter a heal loop

🔒

Commit

Only passing tests land in your repo

generate.sh
One Generation Run

The output is a folder of standard Playwright spec files, committed to the same repo as the hand-written tests, run by the same npx playwright test command, and reviewed in the same pull requests. There is no second runner, no separate dashboard, no vendor login. If Assrt disappeared tomorrow, the tests would still run because they are not Assrt-format files. They are Playwright files that Assrt happened to write.

4

Generated test for password reset

Moderate
tests/ai/password-reset.spec.ts

8. Measure What Matters, Iterate Weekly

QA automation programs that measure nothing slowly drift into disuse. The four metrics that actually matter are mean time to feedback, flakiness rate, defect escape rate, and authoring cost per test. Track them weekly, post the chart in the engineering channel, and use it as an honest mirror.

0minMean time to feedback
0%Flakiness rate target
0%Defect escape ceiling
0minAuthoring cost per test

Mean time to feedback is the wall-clock time from git push to a green or red signal. Under five minutes is the gold standard for smoke tests. Anything above fifteen minutes degrades developer behavior because engineers context-switch while waiting and forget what they were testing.

Flakiness rate is the percentage of failures that are not real bugs. Above 1% and engineers stop trusting the suite. Above 5% and the suite becomes worse than no suite at all because it absorbs attention without producing signal.

Defect escape rate is the percentage of production bugs that the suite did not catch. This is the metric your customers feel. A program that optimizes for everything else but ignores escapes is solving the wrong problem.

Authoring cost is how long it takes a developer to add one new test. If it is above thirty minutes, the harness is fighting you and you should fix the harness before adding more tests. With Assrt generating the long tail, this metric should trend toward zero on the AI-generated portion of the suite.

9. Common Mistakes That Sink QA Automation Projects

Most QA automation programs do not fail for technical reasons. They fail for organizational reasons, and the failure modes are remarkably consistent. Knowing what they look like is the cheapest insurance against repeating them.

Trying to automate everything before anything works. The teams that succeed start with one test in CI. The teams that fail draft a 200-test plan, spend three months building it, and never get to a green run. Ship one test the first day. Ship ten by the end of week one. Stop optimizing the plan and start producing the artifact.

Buying a vendor before understanding the problem. Test infrastructure is something you grow into. The shape of your test suite is specific to your application, your CI budget, and your team. A vendor sells you their shape, and once you adopt it, you are paying forever. Start with Playwright in your repo and add tools when you have a specific bottleneck to solve.

Letting flakiness accumulate. Every flaky test is a tax on every other test in the suite. The temptation is to add a retry annotation and move on. Resist it. Quarantine the test, file a bug, and fix the root cause within the week. A 99% reliable suite of 200 tests still produces two false failures per run, which is enough to destroy trust.

Generating tests no one reviews. AI generation is a productivity lever, not an accountability lever. The diff still goes through code review. The author of record is whoever merges the PR. If a generated test is wrong, the human who clicked merge owns it. Treat AI output with the same scrutiny as a junior engineer’s pull request, not a black box that produces gospel.

Hiding tests in a vendor cloud. If your tests do not live in git, they are not under version control, they cannot be reviewed, they cannot be diffed, and they cannot be rolled back. The whole point of automation is that the artifacts outlive the people and tools that wrote them. Insist on real code on disk.

10. FAQ

How do I start QA automation from scratch with no existing tests?

Install Playwright, write one smoke test against the homepage, wire it into your CI pipeline as a required check, and merge it the same day. Once the loop is proven, add the two or three flows that would cause an immediate customer escalation if they broke. From there you can either keep writing tests by hand or point Assrt at the running app to generate the long tail automatically.

How long does it take to set up QA automation?

One day to get the first test green in CI. One week to cover the three highest-impact flows. One month to reach a suite of twenty hand-written tests with under 1% flakiness. With Assrt generating the long tail, you can layer on another fifty AI-generated tests in a single afternoon. The bottleneck after that is not authoring, it is review.

Should QA automation replace manual testers?

No. Automation absorbs the regression checks that humans hated doing anyway. The humans shift to exploratory testing, edge-case discovery, accessibility audits, and incident review. The output of the QA function goes up; the headcount does not need to change. The teams that try to automate humans out of the loop end up with brittle suites and angry engineers.

What is the difference between QA automation and unit testing?

Unit tests verify individual functions in isolation, usually without a browser, in milliseconds. QA automation verifies the entire application end to end, in a real browser, the way a user would interact with it. Both matter. Unit tests catch logic bugs early; QA automation catches integration bugs that only appear when the full stack is wired together. A healthy product has both.

How much does QA automation cost?

With Playwright and Assrt, the license cost is zero. The real cost is engineering time: roughly four to eight hours to wire the harness into CI and another hour per hand-written test. AI generation collapses the per-test number to under a minute. Compare that to managed vendors that start at $7,500 per month with annual contracts and still bill you per execution.

Does QA automation work for mobile applications?

Playwright targets web only. For native mobile, the equivalent runners are Appium, Maestro, and Detox. The principles in this guide transfer: write code, version it in git, run it in CI, kill flakiness early, measure mean time to feedback. For mobile web (Safari on iOS, Chrome on Android), Playwright handles it natively through device emulation.

What happens to my tests if Assrt shuts down?

Nothing. The tests Assrt generates are standard Playwright spec files that live in your git repository. You keep running them with npx playwright test forever. Assrt is open-source, so the source is available for audit and forking. This is the entire reason to choose real code over proprietary YAML: zero lock-in is a property of the artifact, not a promise from the vendor.

Related Guides

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

$npm install @assrt/sdk