AI Testing

Automated AI Testing: Real Playwright Code You Own, Not Vendor YAML You Rent

Q: Can automated AI testing handle apps that require login?

Yes. Assrt accepts a Playwright storageState file or a login script and reuses the authenticated session for every generated test, so specs start already logged in. Multi-tenant apps are handled with per-tenant storageState files.

Q: Does automated AI testing replace the QA team?

No. Automated AI testing absorbs regression work so QA engineers can focus on exploratory testing, test planning, edge-case discovery, and incident review. Output goes up, not headcount.

Q: How does Assrt avoid shipping broken generated tests?

Every generated spec is executed against the live application before it is written to disk. Failing tests enter a repair loop with three retries, and tests that still fail are discarded, never committed.

Q: Which LLM does Assrt use, and can I swap it?

Assrt defaults to Claude but supports any OpenAI-compatible endpoint. You can point it at OpenAI, Anthropic, Azure OpenAI, a self-hosted vLLM deployment, or an internal gateway.

Q: Can automated AI testing handle visual regressions?

Yes. Playwright ships with screenshot assertions and Assrt generates tests that use them when pages have meaningful visual state. Visual diffs live next to functional tests and run on the same CI job.

Automated AI testing uses language models to crawl your running application, discover every user flow, and emit executable test code. The critical question is what the AI writes. This guide covers the full loop (plan, generate, execute, heal) with runnable TypeScript and compares tools that output real Playwright against vendors that trap you in proprietary YAML behind a monthly invoice.

$7,500/mo

“QA Wolf starts at $7,500 per month with an annual contract. Assrt is free, open-source, and self-hosted, and the Playwright files it generates belong to you permanently.”

QA Wolf public pricing, 2025

0%Test authoring time removed

$0Assrt license cost per seat

0minFrom URL to validated suite

0%Standard Playwright output

The Automated AI Testing Loop

1. What Automated AI Testing Actually Means

Automated AI testing is the combination of two ideas that used to live in separate tools. Automated testing means a machine runs the browser without a human clicking through it. AI testing means a language model decides what to click, what to assert, and how to recover when a selector drifts. Put them together and you get a pipeline where a developer points a CLI at a running URL and the machine produces a validated test suite in minutes.

The term is used loosely in marketing material, so it is worth being specific. Automated AI testing is not screen recorders that replay a click sequence. It is not brittle CSS selector harvesters that snapshot a DOM. It is a closed loop of plan, generate, execute, and heal in which the model reads the application like a user and writes code that another developer could read, review, and edit by hand.

The decisive question when choosing a tool is what file lands on disk. If the artifact is a proprietary YAML file that only runs inside a vendor cloud, your test investment is a rental. If the artifact is a standard .spec.ts file that runs with npx playwright test, your tests are durable assets that outlive the tool that wrote them. Assrt is built around the second answer.

What Makes Automated AI Testing Different

Reads a running application, not a requirements document or a Figma file
Plans user journeys the way a human QA engineer would: signup, login, CRUD, edge cases
Generates code that uses accessible locators (getByRole, getByLabel) not CSS selectors
Executes every test against the live app before writing it to disk
Self-heals selectors when the UI shifts, so maintenance drops toward zero
Produces artifacts you commit to git: plain TypeScript, readable, editable, forkable
Runs on your laptop, on a container, or on any CI pipeline that understands Node

2. Real Playwright Code vs Proprietary YAML

Every automated AI testing vendor claims to produce reusable artifacts. Then you read the output and realize it is a JSON or YAML file that only means something to their runtime. This is the single largest trap in the category and it is worth showing side by side.

The Same Checkout Test, Two Output Formats

// Assrt output: standard Playwright you own forever
import { test, expect } from '@playwright/test';

test('user completes checkout with valid card', async ({ page }) => {
  await page.goto('/cart');
  await page.getByRole('button', { name: 'Checkout' }).click();

  await page.getByLabel('Full name').fill('Priya Raman');
  await page.getByLabel('Email').fill('priya@example.com');
  await page.getByLabel('Shipping address').fill('1 Market St');

  const cardFrame = page.frameLocator('iframe[title="Secure card"]');
  await cardFrame.getByPlaceholder('Card number').fill('4242424242424242');
  await cardFrame.getByPlaceholder('MM / YY').fill('12/30');
  await cardFrame.getByPlaceholder('CVC').fill('123');

  await page.getByRole('button', { name: 'Pay now' }).click();
  await expect(page.getByRole('heading', { name: 'Order confirmed' })).toBeVisible();
});

-44% fewer lines

The Assrt output is a file you can paste into any Playwright project. The proprietary output on the right looks similar but cannot run anywhere except inside the vendor runtime. Cancel the subscription and the tests stop working. Export them and you get a dump that no open-source tool understands. This is the rent trap.

Artifact Ownership: Rented vs Owned

❌

Rent

Vendor YAML tied to their cloud runtime

⚙️

Invoice

Monthly bill required to execute tests

❌

Cancel

Artifacts stop working the moment you churn

✅

Own

Standard Playwright in your git repo

🔒

Forever

Tests keep running even if Assrt disappears

tests/ai/ownership-check.sh

3. Architecture of an Automated AI Test Runner

Understanding the architecture helps you spot tools that take shortcuts. A serious automated AI testing system has four stages, not three. Most vendors skip the validation stage and ship tests that look plausible but break on first run. Assrt validates every file before it touches disk.

Assrt's Four-Stage Pipeline

🌐

Crawl

Chromium walks every route reachable from the URL

↪️

Plan

LLM groups interactions into scenarios

⚙️

Generate

Emits Playwright code with accessible locators

✅

Validate

Runs each file, discards or heals failures

🔒

Commit

Writes passing tests into your repo

Stage 1: Crawl

A headless Chromium instance starts from your URL and follows every link, form, and route transition it can discover. It records DOM snapshots, the ARIA accessibility tree, form definitions, network traffic, and cookies. The output is a directed graph of application states with edges labeled by the actions that connect them.

Stage 2: Plan

The graph and a structured system prompt go to the language model. The model reasons about which sequences of actions form a meaningful user journey and outputs a test plan. A plan entry looks less like code and more like a design doc: flow name, preconditions, ordered steps, and the observable outcome that should be asserted.

Stage 3: Generate

Each plan entry is compiled into a Playwright spec file. The generator prefers role-based locators over CSS selectors because role-based locators survive refactors. If a form field has a visible label, the generator uses getByLabel. If a button has an accessible name, it uses getByRole. CSS selectors appear only when nothing else matches.

Stage 4: Validate

Generated files are executed against the live app in a sandboxed worker. Passing tests are written to disk. Failing tests enter a repair loop: the model receives the failure message and retries with a corrected locator or assertion. If three retries fail, the test is dropped rather than shipped broken. You never commit a red test generated by Assrt.

Four Stages in One Command

tests/ai/generated-example.spec.ts

Own every test file you generate

Assrt discovers your app, emits real Playwright, and validates every spec against your running build. Open-source, self-hosted, zero lock-in.

4. Scenario: Checkout Flow With Stripe Elements

Checkout flows are the highest-stakes tests in any product. Revenue runs through them. They also touch iframes, third-party scripts, and asynchronous state changes, which is where brittle test runners fall apart. Here is what Assrt generates for a Stripe Elements checkout on a storefront.

Happy path with valid test card

Moderate

tests/ai/checkout-happy.spec.ts

Declined card surfaces inline error

Moderate

tests/ai/checkout-declined.spec.ts

Network partition during payment

Complex

tests/ai/checkout-offline.spec.ts

5. Scenario: Authenticated Multi-Tenant Dashboard

Most SaaS tests live behind a login wall and target a specific tenant. Automated AI testing has to handle both the authentication step and the tenant scoping. Assrt does this by reusing a stored Playwright storageState from a login fixture so every generated test starts already logged in.

Tenant isolation: data does not leak across workspaces

Complex

tests/ai/tenant-isolation.spec.ts

Role-based access: viewer cannot edit

Moderate

tests/ai/rbac-viewer.spec.ts

Generating Authenticated Tests

6. Scenario: Form Validation and Error States

Forms are the place where automated AI testing shines because the state space is tedious for humans to enumerate. Empty required fields, out-of-range numbers, mismatched confirm passwords, invalid emails, the same form submitted twice. Assrt walks the combinations systematically and writes one test per meaningful failure mode.

Required field validation

Straightforward

tests/ai/form-required.spec.ts

Email format validation with aria-invalid

Moderate

tests/ai/form-email.spec.ts

Idempotent submission: double click does not double send

Complex

tests/ai/form-idempotent.spec.ts

7. Self-Hosting: Docker, CI, and Your Own LLM

Because Assrt is open-source and ships as a Node package, you can run it anywhere Node runs. That includes your laptop, your internal Kubernetes cluster, a GitHub Actions runner, a GitLab CI job, or a self-hosted LLM endpoint. There is no phone-home, no account required, and no rate limit beyond what your own LLM budget enforces.

Dockerfile

.github/workflows/ai-testing.yml

Self-Hosted Guarantees

Runs on any Node 18+ environment: laptop, Docker, CI, Kubernetes, bare metal
Point to your own LLM endpoint (Anthropic, OpenAI, vLLM, Ollama, internal gateway)
No telemetry, no phone-home, no vendor account required to execute
Playwright artifacts (traces, videos, HAR) stay on your infrastructure
You control retention, encryption, and access of test evidence
Compliance teams review the same code the AI ships, not a SaaS black box

8. Migration From Vendor Lock-In to Owned Code

If you are currently running an automated AI testing tool that emits proprietary configs, the migration path to Assrt is simpler than you would expect. You do not translate the old files. Assrt regenerates the entire suite from your running application, and because the output is standard Playwright, the new suite drops straight into your existing CI pipeline.

Migration in Five Steps

🔒

Snapshot

Freeze your current proprietary suite as a reference

⚙️

Run Assrt

Generate fresh Playwright from the same URL

↪️

Diff Coverage

Compare scenarios, add missing cases by hand

🌐

Dual-Run

Ship both suites for a week, compare failures

✅

Cut Over

Cancel the vendor, delete the proprietary files

scripts/migration-coverage-diff.ts

Running the Coverage Diff

9. Cost Breakdown: Assrt vs QA Wolf vs mabl vs Testim

The price tags in this market vary by two orders of magnitude. QA Wolf starts at $7,500 per month. mabl and Testim sit in the mid-hundreds. Assrt is free. What you actually pay, though, is more than the license. Here is the honest first-year total cost of ownership for a 50-test automated AI testing program.

Tool	License	Output	Year 1 TCO	Lock-in
Assrt	Free, OSS	Standard Playwright	$4,200 (review time)	None
QA Wolf	$7,500+/mo	Playwright in cloud	$90,000 + integration	High
mabl	$500+/mo	Proprietary JSON	$6,000 + migration	High
Testim (Tricentis)	$450+/mo	Testim runtime JS	$5,400 + migration	High
Octomind	$500+/mo	Playwright (managed)	$6,000 + platform	Medium
Momentic	$300+/mo	Proprietary steps	$3,600 + rebuild	High

The headline is that Assrt is between 1.4 times and 21 times cheaper on year-one TCO depending on the comparison. The softer but more important number is what happens in year three. With Assrt, your tests are git-tracked TypeScript, so the marginal cost of keeping them is the marginal cost of running Playwright, which you were going to pay anyway. With proprietary vendors, every additional year is another full license payment and you still own nothing at the end of it.

scripts/tco-calculator.ts

What You Ship to Git

// With Assrt: every file is reviewable code
// tests/ai/checkout.spec.ts
import { test, expect } from '@playwright/test';

test('checkout completes', async ({ page }) => {
  await page.goto('/cart');
  await page.getByRole('button', { name: 'Checkout' }).click();
  // ... 20 more lines of readable Playwright
});
// Git diff shows every change. Review is a code review.
// Rollback is a git revert. Ownership is total.

30% fewer lines

10. FAQ

What is the difference between automated AI testing and AI-assisted testing?

Automated AI testing runs the full loop without human intervention: crawl, plan, generate, execute, heal. AI-assisted testing keeps a human in the authoring loop and uses the model as a copilot that suggests selectors or drafts a single test at a time. Assrt supports both modes. The CLI runs end-to-end on its own; the VS Code extension offers inline suggestions while a human writes a test by hand.

Can automated AI testing handle apps that require login?

Yes. Assrt accepts a Playwright storageState file or a login script. It reuses the authenticated session for every generated test, which means generated specs start already logged in and can target any route behind the login wall. Multi-tenant apps are handled by passing a per-tenant storageState.

Does automated AI testing replace the QA team?

No, and the teams that treat it that way regret it. Automated AI testing absorbs the tedious regression work that a QA team already hated writing. The humans shift to exploratory testing, test planning, edge-case discovery, and incident review. The output of the QA team goes up, not the headcount.

How does Assrt avoid shipping broken generated tests?

Every generated spec file is executed against the live application before it is written to disk. Tests that fail enter a repair loop with up to three retries. Tests that still fail are discarded, not committed. Teams never see a red test file that Assrt generated because Assrt would have dropped it before it landed in the repo.

What happens if Assrt shuts down tomorrow?

Nothing happens to your tests. They are standard Playwright files in your git repository. You keep running them with npx playwright test forever. Assrt is open-source, so the source code is available for audit and forking. This is the whole point of generating real code instead of a proprietary format.

Which LLM does Assrt use, and can I swap it?

Assrt defaults to Claude for planning and generation but supports any OpenAI-compatible endpoint. You can point it at OpenAI, Anthropic, Azure OpenAI, a self-hosted vLLM deployment, or an internal gateway. Cost and latency are tunable because the LLM call is a standard HTTP request you control.

Can automated AI testing handle visual regressions?

Yes. Playwright ships with screenshot assertions out of the box and Assrt generates tests that use them when a page has visual state worth snapshotting. Visual diffs live next to functional tests in the same tests/ai folder and run on the same CI job.

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

View on GitHub

Automated AI Testing: Real Playwright Code You Own, Not Vendor YAML You Rent

1. What Automated AI Testing Actually Means

2. Real Playwright Code vs Proprietary YAML

3. Architecture of an Automated AI Test Runner

Stage 1: Crawl

Stage 2: Plan

Stage 3: Generate

Stage 4: Validate

4. Scenario: Checkout Flow With Stripe Elements

Happy path with valid test card

Declined card surfaces inline error

Network partition during payment

5. Scenario: Authenticated Multi-Tenant Dashboard

Tenant isolation: data does not leak across workspaces

Role-based access: viewer cannot edit

6. Scenario: Form Validation and Error States

Required field validation

Email format validation with aria-invalid

Idempotent submission: double click does not double send

7. Self-Hosting: Docker, CI, and Your Own LLM

8. Migration From Vendor Lock-In to Owned Code

9. Cost Breakdown: Assrt vs QA Wolf vs mabl vs Testim

10. FAQ

What is the difference between automated AI testing and AI-assisted testing?

Can automated AI testing handle apps that require login?

Does automated AI testing replace the QA team?

How does Assrt avoid shipping broken generated tests?

What happens if Assrt shuts down tomorrow?

Which LLM does Assrt use, and can I swap it?

Can automated AI testing handle visual regressions?

Related Guides

Self-Healing Test Automation

Natural Language Test Automation

Modern E2E Testing: AI-Powered

Ready to automate your testing?

Comments ()

1. What Automated AI Testing Actually Means

2. Real Playwright Code vs Proprietary YAML

3. Architecture of an Automated AI Test Runner

Stage 1: Crawl

Stage 2: Plan

Stage 3: Generate

Stage 4: Validate

4. Scenario: Checkout Flow With Stripe Elements

Happy path with valid test card

Declined card surfaces inline error

Network partition during payment

5. Scenario: Authenticated Multi-Tenant Dashboard

Tenant isolation: data does not leak across workspaces

Role-based access: viewer cannot edit

6. Scenario: Form Validation and Error States

Required field validation

Email format validation with aria-invalid

Idempotent submission: double click does not double send

7. Self-Hosting: Docker, CI, and Your Own LLM

8. Migration From Vendor Lock-In to Owned Code

9. Cost Breakdown: Assrt vs QA Wolf vs mabl vs Testim

10. FAQ

What is the difference between automated AI testing and AI-assisted testing?

Can automated AI testing handle apps that require login?

Does automated AI testing replace the QA team?

How does Assrt avoid shipping broken generated tests?

What happens if Assrt shuts down tomorrow?

Which LLM does Assrt use, and can I swap it?

Can automated AI testing handle visual regressions?

Related Guides

Self-Healing Test Automation

Natural Language Test Automation

Modern E2E Testing: AI-Powered

Ready to automate your testing?

Comments (••)

Comments ()