Testing Guide

AI-Assisted Playwright Test Scaffolding: Keeping Generated Code Consistent

The hardest part of AI test generation is not producing code that runs. It is producing code that fits your existing test suite structure, reuses shared page objects, and follows your team's fixture conventions.

60%

60% of AI-generated Playwright tests fail code review on the first pass due to inconsistency with existing page objects and fixture patterns.

Developer survey, 2026

1. The Consistency Problem in AI Test Generation

Most teams that experiment with AI-generated Playwright tests hit the same wall within the first week. The AI produces code that runs. It finds elements, clicks buttons, fills forms, and makes assertions. But the code looks nothing like the rest of the test suite. It inlines selectors that should come from a shared page object. It creates its own setup logic instead of using the team's custom fixtures. It names files and describes tests with conventions that clash with everything already in the repository.

This inconsistency is not a minor style issue. When AI-generated tests bypass existing page objects, they create duplicate selector definitions that drift over time. When they ignore custom fixtures, they miss shared authentication flows, database seeding, or environment configuration that the team has already built and debugged. The result is a test suite that technically has more coverage but is significantly harder to maintain because half the tests follow one pattern and half follow another.

The root cause is straightforward. AI models generate code based on their training data and the immediate prompt context. They do not inherently understand your project's architecture, your team's conventions, or the relationships between your test files. Without explicit scaffolding, they will produce generic Playwright code that ignores everything your team has built. Solving this requires treating AI test generation as a pipeline with deliberate structure, not a magic prompt.

2. Scaffolding Your AI Test Generation Pipeline

Effective AI test scaffolding starts with giving the model the right context before it writes a single line of test code. This means building a generation pipeline that feeds your existing test architecture into the prompt systematically. Think of it as creating a template system where the AI fills in the specifics while your scaffolding enforces the structure.

The first layer of scaffolding is a conventions document. This is a plain-text file (often stored at the root of your test directory) that describes your test suite's patterns: how files are named, how tests are organized, which fixtures are available and when to use them, and which page objects exist. This document becomes part of every AI generation prompt. Teams that maintain this document report a significant improvement in first-pass code review acceptance rates because the AI learns to follow the same patterns.

The second layer is example injection. Rather than describing your patterns in prose, you include two or three representative test files directly in the prompt context. The AI can then pattern-match against real code from your suite. Choose examples that demonstrate your most common patterns: a test using page objects, a test with custom fixtures, and a test with complex assertions. This approach is more reliable than written descriptions alone because models are better at imitating code patterns than following abstract rules.

The third layer is structural constraints. Instead of asking the AI to generate a complete test file from scratch, you provide a skeleton with the imports, fixture usage, and describe blocks already in place. The AI only needs to fill in the specific test steps and assertions. This drastically reduces the surface area for inconsistency because the structural decisions are already made.

Try Assrt for free

Open-source AI testing framework. No signup required.

Get Started

3. Maintaining Page Object Alignment

Page objects are the most common casualty of AI-generated tests. A well-maintained test suite might have a LoginPage class that encapsulates all selectors and actions for the login flow, a DashboardPage for navigation and widget interactions, and a CheckoutPage for the purchase flow. When AI generates a new test for the checkout process, it needs to know that CheckoutPage exists, what methods it exposes, and how to import it.

The practical solution is to include a page object index in your generation context. This is a summary file that lists every page object, its file path, and its public methods with brief descriptions. For a team with twenty page objects, this index might be a few hundred lines. That fits comfortably within a modern model's context window alongside the test conventions and examples.

When the AI generates a test that needs to interact with a page that already has a page object, it should import and use that page object rather than writing raw selectors. When it encounters a page without an existing page object, it should either flag this for human review or generate a new page object following the established pattern. The key is that the AI's default behavior should be to check the index first, not to start from scratch.

Assrt handles this by analyzing your existing test directory before generating new tests. It identifies page objects, fixtures, and helper utilities automatically and incorporates them into its generation context. This means generated tests reference your actual page objects by default, reducing the manual cleanup needed after generation. The setup is a single command: npx @m13v/assrt discover https://your-app.com

4. Custom Fixtures and Context Windows

Playwright's fixture system is one of its most powerful features, and also one of the hardest things for AI to use correctly. Teams build custom fixtures for authenticated sessions, database state, feature flags, API mocking, and dozens of other setup tasks. These fixtures encode critical knowledge about how the application works and what state it needs for testing.

When AI ignores custom fixtures, it often recreates their functionality inline. This leads to test code that logs in via the UI on every test run instead of using a fixture that injects authentication tokens directly. Or test code that manually navigates to a page through multiple clicks instead of using a fixture that opens the page with pre-configured state. These inline alternatives are slower, more fragile, and harder to maintain.

The solution is to document your fixtures in a machine-readable format that becomes part of the generation context. A simple approach is a JSON or YAML file that lists each fixture's name, what it provides, when to use it, and a short usage example. This file lives in your test directory and is updated whenever fixtures change. The overhead of maintaining this documentation is minimal compared to the time saved on reviewing and fixing AI-generated tests.

Context window limits are a practical constraint here. If your test suite has extensive page objects, dozens of fixtures, and long convention documents, you may exceed what fits in a single prompt. The workaround is prioritization: include the page objects and fixtures relevant to the specific test being generated, not the entire index. A scaffolding layer that selects context based on the target page or feature can keep the prompt focused and effective.

5. Code Review Challenges and Automation

Even with good scaffolding, AI-generated tests need human review before merging. The challenge is that reviewing AI-generated code feels different from reviewing human-written code. The generated tests are often syntactically correct and logically reasonable but miss subtle issues: assertions that verify the wrong thing, selectors that work today but are fragile, or test flows that skip important intermediate states.

Teams that succeed with AI test generation typically establish a specific review checklist for generated tests. This checklist includes verifying that page objects are used correctly, that custom fixtures are leveraged where appropriate, that assertions test meaningful behavior rather than implementation details, and that the test actually exercises the intended user flow end to end. Having a concrete checklist prevents reviewers from rubber-stamping generated code because it "looks right."

Automated lint rules can catch some consistency issues before human review. A custom ESLint rule that flags raw selectors in test files when a page object exists for that page can prevent the most common problem. Similarly, a rule that warns when test files do not import expected fixtures can catch missing setup patterns. These rules act as a safety net that catches the issues scaffolding should have prevented.

The goal is to reach a point where AI-generated tests pass code review at the same rate as human-written tests. Most teams are not there yet, but the gap is closing. With proper scaffolding, context injection, and automated checks, the first-pass acceptance rate can climb from under 40% to above 80%. The remaining 20% typically involves edge cases where the AI lacks sufficient context about application behavior to make the right testing decisions.

6. A Practical Workflow for AI Test Scaffolding

Putting this all together, a practical workflow for AI-assisted test scaffolding follows these steps. First, maintain a conventions document and a page object index in your test directory. Update these whenever your test architecture changes. Second, use a generation tool that accepts this context. Assrt does this automatically by scanning your test directory; for other tools, you may need to build a wrapper that assembles the context into prompts.

Third, generate tests against specific user flows rather than asking for broad coverage. Targeted generation produces better results because the context can be more specific. Fourth, run the generated tests immediately to verify they execute. A test that fails on first run is easier to fix than one that subtly verifies the wrong thing. Fifth, apply your automated lint checks and then send the tests through human review with the AI-specific checklist.

The workflow is iterative. Each round of generation and review reveals gaps in your conventions document or missing page objects that should be created. Over time, the scaffolding improves, the generation quality improves, and the review overhead decreases. Teams that invest in this loop report that after two to three months, AI- generated tests are nearly indistinguishable from human-written tests in style and quality.

The key insight is that AI test generation is not a replacement for test architecture. It is a productivity multiplier that works best when the architecture is solid. Teams that skip the scaffolding step and generate tests into a vacuum end up with a maintenance burden that offsets the productivity gains. Teams that invest in scaffolding first get the full benefit: more coverage, consistent style, and tests that feel like they were written by someone who understands the codebase.

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

$npm install @assrt/sdk