Test Automation
AI Playwright Test Generation: What Actually Works in 2026
Writing and maintaining Playwright selectors by hand is one of the biggest time sinks in test automation. Here's how teams are using AI to generate real, runnable test code instead.
“60% of end-to-end test maintenance effort goes toward updating selectors and locators after UI changes, not fixing actual test logic.”
Playwright Community Survey, 2025
1. The Selector Maintenance Problem
Anyone who has maintained a Playwright or Selenium test suite for more than six months knows the pain. A designer changes the navigation layout, and fifteen tests break because they relied on a CSS class that no longer exists. A React refactor swaps a div for a section element, and your XPath selectors silently start targeting the wrong node. A component library update renames data-testid attributes across forty components, and suddenly your entire checkout flow suite is red.
Playwright's built-in locator strategies (role-based selectors, text matching, and the accessibility tree) have improved this situation significantly compared to the raw CSS and XPath days. But even with best-practice locators, the fundamental problem remains: someone has to write the initial selectors, keep them in sync with the evolving UI, and debug the failures when they break. For a large application with hundreds of user flows, this maintenance overhead is enormous.
The real cost is not just the time spent fixing broken selectors. It is the opportunity cost of QA engineers spending their week updating locators instead of improving test coverage, investigating production issues, or building better test infrastructure. This is the gap that AI test generation tools are trying to close.
2. Manual Playwright and Cypress: The Baseline
Before evaluating AI alternatives, it is worth understanding what good manual test authoring looks like in 2026. Playwright has become the dominant framework for browser automation, surpassing Cypress in adoption for new projects (though Cypress remains widely used in existing codebases). Playwright's advantages include multi-browser support, better handling of iframes and shadow DOM, native network interception, and a more flexible execution model that supports parallelism out of the box.
The standard workflow for manual test writing looks like this: a QA engineer or developer identifies a user flow to test, uses Playwright Codegen to record a rough draft, then refactors the generated code to use stable locators, add meaningful assertions, and handle timing issues. The recording step saves time on the initial draft, but the refactoring step is where most of the effort goes. Codegen tends to produce brittle selectors that need manual improvement.
Cypress follows a similar workflow with its own recording tools and selector strategies. The main tradeoff compared to Playwright is that Cypress runs inside the browser (which gives it access to the application's JavaScript context for stubbing) but limits it to Chromium-based browsers for most use cases. For teams that only need to test in Chrome, this tradeoff may be acceptable.
The manual approach works well for small to medium applications with stable UIs. Where it breaks down is at scale: applications with hundreds of pages, frequent UI changes, or multiple teams shipping to the same codebase. At that point, the maintenance overhead exceeds what a QA team can reasonably handle, and teams start looking for automation.
3. Proprietary Platforms: Convenience vs. Lock-In
Several commercial platforms have emerged to address the test maintenance problem. Tools like Momentic, Testim (now part of Tricentis), Mabl, and Katalon offer visual test builders, AI- powered selector healing, and cloud execution infrastructure. The pitch is compelling: record your tests visually, let AI handle selector updates when the UI changes, and run everything in the cloud without managing infrastructure.
These platforms genuinely reduce the barrier to entry for test automation. A product manager or manual tester can create end-to-end tests without writing code. The AI-powered self-healing features can reduce selector maintenance by 40% to 70% according to vendor claims, and the real-world experience of many teams confirms that the reduction is significant, if not always as dramatic as advertised.
The tradeoff is lock-in. Tests created in these platforms exist in proprietary formats. You cannot run them locally with a standard test runner. You cannot integrate them into your existing CI pipeline the same way you would Playwright tests. If the vendor raises prices, changes their product direction, or shuts down, your test suite is stranded. For enterprise teams with compliance requirements around test artifact ownership, this can be a dealbreaker.
There is also the question of test quality. Visual test builders make it easy to create tests, but they also make it easy to create bad tests: tests with implicit waits instead of explicit conditions, tests that verify CSS properties instead of functional behavior, tests that pass in the cloud environment but fail locally because of timing differences. The convenience of no-code creation can mask the need for solid test design principles.
4. AI Test Discovery and Open-Source Code Generation
A newer category of tools takes a fundamentally different approach. Instead of recording individual tests or translating natural language into test code, these tools crawl your running application, automatically discover testable user flows, and output standard framework code that you own and can run anywhere. The key distinction is that the output is real Playwright (or Cypress, or whatever framework you choose) code, not a proprietary representation.
Assrt is one example of this approach. You point it at your application URL, it crawls the pages, identifies interactive elements, discovers multi-step user flows, and generates Playwright test files that you can check into your repository and run with your existing test infrastructure. The generated code uses accessibility- tree selectors where possible and includes self-healing logic that adapts when selectors change. Because the output is standard Playwright, you can modify it, extend it, and integrate it into your CI pipeline with no vendor dependency.
Other tools in this space include Playwright Codegen (built into Playwright itself, though limited to recording rather than discovery), various LLM-based generators that produce test code from application descriptions, and internal tools that large companies have built by combining crawling with their own AI models. The common thread is that they output code in a standard format that the team owns.
The advantage of this approach is clear: you get the productivity benefits of AI generation without sacrificing ownership or flexibility. The generated tests are just files in your repository. You can review them in pull requests, track their history in git, run them locally or in any CI system, and modify them as needed. When you outgrow the tool or want to switch approaches, your tests still work because they were always just Playwright code.
5. Choosing the Right Approach for Your Team
The right approach depends on your team's size, technical depth, and priorities. For small teams with strong engineering culture, manual Playwright with AI-assisted code generation (through Copilot or similar) may be sufficient. You get full control, no vendor dependencies, and the flexibility to customize everything. The tradeoff is higher upfront investment in writing and maintaining tests.
For teams that need to scale test coverage quickly and have non-technical stakeholders creating tests, a proprietary platform can be the right choice. The faster time-to-value and lower technical barrier can outweigh the lock-in concerns, especially if you are an early-stage company that needs coverage now and can migrate later if needed.
For teams that want AI-powered productivity without vendor lock-in, open-source discovery tools that output standard framework code offer the best of both worlds. You get automatic test generation and selector maintenance without giving up ownership of your test suite. This approach works particularly well for teams that already have Playwright infrastructure and want to accelerate coverage without changing their workflow.
Regardless of which approach you choose, the most important principle is to keep your tests in a standard, portable format. The frameworks will evolve, the AI tools will improve, and your application will change constantly. The teams that maintain their test suites as first-class code artifacts, version-controlled and reviewed alongside application code, will always have the most flexibility to adapt.