Testing Guide
Playwright v1.56 AI Agents: Planner, Generator, and Healer
Playwright v1.56 introduced three AI agent modes that address different parts of the test lifecycle. Healer is the most immediately useful. Generator needs careful review. Planner requires deliberate prompt strategy. Here is how they work together.
“Playwright v1.56 introduced three AI agent modes: Planner for test scenario design, Generator for code output, and Healer for self-repair. Early adopters report Healer alone cuts selector maintenance time by half.”
Playwright release notes
1. The Agent Trio: What v1.56 Introduced
Playwright v1.56 marked a significant shift in how the framework approaches AI integration. Rather than adding a single "AI-powered" feature, the team introduced three distinct agent modes, each targeting a different pain point in the test automation lifecycle. The Planner agent helps design test scenarios from application descriptions. The Generator agent produces test code from natural language specifications. The Healer agent automatically repairs broken selectors and adapts tests to DOM changes.
The separation into three agents reflects a mature understanding of where AI adds value in testing. Planning, generating, and maintaining tests are fundamentally different tasks that require different approaches. A system that plans test scenarios needs to understand application behavior and user flows. A system that generates code needs to produce syntactically correct, idiomatically clean test files. A system that heals tests needs to understand DOM changes and selector equivalence. Trying to do all three with a single model and prompt would compromise each capability.
Each agent can be used independently or in combination. Teams can adopt the Healer immediately for existing test suites, experiment with the Generator for new tests, and explore the Planner for test design sessions. This modular approach lets teams adopt AI capabilities incrementally based on where they feel the most pain rather than requiring an all-or-nothing commitment.
2. Healer: The Biggest Immediate Impact
The Healer agent addresses the single biggest time sink in test maintenance: broken selectors. When a test fails because a selector no longer matches, the Healer analyzes the current page, identifies the intended target element using multiple signals (role, text, position, surrounding context), and suggests or automatically applies a corrected selector. Early adopters report that Healer reduces selector maintenance time by approximately 50%, which for teams spending twenty hours per sprint on selector fixes translates to ten hours recovered for other work.
What makes Healer particularly effective is that it solves a well-constrained problem. The input is clear (a failing selector and the current page state), the success criteria is objective (does the new selector target the correct element?), and the verification is automatic (re-run the test and see if it passes). Unlike test generation, where quality is subjective and context-dependent, selector healing has a measurable right answer. This is why Healer's accuracy rates are significantly higher than Generator's first-pass quality.
The Healer works best when combined with the re-run strategy discussed in self-healing literature. When it detects a broken selector, it fixes the selector and re-runs the test from the beginning. This ensures that the healed selector works in the context of a clean application state, avoiding the state drift issues that plague mid-flow patching approaches.
For teams evaluating which agent to adopt first, Healer is the clear recommendation. It requires no changes to your existing test code, no new workflows, and no training. You enable it in your Playwright configuration, and it starts reducing selector- related test failures immediately. The ROI is measurable from the first week.
3. Generator: Useful but Needs Heavy Review
The Generator agent takes a natural language description of a test scenario and produces a complete Playwright test file. You describe what you want to test ("verify that a user can add an item to their cart and complete checkout with a credit card"), and the Generator produces a test file with navigation steps, interactions, and assertions. The output is standard Playwright code that you can run immediately.
The quality of Generator output varies significantly based on the complexity of the flow and the specificity of the prompt. For simple, linear flows (login, form submission, single-page interactions), the Generator produces good first drafts that need minor adjustments. For complex flows with conditional logic, multi-step wizards, or interactions between multiple application areas, the output often needs substantial restructuring.
The most common issues with Generator output are: assertions that are too shallow (checking element visibility rather than content correctness), selectors that work but are more fragile than necessary (using text matching when getByRole would be more resilient), and missing edge case handling (not checking error states, empty states, or loading states). These issues are consistent enough that teams can develop a review checklist specifically for Generator output.
The practical approach to Generator is to treat it as a first-draft tool rather than a finished-code tool. Use it to scaffold new tests quickly, then review and refine the output against your team's standards. The time savings compared to writing from scratch are real (typically 30% to 50% faster for simple tests), but the review step is not optional. Skipping review leads to a test suite full of shallow, fragile tests that generate noise without catching real defects.
4. Planner: Prompt Strategy for Application Logic
The Planner agent is the most conceptually ambitious of the three. Given a description of an application feature or user flow, it produces a structured test plan: a set of test scenarios with preconditions, steps, expected outcomes, and edge cases to consider. The plan can then be fed to the Generator to produce actual test code, or used as a guide for manual test authoring.
The quality of Planner output depends heavily on the quality of the input. Generic prompts like "plan tests for the checkout feature" produce generic plans with obvious scenarios that any experienced QA engineer would identify in minutes. Specific prompts that include application context produce significantly better results. The key is providing the Planner with information about your application's specific business rules, known edge cases, integration points, and user personas.
An effective prompt strategy for the Planner includes three components. First, describe the feature's functional requirements in plain language, including what happens on success and what happens on various failure conditions. Second, provide context about related features and integration points (does checkout interact with inventory? does it trigger email notifications?). Third, mention any known edge cases or areas where bugs have appeared historically.
Teams report the best results when they use the Planner as a brainstorming partner rather than an authoritative source. The Planner generates a broad set of scenarios, the QA team reviews and filters them, adds domain-specific scenarios the Planner missed, and prioritizes based on risk. This collaborative approach leverages the Planner's breadth (it considers many scenarios quickly) while relying on human judgment for depth (which scenarios actually matter for this specific application).
5. How the Three Agents Compose Together
The agents form a pipeline that covers the entire test lifecycle. The Planner produces test scenarios. The Generator turns those scenarios into code. The Healer maintains the code as the application evolves. In theory, this pipeline can operate with minimal human intervention. In practice, human oversight is essential at each stage, with the amount of oversight decreasing as you move from Planner to Healer.
The Planner-to-Generator handoff works best when the Planner output is reviewed and refined before being passed to the Generator. Feeding unreviewed Planner scenarios directly into the Generator produces test code for scenarios that may be irrelevant, redundant, or incorrectly specified. Taking ten minutes to review and curate the Planner's output saves hours of reviewing bad generated code downstream.
The Generator-to-Healer relationship is more automatic. Once Generator output has been reviewed, approved, and merged, the Healer takes over maintenance responsibility for the selectors in those tests. This is where the real long-term value emerges: the Generator reduces the cost of creating new tests, and the Healer reduces the cost of maintaining them. Combined, the two agents address the largest cost centers in test automation.
Assrt provides a complementary approach to this agent pipeline by handling the discovery phase that precedes planning. Before you can plan tests, you need to know what your application does. Assrt crawls your application, identifies user flows, and generates Playwright tests with self-healing selectors built in. This fills the gap between "we need tests" and "we know what to test," which the Planner agent assumes someone has already figured out.
6. Practical Adoption Strategy
The recommended adoption sequence is Healer first, Generator second, Planner third. This order follows the principle of starting with the highest-confidence, lowest-risk capability and moving toward more experimental ones. Healer has the clearest value proposition, the most objective success criteria, and the least risk of producing bad output that goes unnoticed.
Start by enabling Healer on your existing test suite and running it for two to three weeks without other changes. Monitor the healing events to understand how often your selectors break, what types of changes cause breakages, and whether the Healer's fixes are correct. This baseline data informs your confidence in the Healer and helps you tune its settings (confidence thresholds, re-run behavior, logging verbosity).
Once the Healer is stable, introduce the Generator for new test development. Establish a review process specifically for Generator output: a checklist, a designated reviewer with test architecture experience, and a policy for how much modification is acceptable before regenerating versus hand-editing. Track the Generator's first-pass acceptance rate over time. If it stays below 50%, your prompts or conventions documentation needs improvement.
The Planner is the last addition and the most optional. Many teams find that their existing test planning processes (sprint planning, backlog grooming, dedicated QA planning sessions) work well enough and that adding the Planner does not significantly improve scenario coverage. The Planner shines for teams that are building test coverage from scratch for a new feature or application, where the breadth of its scenario suggestions is most valuable. For mature applications with established test suites, the Planner's incremental value is smaller.
Ready to automate your testing?
Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.