QA Strategy

Flaky Test Selector Strategies: Stop Retrying, Start Fixing

If your QA team is retrying tests 40 times to get a green build, the problem is not infrastructure. It is selector fragility. This guide covers practical strategies for building selectors that survive real-world UI changes without relying on brute-force retries.

0

Generates standard Playwright files you can inspect, modify, and run in any CI pipeline.

Open-source test automation

1. The Retry Trap and Why It Makes Things Worse

When a test fails intermittently, the most common response is to add retries. Retry once. Retry three times. Some teams configure their test runner to retry up to 40 times before marking a test as failed. This approach masks the underlying problem and creates a false sense of stability. The test suite appears green, but it is actually running five times longer than it should and hiding real regressions behind successful retries.

Retries also waste CI resources. If a test takes 10 seconds and retries 5 times on average, that single test consumes 50 seconds of pipeline time. Multiply that by the number of flaky tests in the suite and you are paying for minutes or hours of wasted compute on every pipeline run. Some teams report that retries account for 30% to 50% of their total CI runtime.

The deeper problem is that retries erode trust. When the team knows that tests are flaky and retries are expected, they stop investigating failures. A real regression can hide behind a retry and reach production because everyone assumed it was "just flakiness." The only sustainable fix is to address the root cause, which in most cases is selector fragility.

2. Anatomy of a Fragile Selector

A fragile selector is one that depends on implementation details that change frequently. The classic example is a CSS class name generated by a build tool:.btn_a3f2c. This class name changes on every build, so any test that uses it will break on every deployment. But fragility exists on a spectrum. Even human-written class names like.submit-buttoncan change during a design system migration or CSS refactor.

Deep DOM path selectors are another common source of fragility. A selector likediv > div:nth-child(3) > form > buttonbreaks whenever the page layout changes, even if the button itself is unchanged. Adding a wrapper div, reordering sections, or restructuring a component hierarchy will all invalidate this selector. The deeper the path, the more opportunities for breakage.

XPath selectors that use absolute paths have the same problem but are often even longer and harder to debug. Selectors based on element indices (the third button on the page, the second input in the form) break whenever a new element is added before the target. All of these patterns share a common flaw: they describe where the element is rather than what the element is.

Try Assrt for free

Open-source AI testing framework. No signup required.

Get Started

3. Building Resilient Selectors from Scratch

The principle behind resilient selectors is simple: select elements by their meaning, not their position or styling. A button that submits a login form should be found by its role and accessible name, not by its CSS class or DOM position. Playwright's locator API is designed around this principle with methods likepage.getByRole('button', { name: 'Sign In' }).

When accessible names are not available, use text content as the next best option. A link that says "View Dashboard" can be found withpage.getByText('View Dashboard')regardless of its class, ID, or position. For elements without visible text (icons, images, decorative elements), adddata-testidattributes as explicit test anchors. These survive refactors because they are a contract between the application and its tests.

For complex scenarios where multiple elements match the same text or role, use Playwright's filtering and chaining capabilities. Narrow your selection by finding the element within a specific section:page.getByRole('navigation').getByRole('link', { name: 'Home' }). This structural context makes the selector specific without making it fragile, because navigation landmarks rarely change position.

4. Selector Strategies Ranked by Stability

Not all selector strategies are equally stable. Based on real-world data from large test suites, here is a ranking from most to least resilient. Test IDs (data-testid) are the most stable because they are explicitly maintained for testing. ARIA roles and labels are next because they are maintained for accessibility, which provides a similar durability guarantee. Visible text content is slightly less stable because text changes during copy updates and localization, but it remains a strong choice for stable UI labels.

Below these top-tier strategies, placeholder text and label associations (getByPlaceholder, getByLabel) provide good stability for form elements. Alt text (getByAltText) works well for images. Human-written CSS classes are moderately stable but carry refactoring risk. Generated CSS classes, XPath with indices, and deep DOM path selectors are at the bottom of the stability ranking and should be avoided in production test suites.

The practical recommendation is to use the highest-stability strategy available for each element. Prefer getByRole, fall back to getByText, and only use CSS selectors or data-testid when semantic strategies are insufficient. This layered approach gives you the best balance of specificity and resilience.

5. Tools That Generate Stable Selectors Automatically

Writing resilient selectors by hand is possible but time-consuming. Several tools can generate them automatically. Playwright's codegen tool records browser interactions and generates selectors using its built-in locator strategies. While the generated selectors are generally good, codegen only covers the flows you manually perform during recording.

Assrt goes further by auto-discovering test scenarios across your entire application. It crawls the running app, identifies interactive elements, and generates Playwright test files with selectors that prioritize accessibility attributes and text content over implementation-dependent CSS. Because Assrt observes the actual DOM structure, it can choose the most stable selector strategy for each element based on what attributes are available.

Commercial options like Testim and Mabl use machine learning to maintain selector stability over time, but they come with vendor lock-in and significant cost. The advantage of tools that output standard Playwright files (like Assrt and codegen) is that you own the generated tests completely. You can modify them, extend them, and run them without any proprietary runtime.

6. Measuring and Monitoring Selector Health

Fixing selectors once is not enough. Selector health degrades over time as the application evolves. Implement monitoring by tracking the retry rate per test, the failure rate per selector strategy, and the time spent on selector maintenance each sprint. These metrics reveal which tests and which selector patterns are causing the most friction.

Create a flake dashboard that shows the most frequently failing tests over the past week. Review this dashboard weekly and address the top offenders. Teams that do this consistently report flake rates dropping from double digits to below 1% within a quarter. The key is consistency: selector health improves when it is treated as ongoing maintenance, not a one-time cleanup.

When you identify a frequently failing test, check which selector strategy it uses. If it uses CSS selectors, replace them with role-based or text-based locators. If it uses text that changes often, switch to data-testid. If the flakiness is not selector-related (timing, state, network), address those root causes separately. The goal is a test suite where retries are unnecessary because selectors are resilient by design.

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

$npm install @assrt/sdk