Testing Guide
Your UI Tests Are Not Failing: Your Product Is Evolving Faster Than Your Tests
Every fast-moving product team has experienced this: a design refresh ships on Monday, and by Tuesday morning 40+ tests are red. Not because anything is broken. Because buttons moved, class names changed, and the DOM restructured. The team spends the next sprint updating selectors instead of testing new features. This guide explains why this happens and how writing tests at the right abstraction level prevents it.
“Teams that switched from element-specific selectors to user flow-based tests saw a 70% reduction in test breakage during UI redesigns.”
Test maintenance studies
1. The Redesign Problem: 40 Tests Break Overnight
The story plays out the same way at company after company. The design team ships a refreshed UI. New component library, updated color system, restructured navigation. The application works exactly the same from a user perspective. Every feature is intact. But the CI pipeline lights up red because dozens of tests reference CSS classes that no longer exist, DOM structures that have been reorganized, and element positions that have shifted.
A team at a mid-stage SaaS company tracked their test maintenance after a Material UI to Tailwind migration. 43 of their 120 E2E tests broke. Zero actual regressions. Every failure was a selector problem: tests looking for .MuiButton-root on elements that were now plain button elements with Tailwind classes. The application was fine. The tests were not testing the application; they were testing the CSS framework.
The frustrating part is that the team spent two developer-weeks fixing selectors. During those two weeks, they shipped no new tests for the three features that launched alongside the redesign. The test maintenance created a coverage gap precisely when it was most needed: during a period of significant change.
2. Testing at the Wrong Abstraction Level
The root cause of redesign-induced test breakage is almost always the same: tests that assert on UI implementation details instead of user-visible behavior. There is a spectrum of abstraction levels for E2E tests, and most teams write theirs too low on that spectrum.
At the lowest level, tests reference specific CSS selectors, exact DOM paths, or pixel coordinates. These tests break when anything about the visual implementation changes, even if the user experience is identical. A button that moved from the left side to the right side of the header still works for users, but a test that expected it at a specific position fails.
At the middle level, tests use semantic selectors like ARIA roles, data-testid attributes, and text content. These survive visual changes but can still break when content is reworded or components are restructured. A button labeled "Submit" that becomes "Continue" breaks a text-based selector even though the behavior is unchanged.
At the highest level, tests describe user flows in terms of intent: "the user completes the signup form and reaches the dashboard." These tests survive almost any redesign because they test the outcome, not the mechanism. The signup form could change from a single page to a multi-step wizard, and the test still passes as long as users can sign up and see the dashboard.
Tests that survive your next redesign
Assrt generates Playwright tests focused on user flows, not CSS selectors. Self-healing selectors adapt when the UI changes. Open-source, zero vendor lock-in.
Get Started →3. User Flow Tests That Survive Redesigns
Writing tests at the user flow level requires a shift in thinking. Instead of "click the submit button in the form on the signup page," the test says "complete the signup process." The test fills in whatever fields are present, clicks whatever button submits the form, and verifies the outcome (reaching the dashboard, seeing a welcome message) rather than verifying intermediate UI states.
In Playwright, this means preferring page.getByRole() and page.getByText() over page.locator('.css-class'). Role-based selectors work regardless of styling because they target the semantic meaning of elements, not their visual appearance. A button is a button whether it has a Material UI class, a Tailwind utility, or no CSS class at all.
The team that lost two weeks to selector fixes rewrote their tests using role-based selectors and flow-level assertions. Their next redesign (switching from a sidebar navigation to a top navigation bar) broke 3 of their 120 tests instead of 43. The three that broke were legitimate: the navigation restructuring changed which pages were accessible from which other pages, which was an actual UX change worth testing.
4. A Selector Strategy Built for Change
A resilient selector strategy uses a priority order. First, try to find elements by their accessible role and name. This works for buttons, links, headings, form fields, and most interactive elements. If the role is ambiguous (multiple buttons on the page with the same text), add context by scoping to a parent landmark or section.
Second, use data-testid attributes for elements that do not have meaningful roles or unique text. These are intentional test hooks that survive redesigns because they exist specifically for testing. The convention of adding data-testid attributes to key elements is an investment that pays off every time the UI changes.
Third, use text content only when it is a stable part of the product (brand names, navigation labels, form field labels). Avoid asserting on dynamic content, error messages that might be reworded, or placeholder text that design might change during a refresh. The more stable the text, the more reliable it is as a selector.
Never use CSS class selectors, nth-child selectors, or XPath expressions that depend on DOM depth. These are the selectors that break during every redesign. They encode the visual implementation of the UI rather than its semantic structure, and visual implementations change constantly in a product that is evolving.
5. Self-Healing Tests and Automated Recovery
Even with the best selector strategy, some tests will break during significant UI changes. Self-healing test frameworks address this by maintaining multiple selector strategies for each element and automatically falling back when the primary strategy fails. If the role-based selector breaks, the framework tries the text-based selector. If that fails, it tries a visual match. If the element is found via a fallback, the test file is updated with the new primary selector.
Assrt implements self-healing at the Playwright level. When you run npx @m13v/assrt discover https://your-app.com against your application, the generated tests include multiple selector strategies for each interaction. When a redesign changes the DOM, the self-healing layer finds the element through an alternative strategy and continues the test. The test file gets updated automatically so subsequent runs use the new selector without manual intervention.
Self-healing is not magic, and it has limits. If a redesign removes a feature entirely, the test should fail because the feature is gone. If a redesign moves a critical action behind a new step (adding a confirmation dialog before deletion), the test should fail because the flow changed. Self-healing handles cosmetic and structural changes. It does not and should not handle behavioral changes, because those are exactly what tests are supposed to catch.
6. The Test Maintenance Budget: How to Stay Under It
Every test in your suite has a maintenance cost. It might break during redesigns, produce false positives during infrastructure issues, or slow down CI. A useful mental model is to think of each test as having a maintenance budget: the amount of human time it will consume over its lifetime. If that cost exceeds the value the test provides (bugs it catches, confidence it creates), the test should not exist.
Tests at higher abstraction levels have lower maintenance costs. A test that verifies "users can complete a purchase" might need updating once a year when the checkout flow changes significantly. A test that verifies "the checkout button has class .btn-primary and appears below the cart summary" might break every month during routine UI polish. The behavioral test provides equal or better regression protection at a fraction of the maintenance cost.
Track maintenance events per test. If a test has broken more than three times in six months without catching a real bug, it is a net negative. Either rewrite it at a higher abstraction level or remove it entirely. The goal is a suite where every test earns its maintenance cost by providing genuine regression protection, not just inflating the coverage number.
7. Building an Evolution-Friendly Test Suite
An evolution-friendly test suite embraces the reality that the product will change constantly. It does not fight change; it accommodates it by design. The principles are straightforward: test user outcomes instead of UI implementation, use semantic selectors instead of structural ones, employ self-healing to handle cosmetic changes automatically, and reserve manual maintenance time for genuine flow changes.
When using AI test generation tools, evaluate them on how resilient the output is, not just how comprehensive. A tool that generates 100 tests using CSS class selectors creates a maintenance liability. A tool that generates 50 tests using role-based selectors with self-healing fallbacks creates a durable asset. Assrt generates Playwright tests with multiple selector strategies specifically because durability matters more than initial count.
The next time your product team announces a redesign, your reaction should be "great, the tests will probably still pass" instead of "we need two weeks to fix the test suite." If your current reaction is the latter, the problem is not the redesign. The problem is tests written at the wrong abstraction level. Fix the abstraction level and product evolution becomes something your test suite handles gracefully instead of something it dreads.
The product is supposed to evolve. That is a sign of a healthy business. Your test suite should be designed to evolve with it, not against it. Tests that break because the product improved are tests that were testing the wrong thing.