Runtime resolution

Playwright flaky selectors after UI changes: the fix every other guide stops short of

Direct answer (verified 2026-05-16)

Use getByRole, getByLabel, and getByTestId as the floor. Then accept that those locators still break when accessible names, labels, or test-ids get renamed in a real refactor (which they will). The only durable fix is to re-resolve elements from the live accessibility tree on every action. Assrt does this via Playwright MCP refs ( [ref=eN] ); the recovery rule is defined in ~/assrt-mcp/src/core/agent.ts. Static suites should mix runtime re-resolution for flaky flows with committed Playwright files for stable ones. Authoritative reference: playwright.dev/docs/locators.

M
Matthew Diakonov
7 min read

Open any of the top guides on flaky Playwright selectors and you will read the same five paragraphs in a different order: prefer getByRole, add data-testid, avoid :nth-child, write proper waits, do not retry your way out of the problem. That advice is correct. It is also the easy 80%. It does nothing for the failure mode that actually wrecks production suites: the UI changed and the role-based locator you trusted resolved to zero elements anyway.

This page is about the harder 20%. Why role and test-id locators still drift, what the durable fix looks like, and where to stop running an AI agent and just commit the file.

The problem is not the selector. It is that a selector is a string in a file.

Every committed locator is the same shape: page.getByRole('button', { name: 'Submit' }), page.getByTestId('save-btn'), page.locator('.btn-primary'). Different layers of the same thing: a string captured at one moment in time, pinned in a test file, asked to find a live element later. The string is dead. The DOM is alive. A locator written on Monday and a UI shipped on Friday do not negotiate.

Most advice on this topic is about picking a better string. Role beats CSS because the accessible name changes less often than the CSS class. Test-id beats role because nobody (in theory) renames the test-id. Both are true on average and both fail individually. A redesign renames buttons. A design system migration renames test-ids. A localisation pass replaces the English name with a key. Any one of those breaks a static locator that was fine yesterday, and no amount of role discipline can prevent it. The locator is still committed to git; the UI moved without telling git.

The honest framing: static locators are good. They are not enough. There is a class of flakiness that only goes away when the locator is no longer a string in a file, but a query executed at action time against the live accessibility tree.

Three real ways role-based locators still flake on UI changes

1. Accessible name drift

getByRole('button', { name: 'Submit' }) survives a class rename, a wrapper restructure, even a tag swap from <button> to a properly-aria-roled <div role="button">. What it does not survive is a copywriter changing "Submit" to "Continue", a localisation engineer wrapping the string in an i18n function, or a product manager A/B-testing two CTA names. The role is still "button", the element is still on the page, the query returns zero elements.

2. Test-id refactor in a shared design system

getByTestId('save-btn') breaks when the design system team replaces the single Button component with PrimaryButton, SecondaryButton, DestructiveButton and decides each variant owns its own test-id namespace. The consumer test never knew the migration happened. The test file says save-btn, the DOM says primary-button-save, and nothing in the type system flags the mismatch.

3. Structural drift on multi-step flows

A locator resolves uniquely on step 3 of the recorded flow because there was exactly one button with that role and name on the page. After the redesign, the same button now appears inside a new wrapper that also contains a stepper with a step labelled "Submit". The role-and-name query is suddenly ambiguous, Playwright resolves to multiple elements, and the locator fails strict mode. Nothing about the button itself changed; the page around it did.

What "the UI changed and the test broke" actually looks like

The static path. A committed test runs against a UI that has drifted out from under it. Auto-retry cannot save you because the locator simply resolves to nothing.

Committed locator vs renamed button

Static Playwright testLocator stringLive DOM (after UI change)Test resultgetByRole('button', { name: 'Submit' })Look for accessible button named 'Submit'Button is now named 'Continue', zero matchesLocator resolved to 0 elementsFAIL after 30s of auto-retry

The only fix that holds: re-resolve from the accessibility tree on every action

If the problem is that a locator is a string captured at one moment, the fix is to never capture it. Resolve the element, fresh, every time you act on it. The browser already exposes a structure that supports this: the accessibility tree, the same one screen readers use. Playwright MCP wraps the browser's a11y tree and assigns every element a temporary id, in the shape [ref=e1], [ref=e2], etc. The id is valid for that one snapshot only. The next snapshot builds a fresh tree with fresh ids.

An agent driving this loop does not commit any locator. It calls browser_snapshot, reads the tree, picks the element by description ("the primary CTA on the pricing card"), and acts on the matching ref. If the action succeeds, it snapshots again to see the result. If it fails, it snapshots again to recover. The rule in the Assrt agent prompt (~/assrt-mcp/src/core/agent.ts, the "Error Recovery" block) reads, literally: "If a ref is stale (action fails), call snapshot again to get fresh refs." The ref is never trusted across snapshots. There is nothing to drift because there is no committed string.

When the UI changes between two runs, the agent does not know or care. It requests a new tree, finds the element by intent again, and continues. The button name went from "Submit" to "Continue"? Still one button with role=button at the bottom of the form, still the primary CTA. The test runs. The locator was never the bottleneck.

The snapshot loop, drawn

The same UI change as above, this time against an agent that re-resolves on every action. Same renamed button. Different outcome.

Snapshot-driven resolution vs renamed button

Assrt agentPlaywright MCPLive DOM (after UI change)Test resultbrowser_snapshotBuild accessibility tree[ref=e5] button 'Continue', [ref=e6] link 'Back'Tree with [ref=eN] labelsclick(element: 'primary CTA on pricing card', ref: 'e5')Resolve e5 to live elementClick landed, scenario continues

The honest counterargument: re-resolution is not free

Re-resolving from the accessibility tree on every action costs something. A snapshot of a large page takes time (low hundreds of milliseconds on a normal page, more on something the size of Wikipedia, which is why the Assrt browser layer truncates trees above a threshold and stores the full tree in a file the agent reads on demand). An LLM-driven picker that maps "the primary CTA" to a ref adds latency and tokens. Running a full suite of 400 tests through this loop on every CI push would be expensive in both wall time and dollars.

So you do not. The shape that actually works:

  • Static locators for the stable 90%. A login form that has not changed in eighteen months does not need an agent at runtime. Commit the Playwright file, run it in your fast CI lane.
  • Runtime resolution for the flaky 10%. The flows that touch the area of the app under active redesign, the multi-step checkout that the product team revises every sprint, the third-party widgets you cannot annotate. Run those through the agent during regression sweeps, get fresh evidence each time.
  • Re-discovery as a maintenance loop, not a one-time event. When a static test fails because the UI changed, do not hand-patch the locator. Re-run discovery on the same flow, get a freshly-generated Playwright file, diff the locator changes against the DOM changes, commit. The maintenance is automated, the artefact stays portable, and you keep the fast CI lane.

So which approach is right for "flaky selectors after UI changes"?

Both. The two layers are different jobs.

The static Playwright file does fast CI lanes. It runs in seconds, fails loudly, fits into your existing GitHub Actions or Buildkite or whatever you already pay for. It is good at catching regressions on stable flows and deterministic enough to gate deploys on. Its weakness is the one this whole page is about: when the UI moves, the file rots, and nobody fixes the rot until somebody is on call.

The agent loop does maintenance. It runs slower, costs more per execution, and uses an LLM. Its job is to survive the UI changes that would have rotted the static file, and to regenerate that static file from the new DOM when the change is real. You do not run this loop on every push. You run it nightly, or when a flaky test bisects against a UI commit, or when a designer pushes a redesign and you need to know which of your 400 tests still pass.

The two layers feed each other. The agent writes the static file. The static file gates the deploy. When the static file fails because the UI changed, the agent rewrites it. The locator string in your repo is no longer a permanent commitment; it is a cache of the latest discovery run, regenerable on demand. That is what fixes flaky selectors after UI changes. Not a better string. A loop that does not commit to one.

Want to see this on your app?

Bring a flow that keeps breaking on UI changes. We will run the agent against your staging URL, generate a fresh Playwright file, and walk through where to put the static lane vs the re-discovery lane.

Common questions

Why do my Playwright selectors still break after UI changes, even when I use getByRole and data-testid?

Because role-based locators encode the same thing CSS selectors did, just at a different abstraction layer. getByRole('button', { name: 'Submit' }) breaks the moment somebody renames the button to 'Continue'. getByTestId('save-btn') breaks when the new design system component drops or renames the data-testid. The locator is still a string committed to a file at one point in time, and the UI has moved on without it. Pinned strings against a moving target will always drift.

Is data-testid actually stable across UI changes?

Only when the team treats it as a public API, which most teams do not. A data-testid lives in a component file like Button.tsx. The next developer who refactors Button into PrimaryButton and SecondaryButton has no incentive to preserve the original testid string, because it does not affect users, types, or runtime behavior. Six months in, the value has drifted from save-btn to primary-cta-save to btn-save-changes, and three test files reference each version. Pinning data-testid only works if your team treats it as a stability contract and your CI enforces it.

What does it mean to 're-resolve from the accessibility tree on every action'?

Instead of committing a selector at write time and trusting it at run time, you ask the live browser, right now, for the accessibility tree of the page (the same tree screen readers use). Each interactive element gets a temporary id (Playwright MCP labels them ref=e1, ref=e2, etc.). Your action picks the element by semantic intent ('the primary CTA on the pricing card'), maps that to a fresh ref, and clicks. The ref is only valid for that one snapshot, so it cannot go stale. The next action takes a new snapshot. The selector is never a string in a file; it is a query against the current DOM, executed every time.

How does Assrt do this in practice?

Assrt drives Playwright through the Playwright MCP server. The agent calls browser_snapshot to get an accessibility tree where each element is labelled [ref=eN]. It picks the element by description, performs the action with that ref, and re-snapshots after each interaction. The recovery rule in the agent prompt (~/assrt-mcp/src/core/agent.ts, the Error Recovery section) is explicit: 'If a ref is stale (action fails), call snapshot again to get fresh refs.' The picker is semantic intent, the ref is ephemeral, and the UI can change between actions without breaking the run.

Then why does Assrt also generate real Playwright code?

Because once a flow is proven to work, you want to bake it into CI without re-running the agent every time. Assrt outputs the discovered scenario as standard Playwright (page.getByRole, page.getByTestId, page.locator) so you can commit it, version it, and run it in any CI pipeline. The static code will still break on a big UI refactor, just like any hand-written suite. The difference is you can re-run discovery on the same flow, get a fresh test file, and diff the locator changes against the DOM changes. The maintenance loop is automated; the runtime artefact stays portable.

Are there tradeoffs to runtime re-resolution?

Yes. Snapshots cost time (low hundreds of milliseconds on a normal page, more on huge pages with thousands of nodes), and an LLM-driven picker has occasional latency and cost. You would not want every CI run to call an LLM for every click. The right shape is: run the agent during discovery and during exploratory regression sweeps, where flakiness is the bottleneck, and run the generated Playwright file in fast deterministic CI lanes. Mix the two. Static locators for the 90% of tests where the UI rarely changes; agent re-resolution for the 10% that always flake.

What about Playwright's own retry mechanism, doesn't it handle this?

It handles timing flakiness (the element exists but is not interactive yet, the DOM is mid-update). It does not handle structural flakiness (the element you wrote about no longer exists under that locator). Auto-waiting and retries cannot invent a locator that resolves to nothing. When the button is renamed, getByRole('button', { name: 'Submit' }) will still find zero elements after 30 seconds of retry, and the test will fail correctly. The retry layer is downstream of the resolution problem.

How do I know if my flakiness is timing-based or selector-based?

Open the Playwright trace viewer for the failed run and look at the locator step. If the screenshot shows the element clearly on the page but the locator resolved to zero or many elements, the selector drifted. If the screenshot shows a loading state or the element half-rendered, the wait condition was wrong. Most teams conflate the two and blame 'flakiness' for both, which is why retries-as-fix culture is so common. The fixes are different: better waits for timing, re-resolution for structural drift.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.