Argument

Structural locators are not the right primitive for AI Playwright tests. The accessibility-tree ref is.

The pages that argue this question online stop after “prefer getByRole over CSS”. That is correct, but it is the answer to a smaller question. An AI agent driving a real browser has to make an architectural choice one level below the locator API: at runtime the agent reads a snapshot of the accessibility tree and clicks by ref=e5; at serialization time the test file lands in your repo as getByRole. CSS chains, XPath, and :nth-child belong to neither layer. This page is about why those two layers exist, and why structural locators do not survive in either.

Matthew Diakonov, Written with AI

Published April 29, 20268 min read

Direct answer (verified 2026-04-29)

No, structural locators are not safe for AI Playwright tests. The resilient pattern is two-layered:

At runtime, the agent calls snapshot() to get the accessibility tree, then clicks or types by the ref of the node it just read. The ref is a snapshot-relative ID, not a DOM selector.
At serialization, the durable test file uses Playwright's user-facing locators: getByRole, getByLabel, getByTestId.

Source for the recommendation against CSS and XPath: playwright.dev/docs/locators. CSS and XPath are explicitly called out as “not recommended as the DOM can often change leading to non resilient tests.”

The two-layer model

Most arguments online collapse the runtime layer and the disk layer into one decision. They are not the same decision.

The runtime layer is what the AI agent does in the moment it is about to click something. It needs an unambiguous handle on the element it just looked at. The disk layer is what gets committed to tests/checkout.spec.ts and read by a human three months later, or by a different agent on the next run.

A structural locator like div.app > div:nth-child(2) > button.btn.cta is a string. It survives between layers, which sounds convenient, but that survival is exactly what makes it brittle. The string was captured against one DOM at one moment. The DOM is constantly drifting. By the time the test runs again, the string is a fossil.

The two-layer model fixes this by refusing to ship the string between layers at all. At runtime, the handle is a ref pinned to a fresh snapshot. At rest, the handle is a semantic description (role plus accessible name) the agent re-resolves against the current tree on the next run. Neither side persists a coupling to render order.

What the agent actually clicks

The Assrt agent has a system prompt that names this directly. It is worth reading in full because the warning at the bottom is the part that competing pages miss:

From src/core/agent.ts in @m13v/assrt-mcp

## Selector Strategy

1. Call snapshot to get the accessibility tree

2. Find the element you want to interact with in the tree

3. Use its ref value (e.g. "e5") in the ref parameter of click/type_text

4. Also provide a human-readable element description

5. If a ref is stale (action fails), call snapshot again

## Important

Do NOT use ref attributes as DOM selectors (they don't exist in the DOM).

The last line is the load-bearing one. A new contributor reading the agent's tool calls might guess that ref="e5" is some kind of attribute on a button ([ref="e5"], surely?). It is not. It is an opaque ID Playwright MCP minted when it serialized the accessibility tree. The browser does not know it exists. Trying to query it with a CSS selector returns nothing.

ref ≠ selector

“Do NOT use ref attributes as DOM selectors (they don't exist in the DOM).”

Assrt agent system prompt, src/core/agent.ts

What goes wrong when an LLM ships a structural locator

The fastest way for an AI test generator to ship structural locators is to read the rendered HTML, find a path, and emit it. It happens when the model treats the DOM as the source of truth rather than the accessibility tree. Here is the failure mode in code:

Same intent, two different durability profiles

await page .locator("#root > div:nth-child(2) > main button.btn-primary") .click(); // 3 weeks later the team adds a banner above main: // the same locator now points at "Dismiss banner". // The test still passes. The wrong button gets clicked.

Couples the test to render order (nth-child)
Couples the test to class names that change at every build
Silently passes when the wrong button gets clicked because the path still resolves

Side by side: what gets shipped to your repo

The output side of an AI testing tool is where structural locators either show up in your codebase or do not. This is also the dimension on which most closed-source tools punt: they emit YAML or a proprietary scenario format, so the question of whether the underlying handle is a CSS chain or a role-based locator never surfaces in your repo at all. You cannot review what you cannot see.

checkout.spec.ts

// what AI-generated Playwright tests sometimes ship,
// because the model copied a Chrome DevTools "copy selector"
await page
  .locator(
    "#root > div.app > div:nth-child(2) > main > section.checkout > "
    + "div.summary > button.btn.btn-primary.cta-checkout"
  )
  .click();

await page
  .locator("xpath=//div[@id='order']/div[3]/div[1]/span[2]")
  .innerText();

await page
  .locator(".promo-apply")  // class will be hashed at next build
  .click();

6% more durable lines

The right column is two things at once. The first half is the runtime: the agent calling Assrt's tools with refs from a snapshot it just took. The second half is the durable artifact: the Playwright code Assrt writes to disk, in standard page.getByRole form. A reviewer can read it without running it. A new agent on the next CI run can re-resolve it without knowing what the page looked like last week.

The runtime sequence in one diagram

Here is what actually happens when the Assrt agent decides to click the Checkout button. Note that no string selector ever travels from the agent to the browser:

ref-based interaction, no DOM selector in flight

Step 6 is where the architectural break happens. The agent never sees, holds, or sends a CSS string. It sees ref=e42 and a human-readable element description for logging. The ref is only valid against this snapshot. If the next click fails, step 1 repeats and a fresh ref is minted. There is no cached selector to go stale.

The honest counterargument: when do CSS selectors still earn a place?

A guide that pretends CSS never has a role is dishonest. Three places structural selectors still earn their keep:

1. Asserting a CSS property on an element you already located. If you found the button by getByRole and now want to assert it has cursor: pointer, the assertion is structural by definition. That is fine. Structural assertion on a semantically-located element is not the same as structural location.

2. Targeting elements with no accessible name. Decorative SVGs, drag handles, third-party iframe widgets that refuse to expose ARIA. A data-testid is the right escape hatch here, but if you do not control the markup, a narrow CSS selector scoped under a getByRole parent is preferable to a top-level XPath.

3. Reading non-interactive computed styles for visual regression. If the question is “is this banner red?” the answer comes from a screenshot diff or a computed-style read, not from click semantics. CSS is the right shape there.

What none of those three justify is using a CSS chain to find and click a thing the agent could have located by role and name. The interaction primitive is the place to be strict. Everything else is a downstream choice.

Resolution: rules a team can actually adopt

Forbid CSS chains and XPath in the interaction layer. If a test file in your repo contains .locator("div > div > ...") for a click or a type, treat it as a lint failure.
Allow getByTestId as an escape hatch when the surface has no accessible name. Pair it with a code-owner rule that catches abuse.
When you adopt an AI testing tool, ask which layer the tool emits into your repo. If the answer is “a YAML file in our cloud”, you cannot lint the locator strategy at all because the strings are not in your repo.
Inspect what the agent does at runtime. If it sends string selectors over the wire, every selector is a candidate for staleness. If it works against snapshot refs, the runtime is structurally immune to that class of failure.
Treat the question “does the test still pass for the right reason?” as primary, and the question “is the test green?” as secondary. Structural locators fail this on every DOM tree-shake; ref-plus-getByRole survives it.

Try it on a real page

The fastest way to see the two-layer split for yourself is to run Assrt against an app you already have. The agent will print every tool call, including the ref it picked from the snapshot. The generated .spec.ts file lands in your repo with getByRole-style locators you can read line by line.

npx @m13v/assrt discover https://your-app.com

Open github.com/assrt-ai/assrt-mcp and read the agent prompt at src/core/agent.ts to verify the runtime contract before you run it.

Walk through your test suite with us

If your team is mid-rewrite away from CSS-heavy tests, we will sit down with one of your specs and show what changes when the agent drives by ref instead. 30 minutes, no slides.

On the same shelf

Keep reading

Caching

AI Playwright cached selector staleness

The silent-pass failure mode when a cached pick still resolves but points at the wrong element. Self-validating cache entries vs TTL-only invalidation.

Read

Output

Readable Playwright test code

What an AI agent should write to disk so a human can read the test in three months and recognize the intent without rerunning it.

Read

Architecture

Playwright agent isolation

Per-scenario browser context, fresh storage state, and why isolation pairs with ref-based interaction rather than fighting it.

Read

Frequently asked questions

Are CSS or XPath structural locators safe to use in AI-generated Playwright tests?

No. Playwright's own documentation says CSS and XPath are not recommended because the DOM changes and tied-to-implementation selectors break. AI testing pushes that further: at runtime the agent should drive the page through accessibility-tree refs returned by snapshot(), and at serialization time the test file should land in the repo as getByRole, getByLabel, or getByTestId. Structural locators belong to neither layer.

What is the difference between a Playwright MCP ref and a CSS selector?

A ref is a snapshot-relative ID like 'e5' that Playwright MCP assigns to a node in the accessibility tree it just produced. It only exists inside the snapshot the agent is reading. A CSS selector is a string against the live DOM. The Assrt agent system prompt explicitly warns 'Do NOT use ref attributes as DOM selectors (they don't exist in the DOM).' Treating a ref as a DOM selector is a category error and will fail.

If structural locators are not used, how does the AI agent decide what to click?

It calls snapshot() first, reads the accessibility tree, finds the node by role and accessible name, then passes the ref of that node to the click tool. There is no string selector traveling between the agent and the browser. When the agent has to recover from an error, it calls snapshot() again to get fresh refs.

Then what should appear in the Playwright test file that gets committed to git?

Real Playwright code with user-facing locators: page.getByRole('button', { name: 'Checkout' }), page.getByLabel('Email'), page.getByTestId('apply-promo'). Not CSS chains, not XPath, and not refs. The ref is a runtime tool for the agent. The committed test is a durable artifact a human or another agent will read in three months.

What about nth-child or position-based selectors when role and name are ambiguous?

Disambiguate by chaining user-facing locators rather than indexing structurally. page.getByRole('listitem').filter({ hasText: 'Product 2' }).getByRole('button', { name: 'Add to cart' }) is the recommended Playwright pattern. nth-child or :nth-of-type couples the test to render order, which AI-driven UIs rearrange constantly.

Does this mean visual regression and AI testing are mutually exclusive?

No. Visual regression catches pixel drift; ref-based interaction catches behavioral drift. They run on top of each other. Assrt does both: drive the page through refs, capture a screenshot per scenario, diff against the baseline. The structural-locator question is orthogonal to whether you take screenshots.

What happens when the page restructures and the accessible name changes?

The next snapshot returns a different tree. The agent matches the intent ('click checkout button') against the new tree by role and name, gets a new ref, and proceeds. The test does not need a code change because the durable artifact is getByRole({ name: 'Checkout' }), and at runtime the agent re-resolves anyway.

How is this different from ZeroStep or ai-locators libraries that take a natural-language description?

ZeroStep and ai-locators wrap a page.locator-style API around an LLM call that returns a string selector. Assrt does not generate or hold a selector string at runtime. The agent walks the tree, picks a node, and passes its ref. Selector synthesis happens once, when the test is written to disk, and what gets written is a getByRole-style locator, not an AI-resolved CSS chain.

The two-layer model

What the agent actually clicks

What goes wrong when an LLM ships a structural locator

Same intent, two different durability profiles

Side by side: what gets shipped to your repo

The runtime sequence in one diagram

The honest counterargument: when do CSS selectors still earn a place?

Resolution: rules a team can actually adopt

Try it on a real page

Walk through your test suite with us

Keep reading

AI Playwright cached selector staleness

Readable Playwright test code

Playwright agent isolation

Frequently asked questions

Comments (••)

Comments ()