Test infrastructure

Accessibility tree web testing: how AI test agents drive browsers without CSS selectors

A practical guide to using the browser's accessibility tree as the target for end-to-end tests, with a real walk through Assrt's open source agent loop.

Matthew Diakonov, Written with AI

Published May 1, 20269 min read

Direct answer (verified 2026-05-01)

Accessibility tree web testing means writing end-to-end tests that target elements by their ARIA role and accessible name rather than by CSS class, ID, or XPath. The accessibility tree is the same structure screen readers consume, so it survives most UI refactors. Modern runners serialize the tree to a YAML snapshot with stable element refs (Playwright calls these aria snapshots, see playwright.dev/docs/aria-snapshots), and AI test agents like Assrt use those refs to drive a real browser without ever writing a CSS selector.

Why the accessibility tree, why now

For two decades, the standard way to address an element in a browser test was a CSS selector or an XPath expression. That worked when markup was hand-written, components were stable, and a button stayed a button for a year at a time. It does not work in a 2026 codebase where the same button is implemented by three different design system versions across the app, where Tailwind class strings change every sprint, and where an LLM rewrites a wrapper div without anyone noticing.

The accessibility tree is different. It is the structure the browser exposes to assistive technology. Every node has a stable identity built from three things: a role (button, textbox, link, heading, dialog), an accessible name (the text a screen reader would announce), and a small set of states (disabled, checked, expanded, selected). None of those are coupled to your CSS, your component library, or how you write your JSX. They are coupled to what the element is, semantically.

Tests that address elements by role and name behave like users do. When a designer rewrites the button as a styled div without restoring the role, the test fails. So does the experience for anyone using a screen reader. That is a useful coincidence: the test is now a continuous accessibility check, paid for by the QA budget instead of fought for separately.

What changes in the test code

Same test, different selector strategy

// Brittle: dies on any class rename
await page.click('.btn.btn-primary.signin');

// Even worse: positional
await page.click('form > div:nth-child(3) > button');

// Test-id: works, requires team buy-in
await page.click('[data-testid="signin-button"]');

0% fewer lines

The flaky test you have written before

Toggle between the two versions. The behavior is identical. Only the selector strategy changed.

A login test, two ways

Worked the day it was written. Broke when marketing reskinned the homepage. The class .signin-btn was renamed to .auth-cta, the form was wrapped in a new <SectionHero> component, and the input gained an extra div for a label tooltip. Three pull requests, three test failures, two false alarms about a real bug.

Couples the test to one specific markup
Breaks on dark mode rewrite
Breaks on component library upgrade
False positives waste oncall time

How an AI test agent actually consumes the tree

A hand-written Playwright test calls getByRole('button', { name: 'Sign in' }) and the framework walks the accessibility tree for you. An AI agent driving the browser does not get that affordance for free. It needs to seethe tree, decide what to act on, and use a stable reference back to the framework. Here is the loop, taken from Assrt's open source agent at src/core/agent.ts.

The snapshot, act, re-snapshot loop

1
Call snapshot
Tool returns the accessibility tree as YAML with element refs like [ref=e5]. Tagged elements are everything actionable on the page.
2
Decide
Agent reads the tree, picks the next ref to click or type into, and emits a tool call referencing that ref by ID.
3
Act
Browser performs the click or fill against the resolved Playwright locator. Refs map back to the same accessibility node, not to a fragile CSS path.
4
Re-snapshot
Old refs may now be stale (modal opened, form re-rendered, focus moved). Fresh snapshot, fresh refs, repeat until the scenario is complete.

The anchor fact: a 120,000 character ceiling

One detail you will not find in the marketing pages of any other AI test tool: there is a hard cap on snapshot size. In Assrt's source it is set in src/core/browser.ts:

/** Max characters for a snapshot before truncation
 *  (roughly ~30k tokens). */
private static readonly SNAPSHOT_MAX_CHARS = 120_000;

The reason is grim and obvious: a Wikipedia article, a sprawling admin dashboard, an infinite scroll feed, all produce accessibility trees that would blow past any agent's context window the moment you serialized them. The cap forces an honest conversation about scope. Tests that try to operate on entire pages are usually testing the wrong thing. Tests that target a section, a dialog, or a flow rarely come close to the limit.

When the cap is hit, the snapshot is truncated at the last clean line break and a marker is appended so the agent knows the view was partial. The behavior lives in the resolveAndTruncate method right above the constant. If you are building your own agentic test runner, copy this pattern. It is the difference between a test that runs and a test that exhausts your token budget mid-flow.

What a snapshot actually contains

One YAML node per accessible element

Role: button, textbox, link, heading, dialog, listitem, etc.
Accessible name (the label a screen reader would speak)
State: disabled, checked, expanded, selected, focused
Hierarchy: nested children, so a dialog contains its buttons
Element ref like [ref=e5] for the agent to reference back
Truncation marker if the tree exceeded 120,000 characters

What it does not contain: CSS classes, inline styles, image pixel data, JavaScript event handlers, or any markup that exists purely for layout. That omission is the point. If something is not in the accessibility tree, your tests cannot target it, and your users with assistive tech cannot use it either. Both gaps are the same gap.

Accessibility tree testing vs the alternatives

Feature	CSS / XPath / test-id	Accessibility tree
Survives class renames	Breaks on every rename	Yes, role and name are independent of CSS
Survives wrapper / layout changes	Breaks if path-based	Yes, wraps do not change accessible name
Requires team cooperation	Every element needs a stable hook	Only that elements have a role and a name
Doubles as accessibility check	No correlation	Yes, missing labels fail tests
Works with AI test agents	Agent has to invent a selector	Native fit, refs are stable per snapshot
Reads like the user story	click .btn-primary.signin-cta	click button named Sign in

When the accessibility tree is the wrong tool

It is not a universal answer. There are real cases where you want to target the DOM directly. Canvas-rendered apps (large data visualizations, in-browser CAD, some game engines) have no accessibility tree to speak of, just a single canvas node. Tests for those have to fall back to coordinate clicks or hidden DOM mirrors. Visual regression tests measure pixels, not roles, and should not pretend otherwise. And anything that depends on a specific CSS property (a hover color, a transition timing, a print layout) needs to look at styles, not the tree.

The honest position is: use the accessibility tree as the default target for behavioral tests, and reach for CSS or visual diffs only when the thing you are checking is not behavior.

Try the loop on your own app

Assrt is open source and runs as a Node CLI. You point it at a URL, it crawls the accessibility tree, generates scenarios, and writes real Playwright spec files into your repo. The tests are yours, not a rented YAML format you have to escape from later.

npx @m13v/assrt discover https://your-app.com

Read the generated spec before you commit it. If a scenario is junk, delete it. If it is good, treat it like any other Playwright test in your suite: run it in CI, fix it when product behavior changes, retire it when the feature is gone. The accessibility tree is the substrate; the tests are still yours to own.

Walk us through the suite that keeps breaking

A 30 minute call with the Assrt team. Bring your worst flaky test, leave with a plan to express it against the accessibility tree.

Frequently asked questions

What is accessibility tree web testing?

It is the practice of driving end-to-end browser tests against the page's ARIA accessibility tree (roles, accessible names, states, properties) instead of CSS classes, IDs, XPath, or test-id attributes. The accessibility tree is what screen readers consume, so it tends to be a more stable target than markup that designers rewrite every sprint. Modern runners like Playwright (via aria snapshots) and AI test agents like Assrt serialize this tree to a YAML structure and let tests address elements by role and name.

Is the accessibility tree the same thing as the DOM?

No. The DOM is the raw HTML element tree the browser parses. The accessibility tree is a derived structure the browser builds on top of the DOM for assistive technology. Each accessibility node has a role (button, textbox, link, heading), a name (the label a screen reader would announce), and properties (disabled, checked, expanded, level). Many DOM nodes do not appear in the accessibility tree at all, and the same DOM can produce different accessibility trees depending on ARIA attributes, CSS visibility, and focus management.

Why use the accessibility tree instead of getByTestId?

Test IDs work, but they require constant cooperation from the people writing the UI. Every new component needs a data-testid. Every refactor risks orphaning one. Accessibility tree selectors do not depend on a hidden contract with the design team. They depend on the same thing your users depend on: the element having a role and a name. If a button has no accessible name, your test fails and your users with assistive tech also fail. That is signal, not friction.

Does this work for sites that have poor accessibility?

Less well than it works on accessible sites, which is honest feedback. If your modal is built from divs with no role, no name, and no focus trap, the accessibility tree will show a blob and tests have nothing to grab. The fix is the same fix you owe your users: give the element a role, give it a name. In our experience, teams that adopt accessibility tree testing end up shipping more accessible products as a side effect.

How does Assrt use the accessibility tree?

Assrt is an open-source AI test agent. When it runs a scenario, it calls a tool named snapshot that returns the page's accessibility tree as YAML with element refs like [ref=e5]. The agent reads that snapshot, decides what to click or type, and uses the ref instead of a CSS selector. After every action it re-snapshots to get fresh refs, because the DOM may have moved. The output is a real Playwright .spec.ts file you can read, modify, and commit.

Why re-snapshot after every action?

Element refs from a snapshot are stable for that snapshot only. As soon as you click something, the DOM may re-render, components may unmount, focus may move, modals may open. Refs from the old snapshot may now point at nothing. Re-snapshotting is cheap (it is a serialization of an in-memory tree, not a network call) and it keeps the agent honest about what is actually on screen.

Is there a size limit on the accessibility tree snapshot?

Yes, in Assrt the cap is 120,000 characters per snapshot, which works out to roughly 30,000 LLM tokens. The cap exists because some pages (Wikipedia, large dashboards, infinite scrolls) produce accessibility trees that would otherwise blow past an agent's context window. When the cap is hit, the snapshot is truncated at the last clean line break and a marker is appended. Practically this means tests should target the section of the page they care about rather than crawling everything.

What does this look like in real Playwright code?

Once you stop thinking in CSS, the test reads almost like prose. await page.getByRole('button', { name: 'Sign in' }).click() instead of await page.click('.btn-primary.signin-cta'). The first one survives a class rename, a wrapper change, a dark mode rewrite. The second one survives only the next merge.