Argument, not a tutorial

The locator is the problem. Not the patch list.

Every guide on this topic teaches you a locator strategy and then a list of patches to keep it from rotting. Prioritise id over name over CSS over XPath. Add data-testid attributes. Adopt the Page Object Model. Pin chromedriver to your Chrome major. Move to WebDriver BiDi for real-time events. Each patch is good. None of them remove the input surface that produces the flake. This piece argues that the locator should be removed from the input surface entirely, and shows the file in an MIT-licensed package where one sentence replaces the entire retry framework.

Matthew Diakonov, Written with AI

Published April 27, 202611 min read

What Selenium actually is, at the protocol layer

Selenium WebDriver is a wire protocol. Your test sends an HTTP request to a driver process, the driver translates it to the browser's native automation API, and the browser does the thing. For Chrome, that driver is chromedriver listening on port 9515. For Firefox it is geckodriver. The protocol itself, codified at w3.org/TR/webdriver2, is a list of endpoints with names like /session/{id}/element.

Look at the endpoint list and the philosophy is right there: you POST to /element with a "using" key (id, name, css selector, xpath, link text, partial link text, tag name) and a "value" string. The protocol presumes you know how to identify the element. The locator is the primitive. Everything Selenium 5 added on top, including the BiDi WebSocket layer for events, leaves that primitive intact.

That is fine until the page changes. Every selector you wrote is a guess that the page's structural identity is stable. The structural identity is rarely stable. The patch list is a response to that fact, not an alternative to it.

The patch list every senior team has tried

If you have run a Selenium suite for more than a year, you already know the playbook. None of these are wrong. They are all good. The point is that they are all patches, and they all live on top of an input surface that still expects a locator string on every step.

The patches that keep selector-based suites alive

Add data-testid attributes everywhere a Selenium test points
Adopt the Page Object Model so locators have one home
Use explicit WebDriverWait everywhere, never implicit_wait
Layer @retry(reraise=True) on flaky tests at the framework level
Pin chromedriver to your Chrome major and update both in lock-step
Switch to WebDriver BiDi for bidirectional events (Selenium 5+)
Centralize selector constants so a UI change is a one-file diff
Adopt visual regression as a backstop for missed text changes

Each one is real engineering. Adopting the Page Object Model does centralise locators; pinning chromedriver does eliminate the worst class of version drift; BiDi does give you events the old protocol could not. But the test in your file still reads: driver.find_element(By.XPATH, "//button[normalize-space()='Continue']").click(). Tomorrow marketing renames the button to "Sign in." The XPath stops matching. The patch list does not save you; it only tells you which patch to add next.

What an alternative input surface looks like

The accessibility tree is already a labelled, role-tagged structure of every interactive element on the page. Browsers compute it for screen readers. Playwright MCP exposes it via a tool called browser_snapshot. Every node has a role ( textbox, button, link), an accessible name (the visible label or aria-label), and a short ref ID like e7. That is enough to identify any interactive element on the page without writing a CSS selector or an XPath.

Assrt is built on this idea. It is a small Node MCP server that spawns a local Playwright MCP process and exposes eighteen tools to a model (Claude Haiku 4.5 by default, Gemini 3.1 Pro optionally). The tools live in the TOOLS array starting at line 16 of assrt-mcp/src/core/agent.ts: navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, and wait_for_stable. Notice what is not there: any verb that takes a CSS selector or an XPath. The two element-targeting tools (click, type_text) take a human-readable description and optionally a ref. They do not take a locator string.

The agent's contract with the model is in the system prompt at line 198 of the same file. The relevant block, headed Selector Strategy (Playwright MCP refs), lives at lines 213-218 and reads, almost verbatim:

## Selector Strategy (Playwright MCP refs)
1. Call snapshot to get the accessibility tree
2. Find the element you want to interact with in the tree
3. Use its ref value (e.g. "e5") in the ref parameter
4. Also provide a human-readable element description for logging
5. If a ref is stale (action fails), call snapshot again to get fresh refs

That is the entire flaky-test recovery contract. Five sentences. The fifth is doing nearly all of the work. Every Selenium retry framework, every page object base class, every visual regression harness exists to compensate for the absence of that line. With it, the agent re-reads the page on every action; it never depends on a stale identifier. There is no global retry decorator and no centralised locator file because there are no locators to centralise.

The same test, both ways

One test, written twice. Left: a typical pytest + Selenium WebDriver port of "sign in and create a project." Seven locators, three of them XPath. Right: the same intent expressed as a Markdown plan file. The agent reads it line by line and calls real Playwright MCP under the hood.

Locator inventory: 7 → 0

# tests/test_create_project.py
# A typical Selenium WebDriver test, the way every "automation
# test selenium" guide teaches it.

import os
import pytest
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

@pytest.fixture
def driver():
    options = webdriver.ChromeOptions()
    options.add_argument("--headless=new")
    drv = webdriver.Chrome(options=options)
    drv.implicitly_wait(0)
    yield drv
    drv.quit()

def test_create_project(driver):
    driver.get("https://app.example.com/login")
    wait = WebDriverWait(driver, 10)

    # Locator #1: email field. Will rot if the design system
    # renames the input or wraps it in a new shadow root.
    email = wait.until(EC.presence_of_element_located(
        (By.CSS_SELECTOR, "input[name='email']")))
    email.send_keys(os.environ["DEMO_EMAIL"])

    # Locator #2: password. Same fragility.
    pw = driver.find_element(By.CSS_SELECTOR, "input[name='password']")
    pw.send_keys(os.environ["DEMO_PASSWORD"])

    # Locator #3: submit button. Watch for marketing renaming
    # "Continue" to "Sign in" next quarter.
    driver.find_element(By.XPATH,
        "//button[normalize-space()='Continue']").click()

    wait.until(EC.url_contains("/dashboard"))

    # Locator #4: the new-project button.
    driver.find_element(By.XPATH,
        "//button[normalize-space()='New project']").click()

    # Locator #5: the project-name input.
    name = wait.until(EC.presence_of_element_located(
        (By.CSS_SELECTOR, "input[aria-label='Project name']")))
    name.send_keys("smoke")

    # Locator #6: the Create button.
    driver.find_element(By.XPATH,
        "//button[normalize-space()='Create']").click()

    # Locator #7: assertion target.
    heading = wait.until(EC.presence_of_element_located(
        (By.XPATH, "//h1[normalize-space()='smoke']")))
    assert heading.is_displayed()

64% fewer lines, zero locators

The line count is not the point. The locator count is. The Selenium file has seven independent reasons to break next quarter; the Markdown file has zero. If marketing renames "Continue" to "Sign in", the agent's snapshot returns a button named "Sign in", and the model picks that ref because the scenario says "Press Continue" but the closest semantic match is the only sign-in button on the page. (You can be stricter than that with an explicit pass-criteria block; see the FAQ.)

What happens on the wire, one step at a time

The model never sees a screenshot for routine clicks (it can request one). It sees the accessibility tree as text, picks a ref, and calls a tool. The tool calls a real Playwright MCP method, which speaks CDP to a real Chromium binary. If the click fails, the model re-snapshots. The diagram below traces one round-trip of one click.

One click, end to end

On a stale-element error, the agent does not retry the same ref. It calls snapshot again; the page returns a fresh tree; the click goes to the new ref. This is not a feature anyone implemented as a retry framework. It is what happens when the system prompt says "if a ref is stale, call snapshot again to get fresh refs" and the model takes that instruction literally. The line is at assrt-mcp/src/core/agent.ts:218.

What you give up, and where Selenium still wins

This is not a "Selenium is dead" piece. The honest tradeoffs:

Language portability. Selenium has first-party bindings for Java, Python, C#, Ruby, JavaScript. Assrt is a Node MCP server. Tests run from Claude Code, Cursor, an editor with an MCP client, or the CLI. If your QA team writes Java, that is friction.
Browser matrix. Selenium drives Chrome, Firefox, Edge, Safari, IE 11, plus mobile via Appium. Assrt drives whatever Playwright MCP drives (Chromium reliably; Firefox and WebKit when configured). If you must test IE 11 on Windows Server 2019, Selenium Grid wins.
Determinism on stable suites. An LLM in the loop introduces a small amount of non-determinism. For tests that already pass 99% of the time with selector-based code, swapping in a model is a downgrade. The Assrt migration target is the 1% that always flakes, not the 99% that already works.
Cost shape. A Selenium run is free at the unit-economics level once you own the CI runner. An Assrt run pays per-token to the model provider. With Claude Haiku 4.5 the average #Case is a few cents; for a thousand-test suite, monthly bills are real and should be budgeted.

Where Selenium does not win and never will: the locator-rot problem. It is structural. No best-practice list eliminates the input surface that produces it. The choice is whether to keep paying the patch tax forever or to route the suites that hurt most onto a substrate where the patch tax is zero.

Toggle a real selenium failure against the Assrt path

The most common Selenium failure mode in real production suites. One side is what your CI dashboard looks like the morning after a frontend rename. The other side is what actually happened in the test that did not break.

A button rename ships, the night build runs

7 of 142 tests fail overnight after a frontend release. The release diff renames a single primary button: <Button>Continue</Button> -> <Button>Sign in</Button> Selenium tests that locate by text break on the next run. Tests that locate by data-testid keep passing — but only because someone added data-testid="continue-cta" to that button six months ago and the diff did not touch it. Triage path: - Read the failure logs (StaleElementReferenceException, NoSuchElementException, TimeoutException) - Find every test that referenced the old button text - Update XPath / link-text locators in 7 files - Re-run, hope nothing else moved - File a ticket asking the FE team to use stable IDs - Repeat next quarter when something else gets renamed

7 broken tests from a single rename
Triage tied to the patch list, not the system
Same failure class will recur on the next rename

How to verify the claim in your own node_modules

Three commands. None require an Assrt account.

npm install assrt-mcp — pulls the MIT-licensed package from the public registry.
grep -n stale node_modules/assrt-mcp/dist/core/agent.js — prints the exact line that says "If a ref is stale (action fails), call snapshot again to get fresh refs." Same string lives in your install as on the page above.
grep -c 'name: "' node_modules/assrt-mcp/dist/core/agent.js — counts the eighteen tool names in the TOOLS array. Confirms you are running an agent that dispatches to a fixed, inspectable contract — not a black-box DSL.

If those three commands give you the file paths, line numbers, and counts described in this piece, the rest of the architecture follows. The rest is just running it against your own URLs.

Migrating one Selenium suite this quarter? Talk to us.

Bring a flaky locator-heavy file. We will rewrite it as a #Case plan on a call and run it against your staging URL with you watching.

Frequently asked questions

I already have a Selenium suite. Why would I rewrite it as English #Case blocks?

You don't have to. Most teams that adopt Assrt keep their Selenium-driven CI smoke pack and add Assrt for the things WebDriver was always painful at: signup with a real disposable email and OTP, exploratory regression on a deploy, and tests written by a designer or PM who cannot edit a Python file. The two stacks share a Chrome binary, so cookies persisted by Assrt at ~/.assrt/browser-profile do not interfere with a Selenium test runner that points at its own profile. The decision is not 'replace Selenium today' — it is 'route the brittle locator-heavy cases to a substrate that does not need locators at all.'

What does 'no locator' actually mean? I still need to identify a button somehow.

The agent does, but you don't. On every step the agent calls a tool named `snapshot`, which returns the page's accessibility tree from Playwright MCP — every interactive element with its role, accessible name, and a short ID like `ref="e7"`. The agent picks the ref whose accessible name matches your English ("Email field" → the textbox with name "Email") and passes that ref to `click` or `type_text`. The whole tree is sourced from ARIA, which is what assistive technology already trusts. You write the intent in English; the agent picks the ref; the ref is fresh on every step because every step starts with a snapshot. There is no By.id, no By.xpath, no By.cssSelector — neither in your scenario file nor inside the agent's transcript.

Show me the line where 'self-heal on stale locators' lives. I want to verify it.

Open /Users/matthewdi/assrt-mcp/src/core/agent.ts. The system prompt starts at line 198. The `## Selector Strategy (Playwright MCP refs)` block lives at lines 213-218. Line 218 reads, verbatim: `5. If a ref is stale (action fails), call snapshot again to get fresh refs`. That single sentence is the entire flaky-test recovery contract. The package is MIT-licensed and published on npm as `assrt-mcp`, so once you `npm install assrt-mcp` you can run `grep -n stale node_modules/assrt-mcp/dist/core/agent.js` and confirm the same string lives in your own node_modules. No proprietary retry framework, no cloud-side healing service.

Selenium 5 introduced WebDriver BiDi for real-time browser events. Doesn't that solve the locator-rot problem?

BiDi solves a different problem. It gives Selenium a WebSocket channel for events the browser pushes — console logs, network requests, DOM mutations — so your test doesn't have to poll. That is genuinely better. But it does not change the input surface: you still call `driver.find_element(By.XPATH, ...)` to identify the thing you want to click. BiDi makes the wire protocol bidirectional; it does not remove the locator. Assrt removes the locator. The two changes compose: a BiDi-enabled Selenium test still benefits from data-testid hardening; an Assrt #Case benefits from the underlying browser exposing reliable accessibility metadata in the first place.

What is the catch? When does Selenium win over Assrt?

Three honest cases. First, language portability: if your team writes tests in Java, C#, or Ruby, Selenium's bindings cover all three; Assrt is a Node MCP server, and your scenario authors run it from Claude Code or the CLI rather than from a Maven build. Second, on-device matrices: if you must run the same test on Internet Explorer 11 (still a thing in some industries), Edge Legacy, or a real iPhone via Appium, Selenium Grid is mature and Assrt is Chromium-only via Playwright MCP. Third, deterministic CI for tests that are already stable: if you have a working selector-based suite that passes 99% of the time, switching it to an LLM-driven loop introduces non-determinism you may not want. The right migration target is the part of your suite that is already flaky.

How is this not just a hosted no-code tool with an LLM bolted on?

Three checks. (1) Open `node_modules/assrt-mcp/dist/core/agent.ts` and read the TOOLS array starting at line 16 — you will see the eighteen tools the agent dispatches: navigate, snapshot, click, type_text, select_option, scroll, press_key, wait, screenshot, evaluate, create_temp_email, wait_for_verification_code, check_email_inbox, assert, complete_scenario, suggest_improvement, http_request, wait_for_stable. They are JSON tool definitions, not a wrapped DSL. (2) Open `browser.ts:561` and you will see the `snapshot` method making a literal call to `browser_snapshot` on the local Playwright MCP process. The browser bytes on the wire are vanilla Playwright. (3) The license is MIT and the source is on GitHub under the assrt-ai org. You can self-host the entire stack on your laptop with your own Anthropic key and never talk to assrt.ai's cloud.

What does the accessibility tree actually look like? I want to know what the model is reading.

It is a YAML-style outline of every focusable / labeled element, indented by parent-child relationship. A login form looks roughly like: `- main [ref=e1]` → `- form [ref=e2]` → `- textbox 'Email' [ref=e3]` → `- textbox 'Password' [ref=e4]` → `- button 'Continue' [ref=e5]`. The agent sees role, accessible name, and `ref` ID for each node. The whole tree is capped at 120,000 characters (browser.ts:523, `SNAPSHOT_MAX_CHARS = 120_000`, ≈30k tokens) before truncation, so even a Wikipedia-sized DOM fits in the model's context. If the page does not expose accessibility metadata, the agent falls back to a screenshot and visible text — but in 2026, most production apps already pass a basic ARIA audit because of design-system pressure, so the snapshot-first path covers the common case.

Can I drive Assrt from CI in the same place I currently run Selenium?

Yes. Assrt ships as a CLI that takes a plan file and a URL and exits non-zero on failure: `npx @assrt-ai/assrt run --plan scenario.md --url https://app.example.com`. Wire that into the same GitHub Actions workflow that runs your `pytest` suite. Each scenario emits a JSON report at `/tmp/assrt/<runId>/results.json` and a video at `/tmp/assrt/<runId>/video/recording.webm`, both of which you can upload as build artefacts. The cloud sync to app.assrt.ai is opt-in; without an account, the entire run stays on the CI runner.

What about parallelism and Selenium Grid?

Assrt is single-process per scenario, but scenarios run sequentially within one plan file by design — they share browser session state (cookies, localStorage) so case N+1 can build on case N's logged-in session. For real parallelism, run multiple plan files in parallel as separate processes, each with its own `userDataDir` (set with the `isolated` flag, browser.ts:336). That mirrors what Grid would do — N independent Chromium instances, each running a slice of the suite. There is no shared scheduler, but for a CI matrix that is rarely the bottleneck.

If the locator-elimination argument lands, these go further.

Adjacent reading

Open source

Open-source test automation with Playwright (no cloud required)

Every component of the Assrt stack is self-hostable: the MIT-licensed MCP server, the Playwright binary on your machine, and the cookie store on disk. The companion piece for teams who want to keep their CI runner offline.

Read

Comparison

Playwright vs Selenium: same flaky test, different substrates

A walk through the same suite ported from Selenium WebDriver to Playwright, then to Assrt. Where the protocol matters, where the syntax matters, and where neither does.

Read

Beginner

Test automation for beginners: the loop where you never pick a tool

If you are starting from scratch and not migrating from Selenium, this is the entry point. Three MCP tools, one URL, 5-8 #Case blocks generated for you.

Read

What Selenium actually is, at the protocol layer

The patch list every senior team has tried

What an alternative input surface looks like

The same test, both ways

What happens on the wire, one step at a time

What you give up, and where Selenium still wins

Toggle a real selenium failure against the Assrt path

A button rename ships, the night build runs

How to verify the claim in your own node_modules

Migrating one Selenium suite this quarter? Talk to us.

Frequently asked questions

Adjacent reading

Open-source test automation with Playwright (no cloud required)

Playwright vs Selenium: same flaky test, different substrates

Test automation for beginners: the loop where you never pick a tool

Comments (••)

Comments ()