WCAG · accessibility testing · E2E

Why no single WCAG test SaaS actually covers WCAG.

Every “best WCAG test SaaS” listicle ranks the same axe-core based scanners: axe DevTools, Pa11y, WAVE, Siteimprove, AudioEye, UserWay, Accessibility Insights. They all cite the same number, usually buried in the article: automated scans catch about 57% of accessibility issues. Per Deque’s own 2021 study of 13,000 pages and ~300,000 issues. Nobody talks about the other 43%, which is where most of the actually-painful bugs live.

Matthew Diakonov, Written with AI

Published May 17, 20269 min

Direct answer · verified 2026-05-17

There is no single “WCAG test SaaS” that fully tests WCAG. Per Deque’s 2021 study, the axe-core engine that powers every major scanner (axe DevTools, Pa11y, WAVE, Siteimprove, AudioEye, UserWay, Storybook a11y, Accessibility Insights) catches roughly 57% of accessibility issues on real pages. The other ~43% needs human review for judgment calls (alt text quality, focus order intent) plus runtime tests that exercise modals, dialogs, async loads, and route transitions. Buy a scanner and pair it with an accessibility-tree-driven E2E agent so flow-time regressions cause test failures, not silent scanner passes.

Source: Deque, “Automated Testing Identifies 57% of Digital Accessibility Issues” (study spans ~13,000 pages, ~300,000 issues).

57%

“On average, 57 percent of accessibility issues were completely covered by this automated testing.”

Deque, 2021 study of 13,000 pages and ~300,000 issues across 2,000 audits

The listicle problem: every roundup recommends the same scanner under different names

Look at the top results when you search for a WCAG test SaaS in May 2026. They list axe DevTools, WAVE, Siteimprove, AudioEye, UserWay, Pa11y, accessiBe, Accessibility Insights, Storybook’s a11y addon, Lighthouse’s accessibility audit, sometimes a few niche players like Pope Tech or GetWCAG. Five of those wrap axe-core directly. Lighthouse wraps a subset of axe-core. The Storybook addon literally re-exports axe. WAVE is the one significant outlier in implementation, and it still measures the same family of rules. The market is much narrower than the listicles imply.

That is fine. It is good that the industry converged on one solid engine. What is not fine is that the listicle format hides the ceiling. Every page tells you to “layer tools” or “combine automated with manual” in one throwaway paragraph at the bottom, and then ranks the tools as if picking the right scanner is the decision. It is not. Picking any axe-based scanner gets you to the same ceiling. The decision is what you do about the other half.

the static scanner half

Pick any one of these. They wrap the same engine.

axe DevTools

Deque's commercial wrapper around axe-core; same 57% coverage.

Pa11y

Open-source axe-style CLI for CI. Free.

WAVE

WebAIM's in-page visual feedback scanner.

Siteimprove

Enterprise governance platform, continuous scan.

AudioEye

Hybrid scan plus paid expert audit.

UserWay

Scanner plus overlay; use the scanner side only.

Storybook a11y addon

axe inside @storybook/test-runner per story.

Accessibility Insights

Microsoft's axe-powered desktop and browser tool.

What lives in the other 43%

Two categories. The first is judgment work no scanner can automate. Is this alt text meaningful or auto-generated noise. Does the focus order on this form follow the visual order. Is the link text descriptive out of context. Are the form errors actually associated with their inputs in a way a screen reader can announce. These require humans, and that is what the “manual review” bullet in every listicle is pointing at.

The second category is the one that gets ignored: flow-time regressions. A static scan sees one moment in the page lifecycle, usually after initial load. It does not open the settings modal that traps focus on an aria-hidden element. It does not submit the signup form that should announce the success message via aria-live but instead pops up a div with role=“banner” that no screen reader will announce. It does not navigate through the wizard that hides the Back button behind a custom div with no role. It does not click the Skip to content link that links to an id that does not exist anymore because someone refactored the layout last sprint.

All of those pass an axe scan on every page they touch. All of those make your app unusable for someone navigating with a screen reader or keyboard. The gap between “the scanner is green” and “the app is accessible” is exactly the gap an E2E agent that drives the accessibility tree can close.

Scanner only vs scanner plus AX-tree-driven E2E

axe runs on each URL at initial load. It reports color contrast issues, missing alt attributes, missing form labels, low ARIA hygiene. CI gates on the violations array. Build goes green when each page state at load time is clean. The modal-trap bug, the aria-live-that-isn't bug, the focus-lost-on-close bug, and the broken-skip-link bug all ship to production because the scanner never clicked anything.

Catches ~57% of issues, measured by volume
Static page snapshot at load
Cannot click, type, or navigate
Modal focus traps invisible
ARIA live regions never tested
Skip-to-content links never followed

The anchor fact: the test runs through the accessibility tree, not around it

Most E2E test frameworks let you choose how to locate an element. You can use a CSS selector, a data-testid, an XPath expression, text content, an ARIA role with an accessible name. The choice is up to you. In practice teams pick whichever is most stable, and the most stable thing is usually a data-testid, because product designers do not refactor invisible attributes when they refactor the visible UI.

Assrt removes the choice. Every action is dispatched against a ref from a fresh accessibility-tree snapshot. The SYSTEM_PROMPT at /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 206-218 enforces this as a CRITICAL Rule: “ALWAYS call snapshot FIRST to get the accessibility tree with element refs.” The snapshot itself comes from McpBrowserManager.snapshot() at /Users/matthewdi/assrt-mcp/src/core/browser.ts line 589, which invokes Playwright MCP’s browser_snapshot tool. That tool returns the computed accessibility tree, which is the same tree that NVDA, JAWS, VoiceOver, and TalkBack consume to describe your page to a user.

This sounds incidental. It is not. It means every Assrt test is an unintentional smoke test of your accessibility tree. If the tree on your dashboard misses a role for the primary CTA, the agent picks the wrong element or none, and the test fails. If a new modal traps focus on a hidden element, the next snapshot has no clickable ref for the dialog primary action, and the test fails. If a form’s success state is announced via a div without an aria-live region or accessible name, the assertion against the success heading fails. None of these failures need WCAG-specific logic in the test. They fall out for free because the substrate of the test is the accessibility layer.

How the two halves cooperate in one suite

What this is not: it is not a substitute for the scanner

An AX-tree-driven E2E agent does not parse your CSS for color contrast. It does not lint your alt attributes. It does not inspect your heading hierarchy. It does not measure whether your font sizes resize correctly. The scanner half exists because those checks are cheap to automate and tediously easy to forget. Buy a scanner. Run it on every page state your test suite visits.

What the agent does is close the gap where the scanner cannot see. A real WCAG program in 2026 is layered: scanner per page state, manual review for judgment calls, screen-reader smoke test on the critical paths, and E2E flow tests that exercise the interactive surface area. The first and the last are automatable. The middle two still need a human, but you are not asking the human to do the part the machine is good at.

The recipe, in order

Pick a static scanner (the 57% half)

Use any axe-core based tool. Pa11y if you want free in CI, axe DevTools if you want the team UI, Storybook a11y if you have a component library. Run it on every page state your scanner can reach without clicking anything. Treat its violations array as a build gate.

Write Markdown scenarios for accessibility-sensitive flows (the 43% half)

Modal open and close with focus return. Tab order through a multi-step form. Async submit with aria-live announcement. Skip-to-content link. Keyboard navigation past a sticky header. Each scenario lives in /tmp/assrt/scenario.md as a #Case block; commit it to your repo alongside the Playwright project.

Run Assrt so the AX tree is the test substrate

The agent calls snapshot() before every action, picks a ref from the live accessibility tree, dispatches the click or type, then snapshots again. If a step cannot find an element with the expected role and accessible name, the test fails with the snapshot attached as evidence. Flow-time accessibility regressions become test failures, not silent passes.

Drop @axe-core/playwright into the same suite

At each assertion checkpoint that lands on a meaningful page state, run new AxeBuilder({ page }).withTags(['wcag2a','wcag2aa','wcag21aa']).analyze() and expect violations to be empty. You get the static scanner's coverage at every state the flow visits, not just at page load. This is the path enterprise WCAG SaaS cannot offer because they are not in your test runner.

Triage with the assertion log, not a vendor dashboard

Both halves produce failures inside the same test run. Static violations come from axe; flow violations come from the agent giving up on a missing ref. Both write to results/latest.json. Both write to a .webm video. You read the artifact, find the offending step, fix the markup. There is no SaaS account to manage, no seat to buy, no quarterly renewal call.

The honest counterargument

An enterprise WCAG SaaS sells you something Assrt does not: a paper trail. Audit reports with the vendor’s name on them. VPATs. Quarterly remediation tickets routed to the right teams. Expert human reviewers in the loop for the judgment calls the machine cannot make. A lawyer-friendly artifact when you get an ADA complaint letter. If you are an enterprise procuring accessibility tooling specifically to satisfy a buying requirement, an open-source stack does not produce that paper. Buy AudioEye or Siteimprove for that, and run Assrt alongside it because the SaaS still cannot test your flows.

If you are an engineering team trying to build something that actually works for screen reader users, switch users, and keyboard-only users (and the audit paper is a downstream consequence, not the goal), Pa11y plus Assrt covers more ground than the enterprise SaaS at roughly two orders of magnitude less cost. The tradeoff is real and predictable: no white-glove audit, no vendor name on the report, no quarterly review call. The work output, when measured by what your users with disabilities actually experience, is better, because runtime regressions stop shipping.

Want help wiring axe + Assrt into your existing CI?

Bring your repo and the WCAG SaaS you're currently considering or paying for; I'll show you what the layered stack looks like for your stack.

WCAG test SaaS, answered honestly

Is Assrt a WCAG compliance scanner like axe DevTools or UserWay?

No, and I want to be honest about that up front. Assrt does not output a WCAG 2.2 violations report. It does not classify issues by success criterion. It does not know what a Level AA contrast ratio is. What Assrt does is run real Playwright tests against a real Chromium browser, where every click and type targets an element pulled from the live accessibility tree via a snapshot taken right before that step. The rule is in SYSTEM_PROMPT at /Users/matthewdi/assrt-mcp/src/core/agent.ts lines 206-218: ALWAYS call snapshot FIRST to get the accessibility tree with element refs. If the accessibility tree on your page is missing a label, the agent will pick the wrong element or fail to find one at all. That is a different signal than a scanner gives you, and that is the half of WCAG coverage nobody else is selling.

Which WCAG test SaaS should I actually buy then?

If you only buy one tool, buy something axe-core based: axe DevTools is the commercial wrapper, Pa11y is the free CLI, Storybook ships axe in @storybook/test-runner, GitHub Actions has the deque-systems/axe-core-action. They all use the same engine and catch the same approximately 57 percent of WCAG issues (Deque's 2021 study, link in the proof banner below). For the rest, you need a process: per-page manual review for keyboard traps and focus order, a screen reader check on the critical paths, and runtime tests that exercise your modals, dialogs, async loads, and route transitions. Assrt fits the runtime-tests slot. It is not a substitute for the scanner; it is the layer above the scanner.

Why can scanners only catch 57% of WCAG issues?

Two reasons. The first is structural: many WCAG criteria require human judgment to evaluate, like 'is this alt text actually meaningful' or 'does this focus order match the visual flow.' A scanner can flag a missing alt attribute but cannot grade a present one. The second is temporal: a static scan happens at one moment in the page lifecycle, usually after load. It does not click anything. It does not open the modal that traps focus. It does not submit the form that triggers the loading spinner that should announce itself via aria-live but does not. Deque's 57 percent number is measured over 13,000 pages and roughly 300,000 issues, weighted by volume rather than success criterion. That weighting is what most people miss when they argue automated coverage is closer to 30 percent. Both numbers are right; they are measuring different things.

How exactly does an E2E test catch accessibility regressions a scanner misses?

By depending on the accessibility tree to drive the test. Assrt's browser_snapshot tool at /Users/matthewdi/assrt-mcp/src/core/browser.ts line 589 calls into the @playwright/mcp official server, which returns the computed accessibility tree, the same tree a screen reader consumes. If a modal opens and traps focus on a hidden element, the next snapshot has no clickable ref for the dialog primary action, and the test fails on that step. If a button loses its accessible name during a refactor (someone wraps it in a styled div that swallows the role), the agent cannot find it by intent and reports the failure with the snapshot as evidence. If a form posts and the success message arrives via aria-live but the markup is broken, the agent's wait_for_stable returns and the assertion against the success heading fails. None of these failures need WCAG-specific logic in the test framework. They fall out for free because the test is reading the same surface the assistive technology reads.

Can I run axe-core inside Assrt-generated tests?

Yes, and that is the obvious path if you want both halves in one runner. The Assrt agent emits real Playwright instructions (it drives @playwright/mcp), and Playwright has a first-party integration at @axe-core/playwright that injects axe into any Page or Frame and returns the violations array. You write the scenario in Markdown for Assrt to execute the flow, and at the assertion step you call a Playwright helper that runs axe.analyze() and asserts the violations array is empty. The combined result is one suite that fails on flow-time accessibility breakage (the Assrt half) and on WCAG 2.2 violations on each page state visited (the axe half). That is closer to a real audit than any single SaaS gives you, and it costs whatever your CI minutes plus your LLM tokens cost.

What about overlay tools like UserWay and accessiBe — are they a WCAG test SaaS?

They are accessibility overlays first and a scanner second. The scanner functionality piggybacks on the same axe-core or in-house engines as everyone else; the overlay is a separate, controversial product that injects a widget into your page to remediate issues client-side. The W3C, WebAIM, and a long list of disability advocacy groups have argued the overlay approach is harmful, and there is ongoing litigation around it. If you want to test WCAG compliance, use the scanner side; do not deploy the overlay as a fix. If you want runtime coverage on top of the scanner, that is what an accessibility-tree-driven E2E agent gives you, and it does not modify the production page in any way.

Does my app's accessibility actually have to be good for Assrt to run?

It degrades gracefully. The accessibility tree @playwright/mcp returns is the computed tree, which infers roles from HTML semantics even when ARIA is absent. A button without aria-label still appears with its visible text. A div with an onclick still appears as a generic interactive node. The agent can fall back to evaluate for arbitrary JavaScript if a page is genuinely opaque, and the SYSTEM_PROMPT documents an evaluate-based workaround for OTP inputs split into single-character fields (agent.ts line 235). What you do not get is reliability: the worse your accessibility is, the more your tests will pick the wrong element, take a wrong path, or fail on a stale ref. The fix is to improve the accessibility, which is also what your users with screen readers, switch controls, and keyboard-only navigation need. The test suite turns into a forcing function for the work that should have happened anyway.

How is this different from running Pa11y on a CI cron?

Pa11y runs a static axe-style scan on a list of URLs you give it. It is great. It catches the 57 percent half. It does not click anything, open any modal, submit any form, or navigate any in-app state, so it cannot catch the 43 percent that lives in flows. Pa11y plus a manual QA checklist is the most common shape of an accessibility program today, and the manual checklist is the part everyone hates. Assrt automates the checklist: write the flow in Markdown, run it on every PR, fail the build when the flow can no longer be completed via the accessibility tree. Pa11y for the static scan, Assrt for the flow regression, axe-core/playwright as the shared engine between them. That is the actual recipe.

What does the scenario file look like when I want to test an accessibility-sensitive flow?

A plain Markdown file with #Case blocks. Example for a focus-after-modal-close test: '#Case 1: Closing the settings modal returns focus to the trigger\n1. Navigate to /dashboard\n2. Click the Settings button in the top nav\n3. Assert: a dialog with the heading Settings is visible\n4. Click the Close button in the dialog\n5. Assert: focus is on the Settings button.' The agent runs the case, calls snapshot between steps, and the focus assertion either finds a focused ref matching Settings or fails. The scenario file lives at /tmp/assrt/scenario.md by default, and the format is defined at /Users/matthewdi/assrt-mcp/src/core/scenario-files.ts. You can commit it to your repo next to a Playwright project, version-control it, diff it, review it. There is no proprietary YAML, no vendor dashboard, no seat-based pricing.

Is this cheaper than buying an enterprise WCAG SaaS?

Substantially, for most teams. AudioEye, Siteimprove, and similar enterprise platforms price into the five to six figures per year for sites of any meaningful scale. UserWay and accessiBe are in the $490 to $4,900 per year range for the overlay plus scanner, plus the reputational and legal cost of running an overlay. axe DevTools Pro is around $40 per developer per month. Pa11y is free. Assrt is open source, self-hosted, and free beyond the Anthropic API tokens your runs consume; a typical flow run is on the order of cents. The right comparison is not Assrt vs an enterprise WCAG SaaS, it is Pa11y plus Assrt vs an enterprise WCAG SaaS. The open stack covers the same approximately 57 percent of issues as the enterprise scanner, plus the runtime flow regressions enterprise scanners do not test, at roughly two orders of magnitude less cost. The trade is that you do not get the white-glove audit and the legal posture a vendor sells; if you need those, buy them separately.