Guide

Modern E2E Testing: From Flaky Scripts to Reliable AI-Powered Tests

By Pavel Borji··Founder @ Assrt

End-to-end testing has undergone a radical transformation over the past decade. What started as brittle Selenium scripts maintained by dedicated QA armies has evolved into AI-powered frameworks that discover, generate, and self-heal tests autonomously. This guide traces that evolution and shows you what modern E2E testing looks like in practice.

Trillions

Global software failures cost an enormous amount annually, with a significant portion traced to inadequate testing and regression defects that slipped past manual QA.

1. The Evolution of E2E Testing

To understand where E2E testing is going, you need to understand where it has been. The history of browser automation is a story of incremental improvements, each solving the most painful problem of its era while creating new ones.

2004 to 2015

The Selenium Era

Selenium WebDriver became the universal standard. Tests were written in Java or Python, using explicit XPath selectors and manual waits. Every team maintained a custom framework on top of Selenium. Test suites were slow, brittle, and required constant maintenance. A typical enterprise Selenium suite had a 30 to 40% flakiness rate.

2017 to 2019

The Cypress Revolution

Cypress introduced a fundamentally different architecture: running inside the browser rather than controlling it externally. This brought automatic waiting, time-travel debugging, and a dramatically better developer experience. But Cypress was limited to Chromium, could not handle multiple tabs, and introduced its own set of constraints.

2020 to 2023

The Playwright Rise

Microsoft's Playwright combined the best of both worlds: modern auto-waiting and a great developer experience (like Cypress) with multi-browser support, multiple contexts, and network interception (like Selenium). Its trace viewer, codegen tool, and built-in assertions set a new standard for test reliability and debuggability.

2024 and Beyond

The AI Layer

AI-powered testing frameworks like Assrt add intelligence on top of Playwright. Instead of manually writing every test, AI discovers testable scenarios, generates tests from natural language descriptions, and automatically fixes tests when the UI changes. The human role shifts from writing tests to reviewing and curating them.

Each era solved real problems. But until the AI layer, one fundamental challenge remained: the cost of creating and maintaining tests scaled linearly with the size and complexity of the application. AI breaks that linear relationship.

2. Why Traditional E2E Testing Breaks

The numbers tell a grim story. The global cost of software defects reaches $1.7 trillion annually, with a disproportionate share attributable to bugs that should have been caught by automated tests but were not. More than 60% of test suites in production suffer from chronic flakiness, meaning tests that pass and fail intermittently without any code changes.

Why does traditional E2E testing fail so reliably? The root causes are structural, not technical.

Selector fragility. Traditional tests rely on CSS selectors, XPath expressions, or test IDs to locate elements. When a developer reorganizes a component, renames a class, or moves an element to a different container, tests break. In a fast-moving codebase, these UI changes happen constantly. A single redesign sprint can break dozens of tests simultaneously.

Timing assumptions. Despite improvements in auto-waiting, many tests still encode timing assumptions. A test that works on a fast CI machine may fail on a slower one. Network latency, animation durations, and server response times all introduce nondeterminism that is difficult to eliminate entirely.

Maintenance cost curve. The effort required to maintain a test suite grows faster than the suite itself. At 50 tests, maintenance is manageable. At 500 tests, teams routinely spend 40 to 60% of their automation budget on maintenance alone. This creates a vicious cycle: as maintenance costs rise, teams write fewer new tests, coverage stagnates, and more bugs escape to production.

The discovery gap. Traditional testing requires a human to identify what should be tested. This means test coverage is limited by human imagination, time, and institutional knowledge. When an engineer leaves the team, the understanding of which flows need testing often leaves with them. New team members tend to write tests for what they know, not for what the application actually needs.

These are not problems that better tooling alone can solve. They require a fundamentally different approach to how tests are discovered, written, and maintained. That is where the three pillars of modern testing come in.

Try Assrt for free

Open-source AI testing framework. No signup required.

Get Started

3. The Three Pillars of Modern Testing

Modern AI-powered E2E testing rests on three capabilities that, taken together, address every structural failure of traditional approaches.

01

Auto-Discovery

AI crawls your application, analyzes its structure, and identifies every testable scenario. No human needs to enumerate flows manually. The discovery process is exhaustive and repeatable.

02

Natural Language Generation

Tests are authored in plain English and translated into production-ready Playwright code. This democratizes test authoring, allowing product managers, designers, and junior engineers to contribute to the test suite.

03

Self-Healing

When UI changes break existing tests, the framework detects the breakage, determines the correct fix, and submits a pull request with the updated selectors and assertions. Maintenance becomes nearly zero-cost.

These three pillars work together as a closed loop. Auto-discovery identifies what to test. Natural language generation creates the tests. Self-healing maintains them over time. The result is a test suite that grows and adapts alongside your application with minimal human intervention.

4. Auto-Discovery: Let AI Map Your App

Auto-discovery is the most transformative capability in modern testing. Instead of relying on engineers to manually identify and catalog every testable flow, AI systematically explores your application and builds a comprehensive map of its functionality.

The process works through multiple analysis layers. First, crawl-based flow detection navigates every reachable page, clicking links, filling forms, and mapping the application's state transitions. Second, accessibility tree analysis examines the semantic structure of each page, identifying interactive elements, form fields, navigation patterns, and content regions. Third, vision understanding uses visual analysis to identify UI patterns that may not be reflected in the DOM, such as canvas-based interfaces, dynamically rendered content, or complex widget states.

Auto-Discovery in Action

$ assrt discover https://my-saas-app.com

Crawling application...
  Found 23 unique pages
  Found 47 interactive flows
  Found 12 form submissions
  Found 8 authentication states

Discovered test scenarios:
  1. User signup with email and password
  2. User login with valid credentials
  3. User login with invalid credentials (error state)
  4. Password reset flow
  5. Dashboard loads with correct user data
  6. Create new project from dashboard
  7. Edit project settings
  8. Invite team member via email
  9. Accept team invitation
  10. Upgrade plan via checkout
  ... and 37 more scenarios

Total: 47 test scenarios discovered in 2m 14s

What makes auto-discovery particularly powerful is its ability to find scenarios that humans overlook. Edge cases like error states, empty states, permission boundaries, and multi-step flows that span multiple pages are commonly missed in manual test planning but are naturally discovered by AI exploration.

Auto-discovery is also repeatable. You can run it after every major release to identify new flows that need testing. As your application grows, your test coverage grows with it, automatically.

5. Natural Language Generation: Tests Everyone Can Write

The second pillar addresses one of the oldest bottlenecks in testing: the skill barrier. Traditional E2E tests require deep knowledge of the testing framework, the application's DOM structure, and the quirks of browser automation. This limits test authoring to a small subset of the engineering team.

Natural language test generation changes this equation entirely. You describe what you want to test in plain English, and the AI translates your description into production-ready Playwright code. The generated code uses proper selectors, includes appropriate assertions, and handles common edge cases like loading states and animations.

Natural Language Input

$ assrt generate "User signs up with email, confirms
their account, and sees the onboarding wizard"

Generated Playwright Code

import { test, expect } from '@playwright/test';

test('user signs up and sees onboarding', async ({ page }) => {
  // Navigate to signup page
  await page.goto('/signup');

  // Fill in registration form
  await page.getByLabel('Email').fill('newuser@example.com');
  await page.getByLabel('Password').fill('SecurePass123!');
  await page.getByLabel('Confirm password').fill('SecurePass123!');
  await page.getByRole('button', { name: 'Create account' }).click();

  // Verify confirmation page
  await expect(page.getByText('Check your email')).toBeVisible();

  // Simulate email confirmation (via API shortcut)
  await page.goto('/confirm?token=test-token');

  // Verify onboarding wizard appears
  await expect(page.getByRole('heading', {
    name: 'Welcome'
  })).toBeVisible();
  await expect(page.getByText('Step 1 of 3')).toBeVisible();
});

Notice the quality of the generated code. It uses semantic locators (getByLabel, getByRole) rather than brittle CSS selectors. It includes meaningful assertions that verify actual user-visible behavior. It follows Playwright best practices without requiring the author to know them.

The democratization effect is profound. Product managers can describe acceptance criteria in natural language and receive executable tests. Junior engineers can contribute to the test suite from day one. Design teams can specify visual behavior and generate tests that verify their designs are implemented correctly.

This does not eliminate the need for human review. Generated tests should always be reviewed by an engineer who understands the application. But the review-and-refine workflow is dramatically faster than writing tests from scratch, typically reducing test authoring time by 70 to 80%.

6. Self-Healing: Tests That Fix Themselves

The third pillar attacks the most expensive problem in test automation: maintenance. Traditional test suites degrade over time. Every UI change, every component refactor, every design update risks breaking existing tests. Teams that do not invest heavily in maintenance end up with suites where 30% or more of the tests are disabled, skipped, or ignored.

Self-healing testing works differently. Instead of matching elements by a single selector, the framework understands the intent of each test step. When a selector breaks, the framework uses multiple strategies to relocate the target element.

Intent-based locators understand what the test is trying to interact with, not just how to find it in the DOM. If a test step says “click the submit button,” the framework looks for any element that serves as the submission trigger, regardless of whether its class name, ID, or position has changed.

Multi-attribute analysis examines multiple properties simultaneously: the element's role, accessible name, text content, relative position, visual appearance, and surrounding context. This multi-signal approach is far more robust than relying on any single attribute.

Automatic PR fixes close the loop. When the self-healing system detects and resolves a broken selector, it does not just fix the test silently. It creates a pull request with the updated test code, explains what changed and why, and lets your team review the fix before merging. This maintains full visibility and control while eliminating the manual work.

Self-Healing in Action

// Original test (written 3 months ago):
await page.click('#submit-btn');

// After redesign, #submit-btn no longer exists.
// The button is now: <button class="primary-action">Save</button>

// Assrt self-healing detects the break and opens a PR:
//
// PR #247: "fix(tests): update submit button selector"
// - Old: page.click('#submit-btn')
// + New: page.getByRole('button', { name: 'Save' })
//
// Reason: Element #submit-btn not found.
// Matched by role + accessible name with 98% confidence.

The impact on maintenance costs is dramatic. Teams using self-healing testing report 80 to 90% reductions in time spent on test maintenance. That time is redirected to writing new tests, improving coverage, and building features. The test suite becomes an asset that appreciates over time rather than a liability that depreciates.

7. The Open Source Advantage

The AI testing landscape is crowded with proprietary platforms that promise to solve your testing problems in exchange for a substantial monthly fee and complete dependence on their infrastructure. Assrt takes a fundamentally different approach.

Code ownership. Every test generated by Assrt is a standard Playwright TypeScript file that lives in your repository. You can read it, modify it, run it without Assrt, and share it with anyone. If you decide to stop using Assrt tomorrow, your tests continue to work exactly as they did before. There is no proprietary runtime, no cloud dependency, and no vendor lock-in.

Transparency. With open source, you can inspect every line of code that generates, modifies, or executes your tests. There is no black box. If the AI makes a mistake, you can see exactly what happened and why. This transparency is essential for teams that need to trust their testing infrastructure.

Community-driven development. Assrt benefits from contributions by engineers around the world who face the same testing challenges you do. Bug fixes, new features, and improvements are driven by real-world usage, not by a product roadmap designed to maximize subscription revenue.

CapabilityAssrt (Open Source)Proprietary Platforms
CostFree$500 to $5,000+/month
Test ownershipYour repo, your codeLocked in platform
Vendor lock-inNoneHigh
Runs without vendorYes (standard Playwright)No
Self-healingYesVaries
Auto-discoveryYesSome
Source availableFull source on GitHubClosed source

The open source model also means that Assrt works with your existing infrastructure. Run it locally, in GitHub Actions, in GitLab CI, or in any environment that supports Node.js. There is no proprietary agent to install, no cloud dashboard you must log into, and no network traffic leaving your organization unless you choose to send it.

8. What's Next: Agentic Testing

The evolution of E2E testing is far from over. The next frontier is agentic testing: autonomous AI agents that go beyond scripted scenarios to explore, reason about, and test applications the way a skilled human QA engineer would.

Autonomous test agents represent a paradigm shift from “execute these steps” to “verify this behavior.” Instead of scripting a precise sequence of clicks and assertions, you give the agent a goal: “Verify that a new user can complete the onboarding flow and reach the dashboard.” The agent figures out the steps, adapts to UI variations, and verifies the outcome.

Exploratory AI testing takes this further. Agents autonomously explore the application looking for bugs, rather than verifying known scenarios. They try unusual input combinations, navigate unexpected paths, test edge cases that no human thought to specify, and report anomalies. This is the digital equivalent of hiring an infinitely patient exploratory tester who works 24 hours a day.

Predictive test selection uses historical data and code change analysis to determine which tests are most likely to fail for a given code change. Instead of running the entire suite on every commit, the system runs only the tests that matter, reducing CI time by 60 to 80% while maintaining the same defect detection rate.

The 2026 outlook is promising. As language models become more capable and inference costs continue to drop, the economics of AI-powered testing improve rapidly. We are approaching a point where the cost of AI-generated and AI-maintained tests approaches zero, while the quality and coverage surpass what human-only teams can achieve.

The Progression: Selenium to Agentic Testing

// 2012: Selenium (Java)
WebDriver driver = new ChromeDriver();
driver.get("https://app.com/login");
driver.findElement(By.id("email")).sendKeys("user@test.com");
driver.findElement(By.id("pass")).sendKeys("password");
driver.findElement(By.xpath("//button[@type='submit']")).click();
Thread.sleep(3000);
assert driver.getCurrentUrl().contains("/dashboard");

// 2018: Cypress (JavaScript)
cy.visit('/login')
cy.get('#email').type('user@test.com')
cy.get('#pass').type('password')
cy.get('button[type=submit]').click()
cy.url().should('include', '/dashboard')

// 2021: Playwright (TypeScript)
await page.goto('/login');
await page.getByLabel('Email').fill('user@test.com');
await page.getByLabel('Password').fill('password');
await page.getByRole('button', { name: 'Sign in' }).click();
await expect(page).toHaveURL(/dashboard/);

// 2024: Assrt (Natural Language)
assrt generate "User logs in and reaches dashboard"
// Outputs production-ready Playwright code

// 2026: Agentic (Goal-based)
assrt agent verify "Authentication flow works correctly"
// Agent explores, adapts, and reports autonomously

The trajectory is clear: each generation of testing technology reduces the human effort required while increasing the coverage and reliability of the test suite. The teams that adopt these tools early gain a compounding advantage: faster releases, fewer production bugs, and engineering time redirected from maintenance to innovation. The future of E2E testing is not about writing more tests. It is about building systems that test themselves.

Related Guides

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

$npm install @assrt/sdk