AI in Testing

How AI Is Automating the Repetitive Parts of Software Testing

Across every role, AI is absorbing roughly 30% of repetitive work. In QA, that means test script generation, selector maintenance, and regression execution are shifting from manual effort to automated pipelines. Here is what that looks like in practice.

0

Generates standard Playwright files you can inspect, modify, and run in any CI pipeline.

Assrt SDK

1. The 30% Rule: Which Testing Tasks Are Repetitive

A recurring theme in discussions about AI in the workplace is that roughly 30% of tasks in any given role are repetitive enough to automate. In software testing, this maps cleanly to specific activities: writing boilerplate test scripts for standard user flows, updating CSS selectors after UI changes, re-running the same regression suite after every deploy, and triaging test results that are usually green.

These tasks share a common trait. They require knowledge of the application's structure but not deep judgment about what should be tested or why. A tester who spends two hours updating selectors after a frontend refactor is doing work that follows predictable rules: find the element that moved, determine its new location, update the locator. That is exactly the kind of pattern recognition AI handles well.

The remaining 70% is where human testers add irreplaceable value. Exploratory testing, risk assessment, understanding user intent, designing test strategies for new features, and recognizing when a passing test is actually testing the wrong thing. These require contextual reasoning that current AI models cannot reliably perform. The goal is not to replace testers but to redirect their time toward high-judgment work.

2. AI for Test Script Generation

The most visible application of AI in testing today is automatic test script generation. Several approaches exist. LLM-based code generation tools (GitHub Copilot, Cursor, Codeium) can write Playwright or Cypress tests from natural language descriptions. You describe the flow you want to test, and the model generates executable test code.

A different approach is crawl-based test discovery, where a tool navigates your running application, identifies interactive elements and user paths, and generates test code based on what it finds. This has the advantage of testing what actually exists rather than what someone remembered to describe. Tools like Assrt use this approach, crawling your app and producing standard Playwright test files that you can inspect, modify, and commit to your repository.

Record-and-playback tools like Playwright Codegen and Cypress Studio offer a middle ground. You perform actions in the browser, and the tool generates test code from your interactions. This works well for straightforward flows but tends to produce brittle tests that break when the UI changes, which brings us to the next challenge.

Try Assrt for free

Open-source AI testing framework. No signup required.

Get Started

3. Automated Selector Maintenance and Self-Healing

Selector maintenance is the single largest time sink in end-to-end test automation. Every time a developer changes a class name, restructures a component, or updates a design system, tests break. Not because the application is broken, but because the test is looking for an element that moved. Studies from teams with large test suites consistently show that 40% to 60% of test maintenance effort goes to fixing selectors.

AI-powered self-healing selectors address this by using multiple identification strategies simultaneously. Instead of relying on a single CSS selector, a self-healing system considers the element's text content, ARIA attributes, position relative to other elements, visual appearance, and DOM structure. When the primary selector fails, the system tries alternative strategies to find the same logical element.

Several tools implement this pattern, including Healenium (an open-source Selenium extension), Testim, and Assrt. The practical result is that routine UI refactors no longer cause cascading test failures. Tests continue to pass as long as the element still exists on the page, regardless of how its technical attributes changed. This eliminates a significant portion of that repetitive 30%.

4. Regression Suites That Run Themselves

Modern CI/CD pipelines already run regression suites automatically on every push, but the suite itself still requires manual curation. Someone has to decide which tests to include, maintain the test data, handle environment configuration, and review results. AI is starting to automate parts of this curation process.

Smart test selection tools (like Launchable and Buildkite Test Analytics) use historical data to predict which tests are most likely to fail for a given code change. Instead of running 2,000 tests on every PR, they run the 200 most relevant ones, cutting pipeline time by 80% or more while maintaining the same defect detection rate.

Automatic test result triage is another area where AI adds value. When a test fails, an AI system can classify the failure as a genuine bug, a flaky test, an environment issue, or a test that needs updating. This classification, which used to require a human to investigate each failure, can now happen automatically for the majority of cases. The human reviewer only needs to look at the genuinely ambiguous ones.

5. Where Humans Still Matter Most

The most important thing AI cannot automate in testing is deciding what matters. A test suite with 100% code coverage can still miss critical bugs if the tests are checking the wrong things. Choosing what to test, how to prioritize risk, and when to invest in deeper testing requires understanding the business context, the user base, and the failure modes that actually cost money or trust.

Exploratory testing is another irreducibly human activity. When a skilled tester uses an application with the intent of finding problems, they draw on intuition, domain knowledge, and a mental model of what "feels wrong." They notice that a loading state flickers oddly, that a form field allows negative quantities, or that a race condition is theoretically possible when two users edit the same record. AI can eventually be taught to check for known patterns, but it cannot replicate the creative skepticism of a good tester.

Communication is the third pillar. The value of a test failure depends entirely on how well it is communicated to the developer who needs to fix it. A tester who can explain why a bug matters, how it affects users, and how to reproduce it reliably is far more valuable than one who simply files "button does not work" tickets. AI can generate bug report templates, but contextualizing failures for a specific team remains a human skill.

The path forward is clear: let AI handle the mechanical repetition (writing scripts, fixing selectors, running regressions, triaging results) and invest human time in strategy, exploration, and communication. Teams that make this shift effectively will find that their testers become more impactful, not less relevant.

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

$npm install @assrt/sdk