Testing Strategy
Smart Test Selection: Why Fewer Tests Can Mean Better Coverage
Most test suites are bloated. A small percentage of tests catch the vast majority of real bugs. Here is how to find which tests actually matter and get leadership on board with deleting the rest.
“Analysis of 12 enterprise test suites found that roughly 5% of tests detected 95% of real production incidents over a 12-month period.”
Google Testing Blog, Launchable case studies
1. The Problem with Bloated Test Suites
Every mature codebase has the same problem: a test suite that grew organically over years, accumulating tests that no one dares to touch. Some were written for features that no longer exist. Others test the same behavior three different ways because three different engineers wrote them independently. A significant portion are flaky, passing most of the time but occasionally failing for reasons unrelated to actual code defects.
The cost of this bloat is real and measurable. CI pipelines that should take 10 minutes take 45. Engineers start ignoring test failures because they assume flakiness. Merge queues back up, slowing the entire team. Infrastructure costs balloon as you provision more parallel runners to maintain acceptable feedback times. And paradoxically, the team’s confidence in the test suite decreases as its size increases, because nobody understands what all those tests actually cover.
The counterintuitive insight is that running fewer, better-selected tests can actually improve your defect detection rate. A focused suite of high-signal tests that runs quickly and reliably provides more value than an enormous suite that takes an hour and produces noise. The challenge is figuring out which tests belong in that focused set.
2. Changeset Analysis for Targeted Test Runs
The core idea behind changeset analysis is simple: if a pull request only modifies the checkout flow, you do not need to run tests for the user profile page. By analyzing which files changed and mapping those changes to the tests that exercise the affected code paths, you can dramatically reduce the number of tests that need to run on each commit.
There are several approaches to building this mapping. Static analysis tools can trace import graphs and dependency chains to determine which tests depend on which source files. Runtime instrumentation (code coverage collected during test execution) can build precise maps of which tests exercise which lines of code. Hybrid approaches combine both, using static analysis for speed and runtime data for accuracy.
Tools like Launchable, Buildkite Test Analytics, and Jest’s built-in changed-file detection all implement variations of this approach. For Playwright-based E2E suites, the mapping is trickier because tests interact with the application through the browser rather than importing source files directly. Some teams solve this by tagging tests with the features they cover, then using a mapping file that connects source directories to feature tags.
The results can be dramatic. Teams that implement changeset-based test selection typically see 60 to 80% reductions in CI test execution time for average pull requests. The full suite still runs on merges to main or on a nightly schedule, but the per-commit feedback loop becomes fast enough that engineers actually wait for results before merging.
3. Impact-Based Test Prioritization
Not all tests carry equal weight. A test that verifies your payment processing flow is more important than one that checks whether a tooltip appears on hover. Impact-based prioritization assigns a risk score to each test based on factors like the business criticality of the feature it covers, the historical frequency of bugs in that area, and the blast radius of a failure.
Building an impact model requires combining several data sources. Start with production analytics: which features do users interact with most? Layer in incident history: which parts of the codebase have produced the most production bugs? Add code churn data: files that change frequently are more likely to introduce regressions. Finally, factor in revenue impact: a bug in the signup funnel has different consequences than a bug in an admin settings panel.
Once you have a risk model, you can use it to order test execution. High-impact tests run first, providing early signal on the areas that matter most. If you need to fail fast (say, in a pre-merge check with a time budget), you can run only the top-priority tier and defer lower-priority tests to a post-merge or nightly run.
Tools like Assrt can help here by automatically discovering and generating tests for critical user flows based on actual application structure. Combined with manual risk scoring for business logic, this gives you a layered approach where AI handles coverage breadth and human judgment handles prioritization depth.
4. Getting Leadership Buy-In to Delete Tests
The hardest part of smart test selection is often not the technical implementation. It is convincing stakeholders that deleting tests is a good idea. “More tests equals more quality” is deeply ingrained in engineering culture, and suggesting the opposite feels risky.
The most effective approach is to lead with data. Start by measuring the current state: how long does the full suite take? What is the flakiness rate? How many tests have not caught a real bug in the past year? How much are you spending on CI infrastructure? These numbers make the cost of bloat concrete and visible.
Next, run a pilot. Pick one team or one service, implement changeset-based selection for two weeks, and measure the results. Track CI time, developer satisfaction (via a quick survey), merge frequency, and most importantly, escaped defects. If the defect rate stays flat while CI time drops by 70%, you have a compelling case to present to engineering leadership.
Frame the conversation around risk management rather than test deletion. You are not removing safety nets. You are replacing a blunt instrument (run everything, every time) with a precision tool (run what matters, when it matters). The full suite still exists and still runs; it just does not gate every single merge.
For tests that genuinely should be retired, propose a quarantine process rather than immediate deletion. Move low-value tests to a separate suite that runs weekly instead of per-commit. If nothing breaks after a quarter, delete them permanently. This gives nervous stakeholders a safety valve while still delivering the efficiency gains.
5. Implementing Smart Selection in Practice
A practical implementation plan starts with instrumentation. Collect coverage data during test runs and store it somewhere queryable. This does not need to be fancy; a JSON file in your CI artifacts that maps test names to touched files is a viable starting point.
Build a selection script that takes a list of changed files (from your git diff) and outputs the set of tests that need to run. Start conservatively by including any test that has ever touched any changed file. Over time, refine the mapping to be more precise, using recent coverage data rather than historical maximums.
Integrate the selection script into your CI pipeline as an optional step. Run both the selected subset and the full suite in parallel for a few weeks, comparing results. This shadow mode lets you validate that the selection algorithm is not missing important tests before you trust it to gate merges.
For E2E test suites built on Playwright, consider using test tags or annotations to categorize tests by feature area. This makes the mapping from changed files to relevant tests more explicit and easier to maintain. Some teams also use tools like Assrt to periodically regenerate their E2E suite, ensuring that test coverage stays aligned with the current state of the application rather than drifting as features evolve.
The goal is not perfection on day one. It is building a system that gets smarter over time, running fewer tests while catching the same (or more) real defects. Teams that commit to this approach consistently report that their CI becomes faster, their engineers become happier, and their defect escape rate stays flat or improves.