From 6 Hours to 60 Minutes: How to Speed Up a Slow Test Suite
A slow test suite kills developer productivity, delays releases, and trains your team to ignore CI results. Here is how to systematically cut execution time without sacrificing coverage.
“Generates standard Playwright files you can inspect, modify, and run in any CI pipeline.”
Open-source test automation
1. Classify Your Tests Before Optimizing
The most common mistake teams make when facing a slow test suite is jumping straight to parallelization. Before you throw hardware at the problem, you need to understand what you are actually running. A team at a mid-stage SaaS company recently audited their 6-hour suite and found that 40% of their end-to-end tests were verifying logic that could be covered by unit or integration tests. The remaining 60% included redundant scenarios that tested the same user flow through slightly different entry points.
Start by tagging every test with its type: unit, integration, API, or end-to-end. Then measure execution time per tag. You will almost certainly find that a small percentage of your tests account for the majority of your runtime. In most codebases, E2E tests make up 10 to 15 percent of total test count but consume 70 to 80 percent of execution time. This classification step alone reveals the highest-leverage optimization targets.
Once you have this map, you can make informed decisions. Some E2E tests should be deleted because they duplicate coverage. Others should be rewritten at a lower level. And the ones that genuinely need a browser should be optimized for speed with proper parallelization and resource management.
2. Move UI Validations Down to the API Level
Every test that launches a browser to verify business logic is paying a massive overhead tax. Browser startup, page rendering, DOM interaction, and screenshot capture all add seconds per test step. When you multiply that across hundreds of tests, the cost becomes enormous. A single API call that verifies the same backend behavior typically runs in 50 to 200 milliseconds, compared to 5 to 30 seconds for the equivalent browser-based test.
Consider a checkout flow test. The E2E version navigates to the product page, adds an item to cart, fills in shipping details, enters payment information, and confirms the order. That is 15 to 20 seconds minimum. The API-level equivalent sends a POST request with the cart payload and asserts on the response body and status code. It runs in under a second and covers the same business logic.
The rule of thumb is: if the test is verifying data transformation, validation rules, or business logic, it belongs at the API or integration level. Reserve E2E browser tests for scenarios where the UI interaction itself is what you are testing, such as drag-and-drop, complex form wizards, or visual rendering.
Stop writing tests manually
Assrt auto-discovers scenarios and generates real Playwright code. Open-source, free.
Get Started →3. Parallelize with Isolation, Not Just More Machines
Parallelization is the most impactful single change you can make, but it only works when your tests are properly isolated. If tests share database state, file system resources, or authentication sessions, running them in parallel will introduce flakiness that erodes the time savings. Before scaling horizontally, ensure each test creates and tears down its own data, uses unique user accounts, and operates on independent browser contexts.
Playwright supports sharding natively, splitting your test suite across multiple workers or CI machines with a single configuration change. Combined with a CI provider that supports matrix builds (GitHub Actions, GitLab CI, or CircleCI), you can distribute 200 tests across 10 machines and reduce wall-clock time from 60 minutes to 6. The key metric to track is not just total execution time but the standard deviation between shard durations. If one shard takes 8 minutes while others finish in 4, you need to rebalance test distribution.
Tools like Assrt can help here by generating tests that are isolated by default, using independent browser contexts and unique test data per scenario. This makes parallelization straightforward because there are no shared state conflicts to debug.
4. Smart Test Selection Per Pull Request
Running your entire test suite on every pull request is wasteful. If a developer changes a CSS file in the settings page, there is no reason to run the checkout flow tests. Smart test selection (sometimes called test impact analysis) maps code changes to the tests that exercise those code paths and runs only the relevant subset.
There are several approaches to implementing this. The simplest is directory-based filtering: changes in src/features/billing/ trigger tests tagged with @billing. More sophisticated approaches use code coverage data from previous runs to build a dependency graph between source files and test files. Tools like Launchable and Develocity offer this capability, typically reducing per-PR test execution by 60 to 80 percent.
The tradeoff is risk. Smart selection can miss tests that should have run, especially when changes affect shared utilities or configuration files. Mitigate this by always running the full suite on merges to main and by including a "blast radius" rule that triggers the complete suite when changes touch core modules like authentication, database schemas, or API middleware.
5. Reserve Full Suite Runs for Nightly Builds
The combination of API-level testing, parallelization, and smart selection should bring your per-PR feedback loop under 15 minutes. But you still need the confidence that comes from running everything. The solution is to schedule full suite runs as nightly builds. These run outside of developer working hours, use all available CI capacity, and produce a comprehensive report that the team reviews each morning.
Nightly builds serve as a safety net. They catch the edge cases that smart selection might miss, verify cross-feature interactions, and provide a historical trend of test health. When a nightly build fails, the team investigates before the next development cycle begins. This is far more efficient than blocking every pull request with a 6-hour pipeline.
Real numbers from teams that have adopted this approach: per-PR feedback drops from hours to 8 to 12 minutes, developer satisfaction with CI increases measurably, and the total number of deployments per week rises by 30 to 50 percent. The nightly suite catches roughly 2 to 5 percent of issues that would have been missed by smart selection alone, which is an acceptable tradeoff for the massive improvement in daily velocity.
Ready to automate your testing?
Assrt discovers test scenarios, writes Playwright tests, and self-heals when your UI changes.