Engineering Bottlenecks
Why AI Is Making Software Testing the Biggest Bottleneck in Engineering
AI coding tools have supercharged code generation. But verification has not kept pace. The result is a growing backlog of untested code, rising defect rates, and QA teams that cannot scale fast enough to match the output of AI-assisted developers.
“Engineering teams report that testing and QA now consume 40% or more of total development cycle time, up from 25% before widespread AI coding tool adoption.”
State of Testing Report, 2025
1. More Code, Same QA Capacity
The premise of the “AI is killing software engineering” debate misses the real story. AI is not reducing the need for engineers. It is shifting where the constraint sits. Before AI coding assistants became mainstream, the bottleneck was typically writing code: translating requirements into working implementations. Developers spent most of their time typing, debugging syntax, and wiring together boilerplate. Testing was a secondary concern, often squeezed into whatever time remained before a deadline.
Now the equation has flipped. Tools like Cursor, Claude Code, and GitHub Copilot can produce feature implementations in minutes that used to take hours or days. The generation side of the pipeline has been turbocharged. But the verification side has not changed. The same three QA engineers who struggled to keep up when developers shipped ten pull requests per week are now drowning under fifty. The same manual test plans that barely covered critical paths before are now hopelessly inadequate for the volume of new code landing every sprint.
This is why there is “more code than ever.” AI did not replace engineers; it amplified their output. And that amplification exposed the weakest link in the software delivery chain: the ability to verify that all this new code actually works correctly. Every team that has adopted AI coding tools aggressively has discovered the same thing. Code velocity went up. Quality did not follow. The testing bottleneck, once a minor annoyance, became the critical constraint on delivery.
2. Why Verification Is Fundamentally Harder Than Generation
There is a reason code generation responded to AI faster than testing did. Generation is a convergent problem: given a specification, produce one implementation that satisfies it. There are many possible correct implementations, and the AI just needs to find one. Verification is a divergent problem: given an implementation, enumerate all the ways it could fail and confirm none of them happen. The space of possible failures is combinatorially larger than the space of correct implementations.
Consider a checkout flow with ten form fields, three payment methods, two shipping options, and a discount code input. A developer (or AI) writes one implementation of this flow. Testing it thoroughly requires checking valid submissions for every combination, invalid inputs for each field, partial form completions, network failures at each step, concurrent modifications to the cart, session expiration during the flow, and browser-specific rendering issues. The number of meaningful test scenarios grows exponentially with each new feature interaction.
AI coding tools also create a specific verification challenge: the code they produce often works but contains subtle assumptions that differ from what the team expects. The implementation passes basic smoke tests because it does something reasonable. But “reasonable” and “correct per the business requirements” are not always the same thing. Catching these semantic mismatches requires understanding the intent behind the code, not just its behavior with happy-path inputs. This is why throwing more AI at the testing problem without structural changes to the testing approach does not work. The same models that generate plausible-looking code also generate plausible-looking tests that miss the same blind spots.
Your code velocity doubled. Did your test coverage?
Assrt auto-discovers test scenarios by crawling your running app, then generates real Playwright tests. No YAML configs, no proprietary DSLs. Just real test code you own.
Get Started →3. The QA Scaling Problem
Before AI coding tools, a rough industry standard was one QA engineer for every four to six developers. That ratio assumed developers produced code at a relatively predictable human rate. AI has effectively multiplied each developer's output by three to five times, which means the effective developer-to-QA ratio has ballooned to something like 15:1 or 20:1 in terms of code volume per QA person.
Hiring more QA engineers is not a realistic solution. The labor market for experienced QA engineers was already tight before this shift, and the problem is not just headcount. Manual testing fundamentally cannot scale linearly with code volume. Every new feature adds test cases to maintain. Every UI change invalidates existing test scripts. Every integration point creates new combinations to verify. The maintenance burden of manual test suites grows superlinearly with application complexity.
Teams have tried various approaches to cope. Some have shifted testing responsibility entirely to developers (“you build it, you test it”), which reduces the QA bottleneck at the cost of test quality since developers tend to test their own assumptions. Some have cut testing scope to only cover critical paths, accepting that less important features ship with minimal verification. Some have invested heavily in automated test suites, only to find that the maintenance cost of those suites creates its own bottleneck. None of these approaches solve the fundamental problem: testing capacity needs to scale with code generation capacity, and traditional methods cannot achieve that scaling.
4. Manual Test Automation at Its Breaking Point
The conventional answer to testing bottlenecks has been test automation: write Playwright, Cypress, or Selenium scripts that exercise the application automatically. This works, but writing and maintaining these scripts is itself a significant engineering effort. A single end-to-end test for a multi-step flow can take hours to write correctly, and it breaks every time a selector changes, a page layout shifts, or a backend response format updates.
The maintenance problem is severe. Industry data suggests that teams spend 30% to 50% of their test automation effort on maintaining existing tests rather than writing new ones. When AI coding tools accelerate the pace of UI and backend changes, that maintenance burden intensifies. Selectors that worked last week break because the AI-assisted developer restructured a component. API response schemas change because the backend was refactored with AI assistance. The test suite falls behind, becomes unreliable, and eventually gets ignored.
This is the cycle that creates learned helplessness around testing. Teams invest in automation, the automation becomes expensive to maintain, tests become flaky, developers lose trust in the suite, and the team gradually reverts to manual spot- checking and hoping for the best. The answer is not to abandon automation. The answer is to automate the automation itself: use AI to generate, maintain, and adapt tests at the same pace that AI generates and modifies code.
5. How AI Can Help Close the Testing Gap
The same AI capabilities that accelerated code generation can be directed at test generation and maintenance. Several categories of tools are emerging to address different parts of the testing bottleneck.
AI test discovery and generation. Tools in this category crawl your running application, identify testable user flows, and generate test code automatically. Assrt takes this approach: you run npx @m13v/assrt discover https://your-app.com and it explores your application, identifies test scenarios (including edge cases like form validation failures and navigation dead ends), and generates real Playwright test files. The output is standard Playwright code that you can review, edit, and commit. It is open-source, free, and produces no vendor lock-in since the generated tests are just TypeScript files.
Self-healing test selectors. One of the biggest maintenance costs in test automation is fixing broken selectors. When a developer renames a CSS class or restructures a component, every test that targets those elements breaks. AI-powered selector strategies can adapt by using multiple identification methods (text content, ARIA roles, structural position, visual appearance) and falling back gracefully when one method breaks. Assrt uses this approach to generate selectors that survive typical UI refactors without requiring manual updates.
Managed QA services. For teams that want to outsource the testing bottleneck entirely, services like QA Wolf offer fully managed test automation with human QA engineers augmented by AI tools. This approach solves the scaling problem by making testing someone else's job, but it comes with significant cost (typically $7,500 per month or more) and the usual trade-offs of closed-source vendor dependency. You get tests that work, but you do not own or control the testing infrastructure.
Proprietary test platforms. Tools like Momentic offer AI-assisted test creation through proprietary interfaces and YAML-based test definitions. These can reduce the initial effort of writing tests, but they introduce platform dependency. Your tests are described in the vendor's format rather than standard code, browser support may be limited (Chrome-only in some cases), and migrating away means rewriting everything. For teams that prioritize speed of setup over long-term flexibility, these tools can be a reasonable starting point.
6. Choosing the Right Approach for Your Team
The right solution depends on your team's size, budget, and technical preferences. Small teams and startups that need to move fast without adding headcount benefit most from AI test generation tools that produce standard test code. You get the coverage without the ongoing vendor cost, and the generated tests integrate into your existing CI pipeline without special infrastructure.
Larger teams with dedicated QA staff may benefit from a layered approach: AI-generated tests for broad coverage of standard flows, combined with manually written tests for business-critical edge cases that require domain knowledge. The AI handles the repetitive work (testing that every form validates correctly, every navigation link works, every error state renders properly) while human QA engineers focus on the creative work of imagining failure scenarios the AI would not think of.
Whatever approach you choose, the key principle is the same: testing capacity must scale with code generation capacity. If your team adopted AI coding tools six months ago and your testing process has not changed, you are accumulating technical debt in the form of untested code. That debt compounds. Every untested feature becomes a potential production incident. Every production incident erodes user trust and consumes engineering time that could be spent building new features.
The teams that will thrive in the AI-augmented engineering era are the ones that treat testing as a first-class engineering concern, not an afterthought. They invest in test infrastructure with the same urgency they invest in development tooling. They measure test coverage not just by lines of code touched but by user journeys verified. And they use AI on both sides of the equation: to generate code faster and to verify it faster, keeping the two in balance so that velocity and quality scale together.