Testing Guide

Code Is Free, Testing Is Not: The Real Cost of Software Verification

There is a discussion that happens regularly in developer communities when someone shares that they used AI to build an entire app in a weekend: “code is cheap now.” This is technically true and practically misleading. The code is the smallest cost in modern software development. The cost that has not changed is the cost of knowing whether the code actually works, at scale, for real users, in production. That cost is verification.

The gap

Between code that works on my machine and code I trust in production is almost entirely a testing and verification problem. AI made code cheap. It did not make confidence cheap.

1. Code Is Cheap: What This Actually Means

When developers on r/AppDevelopers argue about whether software is expensive, the disagreement is usually about what the word “software” means. If software means the code that implements a feature, then yes, code is cheap now. An AI assistant can produce the implementation of a modal dialog, a payment flow, or a data export function in minutes. The cost of generating code has dropped by roughly an order of magnitude in two years.

But if software means a working product that you can confidently put in front of users and know it will behave correctly, code is only a small fraction of the cost. The rest of the cost is in the verification, the infrastructure for knowing whether the code does what it is supposed to do for real users under real conditions.

This distinction matters because the economics of the two things have changed very differently. Code generation got cheap. Verification infrastructure did not. The tools exist, many of them are free or nearly free, but the time and knowledge required to use them effectively has not been automated away. This creates a situation where developers can build more code faster than they can verify it.

The practical consequence is that codebases are growing faster than their test coverage. AI-assisted development makes it easy to add features. Adding features without adding tests increases the surface area that can break without anyone knowing. At some point, the accumulated unverified code becomes a liability that slows development down, because every change might break something that was never tested, and finding out which thing broke requires manual investigation.

2. The Verification Gap AI-Generated Code Makes Worse

AI-generated code has specific properties that make verification more important, not less. Understanding these properties explains why “the AI tested it” is not a substitute for actual test infrastructure.

First, AI-generated code is confidently wrong in ways that hand-written code typically is not. A developer who writes code and makes a mistake usually has some awareness of the uncertain areas where bugs might hide. They know which edge cases they did not think about. AI generates complete-looking implementations that handle the obvious cases correctly and fail on less obvious cases with no warning signal. The code looks finished because it is syntactically and structurally complete.

Second, AI-generated code often relies on patterns from training data that are technically correct but contextually wrong for a specific codebase. The code follows conventions that are valid in some contexts but incorrect given your specific dependencies, your specific database schema, or your specific business rules. These errors do not show up in static analysis. They show up when real users encounter the edge cases that the AI-generated code does not handle.

Third, AI generates code faster than a human can review it for correctness. When a developer accepts AI-generated code without fully understanding it, they have added code they cannot reason about confidently. If something breaks, debugging AI-generated code they do not fully understand is more expensive than debugging code they wrote themselves. The short-term velocity gain can become a long-term debugging cost.

None of this means AI-generated code is bad. It means that the gap between “AI generated this code” and “I trust this code in production” is at least as wide as the gap for hand-written code, and in some cases wider. Verification is what closes that gap.

Close the verification gap without weeks of test writing

Assrt discovers your application flows and generates real Playwright tests automatically. Generates standard Playwright files you can inspect, modify, and run in any CI pipeline.

Get Started

3. The Hidden Costs of Verification Infrastructure

When developers discuss the cost of software, they typically account for the cost of building features but not the full cost of verification. The hidden costs add up in ways that are not obvious until you are in the middle of them.

The cost of setting up a test environment. Running automated tests requires a test database, a way to start your application in test mode, environment variables for test credentials, and CI configuration that orchestrates all of this. For a solo developer, setting this up correctly the first time can take a week. Maintaining it as dependencies change adds ongoing cost.

The cost of writing tests. For a codebase of moderate size, comprehensive test coverage requires tests totaling roughly 30 to 60 percent of the lines of code in the implementation. If AI generates the implementation at high speed, tests cannot be generated at the same speed without tools specifically designed for test generation. The asymmetry means codebases grow faster than their tests can keep up.

The cost of maintaining tests. Tests break when code changes. Maintaining a test suite for a product under active development requires updating tests whenever the code they test changes in ways that affect the expected behavior. For a fast-moving product, this maintenance cost can be 20 to 40 percent of the total engineering time spent on any given feature.

The cost of investigating failures. When a test fails in CI, someone has to investigate why. Is it a real bug, a flaky test, a test that is testing the wrong thing, or a change in expected behavior? This investigation takes time that could be spent building. Without good test infrastructure, failures are frequent enough to consume a meaningful fraction of engineering capacity.

4. The Confidence Problem: Why Trust in Code Is Expensive to Build

Confidence in code is not binary. It is a spectrum from “I think this probably works” to “I have verified that this works for these specific inputs under these specific conditions.” The cost of moving from the first state to the second is the verification cost.

For app developers shipping to real users, confidence matters in concrete terms. Low confidence means more production incidents. More production incidents mean more time debugging, more user complaints, more reputation damage, and more time fixing bugs instead of shipping features. High confidence means fewer incidents, faster shipping, and less time in crisis mode.

The confidence problem is particularly acute for indie developers and small teams. Enterprise companies employ QA teams, DevOps engineers, and site reliability engineers specifically to build and maintain the verification infrastructure that produces confidence. Indie developers either replicate this infrastructure solo, accept lower confidence and higher incident rates, or find tools that reduce the cost of building confidence.

Browser testing specifically addresses confidence in the most user-visible layer of the application. A browser test that verifies your checkout flow works end-to-end, with a real browser, real HTTP requests, and real form submissions, provides stronger confidence than unit tests that verify individual functions in isolation. It is also the layer where failures are most costly, because a broken checkout flow directly prevents revenue.

5. Reducing Verification Cost Without Reducing Confidence

The goal is not to accept lower confidence. It is to get high confidence at a cost that is proportionate to the size and stage of your application. For an early-stage app with a few hundred users, full enterprise-grade QA infrastructure is overkill. For an app with tens of thousands of users and real revenue, skipping verification to move fast is a false economy.

Automated test generation is the highest-leverage tool for reducing verification cost without reducing confidence. Instead of spending weeks writing tests from scratch, tools like Assrt crawl your deployed application, identify the flows users actually take, and generate Playwright tests for those flows automatically. The output is real, inspectable Playwright code that lives in your repository and runs in CI. The time to meaningful coverage drops from weeks to hours.

The principle behind this approach is that verification should match the reality of how users interact with your application, not the theoretical coverage of your code. A test that verifies a user can complete checkout provides more confidence for most applications than 20 unit tests that verify individual functions in the checkout module, because it tests the integration of all those functions together in a real browser.

Prioritization is the other major cost reduction lever. You do not need to verify everything equally. Identify the flows where failures have the highest cost (checkout, authentication, data export, any flow that touches user data or billing) and invest in deep verification for those. Lighter coverage for flows where failures are recoverable and low-cost. This risk-based approach provides the confidence that matters most at a fraction of the cost of uniform coverage.

6. A Practical Verification Strategy for Indie App Developers

If you are an indie developer or a small team who has been building fast with AI assistance and now wants to add meaningful verification without stopping to write tests for weeks, here is a practical approach.

Start with your most critical flows. Map out the two or three user flows that, if broken, would immediately impact revenue or user trust. For most apps, this is authentication (users can sign up and log in), the core value delivery flow (whatever your app does for users), and billing if you charge money. These are the flows where verification delivers the most value per test.

Use automated test generation for the initial suite. Running a tool that crawls your application and generates tests for what it finds (Playwright Codegen, Assrt, or similar options depending on your stack) produces an initial coverage baseline in hours. Review the generated tests for the critical flows and add assertions that capture the specific behaviors you care about most. The generated tests give you coverage. Your review adds the assertions that make that coverage meaningful.

Run tests in CI on every push to main. The point of tests is to catch regressions before they reach users. A test suite that runs only locally, only manually, provides much weaker protection than one that runs automatically on every change. Setting up a basic GitHub Actions workflow to run Playwright tests after deployment takes an afternoon and provides protection indefinitely.

Add tests incrementally when bugs occur. When a production bug reaches users, write a test that would have caught it before fixing the bug. This builds a suite that reflects your actual failure history and provides the strongest protection against the kinds of bugs your application actually encounters. Over time, this produces a test suite that is both comprehensive and directly connected to real user impact, which is exactly what good verification infrastructure should be.

Verification that matches the speed of AI-assisted development

Assrt generates real Playwright tests from your running application. Get meaningful coverage in hours, not weeks.

$Generates standard Playwright files you can inspect, modify, and run in any CI pipeline.