Verification and Confidence

The Verification Gap: Why Code Is Free but Software Is Still Expensive

"Code is free now but software is still expensive." That observation from a recent discussion captures something important about the current state of AI-assisted development. The gap between "code that works on my machine" and "code I trust in production" is almost entirely a testing and verification problem. Generating code has become trivially cheap. Knowing that the code actually works remains stubbornly expensive.

$0 → ?

The gap between 'code that works on my machine' and 'code I trust in production' is almost entirely a testing and verification problem.

Developer on the cost of software vs code

1. Why Generating Code Is Now Cheap

Two years ago, writing a CRUD API endpoint took a developer 30 minutes to an hour. Today, an AI coding assistant generates the same endpoint in under a minute. The code is syntactically correct, follows framework conventions, includes basic error handling, and often comes with suggested unit tests. The marginal cost of producing code has collapsed.

This cost collapse extends beyond simple endpoints. AI tools generate entire React components, database migration scripts, API integrations, and configuration files. Teams report generating thousands of lines of code per day with AI assistance. Some companies run batch sessions with hundreds of AI prompts, generating and committing code at a pace that would have required a team of ten developers just three years ago.

But generating code and shipping software are different activities. Software requires that the code works correctly, integrates with existing systems, handles edge cases, performs adequately under load, and continues working as the rest of the codebase evolves. None of these properties are guaranteed by code generation. They require verification, and verification has not gotten cheaper. If anything, it has gotten more expensive because the volume of code that needs verification has exploded.

2. The Verification Gap

The verification gap is the difference between the rate at which code is produced and the rate at which it can be validated. Before AI, these rates were roughly matched. A developer wrote code at the same speed they could reason about it, test it, and verify its correctness. The bottleneck was typing and thinking, and testing happened as a natural part of the development flow.

AI breaks this balance. Code production accelerates by 5x to 10x, but verification remains manual, slow, and cognitively expensive. A developer who generates 500 lines of code in 10 minutes still needs to review those 500 lines, understand them, run them, test the edge cases, and verify integration with existing code. The review process takes roughly the same amount of time whether the code was written by a human or generated by an AI.

This creates a dangerous dynamic. Teams feel productive because they are generating code quickly. But the unverified code accumulates faster than it can be validated. Technical debt, which used to accumulate gradually over months, can now accumulate in days. A week of aggressive AI-assisted development can produce a codebase where nobody fully understands what every module does, how they interact, or whether they handle failure cases correctly.

The verification gap is not a technology problem that AI will solve automatically. It is a fundamental constraint. Generating plausible code requires pattern matching. Verifying correct code requires understanding intent, context, and consequences. These are categorically different tasks, and the second one resists automation in ways the first does not.

Close the verification gap automatically

Assrt generates real Playwright tests from your running app, verifying that AI-generated code actually works in a browser. Not just that it compiles.

Get Started

3. Types of Testing and What Each Catches

Effective verification requires multiple layers of testing, each catching a different category of defect. No single layer is sufficient, and skipping any layer creates a blind spot where bugs hide.

Unit tests verify individual functions and methods in isolation. They catch logic errors, boundary conditions, and type mismatches. They run in milliseconds and give immediate feedback. Their limitation is that they test components in isolation, so they miss bugs that emerge from interactions between components. A function that correctly processes valid input might receive invalid input from its caller, and no unit test catches that.

Integration tests verify that components work together. They test API endpoints with real database connections, service-to-service communication, and middleware chains. They catch configuration errors, serialization mismatches, and contract violations. They are slower than unit tests (seconds instead of milliseconds) and require more infrastructure to run.

End-to-end (E2E) testsverify the application from the user's perspective. They run a real browser, click buttons, fill forms, and verify that the correct content appears on screen. They catch CSS bugs, JavaScript runtime errors, layout issues, and the entire class of problems where individual components work correctly but the assembled application does not. They are the slowest layer (seconds to minutes per test) but catch the bugs that matter most to users.

Visual regression tests go beyond functional correctness to verify that the application looks right. They compare screenshots against known-good baselines and flag visual changes. They catch CSS regressions, font loading failures, responsive layout breakages, and the subtle styling bugs that functional tests miss entirely.

4. The Hidden Cost of Trusting AI-Generated Code

When a developer writes code manually, they build a mental model of how it works as they write it. They understand the edge cases because they thought about them. They know which parts are fragile because they struggled with the implementation. This implicit knowledge has always been a critical (if invisible) part of software quality.

AI-generated code comes without this mental model. The developer who accepts an AI-generated function knows what it is supposed to do but may not understand how it handles edge cases, what assumptions it makes about input, or how it behaves under concurrent access. This is not a failing of the developer. It is a natural consequence of receiving code rather than writing it.

The hidden cost shows up in maintenance. When a bug is reported in AI-generated code, the developer who committed it has to reverse-engineer the AI's approach before they can fix the bug. This debugging process takes longer than it would for hand-written code because the developer is working without the context that normally accompanies authorship. Multiply this across a codebase with thousands of lines of AI-generated code, and maintenance costs can exceed the time saved by generation.

Automated testing mitigates this cost by creating an executable specification. When tests describe what the code should do, a developer can fix bugs by making the tests pass without needing to fully understand the original implementation. The tests themselves become the mental model that AI-generated code lacks.

5. Building a Testing Pipeline That Scales with AI Output

If code generation has accelerated 10x, your testing pipeline needs to keep pace. A manual testing process that worked when your team shipped 100 lines of code per day breaks down when AI helps you ship 1,000. The pipeline needs to be automated, layered, and fast.

Start with automated test generation. Just as AI generates application code, it can generate test code. Tools like Assrt take this a step further by generating tests from the running application rather than from source code. Point it at your app, and it discovers user flows, generates Playwright tests, and outputs standard test files that commit to your repo. This approach catches an important category of bugs that source-level test generation misses: integration and rendering issues that only manifest in a real browser.

Layer your automated tests by speed and scope. Unit tests run on every file save (using watch mode). Integration tests run on every commit. E2E browser tests run on every PR and every deploy. Visual regression tests run on PRs that touch frontend code. Each layer adds confidence, and each layer runs only when the previous layer passes.

Implement test generation as part of your AI coding workflow, not as a separate step. When an AI generates a new component, the test for that component should be generated in the same session. When a batch of AI-generated code is committed, the corresponding tests should be committed alongside it. If generating tests is a separate, deferred activity, it will eventually be skipped under time pressure.

6. Verification as the New Competitive Advantage

When everyone has access to the same AI code generation tools, the ability to generate code stops being a differentiator. Every team can build features quickly. The teams that win are the ones that ship features reliably, with confidence that they work correctly and will not break existing functionality.

This shifts the competitive advantage from code production to code verification. The team with a comprehensive, fast, automated testing pipeline deploys ten times a day with confidence. The team without one deploys once a week and still has incidents. Both teams generate code at the same speed. The difference is entirely in their ability to trust what they have generated.

The original observation was right: code is free now. But software, the reliable, tested, production-grade artifact that customers depend on, still requires investment. That investment is increasingly in verification infrastructure: testing pipelines, CI/CD systems, monitoring, and the automated tools that bridge the gap between generated code and trusted software. The teams that build this infrastructure now will compound their advantage as AI-generated code volumes continue to grow. The teams that skip it will drown in their own output.

Make verification scale with your code generation

Assrt generates Playwright tests from your running application automatically. Close the verification gap without slowing down your AI-assisted development.

$Open-source. Real Playwright code. Free to start.