Testing Guide

Testing AI-Generated and Vibe-Coded Apps: A Practical E2E Coverage Guide

Name: Assrt
Availability: InStock
Author: Assrt

Vibe coding ships apps fast. What it does not ship is confidence. When you generate code with AI, you inherit all the systematic blind spots of the model that generated it. Happy paths work. Error handling is often placeholder-quality. Edge cases that the model did not consider during generation simply do not exist. Auditing a vibe-coded app with E2E tests is one of the most useful things you can do, both to document what the app actually does and to build the regression safety net that makes future changes safe.

$0/mo

“Generates real Playwright code, not proprietary YAML. Open-source and free vs $7.5K/mo competitors.”

Assrt vs competitors

1. Unique Risks of AI-Generated Code

AI-generated code is not bad code. It is often quite good at the happy path. The problem is that it has systematic blind spots that are different from, and sometimes harder to detect than, the blind spots in human-written code. Understanding these blind spots is the first step toward testing them effectively.

Happy path bias

Language models are trained on code that works. The happy path is heavily represented in training data. Error conditions, partial failures, and unusual state combinations are much less common. The result is AI-generated code that handles the main flow well but handles failure modes poorly. You will see things like: the success case is beautifully implemented, and the error case shows an empty catch block or a generic "something went wrong" message with no logging and no recovery logic.

Edge case gaps

When you vibe-code a feature, you typically describe it in terms of the normal scenario. The model generates code for that scenario. Edge cases that were not in the prompt are not in the code. What happens when a user submits an empty form? What happens when a network request times out halfway through a multi-step process? What happens when two users edit the same record simultaneously? These are the questions that require deliberate thought to include in a prompt, and they are the questions that most vibe coding sessions never ask.

Race conditions

Race conditions are notoriously difficult to reason about, and AI-generated code is particularly prone to them in async contexts. The model generates code that works in a sequential mental model but fails when two operations complete out of order in real async JavaScript. State updates that assume a particular ordering, event handlers that fire before initialization completes, and optimistic UI updates that do not handle the case where the server returns an error after the local state has already been updated are all common patterns in vibe-coded apps.

Inconsistent validation

AI-generated code often validates inputs differently across different parts of the app. The login form validates email format strictly. The profile update form does not. The API endpoint that both forms call does neither. This inconsistency happens because each piece of code was generated in a separate context, and the model had no way to know that the validation logic should be centralized. Users who discover these inconsistencies will exploit them, intentionally or not.

Invisible state leakage

Vibe-coded apps frequently have state management that works for a single user session but breaks across sessions, across browser tabs, or when a user is logged out and back in. Global state that holds onto stale data from a previous user session. Cached API responses that do not invalidate when the underlying data changes. These bugs are invisible in a fresh browser session and appear reliably in the messier sessions real users actually have.

2. How to Audit a Vibe-Coded App Systematically

The most important thing to know about auditing a vibe-coded app is that you often do not need to understand the codebase deeply to find the problems. The bugs reveal themselves through behavior, not through code inspection. Start from the outside.

Map the user flows first

Before writing a single test, spend thirty minutes using the app as a real user would. Not a careful developer user who knows where things are and what the expected behavior is. A confused first-time user who clicks the wrong buttons, navigates back when they should not, and tries inputs that were not anticipated. Document every flow you can find, including the ones that seem broken. This map becomes the foundation for your test suite.

Test boundary conditions deliberately

For each input field and each form, test the boundaries: empty inputs, maximum length inputs, special characters, unicode, numbers where strings are expected and vice versa. For each async operation, test what happens when it is slow or when it fails. For each stateful flow, test what happens when you interrupt it halfway through.

Area	What to test	Common AI code failure
Forms	Empty submit, long inputs, special chars	Silent failure, no error message
Auth flows	Wrong password, expired session, multi-tab	No redirect, stale state shown
Async operations	Slow network, request failure, double submit	Race condition, duplicate data
Navigation	Back button mid-flow, direct URL access	Broken state, 404 on valid route
Data edge cases	Empty lists, max items, deleted items	Crash, blank screen, wrong count

Check the network layer

Open the browser developer tools and watch network requests while you use the app. AI-generated code often makes unnecessary requests, exposes sensitive data in query parameters, or fails to handle non-200 responses gracefully. Watching the network traffic for ten minutes of normal usage will often surface issues that would take hours to find through code review.

Audit a vibe-coded app without reading the codebase

Assrt auto-discovers test scenarios from your running app. Point it at your URL, and it maps the flows users can take and generates Playwright tests for each. No codebase access required.

3. Why E2E Tests Are Essential for AI-Generated Codebases

You might think that AI-generated code is well-suited to unit tests: the code is modular, the functions are often small, and the logic is usually clear. That is true. But unit tests for vibe-coded apps have a fundamental limitation: they test the code that exists, not the behavior that users experience. The most dangerous bugs in AI-generated code are not in individual functions. They are in the interactions between functions, in the state transitions that were never explicitly designed, and in the gaps between what the developer asked for and what the AI generated.

E2E tests are the only kind of test that catches these integration-level failures reliably. They simulate a real browser session, interact with the UI the way a user would, and observe outcomes at the level users observe them. A race condition that causes a form to submit twice when clicked rapidly will not show up in a unit test. It will show up in an E2E test that clicks the submit button twice in quick succession.

E2E tests as documentation

For vibe-coded apps, E2E tests serve a documentation function that is arguably more valuable than their bug-catching function. When you write an E2E test that says "user logs in, navigates to the dashboard, creates a new project, and sees it appear in the list," you are documenting what the app is supposed to do. That documentation is executable: you can run it at any time to verify that the documented behavior still holds.

This matters enormously for AI-generated codebases because the developer who wrote the code may not remember what decisions were made or why. The AI that generated it certainly does not remember. The test suite becomes the authoritative record of intended behavior.

E2E tests as constraints for future generation

When you continue developing a vibe-coded app with AI assistance, having an E2E test suite gives you a verification layer for each round of generation. Ask the AI to add a feature, run the tests, and confirm that the existing flows still work. This is the workflow that makes iterative vibe coding sustainable. Without tests, each new generation is a gamble that the previous behavior was not broken.

4. Writing E2E Tests During an Audit

The most effective approach to auditing a vibe-coded app is to write tests as you go, not after. Each flow you explore becomes a test case. Each bug you find becomes a failing test that documents the current incorrect behavior and will become a passing test after the fix.

Prioritize flows by impact

You cannot audit everything simultaneously. Start with the flows that represent the highest user impact or the highest business risk. For most apps, this means:

Authentication: sign up, log in, log out, password reset, session expiry
Core value delivery: the primary thing users come to the app to do
Payment or subscription flows, if applicable
Data creation and deletion: the flows with irreversible consequences
Permission and access control: ensuring users can only see what they should

Write tests that document current behavior, not ideal behavior

When auditing a vibe-coded app, your first tests should document what the app actually does, not what it should do. If the error message on a failed login is unhelpfully generic, write a test that asserts that generic message. Now you have a documented baseline. When you improve the error message, the test fails (correctly) until you update it to assert the new, better message. This prevents you from accidentally reverting improvements.

Name tests so they explain the scenario

Good test names are documentation. "User submits login form with correct credentials and is redirected to dashboard" tells you exactly what the test covers and what behavior it asserts. "Login test" tells you nothing. In a vibe-coded codebase where you did not write the code yourself, the test names often become the primary source of truth for what the application is supposed to do.

5. Building Regression Safety Nets for Ongoing Development

A one-time audit is valuable. A persistent regression safety net is invaluable. The difference is automation: the tests run on every significant code change, not just when someone remembers to run them.

Integrate with CI from day one

The moment you have even five E2E tests, integrate them into CI. Run them on every pull request and every push to main. Block merges on failure. This sounds aggressive, but the discipline it creates is worth the occasional friction. Every time a CI failure catches a regression that would have shipped to production, the investment pays for itself.

Expand coverage after every bug found

The best time to write a test for a bug is immediately after finding it. The bug's reproduction steps are clear in your mind. You can write a failing test that reproduces the bug, fix the bug, and confirm the test now passes. This practice, sometimes called test-after-bug, gradually fills in the coverage gaps that the initial audit missed. Over time, the test suite becomes a comprehensive catalog of every failure mode the app has ever exhibited.

Use feature flags to test incrementally

When continuing to develop a vibe-coded app with AI assistance, feature flags let you ship new AI-generated code to a subset of users while the E2E tests run against the full feature before a wider rollout. This gives you a real-world feedback loop on new AI-generated behavior before it reaches your entire user base.

Handle selector brittleness in vibe-coded UIs

AI-generated frontend code often uses class names that are either too generic (meaningless for test targeting) or too specific (fragile when the AI regenerates the component with slightly different naming). The solution is to use data attributes for test targeting: adddata-testidattributes to important elements when you write tests, and ask the AI to preserve them in future generations. This makes your test selectors stable even when the component structure changes.

6. Tools and Workflow for Vibe-Coded App Testing

The right toolset for testing AI-generated apps leans heavily on tools that work from the outside in: starting from what the app does as a user sees it, rather than what the code says it does.

Playwright for E2E tests

Playwright is the standard choice for E2E testing in 2026. It supports all major browsers, has excellent async handling, and produces tests that are readable enough to serve as documentation. The Playwright test runner's built-in screenshot comparison, network interception, and trace viewer make it particularly useful for debugging the kinds of intermittent failures that vibe-coded apps tend to produce.

AI-assisted test discovery

For vibe-coded apps where you did not write the code and may not fully understand the codebase, auto-discovery tools that start from the running application are particularly useful. Assrt, for example, crawls a live app and discovers the user flows it can navigate, generating Playwright tests for each flow it finds. This approach requires no codebase understanding: you point it at your URL withnpx @m13v/assrt discover https://your-app.com, and it returns a set of tests that document the behavior it observed. This is a useful starting point for an audit even if you intend to expand and customize the tests manually.

Browser developer tools for network auditing

The network tab in Chrome DevTools is one of the most powerful audit tools for vibe-coded apps. Filter for failed requests, watch for unexpected data exposure in URLs, and look for requests that fire more times than expected (often a sign of race conditions or missing deduplication). Playwright's network interception API lets you encode what you find in the network tab as test assertions that run automatically.

A practical audit workflow

Step 1: Run a discovery tool against the app to get an initial map of flows and a starter test suite
Step 2: Use the app manually for thirty minutes, focused on boundary conditions and error scenarios
Step 3: Write E2E tests for every flow you find, documenting current behavior even when it is wrong
Step 4: Mark tests for broken behaviors as known failures; fix them in priority order
Step 5: Integrate the full test suite into CI before any further development
Step 6: From this point, run the suite on every AI generation session before accepting the generated code

Vibe coding is a legitimate and increasingly common development approach. The apps it produces are not inherently worse than human-written apps: they are differently risky. The risks are concentrated in edge cases, error handling, and integration-level behaviors that the generation prompt never described. E2E tests are the tool that makes those risks visible and manageable. Building the test suite as part of the development process rather than after it is the difference between a vibe-coded app you can confidently maintain and one you are perpetually afraid to touch.

Audit Your Vibe-Coded App Without Reading the Codebase

Assrt auto-discovers test scenarios from your running app and generates real Playwright tests for every flow it finds. No codebase understanding required. Start with npx @m13v/assrt discover https://your-app.com and build your regression safety net from there.

View on GitHub