Code Audit & Testing

Automated Test Coverage for Vibe-Coded Apps: How to Audit AI-Generated Code with Confidence

Vibe coding (using AI to generate entire applications from natural language prompts) is exploding in popularity. The results can be impressive, but the code often ships without meaningful test coverage. This guide walks you through adding automated E2E tests when auditing a vibe-coded application, so you can catch the bugs that AI missed and build a regression safety net for future development.

82%

82% of AI-generated code samples in a recent study contained at least one untested edge case that could cause user-facing failures in production.

Galois Research on LLM Code Quality, 2025

Why Vibe-Coded Apps Have Unique Testing Challenges

When a developer writes code by hand, they typically build up mental models of failure modes as they go. They think about what happens when the network drops, when a user double-clicks a submit button, or when a session token expires. AI code generators work differently. They optimize for the happy path because that is what the prompt describes.

This creates a specific pattern of blind spots in vibe-coded applications:

  • Happy path bias. LLMs generate code that satisfies the stated requirement but rarely account for invalid inputs, race conditions, or timeout scenarios. The “it works on my machine” demo often masks fragile underlying logic.
  • Inconsistent error handling. AI-generated code may handle errors in one module (because the training data included it) and silently swallow exceptions in another. There is no consistent strategy, just statistical patterns from the training corpus.
  • Copy-paste architecture. Vibe-coded apps frequently contain duplicated logic across components because each prompt response is generated independently. A bug fix in one place may not propagate to the three other places where the same pattern was repeated.
  • Missing boundary validation. File size limits, pagination boundaries, concurrent user access, and locale-specific formatting are routinely absent from AI-generated code unless the prompt specifically requested them.
  • No regression safety net. Most vibe-coded apps ship with zero automated tests. Every future change, whether made by a human or AI, risks breaking existing functionality with no way to detect it before users do.
0%

In a sample of 50 vibe-coded open-source projects on GitHub, none included E2E tests at the time of initial release. Only 4 had any unit tests at all.

Developer audit survey, 2025

Approaching a Vibe-Code Audit with Testing in Mind

A traditional code audit focuses on reading source code, identifying vulnerabilities, and writing a report. When you audit a vibe-coded app, the deliverable should be more than a document. It should include a working test suite that validates the findings and protects against regressions going forward.

Here is a practical framework for structuring the audit:

1. Run the application first, read the code second

Before opening a single source file, spend 30 to 60 minutes using the application as a real user would. Document every user flow you encounter: signup, login, core actions, settings, edge cases you can trigger through the UI. This gives you a map of what the application actually does (as opposed to what the code intends to do). Record your session if possible; these recordings become the basis for your test scenarios.

2. Identify the critical user paths

Not every flow needs a test on day one. Prioritize based on business impact and risk. For most applications, the critical paths fall into a few categories: authentication and authorization, payment or subscription flows, data creation and persistence (can users save their work and retrieve it?), and any flow involving third-party integrations. These paths should get E2E test coverage before you write a single line of audit commentary.

3. Set up the testing infrastructure early

Install your testing framework on day one of the audit. Whether you choose Playwright, Cypress, or another tool, having it ready means you can write a test the moment you discover a bug. This habit transforms your audit from a static report into a living, executable specification. Each finding becomes a test case. Each test case becomes a regression guard.

Automate test discovery

Assrt can crawl a running application and generate Playwright test scaffolds for discovered user flows. Useful for kickstarting coverage during an audit.

Get Started

4. Document assumptions, then test them

Vibe-coded apps are full of implicit assumptions. The AI assumed a user would always be logged in before hitting a certain endpoint. The AI assumed the API response would always include a specific field. The AI assumed file uploads would be small. For each assumption you identify, write a test that violates it. These negative tests are where you will find the most impactful bugs.

Prioritizing User Flows for Test Coverage

When time is limited (and it always is during an audit), use this prioritization matrix to decide which flows get tests first:

PriorityFlow TypeWhy It Matters
P0Authentication and session managementSecurity-critical. AI often generates auth flows that work but miss token refresh, session expiry, or CSRF protections.
P0Data persistence (create, read, update, delete)Core value proposition. If users lose data, nothing else matters.
P1Payment and billing flowsRevenue-critical. Double charges, failed webhooks, and subscription state mismatches are common in AI-generated payment integrations.
P1Error states and empty statesThese are almost never tested by AI. What happens when the API returns 500? When the list is empty? When the network drops mid-request?
P2Navigation and routingEnsures deep links work, back button behavior is correct, and protected routes actually redirect unauthenticated users.
P2Responsive layout and accessibilityAI-generated layouts often break at specific viewport sizes or lack proper ARIA attributes.

Writing Regression Tests During Code Review

The most efficient time to write a regression test is the moment you find a bug. While reviewing the vibe-coded source, you will inevitably discover logic errors, missing validations, and fragile patterns. For each one, follow this workflow:

  1. Reproduce the bug in the browser. Confirm you can trigger the failure manually. If you cannot reproduce it, note the theoretical risk but do not write a test for it yet.
  2. Write a failing test that captures the bug. This test should fail right now, proving the bug exists. Use descriptive test names that reference the audit finding (for example, test('AUDIT-007: session token is refreshed before expiry')).
  3. Fix the bug and verify the test passes. If your audit scope includes fixes, implement the fix. If not, hand the failing test to the development team as a precise, executable specification of what needs to change.
  4. Add the test to CI. The test should run on every pull request going forward. This is how you ensure the bug stays fixed, even as more vibe-coded changes land in the repository.

This approach produces two deliverables from every bug you find: a human-readable audit report entry and a machine-executable test that enforces the fix. Clients value this significantly more than a PDF that collects dust.

3x

Audit engagements that deliver executable test suites alongside written reports see 3x higher remediation rates within 30 days.

Trail of Bits engineering blog, 2025

Tools for Building E2E Coverage During an Audit

Several tools can accelerate the process of adding test coverage to a vibe-coded application. The right choice depends on your team's experience, the application's tech stack, and your timeline.

Playwright

The most popular E2E testing framework in 2026. Playwright supports Chromium, Firefox, and WebKit, includes a built-in test runner, and offers excellent APIs for intercepting network requests. Its codegen tool can record user interactions and generate test code, which is a fast way to bootstrap coverage during a time-limited audit. Playwright is free, open-source, and maintained by Microsoft.

Cypress

A mature E2E framework with an interactive test runner that is especially useful for debugging. Cypress runs tests inside the browser, giving you direct access to the DOM and network layer. The tradeoff is that it only supports Chromium-based browsers (plus experimental Firefox support). The open-source version is free; the Cypress Cloud dashboard is a paid add-on for CI recording and analytics.

Assrt

An open-source AI testing framework built on top of Playwright. Assrt uses LLMs to crawl a running application, discover user flows, and generate complete Playwright test files. During an audit, this can save significant time: point Assrt at the application URL, let it discover the critical paths, then review and customize the generated tests. The output is standard .spec.ts files that you own and can run with npx playwright test, with no vendor dependency. Assrt also includes self-healing capabilities, so tests automatically adapt when selectors change.

Manual Recording Tools

Tools like Playwright's built-in recorder, Chrome DevTools Recorder, and Selenium IDE let you record browser interactions and export them as test code. These are useful for quickly capturing flows you discovered during the manual exploration phase of your audit. The generated code typically needs cleanup, but it is faster than writing every test from scratch.

Practical Tips for Auditing Vibe-Coded Applications

  • Check for hardcoded secrets. AI-generated code frequently includes API keys, database connection strings, and service credentials directly in the source. Run a secrets scanner (like gitleaks or trufflehog) before doing anything else.
  • Test with different user roles. AI often generates authorization checks for the admin role but forgets to restrict access for regular users. Create test accounts at every permission level and verify each endpoint respects the boundaries.
  • Look for orphaned routes and dead code. Vibe-coded apps accumulate unused components and API endpoints from earlier prompt iterations. These represent attack surface with no business justification. Flag them for removal.
  • Verify third-party dependencies. LLMs sometimes hallucinate package names or pin outdated versions with known CVEs. Run npm audit (or the equivalent for your package manager) and check that every dependency actually exists and is actively maintained.
  • Test offline and slow-network behavior. Use Playwright's network throttling or Cypress intercepts to simulate degraded conditions. Vibe-coded apps almost never handle these scenarios gracefully.
  • Establish a test baseline before any fixes. Run your full test suite against the unmodified codebase and record the results. This baseline documents the state of the application at the start of the audit and gives the client a clear before-and-after comparison.

Delivering an Audit That Lasts

The final deliverable of a vibe-code audit should include three things: a written report documenting findings and recommendations, an executable test suite that validates those findings, and a CI configuration that runs the tests automatically on every code change.

The written report alone is a snapshot in time. It becomes outdated the moment the next commit lands. The test suite, on the other hand, is a living document. It catches regressions automatically, communicates expected behavior through test names, and gives the development team confidence to make changes without fear of breaking existing functionality.

For vibe-coded applications specifically, this is especially important. The next round of AI-generated changes will introduce the same categories of bugs you just found. Without automated tests, the audit cycle repeats indefinitely. With them, each new change is validated against the established baseline before it reaches users.

Start your test suite in minutes

Assrt generates Playwright tests from a running app. Point it at your audit target and get a coverage baseline fast.

Get Started

Conclusion

Vibe-coded applications are here to stay. The speed of AI code generation makes it an attractive option for prototyping and even production development. But speed without verification produces fragile software. As an auditor, your job is to add the verification layer that AI skipped.

The approach is straightforward: explore the application as a user, identify critical flows, set up your testing infrastructure early, and write a test for every bug you find. Use tools like Playwright, Cypress, or Assrt to accelerate the process. Deliver executable tests alongside your written report. And make sure those tests run in CI so the safety net persists long after your engagement ends.

The goal is not to slow down vibe coding. It is to make it safe. Automated test coverage is how you get there.

Add test coverage to any app in minutes

Assrt discovers user flows, generates Playwright tests, and self-heals when your UI changes. Free and open-source.

$npm install @assrt/sdk