How to Test AI-Generated Code: Automated E2E Testing for Vibe Coding Projects
AI agents can write code faster than any human. But testing is still the step most people skip. If your verification strategy stops at "it compiles," you are shipping bugs. Here is how browser-based E2E testing catches what unit tests and type checkers miss.
“Generates standard Playwright test files you can inspect, modify, and run in any CI pipeline. No vendor lock-in.”
Assrt
1. The Testing Gap in AI-Assisted Development
The software engineering process does not fundamentally change because an agent is writing the code. You still need requirements, implementation, review, and verification. But when AI handles the implementation step, something interesting happens: the bottleneck shifts. Code generation becomes nearly instant. Review becomes harder because you did not write the code yourself. And testing, the step that was already the first thing most teams cut, gets skipped entirely.
This is the core problem with vibe coding workflows today. Tools like Cursor, Bolt, Lovable, and Copilot make it trivially easy to generate a working frontend in minutes. The code compiles. The page renders. The demo looks great. But nobody verified that the form actually submits data correctly, that the error states render properly, that the auth flow works across browsers, or that the checkout page handles network failures gracefully.
The result is a new class of software that looks finished but is not tested. The surface area of untested behavior grows with every prompt. And because the developer did not write the code line by line, they lack the intuitive understanding of where the fragile points are. When something breaks in production, debugging AI-generated code you only half-understand is significantly harder than debugging code you wrote yourself.
2. Why Unit Tests Are Not Enough for AI-Generated Frontends
The instinctive response to "we need more testing" is to write unit tests. And for backend logic, data transformations, and utility functions, unit tests are still the right tool. But for AI-generated frontend code, unit tests have a fundamental limitation: they test components in isolation, not in the browser where users actually interact with them.
Consider a signup form generated by an AI coding tool. A unit test might verify that the validation function returns the correct error message for an invalid email. That is useful. But it will not catch that the error message renders behind the keyboard on mobile, that the submit button remains disabled after correcting the input, that the password visibility toggle breaks the form layout in Safari, or that pasting a long email from a password manager overflows the input container.
These are not edge cases. They are the everyday failures that users encounter, and they only surface when you test the actual rendered output in a real browser. AI-generated code is particularly prone to these issues because LLMs optimize for making the happy path work. The model does not test its own output in a browser. It generates markup and logic that is syntactically correct and pattern-matched from training data, but it has no concept of how the result actually looks or behaves when rendered.
Type checkers and linters catch another slice of issues (missing props, incorrect types, unused imports) but they say nothing about whether the UI works correctly from the user's perspective. The verification you need is something that can open a browser, interact with the application the way a user would, and assert against the actual UI state.
3. Browser-Based E2E Testing as the Missing Layer
End-to-end testing with a real browser is the layer that closes the verification gap. An E2E test launches your application, navigates to a page, performs actions (clicking buttons, filling forms, scrolling), and asserts that the resulting UI state matches expectations. It catches the entire class of bugs that only manifest when HTML, CSS, JavaScript, network requests, and browser rendering all interact together.
For vibe-coded projects, E2E tests serve a second purpose: they act as a specification of what "working" means. When you generate code with an AI agent, the prompt is your intent but the rendered output is the reality. An E2E test that crawls your application and verifies each page loads, each button is clickable, and each form submits correctly creates an objective record of what the application actually does. This is enormously valuable when you are iterating through multiple AI-generated versions and need to verify that a new change did not break something that was working before.
Playwright has become the standard tool for this kind of testing. It supports Chromium, Firefox, and WebKit, runs headlessly in CI pipelines, and has excellent APIs for intercepting network requests, handling authentication, and taking screenshots for visual comparison. The challenge has always been writing and maintaining the tests, which is where automation enters the picture.
4. Approaches Compared: Manual, Proprietary, and AI-Powered
There are broadly three approaches to getting E2E test coverage for your application, each with different tradeoffs in cost, speed, and flexibility.
Manual Playwright (or Cypress, or Selenium). You write tests by hand. This gives you full control over what gets tested and how. The tests are real code in your repository, they run in your CI pipeline, and you can debug them with standard developer tools. The downside is time. Writing comprehensive E2E tests for a full application takes days or weeks. Maintaining them as the UI changes takes even longer. For teams shipping fast with AI-generated code, the manual approach often cannot keep pace with the rate of UI changes.
Proprietary testing platforms. Tools like Momentic and QA Wolf offer managed testing services. They handle test creation, execution, and maintenance. The tradeoff is vendor lock-in and cost. Momentic uses proprietary YAML instead of standard test code, supports only Chrome, and locks you into their platform. QA Wolf provides human-maintained tests but starts at $7,500 per month, which is prohibitive for most startups and indie developers. Both approaches mean your tests are not portable; if you leave the platform, you start over.
AI-powered test generation. A newer category of tools uses AI to automatically discover test scenarios and generate real test code. The idea is to crawl your application, identify interactive elements and user flows, and produce Playwright (or similar) test files that you own and can modify. Assrt is one example of this approach: you point it at your running application with npx @assrt-ai/assrt discover https://your-app.com, and it generates standard Playwright test files that you can inspect, edit, and run in any CI pipeline. Because the output is real Playwright code (not proprietary YAML), there is no lock-in. Other tools in this space include Playwright's own codegen recorder and various GPT-based test generators.
The AI-powered approach is particularly well-suited to vibe coding workflows because it matches the speed of code generation. If you can generate a new feature in five minutes, you need a testing approach that can generate coverage in a similar timeframe. Manual test writing cannot do this. AI-generated tests, while not perfect, provide a baseline of coverage that catches obvious regressions and gives you a foundation to build on.
5. Building a Testing Workflow for Vibe-Coded Projects
A practical testing workflow for AI-generated code does not require perfection. It requires coverage that grows with your application and catches regressions before users do. Here is a concrete approach that works whether you are a solo developer or a small team.
First, establish a baseline. Run an automated discovery tool against your deployed application to generate E2E tests for every page and user flow that currently exists. This gives you regression coverage for free. Every future change that breaks an existing flow will be caught.
Second, add targeted tests for critical paths. Automated discovery catches navigation and basic interactions, but your checkout flow, authentication, and data mutation endpoints deserve hand-written or hand-tuned tests that verify specific business logic. Use the auto-generated tests as a starting point and extend them with assertions specific to your domain.
Third, integrate tests into your CI pipeline. Tests that do not run automatically are tests that stop running. Configure your pipeline to execute E2E tests on every pull request. Tools like Playwright support parallel execution across multiple browsers and produce trace files for debugging failures. Self-healing selectors (available in tools like Assrt and others) reduce maintenance overhead when AI-generated UI changes cause selector drift.
Fourth, re-run discovery after significant changes. When you generate a new feature or substantially refactor existing pages, run the discovery process again to pick up new flows and update existing coverage. This keeps your test suite aligned with the actual state of the application rather than a snapshot of how it looked weeks ago.
The pattern that emerges is simple: generate code fast, generate tests fast, verify in a real browser, and keep the feedback loop tight. The developers and teams that succeed with vibe coding will be the ones who treat testing as an equal partner to code generation, not an afterthought. The tools exist to make this practical. The question is whether you use them.
Ready to automate your testing?
Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.