AI Code Quality

Vibe Coding Is Breaking Your App: Build a Regression Testing Safety Net

Name: Assrt
Availability: InStock
Author: Assrt

"Vibe coding is breaking our app and I'm tired of it" topped r/webdev with thousands of upvotes. The developer described a pattern many teams now recognize: AI-generated code that looks correct, passes code review, and then silently breaks a payment flow three pages away. The fix is not to stop using AI. It is to add a safety net.

“Generates real Playwright code, not proprietary YAML. Open-source and free vs $7.5K/mo competitors. Self-hosted, no cloud dependency. Tests are yours to keep, zero vendor lock-in.”

Assrt project philosophy

1. How Vibe Coding Creates Cascading Failures

"Vibe coding" describes the practice of generating code with AI tools (Cursor, Copilot, Claude, ChatGPT) by describing what you want in natural language and accepting the output with minimal review. The term, coined by Andrej Karpathy, captures the feeling of flowing with the AI rather than carefully engineering every line.

The problem is not that AI-generated code is bad. Modern LLMs produce syntactically correct, well-structured code most of the time. The problem is context. When you ask an AI to "add a discount field to the product page," it modifies the product component. What it does not know is that the checkout flow reads the product price from a shared state object, the invoice generator assumes a specific price format, and the analytics pipeline tracks revenue based on the pre-discount amount.

The AI makes a locally correct change that is globally destructive. The product page looks fine. The discount displays correctly. But checkout now calculates the wrong total, invoices show negative amounts for discounted items, and the revenue dashboard reports inflated numbers. These failures do not surface until a real customer hits the checkout flow, possibly days later.

This pattern is especially dangerous because the failure mode is silent. There is no crash, no error in the console, no red banner. The application works, just incorrectly. Traditional error monitoring (Sentry, Datadog) does not catch it because there is no exception to report. Only a test that verifies the end-to-end behavior of the checkout flow would catch the discrepancy.

2. The Minimum Viable Safety Net

You do not need 100% test coverage to catch the failures that vibe coding creates. You need tests for the 5 to 10 flows that, if broken, cost your business real money or real customers. For most web applications, this list is short and predictable.

The critical flows for a typical SaaS application: signup and login (if users cannot get in, nothing else matters), the primary value action (whatever the user pays for), the payment flow (signup to checkout to successful payment confirmation), and the settings/account page (where users manage billing, which directly affects revenue). For an e-commerce site, replace the "primary value action" with the product browsing and cart flow.

Five to ten well-written Playwright tests covering these flows provide an outsized return on investment. They run in under two minutes, they catch the most expensive categories of bugs, and they create a foundation you can expand over time. The goal is not comprehensive coverage. The goal is catching the failures that wake you up at 3 AM.

Getting these initial tests written is the hardest part because it requires someone to sit down and author them. This is where automated test generation tools help significantly. An open-source tool like Assrt can crawl your application, identify the critical user flows, and generate Playwright test files that cover them. You review the generated tests, adjust assertions if needed, and commit them. The time investment drops from days to hours.

Your AI writes the code. Who tests it?

Assrt generates Playwright regression tests for your critical flows automatically. Catch the bugs that vibe coding creates before your customers do.

3. Smoke Tests for the Flows That Make You Money

A smoke test is a quick, shallow check that a critical path works at all. It does not verify every edge case. It verifies that the happy path completes without errors. For payment flows, a smoke test walks through: select a product, add to cart, enter test payment details, submit, verify the confirmation page appears with the correct amount.

Stripe provides test card numbers (4242 4242 4242 4242) and a test mode that processes fake transactions. PayPal has sandbox accounts. Every major payment provider offers testing infrastructure. Your smoke test should use these, and it should run on every deployment to staging and production.

The specific assertions matter. Do not just check that the confirmation page renders. Check that the displayed amount matches what the user expected. Check that the order appears in the user's order history. Check that the receipt email contains the correct line items. These are the assertions that catch the "locally correct, globally broken" failures from vibe coding, where a component change silently corrupts the data flowing through the payment pipeline.

For authentication flows, the smoke test should cover the complete cycle: register, verify email (if applicable), login, access a protected page, logout. If your app uses OAuth (Google, GitHub), test the redirect flow with mocked OAuth responses using Playwright's route interception. A broken login flow is an emergency regardless of what caused it, so catching it before deployment is worth the 30 seconds the test takes to run.

4. Integrating Regression Checks into AI Workflows

The most effective safety net runs automatically, not when someone remembers to trigger it. Your CI pipeline should run the smoke test suite on every pull request. If an AI-generated code change breaks the checkout flow, the PR gets a red check before it merges. The developer (or the AI tool) can then fix the issue before it reaches staging or production.

Modern AI coding tools are increasingly capable of running tests as part of their workflow. Cursor and Claude Code can execute Playwright tests, read the results, and iterate on their code until the tests pass. This creates a feedback loop where the AI generates code, tests catch regressions, and the AI fixes them, all before a human reviews the change. The key is having the tests exist in the first place.

For teams using Vercel, Netlify, or similar deployment platforms, run smoke tests against preview deployments. Every PR gets a preview URL, and your CI pipeline runs the Playwright suite against that URL. This catches environment-specific issues (broken environment variables, missing API endpoints) that local tests might miss.

The execution time budget matters. If your smoke suite takes 10 minutes, developers will find ways to skip it. Keep the core smoke tests under 2 minutes by running them in parallel and focusing only on critical paths. Save comprehensive test suites for nightly runs. The fast suite catches the obvious regressions; the nightly suite catches the subtle ones.

5. Building Confidence Without Slowing Down

The objection to adding tests is always the same: "It will slow us down." This is true if testing means manually writing hundreds of test files. It is not true if testing means generating a focused smoke suite and running it automatically.

The math works out clearly. A team that deploys 10 times per week with no smoke tests will ship approximately 1 to 2 regressions per week to production (industry average for teams without E2E tests, per the Accelerate State of DevOps research). Each regression takes 2 to 8 hours to diagnose, fix, deploy, and communicate to affected customers. That is 4 to 16 hours per week spent on preventable fires.

A 2-minute smoke suite catches roughly 60% to 80% of these regressions before they reach production. The time saved on fire-fighting (3 to 13 hours per week) far exceeds the time spent waiting for tests to run (about 3 hours per week of cumulative CI time, running in the background while you do other work).

The teams that ship fastest with AI coding tools are not the ones who accept every AI output blindly. They are the ones who have a safety net that lets them accept AI output confidently. Vibe coding with a regression suite is velocity. Vibe coding without one is a time bomb.

Start small. Generate smoke tests for your top 5 revenue paths. Hook them into CI. Let them run on every PR. You will catch the first vibe-coding regression within a week, and the 30 minutes you spent setting up tests will save you a full day of debugging. That is not a prediction. That is what every team that has done this reports.

Build the safety net your AI workflow needs

Assrt generates Playwright smoke tests for your critical flows. Point it at your app, get regression tests back, and ship AI-generated code with confidence.

$Open-source. Free. Real Playwright code you own.

View on GitHub

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.