Testing Infrastructure

AI Tests Should Speak Your Framework's Language. Not Invent a New One.

When AI testing tools generate tests in proprietary formats, you end up maintaining two parallel test systems. That doubles your maintenance burden and splits your team's attention. The fix is simple: AI tools should output standard framework code that fits the infrastructure you already have.

63%

63% of engineering teams that adopted AI testing tools later abandoned them because the generated tests could not integrate with their existing CI pipelines.

State of Testing Survey, 2025

1. The Island Problem: AI Tests That Live in Isolation

Most AI testing tools were built to solve one problem: generating test cases faster. They succeeded at that. What they failed at was connecting those generated tests to the rest of your engineering workflow. The result is a strange situation where you have a powerful AI producing tests that exist in a completely separate universe from the test suite your team has spent years building.

This isolation manifests in several ways. The generated tests run on the vendor's cloud infrastructure, not in your CI pipeline. They use proprietary selectors or YAML definitions instead of standard Playwright or Cypress locators. Their results show up in a separate dashboard rather than alongside your existing test reports. When a generated test fails, the debugging workflow is completely different from how your team debugs hand-written tests.

The fundamental issue is that these tools were designed as standalone products, not as components in an existing system. They optimize for an impressive demo (watch the AI write 50 tests in two minutes) while ignoring the harder question of how those 50 tests fit into the 500 tests you already have, the CI pipeline that runs them, and the monitoring that tracks their reliability over time.

2. The Nightmare of Maintaining Two Test Systems

Running two parallel test systems is not twice the work. It is significantly more than twice the work, because of the coordination overhead. You need to ensure both systems cover the right areas without excessive overlap. You need separate debugging workflows for each. You need to reconcile conflicting results when one system passes and the other fails. You need to train your team on two different tools, two different reporting formats, and two different failure triage processes.

The coordination cost is especially painful around flaky tests. Every test suite has some degree of flakiness, and managing it requires consistent tooling: retry policies, quarantine mechanisms, historical pass rate tracking. When your AI-generated tests run on a separate platform with its own retry logic and reporting, you lose the ability to manage flakiness holistically. A test that is flaky in the vendor's cloud might be stable in your CI environment (or vice versa), and you have no way to correlate the two.

Teams that adopt proprietary AI testing tools alongside their existing framework inevitably face a choice: migrate everything to the vendor's platform (accepting vendor lock-in and abandoning years of investment in existing infrastructure) or maintain both systems indefinitely (accepting the ongoing coordination cost). Neither option is appealing. The better path is to avoid the split in the first place by choosing AI tools that output standard framework code.

Try Assrt for free

Open-source AI testing framework. No signup required.

Get Started

3. Why Standard Framework Output Changes Everything

When an AI testing tool outputs a standard Playwright test file, that file is just code. It runs with the same npx playwright test command your team already uses. It shows up in the same CI job. It uses the same configuration, the same base URL environment variables, the same authentication helpers. Your existing retry policies, parallelization settings, and reporting integrations all apply automatically.

This compatibility extends to the entire development workflow. When a generated test fails, your developers debug it the same way they debug any other test: they read the Playwright trace, check the screenshot, maybe add a page.pause()and step through it locally. There is no context switching to a different tool or dashboard. The generated test is indistinguishable from a hand-written one, which means your team's existing expertise applies directly.

Standard output also preserves optionality. If you decide to switch AI testing tools later, the tests you have already generated are just Playwright files. They keep working regardless of which tool generated them. Compare this to proprietary formats where switching vendors means re-creating your entire AI-generated test suite from scratch. The test files themselves become an asset you own, not a dependency you rent.

4. Comparing Approaches: Proprietary YAML vs. Standard Playwright

Proprietary YAML definitions. Tools like Momentic use custom YAML or JSON schemas to define test steps. You describe what you want to test in their format, and their engine interprets it at runtime. This approach offers simplicity for non-technical users and allows the vendor to optimize execution behind the scenes. The downside is significant: your tests only run on their platform, you cannot extend them with custom JavaScript logic, and the YAML definitions are opaque when debugging failures. Some tools in this category are also limited to Chrome, which means you cannot run cross-browser tests without additional tooling.

Managed services with closed-source execution. Companies like QA Wolf provide a fully managed testing service where their team (augmented by AI) writes and maintains tests for you. At $7,500 per month or more, this approach works for organizations that want to outsource QA entirely. The tradeoff is that you have limited visibility into how tests work, limited ability to customize them, and a hard dependency on an external team's availability and priorities.

Standard Playwright output (open source). Tools like Assrt take a different approach: they crawl your application, discover testable scenarios automatically, and generate standard Playwright test files that you commit to your repository. The generated files use real Playwright APIs with self-healing selectors, so they run anywhere Playwright runs. Because Assrt is open source, you can inspect exactly how tests are generated, modify the generation logic to fit your conventions, and run everything on your own infrastructure with zero vendor dependency.

The choice between these approaches depends on your team's priorities. If you value control, portability, and integration with existing infrastructure, standard framework output is the clear winner. If you value managed convenience and are comfortable with vendor lock-in, proprietary platforms have their place. Most engineering teams building serious products lean toward the former, because the cost of lock-in compounds over time while the convenience premium fades as the team builds internal expertise.

5. Practical Integration: Making AI Tests Part of Your Pipeline

Integrating AI-generated tests into an existing pipeline requires treating them exactly like hand-written tests. Store them in the same directory structure, subject them to the same code review process, and run them in the same CI jobs. If your team uses a tests/e2e/ directory with a naming convention like *.spec.ts, the AI tool should output files that follow that convention. Any deviation creates friction that will eventually cause your team to treat AI-generated tests as second-class citizens.

The review process matters. AI-generated tests should go through pull requests just like any other code change. This gives your team the opportunity to evaluate whether the generated scenarios are meaningful, whether the assertions are correct, and whether the test adds value to the suite or just adds noise. Treating generated tests as automatically trustworthy is a mistake that leads to bloated, unreliable test suites.

Consider setting up a periodic generation workflow rather than a continuous one. Run the AI discovery tool weekly or after significant feature changes, review the generated tests as a batch, and merge the ones that add meaningful coverage. This cadence gives your team control over suite growth while still benefiting from AI-powered scenario discovery. It also makes it easier to track which generated tests prove stable over time and which need human refinement.

The end goal is a unified test suite where you cannot tell (and do not need to tell) which tests were written by a human and which were generated by AI. When both types use the same framework, the same patterns, and the same infrastructure, the distinction becomes irrelevant. What matters is coverage, reliability, and speed. The tool that generated the test is just an implementation detail.

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

$npm install @assrt/sdk