How does natural language test generation actually work?

It runs as a pipeline with four steps. Intent parsing extracts the actions and expected outcomes from the English description. Application context analysis crawls the app to build a map of pages, forms, and elements. Selector generation picks resilient locators following Playwright's recommended hierarchy (getByRole, getByText, getByLabel, getByTestId). Assertion inference maps phrases like 'should see a success message' to auto-retrying Playwright assertions.

Does natural language testing lock you into a vendor?

It depends on the tool. Assrt generates standard Playwright TypeScript files that you commit to your own repository, so there is no proprietary format or runtime AI dependency. Cloud-based platforms like Momentic or Mabl store tests in their own format on their infrastructure, which means you would need to rewrite tests if you leave. The critical question to ask is whether you can take your tests with you if you leave the tool.

When should you not use natural language testing?

Avoid natural language testing for scenarios that require intricate conditional logic or calculations, performance and load testing (use k6, Artillery, or Lighthouse), highly dynamic UIs with randomized content like games or generative art, and security testing such as XSS or SQL injection (use OWASP ZAP or Burp Suite). The most effective teams use natural language testing for the 70 to 80% of tests that cover standard user flows and reserve hand-written tests for the remaining edge cases.

Testing Guide

Natural Language Testing: Automating Tests Without Code

Q: How much faster is natural language testing compared to hand-writing tests?

Writing a natural language description takes seconds, and generating the Playwright code from that description takes seconds more. Compare this to the typical 15 to 30 minutes required to hand-write a well-structured Playwright test with proper locators, assertions, and error handling. Teams report around an 80% reduction in time spent creating new tests.

Q: Can product managers and non-engineers really write tests this way?

Yes. Product managers, designers, and QA analysts can author test specifications directly because they do not need to learn Playwright's API, TypeScript syntax, or CSS selectors. This shifts test creation left, allowing the people closest to the requirements to define what needs to be verified, while engineers focus on the complex edge cases that genuinely require programming expertise.

By Pavel Borji·February 10, 2026·Founder @ Assrt

How teams are using plain English to generate, run, and maintain end-to-end tests. From concept to implementation with real-world examples and generated Playwright code.

Less

“In our experience, combining natural language authoring with AI-powered self-healing meaningfully reduces test maintenance effort.”

1. The Evolution of Test Automation

Test automation has gone through four distinct generations over the past two decades. Each generation solved problems from the previous one while introducing new tradeoffs. Understanding this progression helps explain why natural language testing is the logical next step.

Generation 1: Manual Testing

The earliest approach to QA was entirely manual. Testers followed written test plans, clicked through the application step by step, and recorded results in spreadsheets. This approach was thorough but could not scale. A complex application might require days of manual regression testing before each release, creating a bottleneck that slowed the entire development team.

Generation 2: Scripted Automation

Selenium arrived in 2004 and introduced the concept of programmatic browser control. Engineers could write scripts in Java, Python, or JavaScript that automated the same clicks and assertions a manual tester would perform. This was a massive improvement in speed and repeatability, but it required significant programming expertise. The scripts were brittle, breaking whenever the UI changed, and only engineers could write or maintain them.

Generation 3: Record and Playback

Tools like Selenium IDE, Cypress Studio, and Playwright Codegen attempted to lower the barrier by recording user interactions and converting them to test scripts. While this made test creation faster, the generated code was often difficult to maintain. Recorded tests used fragile selectors, lacked proper assertions, and produced code that was hard to read or refactor. Most teams treated recorded tests as a starting point and rewrote significant portions by hand.

Generation 4: AI and Natural Language

The current generation combines large language models with application context to enable a fundamentally different approach. Instead of writing code or recording interactions, you describe what you want to test in plain English. The AI understands your intent, maps it to the actual application structure, and generates production-quality test code. This is not a gimmick or a demo toy; teams are using this approach in production today to cut test authoring time by 80% or more.

2. What Is Natural Language Testing?

Natural language testing is the practice of writing test specifications in plain English (or any human language) and using AI to transform those specifications into executable test code. The core idea is that the person who best understands what needs to be tested (a product manager, a QA analyst, a designer) should be able to express that understanding directly, without learning a programming language or a testing framework's API.

Consider this example. A product manager writes: "Verify that a new user can sign up with an email address, receive a confirmation email, click the verification link, and land on the onboarding screen." In a traditional workflow, this requirement would be handed to an engineer who would spend an hour or more translating it into Playwright or Cypress code. With natural language testing, the AI reads that sentence, understands the intent, analyzes the application's UI to find the relevant elements, and generates the complete test.

The generated output is standard Playwright TypeScript. It is not a proprietary format locked into a specific vendor. You can review the code, edit it, commit it to your repository, and run it in any CI system. The AI accelerates the authoring phase while keeping you in full control of the final test.

This approach bridges the gap between business stakeholders and QA teams. Product managers can participate directly in test creation. QA engineers spend less time writing boilerplate and more time designing meaningful test strategies. Engineers focus on the complex edge cases that genuinely require programming expertise, rather than spending their time translating requirements into code.

Try Assrt for free

Enter your email to access the dashboard. No credit card required.

3. How NLP Test Generation Works

Translating a natural language description into executable test code requires several steps. Modern NLP testing tools chain these together in a pipeline that runs in seconds.

Step 1: Intent Parsing

The system parses the English description to extract the user's intent: what actions should be performed, in what order, and what outcomes should be verified. For example, "log in with email test@example.com and password secret123" is parsed into three actions: navigate to the login page, fill the email field, fill the password field, and click the submit button.

Step 2: Application Context Analysis

The AI needs to know what your application looks like to generate accurate selectors. Tools like Assrt use a discovery phase to crawl the application and build a map of pages, forms, buttons, and navigation paths. This context is fed to the language model along with the parsed intent so the generated code targets real elements in your actual UI.

Step 3: Selector Generation

Using the application context, the system chooses the most resilient selector for each element. It follows the same locator hierarchy recommended by Playwright: getByRole first, then getByText, getByLabel, and getByTestId as a fallback. Because the AI understands the semantic structure of the page, it avoids brittle CSS selectors automatically.

Step 4: Assertion Inference

The final step is determining what to assert. When the natural language description says "the user should see a success message," the AI maps this to an appropriate Playwright assertion like toBeVisible() or toContainText(). It uses auto-retrying assertions by default, ensuring the generated tests are resilient to timing variations.

// Natural language input:
// "Add 'Wireless Headphones' to the cart and verify the cart count shows 1"

// Generated Playwright code:
import { test, expect } from '@playwright/test';

test('add wireless headphones to cart', async ({ page }) => {
  await page.goto('/shop');

  const productCard = page
    .getByRole('article')
    .filter({ hasText: 'Wireless Headphones' });

  await productCard.getByRole('button', { name: 'Add to cart' }).click();

  await expect(page.getByTestId('cart-count')).toHaveText('1');
});

4. Benefits Over Traditional Scripting

Natural language testing offers several concrete advantages over writing tests by hand. These are not theoretical; they are measurable improvements that teams report after adopting NLP-based tools.

Accessibility for Non-Engineers

Product managers, designers, and QA analysts can author test specifications directly. They do not need to learn Playwright's API, TypeScript syntax, or CSS selectors. This shifts test creation left, allowing the people closest to the requirements to define what needs to be verified. Engineering time is freed up for tasks that genuinely require programming expertise.

Faster Authoring

Writing a natural language description takes seconds. Generating the test code from that description takes seconds more. Compare this to the typical 15 to 30 minutes required to hand-write a well-structured Playwright test with proper locators, assertions, and error handling. Teams report an 80% reduction in time spent creating new tests.

Improved Maintainability

When tests are described in natural language, the intent is always clear. Six months from now, reading "verify that a user can reset their password via email" is far more informative than parsing through 40 lines of Playwright code. When the UI changes, regenerating the test from the same natural language description produces updated code that targets the new elements.

Living Documentation

The natural language test descriptions serve as up-to-date documentation of your application's expected behavior. Unlike separate requirements documents that drift out of sync with the code, these descriptions are tied directly to executable tests. If the description says the feature works a certain way and the test passes, you know the documentation is accurate.

5. Current Tools and Approaches

Several tools in the market offer some form of natural language or AI-assisted test generation. Each takes a different approach with distinct tradeoffs. Understanding these differences will help you choose the right tool for your team.

Assrt

Assrt is an open-source, local-first framework that generates standard Playwright tests from natural language descriptions. Its key differentiators are transparency and ownership. All AI inference runs locally or through your chosen LLM provider. The generated output is plain TypeScript files committed to your repository. There is no vendor lock-in, no proprietary format, and no runtime AI dependency. Assrt also includes auto-discovery (it crawls your app to build context) and self-healing (it detects broken selectors and submits fix PRs automatically).

Momentic

Momentic provides a cloud-based platform where you describe tests in natural language and the service generates and runs them. The approach is convenient for teams that want a managed solution, but the tradeoff is vendor lock-in. Tests run on Momentic's infrastructure, and the test logic lives in their proprietary format. If you decide to leave, you need to rewrite your tests.

Mabl

Mabl combines record-and-playback with AI-powered healing. While it does not offer full natural language authoring in the same way, its AI layer can automatically adapt tests when the UI changes. Mabl is a SaaS platform with per-seat pricing, making it better suited for larger organizations with dedicated QA budgets. The tests are stored in Mabl's cloud, and export options are limited.

Choosing Between Approaches

If you value transparency, portability, and the ability to run everything locally, Assrt is the strongest option. If you prefer a fully managed platform and are comfortable with SaaS lock-in, Momentic or Mabl may suit your workflow. The critical question to ask is: "Can I take my tests with me if I leave this tool?" If the answer is no, factor that risk into your evaluation.

6. Real-World Examples

The best way to understand NLP test generation is to see it in action. Below are four real-world examples showing the natural language input and the Playwright code that Assrt generates from it.

Example 1: User Registration Flow

// Natural language input:
// "Register a new account with name 'Jane Doe', email 'jane@test.com',
//  and password 'Str0ngP@ss'. Verify the welcome screen appears
//  with the user's first name."

// Generated Playwright code:
import { test, expect } from '@playwright/test';

test('register new account and see welcome screen', async ({ page }) => {
  await page.goto('/register');

  await page.getByLabel('Full name').fill('Jane Doe');
  await page.getByLabel('Email address').fill('jane@test.com');
  await page.getByLabel('Password').fill('Str0ngP@ss');
  await page.getByLabel('Confirm password').fill('Str0ngP@ss');

  await page.getByRole('button', { name: 'Create account' }).click();

  await expect(page).toHaveURL(/\/welcome/);
  await expect(
    page.getByRole('heading', { name: /welcome.*jane/i })
  ).toBeVisible();
});

Example 2: E-Commerce Search and Filter

// Natural language input:
// "Search for 'running shoes', filter by price under $100,
//  and verify that all displayed products show prices below $100."

// Generated Playwright code:
import { test, expect } from '@playwright/test';

test('search and filter products by price', async ({ page }) => {
  await page.goto('/shop');

  await page.getByRole('searchbox', { name: 'Search products' })
    .fill('running shoes');
  await page.getByRole('searchbox', { name: 'Search products' })
    .press('Enter');

  // Apply price filter
  await page.getByRole('button', { name: 'Price' }).click();
  await page.getByLabel('Max price').fill('100');
  await page.getByRole('button', { name: 'Apply filters' }).click();

  // Verify all visible prices are under $100
  const prices = page.locator('[data-testid="product-price"]');
  const count = await prices.count();
  expect(count).toBeGreaterThan(0);

  for (let i = 0; i < count; i++) {
    const text = await prices.nth(i).textContent();
    const value = parseFloat(text!.replace('$', ''));
    expect(value).toBeLessThan(100);
  }
});

Example 3: Multi-Step Form Wizard

// Natural language input:
// "Complete the three-step onboarding wizard: select the 'Startup' plan,
//  enter company name 'Acme Corp', invite team member 'bob@acme.com',
//  and verify the dashboard loads with company name visible."

// Generated Playwright code:
import { test, expect } from '@playwright/test';

test('complete onboarding wizard', async ({ page }) => {
  await page.goto('/onboarding');

  // Step 1: Select plan
  await page.getByRole('radio', { name: 'Startup' }).check();
  await page.getByRole('button', { name: 'Continue' }).click();

  // Step 2: Company details
  await page.getByLabel('Company name').fill('Acme Corp');
  await page.getByRole('button', { name: 'Continue' }).click();

  // Step 3: Invite team
  await page.getByLabel('Email address').fill('bob@acme.com');
  await page.getByRole('button', { name: 'Send invite' }).click();
  await page.getByRole('button', { name: 'Finish setup' }).click();

  // Verify dashboard
  await expect(page).toHaveURL(/\/dashboard/);
  await expect(page.getByText('Acme Corp')).toBeVisible();
});

Example 4: Error Handling and Validation

// Natural language input:
// "Try to submit the contact form with an invalid email address.
//  Verify that an error message appears and the form is not submitted."

// Generated Playwright code:
import { test, expect } from '@playwright/test';

test('contact form shows validation error for invalid email', async ({ page }) => {
  await page.goto('/contact');

  await page.getByLabel('Name').fill('Test User');
  await page.getByLabel('Email').fill('not-an-email');
  await page.getByLabel('Message').fill('Hello, this is a test.');

  await page.getByRole('button', { name: 'Send message' }).click();

  // Verify error state
  await expect(page.getByText('Please enter a valid email')).toBeVisible();
  // Verify we're still on the contact page (form was not submitted)
  await expect(page).toHaveURL(/\/contact/);
});

7. Limitations and When Not to Use NLP Testing

Natural language testing is powerful, but it is not a silver bullet. There are scenarios where traditional scripted tests remain the better choice. Being honest about these limitations will help you adopt the technology effectively rather than over-applying it and getting frustrated.

Complex Conditional Logic

Tests that require intricate branching logic, loops over dynamic data sets, or complex calculations are difficult to express clearly in natural language. For example, "iterate over all items in the cart, verify each price matches the catalog, compute the subtotal, apply a 15% discount if the user has a promo code, and compare the final total to the displayed amount" is better expressed as code where each step is explicit and debuggable.

Performance Testing

NLP testing tools are designed for functional end-to-end tests. They are not suitable for performance benchmarks, load testing, or stress testing. These scenarios require specialized tools like k6, Artillery, or Lighthouse that measure response times, throughput, and resource utilization under controlled conditions.

Highly Dynamic UIs

Applications with heavily randomized content (such as games, generative art tools, or real-time collaboration canvases) can be challenging for NLP test generation because the expected state is not easily described in a static sentence. The AI needs a relatively stable target to generate meaningful assertions.

Edge Cases and Security Testing

Security-focused tests (XSS injection, SQL injection, CSRF token validation) require specialized knowledge and precise payloads that are better handled by dedicated security testing tools like OWASP ZAP or Burp Suite. Similarly, obscure edge cases that require deep domain knowledge may need an engineer to craft the exact scenario.

The Right Balance

The most effective teams use natural language testing for the 70 to 80% of tests that cover standard user flows, form submissions, navigation paths, and CRUD operations. They reserve hand-written tests for the remaining 20 to 30% that involve complex logic, performance constraints, or security requirements. This hybrid approach maximizes productivity without sacrificing coverage or precision.

8. Getting Started with Assrt

Assrt makes it straightforward to go from zero to natural language testing in under ten minutes. Follow these steps to install the tool, discover your application's structure, and generate your first tests from plain English descriptions.

Step 1: Install Assrt

# Install globally via npm
npm install @assrt/sdk

# Or add to your project as a dev dependency
npm install --save-dev @assrt/sdk

# Verify the installation
assrt --version

Step 2: Discover Your Application

Point Assrt at your running application to build a map of all testable flows. The discovery process crawls your app, identifies pages, forms, navigation paths, and interactive elements, then generates a structured context file.

# Start your app (e.g., on localhost:3000)
npm run dev

# Run discovery in a separate terminal
assrt discover http://localhost:3000

# Output:
# Discovered 47 test scenarios across 12 pages
# - /login: email login, social login (Google, GitHub)
# - /register: multi-step registration
# - /dashboard: navigation, widgets, settings link
# - /shop: search, filter, add to cart, checkout
# - ... and 8 more pages
#
# Context saved to .assrt/context.json

Step 3: Generate Tests from English

With the context file in place, you can generate tests by writing plain English descriptions. Assrt uses the application context to produce accurate, well-structured Playwright code.

# Generate a single test from a description
assrt generate "Log in with email user@test.com and password test123, then verify the dashboard shows a welcome message"

# Generate multiple tests from a description file
cat > tests.txt << 'EOF'
Verify that searching for "laptop" shows at least 5 results
Add the first search result to the cart and verify cart count
Complete checkout with test credit card and verify confirmation
EOF

assrt generate --from tests.txt --output tests/e2e/

# Output:
# Generated 3 test files:
# - tests/e2e/search-results.spec.ts
# - tests/e2e/add-to-cart.spec.ts
# - tests/e2e/checkout-flow.spec.ts

Step 4: Run and Iterate

The generated tests are standard Playwright files. Run them with the regular Playwright test runner, inspect the results, and refine your descriptions if needed.

# Run the generated tests
npx playwright test tests/e2e/

# Run with the HTML reporter for detailed results
npx playwright test tests/e2e/ --reporter=html

# If a test needs adjustment, edit the description and regenerate
assrt generate "Log in with email user@test.com and password test123, then verify the dashboard heading contains the user's email address"

# Enable self-healing for ongoing maintenance
assrt heal --watch

Once your tests are generated and passing, commit them to your repository like any other test file. Assrt self-healing will monitor for broken selectors and submit fix PRs automatically. The tests remain plain TypeScript, so your team can edit them by hand whenever needed. There is no lock-in; Assrt is a development tool, not a runtime dependency.

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

View on GitHub