How do you test LaunchDarkly feature flags with Playwright?

Use the LaunchDarkly REST API to set specific flag variations for your test user context before each test, then wait for the SDK to initialize in your app using a ready signal like window.__LD_READY__. Assert on the flag-dependent UI after initialization completes. For fully deterministic tests, intercept the LD SDK's network requests with Playwright's route API and return controlled flag values.

How do you avoid flaky LaunchDarkly flag tests?

The most common cause of flaky flag tests is asserting on the DOM before the LaunchDarkly SDK finishes initializing. Always wait for the SDK's ready event before checking flag-dependent UI. Use a dedicated test environment to prevent other developers from changing flag values mid-test. Use individual user targeting instead of percentage rollouts for deterministic results. Clean up individual targets in afterEach hooks to prevent cross-test contamination.

What is the LaunchDarkly TestData source and when should you use it?

The TestData source is an in-process flag evaluation engine provided by the LaunchDarkly SDK. Instead of connecting to LaunchDarkly's servers, it evaluates flags against data you define in your test setup code. Use it for unit tests and integration tests where you need fast, deterministic flag evaluation without network calls. For end-to-end tests, combine it with Playwright's route interception to mock the SDK's initial flag fetch.

How do you test real-time flag changes with LaunchDarkly streaming?

Load your app and wait for the SDK to initialize with the initial flag values. Then change the flag via the LaunchDarkly REST API while the page is open. The SDK's SSE streaming connection receives the update and triggers a re-render. Use Playwright's expect with a timeout to wait for the UI to update. Verify no page reload occurred by checking the navigation type. Allow up to 15 seconds for the streaming update to propagate.

Feature Flag Testing Guide

How to Test LaunchDarkly Flags with Playwright: Complete 2026 Guide

A scenario-by-scenario walkthrough of testing LaunchDarkly feature flags with Playwright. Client-side evaluation, targeting rules, boolean and multivariate variations, streaming vs polling, the test data API, and the pitfalls that break real feature flag test suites.

10B+

“LaunchDarkly evaluates over ten billion feature flags per day across thousands of customers, making it the most widely deployed feature management platform in production.”

0Flag scenarios covered

0B+Daily flag evaluations

0Evaluation modes tested

0%Fewer lines with Assrt

LaunchDarkly Client-Side Flag Evaluation Flow

1. Why Testing LaunchDarkly Flags Is Harder Than It Looks

Feature flags look simple on the surface: a boolean that toggles a code path. In practice, LaunchDarkly flags introduce several layers of indirection that make end-to-end testing genuinely difficult. The SDK evaluates flags client-side using a context object (user key, email, custom attributes), and the evaluation result depends on targeting rules, percentage rollouts, and prerequisite flags configured in the LaunchDarkly dashboard. Your test needs to control all of those inputs to get deterministic output.

The first structural challenge is timing. The LaunchDarkly JavaScript SDK initializes asynchronously. When your React app mounts, it calls LDClient.initialize()with a client-side ID and a context object. The SDK fetches the evaluated flag values from LaunchDarkly's edge CDN, and only then does your app know which variation to render. If your Playwright test checks the DOM before the SDK has initialized, you will see the default (fallback) value, not the actual flag value. This race condition is the most common source of flaky flag tests.

The second challenge is evaluation context. LaunchDarkly evaluates flags per context: a combination of user key, anonymous flag, custom attributes like plan or country, and multi-context kinds (user, organization, device). If your test does not control the exact context the SDK sends, you cannot predict which variation the flag will return. Percentage rollouts hash the context key to assign a bucket, so even a different test user email changes the outcome.

Third, streaming adds real-time complexity. In streaming mode (the default for browser SDKs), the SDK opens a Server-Sent Events connection to LaunchDarkly. When someone changes a flag in the dashboard mid-test, the SDK receives the update and triggers a re-render. This is powerful for production, but in tests it means your flag values can change underneath you if another developer modifies the test environment. Polling mode avoids the real-time updates but introduces its own timing issues because flag changes only take effect after the next poll interval.

Fourth, the LaunchDarkly test data source (the recommended approach for unit and integration tests) works differently from the production SDK initialization path. Tests that pass with the test data source may fail in production because of subtle differences in initialization timing, event flushing, and context evaluation. Your end-to-end tests need to cover both the happy path with real SDK initialization and the controlled path with test data.

LaunchDarkly SDK Initialization Flow

🌐

App Mounts

React component renders

⚙️

SDK Init

LDClient.initialize()

↪️

Fetch Flags

GET from LD edge CDN

✅

Evaluate

Client-side evaluation

🔔

Open Stream

SSE for real-time updates

🌐

Render

UI shows correct variation

Flag Evaluation Decision Tree

🔒

Context

User key + attributes

✅

Prerequisites

Check prerequisite flags

⚙️

Individual Targets

Exact user key match

↪️

Targeting Rules

Attribute-based rules

🔔

Percentage Rollout

Hash-based bucketing

❌

Default

Fallback variation

A thorough LaunchDarkly test suite must account for all of these surfaces. The sections below walk through each scenario with runnable Playwright TypeScript you can paste directly into your project.

2. Setting Up a Reliable Test Environment

Before writing any test scenarios, configure an isolated LaunchDarkly environment for testing. LaunchDarkly supports multiple environments per project (Development, Staging, Production). Create a dedicated test environment that your CI pipeline uses exclusively. This prevents developers from accidentally changing flag values that your tests depend on.

LaunchDarkly Test Environment Setup Checklist

Create a dedicated 'test' environment in your LaunchDarkly project
Note the client-side ID and SDK key for the test environment
Create an API access token with writer role for the test environment only
Configure all feature flags with explicit default values in the test environment
Disable percentage rollouts in the test environment (use individual targeting instead)
Set up a service account context key for deterministic evaluation
Install the LaunchDarkly Node.js server SDK for test setup scripts
Add test environment variables to your CI secrets

Environment Variables

.env.test

REST API Helper for Flag State Management

The LaunchDarkly REST API lets you programmatically toggle flags, update targeting rules, and add individual targets before each test. This is essential for deterministic test setup. Use semantic patch operations to modify flag configurations without overwriting the entire flag definition.

test/helpers/ld-api.ts

Playwright Configuration for LaunchDarkly

LaunchDarkly SDK initialization is asynchronous, so your Playwright tests need to wait for the SDK to be ready before asserting on flag-dependent UI. The most reliable pattern is to expose a global ready signal from your app and wait for it in your test setup.

playwright.config.ts

Installing dependencies

3. Scenario: Boolean Flag Controls a UI Component

The most common flag type is a simple boolean that shows or hides a UI component. For example, a new-dashboard flag that controls whether users see the redesigned dashboard or the legacy version. The test must verify both variations render correctly and that the toggle is responsive to flag changes.

Boolean Flag Controls a UI Component

Straightforward

Goal

Configure a boolean flag to true for a specific test user, load the app, and verify the flag-gated component renders. Then toggle the flag to false and verify the component disappears.

Preconditions

App running at APP_BASE_URL with LD SDK initialized
Feature flag new-dashboard exists in the test environment
Test user key is deterministic and known

Playwright Implementation

boolean-flag.spec.ts

What to Assert Beyond the UI

Verify the LD SDK sent an analytics event for the flag evaluation by intercepting network requests to events.launchdarkly.com
Confirm the evaluation reason in the event payload matches your expected targeting rule
Check that no console errors related to LD initialization appear during the test

Boolean Flag: Playwright vs Assrt

import { test, expect } from '@playwright/test';
import { addUserTarget } from '../helpers/ld-api';

test('new dashboard visible when flag on', async ({ page }) => {
  await addUserTarget('new-dashboard', 0, 'e2e-test-user-001');
  await page.goto('/dashboard');
  await page.waitForFunction(
    () => (window as any).__LD_READY__ === true
  );
  await expect(
    page.getByTestId('new-dashboard-panel')
  ).toBeVisible();
});

55% fewer lines

4. Scenario: Multivariate Flag with String Variations

Not all flags are simple on/off toggles. LaunchDarkly supports multivariate flags that return strings, numbers, or JSON objects. A common pattern is a string flag that selects between multiple UI themes or checkout flow versions: "control", "variant-a", and "variant-b". Testing these requires iterating over every variation and verifying each one renders the correct UI.

Multivariate Flag with String Variations

Moderate

Goal

Test a multivariate string flag checkout-flow-version that has three variations. Verify each variation renders the correct checkout experience.

Preconditions

Flag checkout-flow-version has three variations: "control", "variant-a", "variant-b"
Each variation maps to a distinct UI layout
Test user has no conflicting individual targeting from previous runs

Playwright Implementation

multivariate-flag.spec.ts

What to Assert Beyond the UI

Intercept the LD analytics event and verify the value field matches the expected variation string
Confirm the variationIndex in the event payload matches what you set via the API
Check that analytics and conversion tracking fire correctly for each variation (this validates your experiment setup)

Try Assrt for free

Enter your email to access the dashboard. No credit card required.

5. Scenario: Targeting Rules and User Segments

Targeting rules are where LaunchDarkly flags get complex. A single flag can have multiple rules that evaluate in order: "if user is in segment 'beta-testers' serve variation 1, else if user attribute 'plan' equals 'enterprise' serve variation 2, else serve the default." Testing this requires creating contexts with specific attributes and verifying each rule fires correctly.

Targeting Rules and User Segments

Complex

Goal

Verify that a flag with multiple targeting rules serves the correct variation based on user attributes. Test the rule evaluation order and confirm that the first matching rule wins.

Playwright Implementation

targeting-rules.spec.ts

What to Assert Beyond the UI

Intercept the LD event stream and verify the reason field shows RULE_MATCH with the correct ruleIndex
For the free user, confirm the reason is FALLTHROUGH, not OFF (which would indicate the flag is globally disabled)
Verify that attribute-based rules evaluate in the documented order by testing a user who matches multiple rules

Targeting Rules: Playwright vs Assrt

import { test, expect } from '@playwright/test';

test('enterprise user gets premium features', async ({ page }) => {
  await page.goto('/login');
  await page.getByLabel('Email').fill('enterprise@yourapp.com');
  await page.getByLabel('Password').fill('TestPass123!');
  await page.getByRole('button', { name: /sign in/i }).click();
  await page.waitForURL('/dashboard');
  await page.waitForFunction(
    () => (window as any).__LD_READY__ === true
  );
  await expect(page.getByTestId('premium-analytics')).toBeVisible();
  await expect(page.getByTestId('premium-exports')).toBeVisible();
  await expect(page.getByTestId('premium-api-access')).toBeVisible();
});

64% fewer lines

6. Scenario: Real-Time Flag Changes via Streaming

One of LaunchDarkly's most powerful features is real-time flag updates via Server-Sent Events (SSE). When a flag changes in the dashboard, the browser SDK receives the update within milliseconds and triggers a re-render. Testing this behavior requires changing a flag value while the page is loaded and verifying the UI updates without a page refresh. This is also the scenario most likely to introduce flakiness if not handled carefully.

Real-Time Flag Changes via Streaming

Complex

Goal

Load the app with a flag set to one variation, then change the flag via the LaunchDarkly API while the page is open. Verify the UI updates in real time without a page reload.

Playwright Implementation

streaming-update.spec.ts

What to Assert Beyond the UI

Confirm the SSE connection to clientstream.launchdarkly.com remains open throughout the test
Verify no full page reload occurred (check performance.navigation.type is still 0)
Measure the latency between the API call and the UI update to catch regressions in your app's flag change handler

Streaming update test output

7. Scenario: Using the Test Data Source for Deterministic Tests

LaunchDarkly provides a TestDatasource that lets you create a fully local, in-process flag evaluation engine. Instead of connecting to LaunchDarkly's servers, the SDK evaluates flags against data you define in your test setup. This eliminates network latency, rate limits, and cross-test interference. It is the recommended approach for integration tests and can also be used for end-to-end tests when you control the app's initialization.

Test Data Source for Deterministic Tests

Moderate

Goal

Replace the real LaunchDarkly SDK connection with the TestData source. Define flag values programmatically and verify evaluation behavior without any network calls.

Server-Side Test Data Setup

test/helpers/ld-test-data.ts

Playwright Integration with Test Data

test-data-e2e.spec.ts

What to Assert Beyond the UI

When using route interception, verify the SDK does not fall back to cached values from a previous test run
Confirm that the bootstrapped flag values match what the production SDK would return for the same context
Check that event tracking still works correctly (or is properly disabled) when using test data

Test Data Source: Playwright vs Assrt

import { test, expect } from '@playwright/test';

test('override flags via route interception', async ({ page }) => {
  await page.route('**/sdk/evalx/**', async (route) => {
    await route.fulfill({
      status: 200,
      contentType: 'application/json',
      body: JSON.stringify({
        'new-dashboard': {
          value: true, variation: 0, version: 100,
          reason: { kind: 'FALLTHROUGH' },
        },
      }),
    });
  });
  await page.goto('/dashboard');
  await page.waitForFunction(
    () => (window as any).__LD_READY__ === true
  );
  await expect(page.getByTestId('new-dashboard-panel')).toBeVisible();
});

78% fewer lines

8. Scenario: Testing Fallback Behavior When LD Is Unavailable

What happens when LaunchDarkly is down? Your app should degrade gracefully, falling back to default flag values. This is a critical scenario that most teams skip. The LaunchDarkly SDK caches flag values in localStorage (browser) or in memory (server-side), so a temporary outage should not break your app. But you need to verify that the fallback values are correct and that the app renders a reasonable experience.

Fallback Behavior When LD Is Unavailable

Moderate

Goal

Block all LaunchDarkly network requests and verify the app uses the fallback values defined in your code. Confirm no crashes, no blank screens, and no broken layouts.

Playwright Implementation

fallback-behavior.spec.ts

What to Assert Beyond the UI

Verify localStorage contains the cached LD flag data after the first successful load
Confirm the SDK emits an error event (not a crash) when the connection fails
Measure time-to-render when using cached values versus fresh evaluation to ensure no performance regression

Fallback behavior test output

9. Common Pitfalls That Break Feature Flag Test Suites

These are real problems sourced from LaunchDarkly community forums, GitHub issues, and Stack Overflow threads. Each one has broken production test suites and wasted hours of debugging time.

Feature Flag Testing Anti-Patterns

Asserting on flag-dependent UI before the SDK initializes. The SDK returns fallback values synchronously before the async initialization completes. Always wait for the ready event.
Sharing a LaunchDarkly environment between CI and developers. When a developer changes a flag in the shared environment, every CI test that depends on that flag breaks. Use a dedicated test environment.
Testing percentage rollouts in e2e tests. Rollouts are deterministic per context key but non-obvious. A different test user key can land in a different bucket. Use individual targeting for deterministic test results.
Forgetting to clean up individual targets after tests. If test A targets a user to variation 1 and test B expects the default, test B will fail because the individual target persists. Add cleanup in afterEach hooks.
Ignoring the bootstrap vs streaming race. When you use bootstrap data and streaming mode simultaneously, the SDK may briefly show the bootstrap value then switch to the streamed value. This causes flaky visual tests.
Not testing the flag OFF state. Most teams test only the ON state. When someone turns off a flag in production, the app should degrade gracefully. Always test both ON and OFF for every flag.
Hardcoding variation indices instead of using semantic patch operations. Variation indices can shift when someone adds or reorders variations in the dashboard. Use the LD API's semantic patch format instead.
Running parallel tests that modify the same flag. Two tests changing the same flag simultaneously create a race condition. Use unique flags per test or serialize flag-dependent tests.

The Initialization Race Condition in Detail

The most insidious pitfall is the initialization timing issue. The LaunchDarkly JavaScript SDK exposes a ready event and a waitForInitialization() promise. If your React app renders before the SDK is ready, all flag evaluations return fallback values. The fix is to gate your app's render on the SDK's ready event. In your app code, expose a global signal that your tests can wait for.

app/providers/ld-provider.tsx

10. Writing These Scenarios in Plain English with Assrt

The Playwright tests above work, but they require deep knowledge of the LaunchDarkly SDK, API, and evaluation model. Every team member who reads or maintains these tests needs to understand route interception, SSE streams, context evaluation, and the LD REST API's semantic patch format. Assrt lets you express the same test intent in plain English.

Here is the real-time streaming scenario (Section 6) compiled into an Assrt file. Assrt handles the LaunchDarkly API calls, SDK initialization wait, and UI assertions behind the scenes.

streaming-flag-update.assrt

Compare the 60+ lines of Playwright code for the streaming test (including the API helper, route interception, SSE monitoring, and navigation checks) with the 16 lines of Assrt above. The test intent is identical. The difference is that Assrt encapsulates the LaunchDarkly integration details so your test reads like a specification, not an implementation.

For the targeting rules scenario (Section 5), the Assrt version eliminates the login ceremony, the SDK wait boilerplate, and the manual attribute matching. You declare which user should see which features, and Assrt handles the rest.

targeting-rules.assrt

Each of these Assrt files can run against your real app, against a staging environment, or against a local dev server. Assrt translates the plain-English steps into the same Playwright API calls, LD REST API operations, and SDK initialization waits that you would write by hand. The advantage is that anyone on your team can read, write, and review these tests without being a LaunchDarkly SDK expert.

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

View on GitHub

1. Why Testing LaunchDarkly Flags Is Harder Than It Looks

2. Setting Up a Reliable Test Environment

Environment Variables

REST API Helper for Flag State Management

Playwright Configuration for LaunchDarkly

3. Scenario: Boolean Flag Controls a UI Component

Boolean Flag Controls a UI Component

Goal

Preconditions

Playwright Implementation

What to Assert Beyond the UI

4. Scenario: Multivariate Flag with String Variations

Multivariate Flag with String Variations

Goal

Preconditions

Playwright Implementation

What to Assert Beyond the UI

5. Scenario: Targeting Rules and User Segments

Targeting Rules and User Segments

Goal

Playwright Implementation

What to Assert Beyond the UI

6. Scenario: Real-Time Flag Changes via Streaming

Real-Time Flag Changes via Streaming

Goal

Playwright Implementation

What to Assert Beyond the UI

7. Scenario: Using the Test Data Source for Deterministic Tests

Test Data Source for Deterministic Tests

Goal

Server-Side Test Data Setup

Playwright Integration with Test Data

What to Assert Beyond the UI

8. Scenario: Testing Fallback Behavior When LD Is Unavailable

Fallback Behavior When LD Is Unavailable

Goal

Playwright Implementation

What to Assert Beyond the UI

9. Common Pitfalls That Break Feature Flag Test Suites

The Initialization Race Condition in Detail

10. Writing These Scenarios in Plain English with Assrt

Related Guides

How to Fix Flaky Tests

Debugging Playwright Tests

Playwright Testing Best Practices

Ready to automate your testing?

Comments (••)

Comments ()