Feature Flag Testing Guide

How to Test LaunchDarkly Flags with Playwright: Complete 2026 Guide

A scenario-by-scenario walkthrough of testing LaunchDarkly feature flags with Playwright. Client-side evaluation, targeting rules, boolean and multivariate variations, streaming vs polling, the test data API, and the pitfalls that break real feature flag test suites.

10B+

LaunchDarkly evaluates over ten billion feature flags per day across thousands of customers, making it the most widely deployed feature management platform in production.

0Flag scenarios covered
0B+Daily flag evaluations
0Evaluation modes tested
0%Fewer lines with Assrt

LaunchDarkly Client-Side Flag Evaluation Flow

BrowserYour AppLD SDKLD Edge CDNLD Streaming APIPage load / navigateInitialize LD client with contextGET flag values for contextReturn evaluated flagsOpen SSE stream (streaming mode)Push flag change eventTrigger re-render with new valuesUI updates reflect new flag state

1. Why Testing LaunchDarkly Flags Is Harder Than It Looks

Feature flags look simple on the surface: a boolean that toggles a code path. In practice, LaunchDarkly flags introduce several layers of indirection that make end-to-end testing genuinely difficult. The SDK evaluates flags client-side using a context object (user key, email, custom attributes), and the evaluation result depends on targeting rules, percentage rollouts, and prerequisite flags configured in the LaunchDarkly dashboard. Your test needs to control all of those inputs to get deterministic output.

The first structural challenge is timing. The LaunchDarkly JavaScript SDK initializes asynchronously. When your React app mounts, it calls LDClient.initialize()with a client-side ID and a context object. The SDK fetches the evaluated flag values from LaunchDarkly's edge CDN, and only then does your app know which variation to render. If your Playwright test checks the DOM before the SDK has initialized, you will see the default (fallback) value, not the actual flag value. This race condition is the most common source of flaky flag tests.

The second challenge is evaluation context. LaunchDarkly evaluates flags per context: a combination of user key, anonymous flag, custom attributes like plan or country, and multi-context kinds (user, organization, device). If your test does not control the exact context the SDK sends, you cannot predict which variation the flag will return. Percentage rollouts hash the context key to assign a bucket, so even a different test user email changes the outcome.

Third, streaming adds real-time complexity. In streaming mode (the default for browser SDKs), the SDK opens a Server-Sent Events connection to LaunchDarkly. When someone changes a flag in the dashboard mid-test, the SDK receives the update and triggers a re-render. This is powerful for production, but in tests it means your flag values can change underneath you if another developer modifies the test environment. Polling mode avoids the real-time updates but introduces its own timing issues because flag changes only take effect after the next poll interval.

Fourth, the LaunchDarkly test data source (the recommended approach for unit and integration tests) works differently from the production SDK initialization path. Tests that pass with the test data source may fail in production because of subtle differences in initialization timing, event flushing, and context evaluation. Your end-to-end tests need to cover both the happy path with real SDK initialization and the controlled path with test data.

LaunchDarkly SDK Initialization Flow

🌐

App Mounts

React component renders

⚙️

SDK Init

LDClient.initialize()

↪️

Fetch Flags

GET from LD edge CDN

Evaluate

Client-side evaluation

🔔

Open Stream

SSE for real-time updates

🌐

Render

UI shows correct variation

Flag Evaluation Decision Tree

🔒

Context

User key + attributes

Prerequisites

Check prerequisite flags

⚙️

Individual Targets

Exact user key match

↪️

Targeting Rules

Attribute-based rules

🔔

Percentage Rollout

Hash-based bucketing

Default

Fallback variation

A thorough LaunchDarkly test suite must account for all of these surfaces. The sections below walk through each scenario with runnable Playwright TypeScript you can paste directly into your project.

2. Setting Up a Reliable Test Environment

Before writing any test scenarios, configure an isolated LaunchDarkly environment for testing. LaunchDarkly supports multiple environments per project (Development, Staging, Production). Create a dedicated test environment that your CI pipeline uses exclusively. This prevents developers from accidentally changing flag values that your tests depend on.

LaunchDarkly Test Environment Setup Checklist

  • Create a dedicated 'test' environment in your LaunchDarkly project
  • Note the client-side ID and SDK key for the test environment
  • Create an API access token with writer role for the test environment only
  • Configure all feature flags with explicit default values in the test environment
  • Disable percentage rollouts in the test environment (use individual targeting instead)
  • Set up a service account context key for deterministic evaluation
  • Install the LaunchDarkly Node.js server SDK for test setup scripts
  • Add test environment variables to your CI secrets

Environment Variables

.env.test

REST API Helper for Flag State Management

The LaunchDarkly REST API lets you programmatically toggle flags, update targeting rules, and add individual targets before each test. This is essential for deterministic test setup. Use semantic patch operations to modify flag configurations without overwriting the entire flag definition.

test/helpers/ld-api.ts

Playwright Configuration for LaunchDarkly

LaunchDarkly SDK initialization is asynchronous, so your Playwright tests need to wait for the SDK to be ready before asserting on flag-dependent UI. The most reliable pattern is to expose a global ready signal from your app and wait for it in your test setup.

playwright.config.ts
Installing dependencies

3. Scenario: Boolean Flag Controls a UI Component

The most common flag type is a simple boolean that shows or hides a UI component. For example, a new-dashboard flag that controls whether users see the redesigned dashboard or the legacy version. The test must verify both variations render correctly and that the toggle is responsive to flag changes.

1

Boolean Flag Controls a UI Component

Straightforward

Goal

Configure a boolean flag to true for a specific test user, load the app, and verify the flag-gated component renders. Then toggle the flag to false and verify the component disappears.

Preconditions

  • App running at APP_BASE_URL with LD SDK initialized
  • Feature flag new-dashboard exists in the test environment
  • Test user key is deterministic and known

Playwright Implementation

boolean-flag.spec.ts

What to Assert Beyond the UI

  • Verify the LD SDK sent an analytics event for the flag evaluation by intercepting network requests to events.launchdarkly.com
  • Confirm the evaluation reason in the event payload matches your expected targeting rule
  • Check that no console errors related to LD initialization appear during the test

Boolean Flag: Playwright vs Assrt

import { test, expect } from '@playwright/test';
import { addUserTarget } from '../helpers/ld-api';

test('new dashboard visible when flag on', async ({ page }) => {
  await addUserTarget('new-dashboard', 0, 'e2e-test-user-001');
  await page.goto('/dashboard');
  await page.waitForFunction(
    () => (window as any).__LD_READY__ === true
  );
  await expect(
    page.getByTestId('new-dashboard-panel')
  ).toBeVisible();
});
55% fewer lines

4. Scenario: Multivariate Flag with String Variations

Not all flags are simple on/off toggles. LaunchDarkly supports multivariate flags that return strings, numbers, or JSON objects. A common pattern is a string flag that selects between multiple UI themes or checkout flow versions: "control", "variant-a", and "variant-b". Testing these requires iterating over every variation and verifying each one renders the correct UI.

2

Multivariate Flag with String Variations

Moderate

Goal

Test a multivariate string flag checkout-flow-version that has three variations. Verify each variation renders the correct checkout experience.

Preconditions

  • Flag checkout-flow-version has three variations: "control", "variant-a", "variant-b"
  • Each variation maps to a distinct UI layout
  • Test user has no conflicting individual targeting from previous runs

Playwright Implementation

multivariate-flag.spec.ts

What to Assert Beyond the UI

  • Intercept the LD analytics event and verify the value field matches the expected variation string
  • Confirm the variationIndex in the event payload matches what you set via the API
  • Check that analytics and conversion tracking fire correctly for each variation (this validates your experiment setup)

Try Assrt for free

Open-source AI testing framework. No signup required.

Get Started

5. Scenario: Targeting Rules and User Segments

Targeting rules are where LaunchDarkly flags get complex. A single flag can have multiple rules that evaluate in order: "if user is in segment 'beta-testers' serve variation 1, else if user attribute 'plan' equals 'enterprise' serve variation 2, else serve the default." Testing this requires creating contexts with specific attributes and verifying each rule fires correctly.

3

Targeting Rules and User Segments

Complex

Goal

Verify that a flag with multiple targeting rules serves the correct variation based on user attributes. Test the rule evaluation order and confirm that the first matching rule wins.

Playwright Implementation

targeting-rules.spec.ts

What to Assert Beyond the UI

  • Intercept the LD event stream and verify the reason field shows RULE_MATCH with the correct ruleIndex
  • For the free user, confirm the reason is FALLTHROUGH, not OFF (which would indicate the flag is globally disabled)
  • Verify that attribute-based rules evaluate in the documented order by testing a user who matches multiple rules

Targeting Rules: Playwright vs Assrt

import { test, expect } from '@playwright/test';

test('enterprise user gets premium features', async ({ page }) => {
  await page.goto('/login');
  await page.getByLabel('Email').fill('enterprise@yourapp.com');
  await page.getByLabel('Password').fill('TestPass123!');
  await page.getByRole('button', { name: /sign in/i }).click();
  await page.waitForURL('/dashboard');
  await page.waitForFunction(
    () => (window as any).__LD_READY__ === true
  );
  await expect(page.getByTestId('premium-analytics')).toBeVisible();
  await expect(page.getByTestId('premium-exports')).toBeVisible();
  await expect(page.getByTestId('premium-api-access')).toBeVisible();
});
64% fewer lines

6. Scenario: Real-Time Flag Changes via Streaming

One of LaunchDarkly's most powerful features is real-time flag updates via Server-Sent Events (SSE). When a flag changes in the dashboard, the browser SDK receives the update within milliseconds and triggers a re-render. Testing this behavior requires changing a flag value while the page is loaded and verifying the UI updates without a page refresh. This is also the scenario most likely to introduce flakiness if not handled carefully.

4

Real-Time Flag Changes via Streaming

Complex

Goal

Load the app with a flag set to one variation, then change the flag via the LaunchDarkly API while the page is open. Verify the UI updates in real time without a page reload.

Playwright Implementation

streaming-update.spec.ts

What to Assert Beyond the UI

  • Confirm the SSE connection to clientstream.launchdarkly.com remains open throughout the test
  • Verify no full page reload occurred (check performance.navigation.type is still 0)
  • Measure the latency between the API call and the UI update to catch regressions in your app's flag change handler
Streaming update test output

7. Scenario: Using the Test Data Source for Deterministic Tests

LaunchDarkly provides a TestDatasource that lets you create a fully local, in-process flag evaluation engine. Instead of connecting to LaunchDarkly's servers, the SDK evaluates flags against data you define in your test setup. This eliminates network latency, rate limits, and cross-test interference. It is the recommended approach for integration tests and can also be used for end-to-end tests when you control the app's initialization.

5

Test Data Source for Deterministic Tests

Moderate

Goal

Replace the real LaunchDarkly SDK connection with the TestData source. Define flag values programmatically and verify evaluation behavior without any network calls.

Server-Side Test Data Setup

test/helpers/ld-test-data.ts

Playwright Integration with Test Data

test-data-e2e.spec.ts

What to Assert Beyond the UI

  • When using route interception, verify the SDK does not fall back to cached values from a previous test run
  • Confirm that the bootstrapped flag values match what the production SDK would return for the same context
  • Check that event tracking still works correctly (or is properly disabled) when using test data

Test Data Source: Playwright vs Assrt

import { test, expect } from '@playwright/test';

test('override flags via route interception', async ({ page }) => {
  await page.route('**/sdk/evalx/**', async (route) => {
    await route.fulfill({
      status: 200,
      contentType: 'application/json',
      body: JSON.stringify({
        'new-dashboard': {
          value: true, variation: 0, version: 100,
          reason: { kind: 'FALLTHROUGH' },
        },
      }),
    });
  });
  await page.goto('/dashboard');
  await page.waitForFunction(
    () => (window as any).__LD_READY__ === true
  );
  await expect(page.getByTestId('new-dashboard-panel')).toBeVisible();
});
78% fewer lines

8. Scenario: Testing Fallback Behavior When LD Is Unavailable

What happens when LaunchDarkly is down? Your app should degrade gracefully, falling back to default flag values. This is a critical scenario that most teams skip. The LaunchDarkly SDK caches flag values in localStorage (browser) or in memory (server-side), so a temporary outage should not break your app. But you need to verify that the fallback values are correct and that the app renders a reasonable experience.

6

Fallback Behavior When LD Is Unavailable

Moderate

Goal

Block all LaunchDarkly network requests and verify the app uses the fallback values defined in your code. Confirm no crashes, no blank screens, and no broken layouts.

Playwright Implementation

fallback-behavior.spec.ts

What to Assert Beyond the UI

  • Verify localStorage contains the cached LD flag data after the first successful load
  • Confirm the SDK emits an error event (not a crash) when the connection fails
  • Measure time-to-render when using cached values versus fresh evaluation to ensure no performance regression
Fallback behavior test output

9. Common Pitfalls That Break Feature Flag Test Suites

These are real problems sourced from LaunchDarkly community forums, GitHub issues, and Stack Overflow threads. Each one has broken production test suites and wasted hours of debugging time.

Feature Flag Testing Anti-Patterns

  • Asserting on flag-dependent UI before the SDK initializes. The SDK returns fallback values synchronously before the async initialization completes. Always wait for the ready event.
  • Sharing a LaunchDarkly environment between CI and developers. When a developer changes a flag in the shared environment, every CI test that depends on that flag breaks. Use a dedicated test environment.
  • Testing percentage rollouts in e2e tests. Rollouts are deterministic per context key but non-obvious. A different test user key can land in a different bucket. Use individual targeting for deterministic test results.
  • Forgetting to clean up individual targets after tests. If test A targets a user to variation 1 and test B expects the default, test B will fail because the individual target persists. Add cleanup in afterEach hooks.
  • Ignoring the bootstrap vs streaming race. When you use bootstrap data and streaming mode simultaneously, the SDK may briefly show the bootstrap value then switch to the streamed value. This causes flaky visual tests.
  • Not testing the flag OFF state. Most teams test only the ON state. When someone turns off a flag in production, the app should degrade gracefully. Always test both ON and OFF for every flag.
  • Hardcoding variation indices instead of using semantic patch operations. Variation indices can shift when someone adds or reorders variations in the dashboard. Use the LD API's semantic patch format instead.
  • Running parallel tests that modify the same flag. Two tests changing the same flag simultaneously create a race condition. Use unique flags per test or serialize flag-dependent tests.

The Initialization Race Condition in Detail

The most insidious pitfall is the initialization timing issue. The LaunchDarkly JavaScript SDK exposes a ready event and a waitForInitialization() promise. If your React app renders before the SDK is ready, all flag evaluations return fallback values. The fix is to gate your app's render on the SDK's ready event. In your app code, expose a global signal that your tests can wait for.

app/providers/ld-provider.tsx

10. Writing These Scenarios in Plain English with Assrt

The Playwright tests above work, but they require deep knowledge of the LaunchDarkly SDK, API, and evaluation model. Every team member who reads or maintains these tests needs to understand route interception, SSE streams, context evaluation, and the LD REST API's semantic patch format. Assrt lets you express the same test intent in plain English.

Here is the real-time streaming scenario (Section 6) compiled into an Assrt file. Assrt handles the LaunchDarkly API calls, SDK initialization wait, and UI assertions behind the scenes.

streaming-flag-update.assrt

Compare the 60+ lines of Playwright code for the streaming test (including the API helper, route interception, SSE monitoring, and navigation checks) with the 16 lines of Assrt above. The test intent is identical. The difference is that Assrt encapsulates the LaunchDarkly integration details so your test reads like a specification, not an implementation.

For the targeting rules scenario (Section 5), the Assrt version eliminates the login ceremony, the SDK wait boilerplate, and the manual attribute matching. You declare which user should see which features, and Assrt handles the rest.

targeting-rules.assrt

Each of these Assrt files can run against your real app, against a staging environment, or against a local dev server. Assrt translates the plain-English steps into the same Playwright API calls, LD REST API operations, and SDK initialization waits that you would write by hand. The advantage is that anyone on your team can read, write, and review these tests without being a LaunchDarkly SDK expert.

Related Guides

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

$npm install @assrt/sdk