Feature Flag Testing Guide
How to Test LaunchDarkly Flags with Playwright: Complete 2026 Guide
A scenario-by-scenario walkthrough of testing LaunchDarkly feature flags with Playwright. Client-side evaluation, targeting rules, boolean and multivariate variations, streaming vs polling, the test data API, and the pitfalls that break real feature flag test suites.
“LaunchDarkly evaluates over ten billion feature flags per day across thousands of customers, making it the most widely deployed feature management platform in production.”
LaunchDarkly Client-Side Flag Evaluation Flow
1. Why Testing LaunchDarkly Flags Is Harder Than It Looks
Feature flags look simple on the surface: a boolean that toggles a code path. In practice, LaunchDarkly flags introduce several layers of indirection that make end-to-end testing genuinely difficult. The SDK evaluates flags client-side using a context object (user key, email, custom attributes), and the evaluation result depends on targeting rules, percentage rollouts, and prerequisite flags configured in the LaunchDarkly dashboard. Your test needs to control all of those inputs to get deterministic output.
The first structural challenge is timing. The LaunchDarkly JavaScript SDK initializes asynchronously. When your React app mounts, it calls LDClient.initialize()with a client-side ID and a context object. The SDK fetches the evaluated flag values from LaunchDarkly's edge CDN, and only then does your app know which variation to render. If your Playwright test checks the DOM before the SDK has initialized, you will see the default (fallback) value, not the actual flag value. This race condition is the most common source of flaky flag tests.
The second challenge is evaluation context. LaunchDarkly evaluates flags per context: a combination of user key, anonymous flag, custom attributes like plan or country, and multi-context kinds (user, organization, device). If your test does not control the exact context the SDK sends, you cannot predict which variation the flag will return. Percentage rollouts hash the context key to assign a bucket, so even a different test user email changes the outcome.
Third, streaming adds real-time complexity. In streaming mode (the default for browser SDKs), the SDK opens a Server-Sent Events connection to LaunchDarkly. When someone changes a flag in the dashboard mid-test, the SDK receives the update and triggers a re-render. This is powerful for production, but in tests it means your flag values can change underneath you if another developer modifies the test environment. Polling mode avoids the real-time updates but introduces its own timing issues because flag changes only take effect after the next poll interval.
Fourth, the LaunchDarkly test data source (the recommended approach for unit and integration tests) works differently from the production SDK initialization path. Tests that pass with the test data source may fail in production because of subtle differences in initialization timing, event flushing, and context evaluation. Your end-to-end tests need to cover both the happy path with real SDK initialization and the controlled path with test data.
LaunchDarkly SDK Initialization Flow
App Mounts
React component renders
SDK Init
LDClient.initialize()
Fetch Flags
GET from LD edge CDN
Evaluate
Client-side evaluation
Open Stream
SSE for real-time updates
Render
UI shows correct variation
Flag Evaluation Decision Tree
Context
User key + attributes
Prerequisites
Check prerequisite flags
Individual Targets
Exact user key match
Targeting Rules
Attribute-based rules
Percentage Rollout
Hash-based bucketing
Default
Fallback variation
A thorough LaunchDarkly test suite must account for all of these surfaces. The sections below walk through each scenario with runnable Playwright TypeScript you can paste directly into your project.
2. Setting Up a Reliable Test Environment
Before writing any test scenarios, configure an isolated LaunchDarkly environment for testing. LaunchDarkly supports multiple environments per project (Development, Staging, Production). Create a dedicated test environment that your CI pipeline uses exclusively. This prevents developers from accidentally changing flag values that your tests depend on.
LaunchDarkly Test Environment Setup Checklist
- Create a dedicated 'test' environment in your LaunchDarkly project
- Note the client-side ID and SDK key for the test environment
- Create an API access token with writer role for the test environment only
- Configure all feature flags with explicit default values in the test environment
- Disable percentage rollouts in the test environment (use individual targeting instead)
- Set up a service account context key for deterministic evaluation
- Install the LaunchDarkly Node.js server SDK for test setup scripts
- Add test environment variables to your CI secrets
Environment Variables
REST API Helper for Flag State Management
The LaunchDarkly REST API lets you programmatically toggle flags, update targeting rules, and add individual targets before each test. This is essential for deterministic test setup. Use semantic patch operations to modify flag configurations without overwriting the entire flag definition.
Playwright Configuration for LaunchDarkly
LaunchDarkly SDK initialization is asynchronous, so your Playwright tests need to wait for the SDK to be ready before asserting on flag-dependent UI. The most reliable pattern is to expose a global ready signal from your app and wait for it in your test setup.
3. Scenario: Boolean Flag Controls a UI Component
The most common flag type is a simple boolean that shows or hides a UI component. For example, a new-dashboard flag that controls whether users see the redesigned dashboard or the legacy version. The test must verify both variations render correctly and that the toggle is responsive to flag changes.
Boolean Flag Controls a UI Component
StraightforwardGoal
Configure a boolean flag to true for a specific test user, load the app, and verify the flag-gated component renders. Then toggle the flag to false and verify the component disappears.
Preconditions
- App running at
APP_BASE_URLwith LD SDK initialized - Feature flag
new-dashboardexists in the test environment - Test user key is deterministic and known
Playwright Implementation
What to Assert Beyond the UI
- Verify the LD SDK sent an analytics event for the flag evaluation by intercepting network requests to
events.launchdarkly.com - Confirm the evaluation reason in the event payload matches your expected targeting rule
- Check that no console errors related to LD initialization appear during the test
Boolean Flag: Playwright vs Assrt
import { test, expect } from '@playwright/test';
import { addUserTarget } from '../helpers/ld-api';
test('new dashboard visible when flag on', async ({ page }) => {
await addUserTarget('new-dashboard', 0, 'e2e-test-user-001');
await page.goto('/dashboard');
await page.waitForFunction(
() => (window as any).__LD_READY__ === true
);
await expect(
page.getByTestId('new-dashboard-panel')
).toBeVisible();
});4. Scenario: Multivariate Flag with String Variations
Not all flags are simple on/off toggles. LaunchDarkly supports multivariate flags that return strings, numbers, or JSON objects. A common pattern is a string flag that selects between multiple UI themes or checkout flow versions: "control", "variant-a", and "variant-b". Testing these requires iterating over every variation and verifying each one renders the correct UI.
Multivariate Flag with String Variations
ModerateGoal
Test a multivariate string flag checkout-flow-version that has three variations. Verify each variation renders the correct checkout experience.
Preconditions
- Flag
checkout-flow-versionhas three variations: "control", "variant-a", "variant-b" - Each variation maps to a distinct UI layout
- Test user has no conflicting individual targeting from previous runs
Playwright Implementation
What to Assert Beyond the UI
- Intercept the LD analytics event and verify the
valuefield matches the expected variation string - Confirm the
variationIndexin the event payload matches what you set via the API - Check that analytics and conversion tracking fire correctly for each variation (this validates your experiment setup)
5. Scenario: Targeting Rules and User Segments
Targeting rules are where LaunchDarkly flags get complex. A single flag can have multiple rules that evaluate in order: "if user is in segment 'beta-testers' serve variation 1, else if user attribute 'plan' equals 'enterprise' serve variation 2, else serve the default." Testing this requires creating contexts with specific attributes and verifying each rule fires correctly.
Targeting Rules and User Segments
ComplexGoal
Verify that a flag with multiple targeting rules serves the correct variation based on user attributes. Test the rule evaluation order and confirm that the first matching rule wins.
Playwright Implementation
What to Assert Beyond the UI
- Intercept the LD event stream and verify the
reasonfield showsRULE_MATCHwith the correctruleIndex - For the free user, confirm the reason is
FALLTHROUGH, notOFF(which would indicate the flag is globally disabled) - Verify that attribute-based rules evaluate in the documented order by testing a user who matches multiple rules
Targeting Rules: Playwright vs Assrt
import { test, expect } from '@playwright/test';
test('enterprise user gets premium features', async ({ page }) => {
await page.goto('/login');
await page.getByLabel('Email').fill('enterprise@yourapp.com');
await page.getByLabel('Password').fill('TestPass123!');
await page.getByRole('button', { name: /sign in/i }).click();
await page.waitForURL('/dashboard');
await page.waitForFunction(
() => (window as any).__LD_READY__ === true
);
await expect(page.getByTestId('premium-analytics')).toBeVisible();
await expect(page.getByTestId('premium-exports')).toBeVisible();
await expect(page.getByTestId('premium-api-access')).toBeVisible();
});6. Scenario: Real-Time Flag Changes via Streaming
One of LaunchDarkly's most powerful features is real-time flag updates via Server-Sent Events (SSE). When a flag changes in the dashboard, the browser SDK receives the update within milliseconds and triggers a re-render. Testing this behavior requires changing a flag value while the page is loaded and verifying the UI updates without a page refresh. This is also the scenario most likely to introduce flakiness if not handled carefully.
Real-Time Flag Changes via Streaming
ComplexGoal
Load the app with a flag set to one variation, then change the flag via the LaunchDarkly API while the page is open. Verify the UI updates in real time without a page reload.
Playwright Implementation
What to Assert Beyond the UI
- Confirm the SSE connection to
clientstream.launchdarkly.comremains open throughout the test - Verify no full page reload occurred (check
performance.navigation.typeis still 0) - Measure the latency between the API call and the UI update to catch regressions in your app's flag change handler
7. Scenario: Using the Test Data Source for Deterministic Tests
LaunchDarkly provides a TestDatasource that lets you create a fully local, in-process flag evaluation engine. Instead of connecting to LaunchDarkly's servers, the SDK evaluates flags against data you define in your test setup. This eliminates network latency, rate limits, and cross-test interference. It is the recommended approach for integration tests and can also be used for end-to-end tests when you control the app's initialization.
Test Data Source for Deterministic Tests
ModerateGoal
Replace the real LaunchDarkly SDK connection with the TestData source. Define flag values programmatically and verify evaluation behavior without any network calls.
Server-Side Test Data Setup
Playwright Integration with Test Data
What to Assert Beyond the UI
- When using route interception, verify the SDK does not fall back to cached values from a previous test run
- Confirm that the bootstrapped flag values match what the production SDK would return for the same context
- Check that event tracking still works correctly (or is properly disabled) when using test data
Test Data Source: Playwright vs Assrt
import { test, expect } from '@playwright/test';
test('override flags via route interception', async ({ page }) => {
await page.route('**/sdk/evalx/**', async (route) => {
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({
'new-dashboard': {
value: true, variation: 0, version: 100,
reason: { kind: 'FALLTHROUGH' },
},
}),
});
});
await page.goto('/dashboard');
await page.waitForFunction(
() => (window as any).__LD_READY__ === true
);
await expect(page.getByTestId('new-dashboard-panel')).toBeVisible();
});8. Scenario: Testing Fallback Behavior When LD Is Unavailable
What happens when LaunchDarkly is down? Your app should degrade gracefully, falling back to default flag values. This is a critical scenario that most teams skip. The LaunchDarkly SDK caches flag values in localStorage (browser) or in memory (server-side), so a temporary outage should not break your app. But you need to verify that the fallback values are correct and that the app renders a reasonable experience.
Fallback Behavior When LD Is Unavailable
ModerateGoal
Block all LaunchDarkly network requests and verify the app uses the fallback values defined in your code. Confirm no crashes, no blank screens, and no broken layouts.
Playwright Implementation
What to Assert Beyond the UI
- Verify localStorage contains the cached LD flag data after the first successful load
- Confirm the SDK emits an
errorevent (not a crash) when the connection fails - Measure time-to-render when using cached values versus fresh evaluation to ensure no performance regression
9. Common Pitfalls That Break Feature Flag Test Suites
These are real problems sourced from LaunchDarkly community forums, GitHub issues, and Stack Overflow threads. Each one has broken production test suites and wasted hours of debugging time.
Feature Flag Testing Anti-Patterns
- Asserting on flag-dependent UI before the SDK initializes. The SDK returns fallback values synchronously before the async initialization completes. Always wait for the ready event.
- Sharing a LaunchDarkly environment between CI and developers. When a developer changes a flag in the shared environment, every CI test that depends on that flag breaks. Use a dedicated test environment.
- Testing percentage rollouts in e2e tests. Rollouts are deterministic per context key but non-obvious. A different test user key can land in a different bucket. Use individual targeting for deterministic test results.
- Forgetting to clean up individual targets after tests. If test A targets a user to variation 1 and test B expects the default, test B will fail because the individual target persists. Add cleanup in afterEach hooks.
- Ignoring the bootstrap vs streaming race. When you use bootstrap data and streaming mode simultaneously, the SDK may briefly show the bootstrap value then switch to the streamed value. This causes flaky visual tests.
- Not testing the flag OFF state. Most teams test only the ON state. When someone turns off a flag in production, the app should degrade gracefully. Always test both ON and OFF for every flag.
- Hardcoding variation indices instead of using semantic patch operations. Variation indices can shift when someone adds or reorders variations in the dashboard. Use the LD API's semantic patch format instead.
- Running parallel tests that modify the same flag. Two tests changing the same flag simultaneously create a race condition. Use unique flags per test or serialize flag-dependent tests.
The Initialization Race Condition in Detail
The most insidious pitfall is the initialization timing issue. The LaunchDarkly JavaScript SDK exposes a ready event and a waitForInitialization() promise. If your React app renders before the SDK is ready, all flag evaluations return fallback values. The fix is to gate your app's render on the SDK's ready event. In your app code, expose a global signal that your tests can wait for.
10. Writing These Scenarios in Plain English with Assrt
The Playwright tests above work, but they require deep knowledge of the LaunchDarkly SDK, API, and evaluation model. Every team member who reads or maintains these tests needs to understand route interception, SSE streams, context evaluation, and the LD REST API's semantic patch format. Assrt lets you express the same test intent in plain English.
Here is the real-time streaming scenario (Section 6) compiled into an Assrt file. Assrt handles the LaunchDarkly API calls, SDK initialization wait, and UI assertions behind the scenes.
Compare the 60+ lines of Playwright code for the streaming test (including the API helper, route interception, SSE monitoring, and navigation checks) with the 16 lines of Assrt above. The test intent is identical. The difference is that Assrt encapsulates the LaunchDarkly integration details so your test reads like a specification, not an implementation.
For the targeting rules scenario (Section 5), the Assrt version eliminates the login ceremony, the SDK wait boilerplate, and the manual attribute matching. You declare which user should see which features, and Assrt handles the rest.
Each of these Assrt files can run against your real app, against a staging environment, or against a local dev server. Assrt translates the plain-English steps into the same Playwright API calls, LD REST API operations, and SDK initialization waits that you would write by hand. The advantage is that anyone on your team can read, write, and review these tests without being a LaunchDarkly SDK expert.
Related Guides
Ready to automate your testing?
Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.