How do you test PostHog feature flags in Playwright?

Intercept the PostHog /decide API endpoint using Playwright's page.route() method. Override the featureFlags and featureFlagPayloads fields in the response body to force specific flag values and payloads. Then navigate to the page and assert that the flag-gated UI renders correctly using expect().toBeVisible() with a timeout to account for the asynchronous flag evaluation.

How do you bootstrap PostHog feature flags for server-side rendering?

Evaluate feature flags on the server using the PostHog Node SDK, then pass the results as the bootstrap property in posthog.init() on the client. This gives the client SDK immediate access to flag values without waiting for a /decide API call. Test for flicker by using a MutationObserver to verify no DOM changes occur on flag-gated elements after hydration.

How do you test multivariate feature flags with Playwright?

Use Playwright route interception to override the featureFlags field in the PostHog /decide response with the specific variant string you want to test. Loop through each variant in a parameterized test to verify every variant renders its expected UI. This approach gives you deterministic control without depending on PostHog's hash-based assignment.

How do you verify feature flag rollout percentages?

Use the PostHog Node SDK to evaluate the flag for a large sample of unique distinct IDs (at least 500). Count how many evaluations return true, calculate the percentage, and assert it falls within a tolerance range (typically plus or minus 5%) of the configured rollout percentage. Also verify that the same distinct ID always returns the same flag value to confirm deterministic hashing.

Feature Flag Testing Guide

How to Test PostHog Feature Flags with Playwright: Complete 2026 Guide

A scenario-by-scenario walkthrough for testing PostHog feature flags end to end. Flag payload evaluation, bootstrapping flags for server-side rendering, local overrides, multivariate flags, rollout percentage verification, and the pitfalls that silently break feature flag test suites.

100K+

“PostHog is used by over 100,000 teams, and feature flags are one of the most-used features on the platform, evaluated billions of times per month across production applications.”

PostHog 2025 company blog

0Flag scenarios covered

0Multivariate variants tested

0msAvg flag evaluation time

0%Fewer lines with Assrt

PostHog Feature Flag Evaluation Flow

1. Why Testing PostHog Feature Flags Is Harder Than It Looks

PostHog feature flags look simple on the surface. You create a flag in the dashboard, wrap a UI element in a conditional check, and toggle it on or off. But the moment your test suite tries to assert that a flag is evaluated correctly in a real browser, five structural challenges emerge that make reliable testing genuinely difficult.

First, feature flags are evaluated asynchronously. The PostHog JavaScript SDK calls the /decide endpoint after initialization, which means there is always a window where flags are undefined. If your test checks for flag-gated UI immediately after page load, you will get intermittent failures because the SDK has not yet received the flag values from PostHog. Second, multivariate flags return string values (not just booleans), and the variant a user sees depends on a hash of their distinct ID. Your tests need deterministic control over which variant is active, which requires either overriding the distinct ID to match a pre-computed hash bucket or intercepting the API response entirely.

Third, flag payloads add a second layer of dynamic data. A multivariate flag might return "variant-b" as the flag value, but also attach a JSON payload like {"cta_text": "Start free trial", "color": "#22c55e"} that drives the actual rendering. Your test must verify both the variant and the payload separately. Fourth, server-side rendering with bootstrapped flags introduces a timing issue: the server evaluates flags via the PostHog Node SDK before HTML is sent to the browser, and then the client SDK re-evaluates on load. If the two evaluations disagree (due to timing or eventual consistency), the UI flickers. Your test must catch that flicker and fail if it happens. Fifth, rollout percentages are probabilistic. A flag set to 30% rollout does not guarantee that exactly 30 out of 100 test users see it, making assertion logic tricky.

PostHog Feature Flag Evaluation Chain

🌐

posthog.init()

SDK initializes with API key

⚙️

POST /decide

Send distinct_id + groups

🔒

Flag Evaluation

Hash distinct_id, check rules

✅

Return Flags

Boolean, multivariate, payloads

🔔

SDK Callback

onFeatureFlags() fires

🌐

Render UI

Conditional rendering based on flag

2. Setting Up a Reliable Test Environment

Before writing a single test, you need a PostHog project configured for testability. Create a dedicated project in PostHog (free tier works fine for testing) with its own API key. Never run tests against your production project because flag evaluations generate events that pollute your analytics data and can trigger rate limits under heavy parallel test execution.

Environment Setup

PostHog Feature Flag Helper for Tests

Use the PostHog API to programmatically create, update, and clean up feature flags in your test suite. The personal API key (not the project API key) is required for flag management. This helper ensures each test run starts with flags in a known state.

test/helpers/posthog-flags.ts

Playwright Configuration for Feature Flag Tests

Feature flag tests need extra time for the SDK to call the /decide endpoint and render conditional UI. Set a comfortable action timeout and configure route interception for deterministic flag control.

playwright.config.ts

3. Scenario: Boolean Flag Controls UI Visibility

The simplest feature flag test verifies that a boolean flag toggles a UI element on or off. This is the smoke test every flag-driven feature needs. When the flag is enabled, the new component renders. When it is disabled, the old UI remains. The catch is timing: you cannot check for the element immediately after navigation because the PostHog SDK evaluates flags asynchronously via the /decide endpoint.

Boolean Flag Controls UI Visibility

Straightforward

Goal

Intercept the PostHog /decide response to force a boolean flag to true, then verify the flag-gated component renders. Repeat with the flag forced to false and confirm the component is absent.

Preconditions

App running at APP_BASE_URL with PostHog JS SDK initialized
A page that conditionally renders based on flag new-pricing-banner

Playwright Implementation

boolean-flag.spec.ts

What to Assert Beyond the UI

Boolean Flag Verification Checklist

The flag-gated element renders when the flag is true
The flag-gated element is absent when the flag is false
PostHog captures a $feature_flag_called event with the correct value
No layout shift or flicker during flag evaluation
The page does not flash the wrong variant before settling

Boolean Flag Test: Playwright vs Assrt

import { test, expect } from '@playwright/test';

test('shows banner when flag is enabled', async ({ page }) => {
  await page.route('**/decide/**', async (route) => {
    const response = await route.fetch();
    const body = await response.json();
    body.featureFlags = {
      ...body.featureFlags,
      'new-pricing-banner': true,
    };
    await route.fulfill({
      response,
      body: JSON.stringify(body),
    });
  });

  await page.goto('/pricing');

  await expect(
    page.getByTestId('pricing-banner')
  ).toBeVisible({ timeout: 10_000 });

  await expect(
    page.getByText('Try our new pricing plan')
  ).toBeVisible();
});

58% fewer lines

4. Scenario: Multivariate Flag Renders Correct Variant

Multivariate flags return a string value instead of a boolean. In PostHog, you define variants like "control", "variant-a", and "variant-b", each with a rollout percentage. The tricky part is that the variant a user receives depends on a hash of their distinct ID combined with the flag key. In production this creates a consistent experience, but in tests it means you need deterministic control over which variant is active, or you will get different results every time a test user ID changes.

Multivariate Flag Renders Correct Variant

Moderate

Goal

Force each multivariate variant via API interception and confirm the UI renders the correct component for each variant string.

Preconditions

Feature flag checkout-flow-experiment with variants: control, single-page, wizard
Checkout page renders different layouts per variant

Playwright Implementation

multivariate-flag.spec.ts

Multivariate Flag: Playwright vs Assrt

import { test, expect } from '@playwright/test';

test('checkout-flow-experiment: single-page variant', async ({ page }) => {
  await page.route('**/decide/**', async (route) => {
    const response = await route.fetch();
    const body = await response.json();
    body.featureFlags = {
      ...body.featureFlags,
      'checkout-flow-experiment': 'single-page',
    };
    await route.fulfill({
      response,
      body: JSON.stringify(body),
    });
  });

  await page.goto('/checkout');

  await expect(
    page.getByRole('heading', { name: 'Quick Checkout' })
  ).toBeVisible({ timeout: 10_000 });

  const steps = page.getByTestId('checkout-step');
  await expect(steps).toHaveCount(1);
});

49% fewer lines

Try Assrt for free

Enter your email to access the dashboard. No credit card required.

5. Scenario: Flag Payload Drives Dynamic Configuration

PostHog feature flag payloads let you attach arbitrary JSON data to each flag variant. Instead of hardcoding configuration per variant in your frontend code, you read the payload from PostHog and use it to drive rendering. This is powerful for A/B testing copy, colors, pricing values, and other parameters without deploying new code. But it also means your tests need to verify both the flag value and the payload content separately, because a correct variant with a malformed payload will still break the UI.

Flag Payload Drives Dynamic Configuration

Complex

Goal

Verify that a flag payload containing CTA text, button color, and discount percentage is correctly parsed and rendered in the UI.

Preconditions

Feature flag promo-banner-config with a JSON payload per variant
The promo banner component reads its content from posthog.getFeatureFlagPayload()

Playwright Implementation

payload-flag.spec.ts

What to Assert Beyond the UI

Payload-driven flags require deeper assertions. The payload is JSON, so your test must verify that the parsing works correctly, that missing fields are handled gracefully, and that type coercion does not silently break the rendering logic.

Payload Flag Verification Checklist

CTA text from payload renders in the correct element
Dynamic color from payload applies to button styles
Numeric payload values (discount_percent) display correctly
Banner is absent when payload is missing or flag is false
Malformed JSON in payload does not crash the page

6. Scenario: Bootstrap Flags for SSR Without Flicker

When you use server-side rendering with PostHog, the server evaluates feature flags via the PostHog Node SDK and embeds the results into the HTML. On the client side, you pass these pre-evaluated flags as the bootstrap option in posthog.init(). This eliminates the flash of default content that happens while the client SDK calls /decide. But bootstrapping introduces a subtle failure mode: if the server and client evaluate the flag differently (because the flag was updated between the server render and the client hydration), the UI will switch from one variant to another after hydration. Your test must detect this flicker.

Bootstrap Flags for SSR Without Flicker

Complex

Goal

Verify that a bootstrapped flag renders the correct variant in the initial HTML (no flash of wrong content), and that if the client SDK receives a different value, the transition is either suppressed or handled gracefully.

Playwright Implementation

bootstrap-ssr.spec.ts

SSR Bootstrap Flag Flow

🌐

Server Request

User requests page

⚙️

Node SDK

Evaluate flags server-side

✅

Render HTML

Embed flag values in bootstrap

🌐

Client Hydration

posthog.init({ bootstrap })

⚙️

Client /decide

Revalidate flags from PostHog

🔒

Compare

Bootstrap vs fresh evaluation

7. Scenario: Verifying Rollout Percentage Distribution

Rollout percentages are one of the most misunderstood aspects of feature flags. A flag set to 30% rollout does not mean "30% of page loads see it." It means 30% of distinct user IDs hash into the enabled bucket. PostHog uses a deterministic hashing algorithm (based on the distinct ID and flag key) so the same user always gets the same result. This is great for user consistency but makes statistical verification in tests more nuanced. You cannot just run 100 requests and expect exactly 30 to be enabled.

Verifying Rollout Percentage Distribution

Complex

Goal

Generate a large number of distinct user IDs, evaluate the flag for each, and verify the distribution falls within an acceptable margin of the configured rollout percentage.

Playwright Implementation

rollout-percentage.spec.ts

Rollout Distribution Test Output

8. Scenario: Local Overrides for Deterministic CI

The biggest problem with testing feature flags in CI is nondeterminism. If your tests rely on the live PostHog API, a flag change in the dashboard during a test run can cause unexpected failures. The solution is local flag overrides: intercept every PostHog API call in your test and return predetermined values. This removes the network dependency entirely and makes your tests deterministic, fast, and runnable offline.

PostHog supports two override mechanisms. First, the JavaScript SDK accepts a bootstrap property at initialization time that provides flag values synchronously. Second, you can call posthog.featureFlags.override() after initialization to force specific flag values for the current session. In Playwright tests, the cleanest approach is route interception because it works regardless of how your app initializes the SDK.

Local Overrides for Deterministic CI

Moderate

Goal

Create a reusable Playwright fixture that intercepts all PostHog API calls and returns a fully controlled set of feature flags, making every test deterministic regardless of the live PostHog state.

Playwright Implementation

test/fixtures/posthog-override.ts

Using the Fixture in Tests

test/flags/overrides.spec.ts

Local Overrides: Playwright vs Assrt

import { test, expect } from '../fixtures/posthog-override';

test('deterministic flag test', async ({ page, posthog }) => {
  await posthog.withFlags({
    'new-pricing-banner': true,
    'checkout-flow-experiment': 'wizard',
    'promo-banner-config': 'active',
  }, {
    'promo-banner-config': JSON.stringify({
      cta_text: 'Limited time offer',
      discount_percent: 25,
    }),
  });

  await page.goto('/');

  await expect(page.getByTestId('pricing-banner')).toBeVisible();
  await expect(page.getByText('Limited time offer')).toBeVisible();
});

50% fewer lines

9. Common Pitfalls That Break Feature Flag Test Suites

Feature flag testing has a unique set of failure modes that differ from standard UI testing. These pitfalls come from real PostHog GitHub issues, community discussions, and production incident reports. Understanding them before you build your test suite will save you hours of debugging flaky tests.

Pitfalls to Avoid

Asserting flag-gated UI immediately after page.goto() without waiting for the /decide response. The PostHog SDK evaluates flags asynchronously, so checking too early gives you the default (unflagged) state.
Using the same distinct_id across parallel test workers. PostHog's deterministic hashing means the same user always gets the same variant, but parallel workers sharing an ID can interfere with each other's flag evaluations.
Forgetting to intercept featureFlagPayloads alongside featureFlags. If your route interception only overrides featureFlags, any code that reads payloads via getFeatureFlagPayload() will get null or stale data.
Testing rollout percentages with too small a sample. 100 evaluations with a 30% rollout can produce results anywhere from 20% to 40%. Use at least 500 samples and a +/- 5% tolerance.
Not blocking PostHog event tracking in tests. The /e/ and /batch/ endpoints fire on every page load and flag evaluation. Without blocking them, your test PostHog project fills with garbage events and API calls slow down your tests.
Relying on posthog.onFeatureFlags() callback timing. This callback fires after the /decide response, but React re-renders are asynchronous. Use Playwright's expect().toBeVisible() with a timeout instead of waitForTimeout after the callback.
Updating flags via the PostHog dashboard during a CI run. Flag changes propagate to the /decide endpoint within seconds, but CDN caching can delay the change by up to 60 seconds, causing intermittent test failures that only happen in CI.

Common Error: Flag Evaluated Before /decide Response

10. Writing These Scenarios in Plain English with Assrt

Every Playwright test in this guide follows the same pattern: intercept the PostHog /decide endpoint, override the flag values, navigate to the page, and assert the UI matches the expected variant. That pattern is powerful but verbose. Each scenario requires 20 to 40 lines of TypeScript with route interception boilerplate, JSON body manipulation, and explicit timeout management. Assrt lets you express the same intent in plain English and compiles it to the same Playwright code.

scenarios/posthog-feature-flags.assrt

Assrt compiles each scenario block into the same Playwright TypeScript shown in the preceding sections. The route interception boilerplate, JSON body manipulation, and timeout handling are all generated automatically. When PostHog changes their /decide API response format or your app restructures its flag-gated components, Assrt detects the failure, analyzes the updated DOM and API responses, and opens a pull request with the updated test code. Your scenario files stay untouched.

Start with the boolean flag scenario. Once it passes in CI, add the multivariate test, then the payload verification, then the SSR bootstrap flicker check, then the local overrides fixture. Within an afternoon you will have complete feature flag coverage that most teams never achieve because the interception boilerplate is too tedious to maintain by hand.

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

View on GitHub

1. Why Testing PostHog Feature Flags Is Harder Than It Looks

2. Setting Up a Reliable Test Environment

PostHog Feature Flag Helper for Tests

Playwright Configuration for Feature Flag Tests

3. Scenario: Boolean Flag Controls UI Visibility

Boolean Flag Controls UI Visibility

Goal

Preconditions

Playwright Implementation

What to Assert Beyond the UI

4. Scenario: Multivariate Flag Renders Correct Variant

Multivariate Flag Renders Correct Variant

Goal

Preconditions

Playwright Implementation

5. Scenario: Flag Payload Drives Dynamic Configuration

Flag Payload Drives Dynamic Configuration

Goal

Preconditions

Playwright Implementation

What to Assert Beyond the UI

6. Scenario: Bootstrap Flags for SSR Without Flicker

Bootstrap Flags for SSR Without Flicker

Goal

Playwright Implementation

7. Scenario: Verifying Rollout Percentage Distribution

Verifying Rollout Percentage Distribution

Goal

Playwright Implementation

8. Scenario: Local Overrides for Deterministic CI

Local Overrides for Deterministic CI

Goal

Playwright Implementation

Using the Fixture in Tests

9. Common Pitfalls That Break Feature Flag Test Suites

10. Writing These Scenarios in Plain English with Assrt

Related Guides

How to Test GA4 Events

How to Test Mixpanel Events

How to Test Segment Track Events

Ready to automate your testing?

Comments (••)

Comments ()