How do you test Server-Sent Events (SSE) with Playwright?

Use page.route() to intercept the SSE endpoint and fulfill it with a ReadableStream that emits data: prefixed JSON chunks. Each chunk follows the OpenAI-compatible format with a choices array containing a delta object. Terminate the stream with data: [DONE] to signal completion. This gives you full control over token timing and content without hitting a real LLM API.

How do you test the Stop/Cancel button in an AI chat UI?

Use a mock stream with slow token timing (200ms per token) and a long token list. Wait for a few tokens to render, then click the Stop button. Assert three things: the partial message is preserved in the DOM, the typing indicator disappears, and the input field becomes enabled again for new messages. The slow timing ensures the stream is still active when the cancel action fires.

How do you verify tokens render incrementally and not all at once?

Set a generous delay between tokens (200ms) and check the DOM at a midpoint during the stream. After roughly two tokens worth of time, the message element should contain the first tokens but not the last. This proves the UI appends tokens as they arrive rather than buffering the entire response and rendering it at once.

How do you test auto-scroll behavior during AI chat streaming?

Generate a long response (50 or more tokens) that overflows the chat container. After several lines render, evaluate the container's scroll position to confirm it stays near the bottom. Then programmatically scroll the container to the top, wait for more tokens, and verify the scroll position stays at the top. This confirms auto-scroll pauses when the user scrolls up to read earlier messages.

AI Chat Testing Guide

How to Test AI Chat Streaming UI with Playwright: Complete 2026 Guide

A scenario-by-scenario walkthrough of testing AI chat streaming interfaces with Playwright. Server-Sent Events, token-by-token rendering, AbortController cancellation, typing indicators, markdown rendering mid-stream, auto-scroll behavior, retry on error, and message history persistence.

100M+

“ChatGPT reached 100 million weekly active users in early 2024, making AI chat interfaces one of the fastest-adopted UI patterns in software history. By 2026, nearly every SaaS product ships some form of streaming AI chat.”

OpenAI

0Streaming scenarios covered

0msToken render interval

0Transport protocols tested

0%Fewer lines with Assrt

AI Chat Streaming End-to-End Flow

1. Why Testing AI Chat Streaming Is Harder Than It Looks

An AI chat streaming interface looks simple on the surface: the user types a prompt, the assistant responds token by token, and the conversation grows. Under the hood, the complexity is substantial. The frontend opens a persistent connection (Server-Sent Events or a fetch with ReadableStream), receives dozens or hundreds of small JSON chunks over several seconds, and must parse, decode, and render each chunk into the DOM while simultaneously handling markdown formatting, code block syntax highlighting, scroll position management, and user interaction events like cancellation.

Traditional E2E tests assume a request/response cycle: click a button, wait for a network response, assert on the final state. Streaming breaks that model entirely. The response arrives incrementally over seconds, the DOM mutates continuously, and the “final” state only exists once the stream terminates. If your test waits for the complete response before asserting, you miss the entire streaming experience. If your test asserts too early, the content is incomplete and the assertion fails.

There are six structural reasons streaming chat UIs are hard to test. First, the transport layer varies: some apps use Server-Sent Events (SSE), others use fetch with ReadableStream, and some use WebSocket connections. Your mock strategy must match the transport. Second, token arrival timing is non-deterministic in production and must be simulated with controlled delays in tests. Third, the UI must handle partial markdown: a code fence might open in one chunk and close three seconds later, and the renderer must not break during that interval. Fourth, auto-scroll behavior depends on the user’s scroll position, creating a stateful interaction between user input and stream output. Fifth, AbortController cancellation must cleanly terminate both the network connection and the rendering pipeline. Sixth, error recovery (retry buttons, rate limit banners) must work correctly when the stream fails mid-response.

Stream Lifecycle in a Chat UI

🌐

User Prompt

Text input + send

⚙️

API Request

POST with stream flag

↪️

SSE Connection

EventSource opens

🔔

Token Chunks

data: {content} events

🌐

DOM Render

Append + markdown parse

✅

Stream End

[DONE] signal received

2. Setting Up Your Streaming Test Environment

The key to reliable streaming tests is full control over the mock server response. You never want your Playwright tests hitting a real LLM API: the latency is unpredictable, the cost adds up fast, and rate limits will break your CI. Instead, you intercept the streaming endpoint with page.route() and return a controlled SSE response with deterministic token timing.

Project Setup

Your playwright.config.ts needs a web server configuration that starts your chat application in development mode. The critical setting is webServer.reuseExistingServer set to true for local development, and use.actionTimeout set generously (15 seconds minimum) because streaming responses are intentionally slow.

playwright.config.ts

The helper function below is the foundation of every streaming test in this guide. It creates a mock SSE response that sends tokens at a configurable interval, giving you complete control over timing. You will reuse this helper across all scenarios.

tests/helpers/mock-stream.ts

Abort/Cancel Flow

⚙️

Stream Active

Tokens arriving

🌐

User Clicks Stop

Cancel button

❌

AbortController

signal.abort()

🔒

Connection Closed

ReadableStream cancelled

✅

UI Settled

Partial message preserved

Mocking an SSE Stream with page.route()

Moderate

3. Mocking an SSE Stream with page.route()

The foundation of every AI chat streaming test is intercepting the chat API endpoint and returning a controlled Server-Sent Events response. Playwright’s page.route() method intercepts any matching network request and lets you fulfill it with a custom response. For streaming, you return a ReadableStream body that emits SSE-formatted chunks over time.

The OpenAI-compatible SSE format sends each chunk as a JSON object prefixed with data: and terminated with two newlines. The final signal is data: [DONE], which tells the client the stream is complete. Your mock must replicate this format exactly, because most chat UI libraries (Vercel AI SDK, LangChain, custom implementations) parse it strictly and will silently drop malformed chunks.

tests/stream-basic.spec.ts

Notice that mockSSEStreamtakes an array of tokens, not a single string. This lets you control exactly where the “word boundaries” fall in the stream. In production, LLM APIs send tokens that rarely align with word boundaries (you might receive "Hel", then "lo, how"), so testing with irregular splits helps catch rendering bugs that only appear with real API responses. For the initial test, clean word-aligned tokens are fine; later scenarios use realistic splits.

Test Output

Verifying Token-by-Token Rendering

Complex

4. Verifying Token-by-Token Rendering

The previous scenario waited for the final message. This scenario verifies that tokens actually render incrementally, which is the whole point of a streaming UI. A broken streaming implementation might buffer the entire response and render it at once, which technically passes a “final text matches” assertion but delivers a poor user experience. To catch this, you need to observe the DOM during the stream, not just after it.

The approach uses slower token timing (200ms per token) and polls the message element’s text content at multiple checkpoints during the stream. If the UI renders tokens incrementally, you will see partial text at each checkpoint. If it buffers, all checkpoints will show either empty or the full response.

tests/stream-incremental.spec.ts

The key assertion is the mid-stream check: after 450ms with 200ms per token, approximately two tokens should have rendered. The test confirms the early text includes the first two tokens but not the final token. This proves the UI streams tokens incrementally rather than buffering. In production, you would also verify that the cursor or caret animation is active during streaming, which we cover in the typing indicator scenario.

Incremental Rendering Test

test('tokens render incrementally', async ({ page }) => {
  const tokens = ['The ', 'answer ', 'to ', 'your ',
                   'question ', 'is ', '42.'];
  await page.route('**/api/chat', (route) =>
    mockSSEStream(route, { tokens, delayMs: 200 })
  );
  await page.goto('/');
  await page.getByPlaceholder('Type a message')
    .fill('What is the answer?');
  await page.getByRole('button', { name: 'Send' }).click();
  const msg = page.locator(
    '[data-testid="assistant-message"]').last();
  await page.waitForTimeout(450);
  const earlyText = await msg.textContent();
  expect(earlyText).toContain('The answer');
  expect(earlyText).not.toContain('42.');
  await expect(msg).toContainText(
    'The answer to your question is 42.');
});

38% fewer lines

Try Assrt for free

Enter your email to access the dashboard. No credit card required.

Cancel Mid-Stream with AbortController

Complex

5. Cancel Mid-Stream with AbortController

Every production AI chat interface needs a “Stop generating” button. When the user clicks it, the frontend calls AbortController.abort(), which cancels the fetch request, closes the SSE connection, and stops rendering new tokens. Testing this correctly requires verifying three things: the network connection actually closes, the partial message is preserved in the DOM, and the UI returns to an idle state where the user can send a new prompt.

The challenge is timing. You need the stream to be actively sending tokens when the cancel button is clicked. If the stream finishes before your test clicks Stop, you are testing a no-op. The solution is to use a longer token list with generous delays, ensuring the stream is still active when the click happens.

tests/stream-cancel.spec.ts

A subtle bug to watch for: some implementations clear the partial message on abort, showing a blank assistant bubble. Others freeze the typing indicator animation but never remove it, leaving a permanently “thinking” UI. Both are real bugs found in production chat applications. The test above catches the first by asserting the partial text is preserved, and the typing indicator scenario (section 6) catches the second.

Typing Indicator and Loading States

Straightforward

6. Typing Indicator and Loading States

The typing indicator (often a pulsing dots animation or a blinking cursor) is the user’s primary feedback that the AI is processing their request. It should appear immediately after the user sends a prompt (before any tokens arrive), remain visible during streaming, and disappear once the stream completes. A missing or stuck typing indicator is one of the most commonly reported UX bugs in chat applications.

tests/stream-typing-indicator.spec.ts

The 300ms delay per token is intentional. With faster timing, the stream might complete before Playwright can assert on the typing indicator, making the test flaky. Slower tokens give the test a reliable observation window. In your CI configuration, you can reduce the delay for speed, but during development, keep it generous enough for visual debugging with --headed mode.

Markdown Rendering and Auto-Scroll

Complex

7. Markdown Rendering and Auto-Scroll During Streaming

AI assistants frequently respond with markdown: headings, bullet lists, code blocks with syntax highlighting, bold text, and inline code. The renderer must handle partial markdown gracefully. A code fence that opens with ```typescript in one chunk should not crash the parser before the closing fence arrives seconds later. Similarly, the chat container should auto-scroll to keep the latest content visible, but it must stop auto-scrolling if the user manually scrolls up to read earlier messages.

tests/stream-markdown.spec.ts

The auto-scroll tests verify a subtle but important UX contract. When the user is passively watching the stream, the container scrolls down automatically. When the user scrolls up to review previous content, the auto-scroll must pause so it does not yank the viewport away from what the user is reading. Many chat implementations get this wrong by either never auto-scrolling (bad for long responses) or always auto-scrolling (bad for review).

Retry on Error and Rate Limiting

Moderate

8. Retry on Error and Rate Limiting UI Feedback

Streaming connections fail. The LLM provider returns a 429 (rate limited), the connection drops mid-stream, or the server returns a 500 error after sending partial tokens. A robust chat UI handles all three cases gracefully: showing an error message, offering a retry button, and preserving any partial content that was already rendered. Testing these error paths requires mocking failures at different points in the stream lifecycle.

tests/stream-error.spec.ts

Error Recovery Test

test('retry after error resends prompt', async ({ page }) => {
  let callCount = 0;
  await page.route('**/api/chat', async (route) => {
    callCount++;
    if (callCount === 1) {
      await route.fulfill({
        status: 500,
        body: JSON.stringify({
          error: { message: 'Internal error' }
        }),
      });
    } else {
      await mockSSEStream(route, {
        tokens: ['Here ', 'is ', 'your ', 'answer.'],
        delayMs: 30,
      });
    }
  });
  await page.goto('/');
  await page.getByPlaceholder('Type a message')
    .fill('Help me');
  await page.getByRole('button', { name: 'Send' }).click();
  await expect(page.getByRole('button',
    { name: /retry/i })).toBeVisible();
  await page.getByRole('button', { name: /retry/i }).click();
  const msg = page.locator(
    '[data-testid="assistant-message"]').last();
  await expect(msg).toContainText('Here is your answer.');
});

64% fewer lines

9. Common Pitfalls That Break Streaming Chat Tests

After building streaming chat test suites for dozens of applications, these are the recurring failures that waste the most debugging time. Every pitfall below comes from real issues observed in production codebases and CI pipelines.

Pitfalls to Avoid

Using waitForResponse() on streaming endpoints. It resolves on the response headers, not the body, so your assertion runs before any tokens arrive.
Asserting on textContent() immediately after click. The stream has not started yet. Use expect().toContainText() which auto-retries with Playwright's built-in polling.
Setting actionTimeout too low. Streaming responses can take 5 to 15 seconds. A default 5-second timeout will cause intermittent failures.
Forgetting to mock the [DONE] signal. Without it, the UI never transitions from streaming to idle state, and your typing indicator stays visible forever.
Testing only with word-aligned tokens. Real LLM APIs split tokens at arbitrary byte boundaries. Use irregular chunks like ['Hel', 'lo, h', 'ow are'] to catch parser bugs.
Not testing the empty response case. If the LLM returns zero tokens before [DONE], the UI should show a graceful fallback, not a blank bubble.
Ignoring race conditions on rapid send. If the user sends two prompts in quick succession, the second request should either queue or cancel the first stream cleanly.

Message History Persistence

One frequently overlooked test is message persistence across page reloads. Most chat applications store conversation history in localStorage, IndexedDB, or a server-side database. If the storage layer has a bug, a page refresh after a long streaming conversation can lose all messages. Test this by sending a streamed message, reloading the page, and asserting that the conversation history is intact.

tests/stream-persistence.spec.ts

Pre-Flight Checklist for Streaming Tests

Mock SSE helper returns proper Content-Type: text/event-stream header
Token arrays include irregular splits to test real-world chunking
actionTimeout in config is at least 15 seconds
Every mock stream sends the [DONE] termination signal
Error scenarios cover 429, 500, and mid-stream connection drop
Cancel/abort tests verify partial content preservation
Auto-scroll tests verify both follow and pause behaviors

Full Test Suite Run

10. Writing These Scenarios in Plain English with Assrt

Every Playwright test in this guide follows the same pattern: set up a mock SSE stream, navigate to the chat UI, type a prompt, click send, and assert on the rendered output. The pattern is clear, but the boilerplate adds up. Each test requires the route interception, the SSE helper, the token array, the delay configuration, and the locator queries. Across 12 tests, that is hundreds of lines of TypeScript that must be maintained as your chat UI evolves.

Assrt lets you express the same scenarios in plain English. It compiles each scenario into the exact Playwright TypeScript shown above, committed to your repository as real tests you can inspect, modify, and run. When your chat UI changes its markup (renaming data-testid="assistant-message" to data-testid="ai-response", for example), Assrt detects the failure, analyzes the new DOM, and opens a pull request with updated locators. Your scenario files remain untouched.

tests/chat-streaming.assrt

Each scenario block compiles to the same Playwright test patterns shown throughout this guide. The mock configuration section tells Assrt how to intercept the streaming endpoint, so you do not need to write the SSE helper manually. The expect blocks map to Playwright’s expect() assertions with automatic retry logic.

Start with the basic streaming scenario. Once it passes in your CI, add the incremental rendering test, then the cancel flow, then error recovery, then markdown rendering, and finally the persistence check. In a single afternoon you can have complete AI chat streaming coverage that most production applications never achieve by hand.

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

View on GitHub

How to Test AI Chat Streaming UI with Playwright: Complete 2026 Guide

1. Why Testing AI Chat Streaming Is Harder Than It Looks

2. Setting Up Your Streaming Test Environment

Mocking an SSE Stream with page.route()

3. Mocking an SSE Stream with page.route()

Verifying Token-by-Token Rendering

4. Verifying Token-by-Token Rendering

Cancel Mid-Stream with AbortController

5. Cancel Mid-Stream with AbortController

Typing Indicator and Loading States

6. Typing Indicator and Loading States

Markdown Rendering and Auto-Scroll

7. Markdown Rendering and Auto-Scroll During Streaming

Retry on Error and Rate Limiting

8. Retry on Error and Rate Limiting UI Feedback

9. Common Pitfalls That Break Streaming Chat Tests

Message History Persistence

10. Writing These Scenarios in Plain English with Assrt

Related Guides

How to Test Ably Realtime

How to Test Collaborative Cursors

How to Test Intercom Messenger

Ready to automate your testing?

Comments ()

1. Why Testing AI Chat Streaming Is Harder Than It Looks

2. Setting Up Your Streaming Test Environment

Mocking an SSE Stream with page.route()

3. Mocking an SSE Stream with page.route()

Verifying Token-by-Token Rendering

4. Verifying Token-by-Token Rendering

Cancel Mid-Stream with AbortController

5. Cancel Mid-Stream with AbortController

Typing Indicator and Loading States

6. Typing Indicator and Loading States

Markdown Rendering and Auto-Scroll

7. Markdown Rendering and Auto-Scroll During Streaming

Retry on Error and Rate Limiting

8. Retry on Error and Rate Limiting UI Feedback

9. Common Pitfalls That Break Streaming Chat Tests

Message History Persistence

10. Writing These Scenarios in Plain English with Assrt

Related Guides

How to Test Ably Realtime

How to Test Collaborative Cursors

How to Test Intercom Messenger

Ready to automate your testing?

Comments (••)

Comments ()