Can Playwright test video captions and subtitles?

Yes. Playwright can access the HTML5 TextTrack API through page.evaluate() to read cue lists, check active cues at specific timestamps, toggle track modes between showing, hidden, and disabled, and verify multi-language caption switching. You interact with the native TextTrack and VTTCue objects directly in the browser context.

How do you verify caption timing in Playwright?

Seek the video to a specific timestamp by setting video.currentTime inside page.evaluate(), then wait for the seeked event to fire. After seeking completes, read the activeCues property from the TextTrack object. Each active VTTCue has startTime, endTime, and text properties you can assert against your expected values.

Why are my VTT captions not loading in Playwright tests?

The three most common causes are: (1) the track mode is still set to disabled, which prevents the browser from fetching the VTT file at all; (2) a CORS error when the VTT file is on a different origin from the video, requiring the crossorigin attribute on the track element and proper response headers; (3) the VTT file is actually an SRT file that was renamed without converting comma-based timestamps to period-based timestamps, which causes silent parsing failure.

How do you test multi-language captions with Playwright?

Access the video element's textTracks collection, which contains one TextTrack per language. Set the desired track's mode to showing and all others to disabled. Wait for the new track's cues to load using waitForFunction, then seek to a timestamp and verify the activeCues text matches the expected translation. Confirm only one track is in showing mode to prevent overlapping captions.

Specialized Testing Guide

How to Test Video Captions with Playwright: TextTrack API, VTT Parsing, and Multi-Language Switching

A scenario-by-scenario walkthrough of testing video captions and subtitles with Playwright. Covers the TextTrack API, WebVTT parsing, cue timing assertions, track mode toggling between showing, hidden, and disabled, multi-language switching, and the silent failures that break caption testing in production.

98%

“The FCC reports that 98% of surveyed viewers use captions at least some of the time, and caption-related accessibility lawsuits have grown steadily since 2019.”

FCC / 3Play Media 2024 Study

0Track modes to verify

0Caption scenarios covered

0msCue timing tolerance

0%Fewer lines with Assrt

Video Caption Loading Flow

1. Why Testing Video Captions Is Harder Than It Looks

Video captions seem simple on the surface. You add a <track> element to your <video>, point it at a WebVTT file, and the browser renders text overlays at the right moment. In practice, the system has five layers of complexity that make automated testing surprisingly difficult.

First, the TextTrack API is entirely asynchronous. When the browser encounters a <track> element, it does not load the VTT file immediately. It waits until the track mode is set to "showing" or "hidden", then fetches and parses the file in the background. Your test cannot simply check for cues right after page load because the cue list may still be empty.

Second, cue timing is continuous. Unlike DOM interactions where an element is either visible or not, captions depend on the video's currentTime property. A cue that starts at 5.200 seconds and ends at 8.100 seconds is only active during that window. Your test must seek the video to a specific timestamp and then query the activeCues property, accounting for the fact that seeking itself is asynchronous.

Third, track modes create a three-state system. A TextTrack can be "showing" (renders visually and fires events), "hidden" (fires events but renders nothing), or "disabled" (the cue list is not even loaded). Toggling between these modes has side effects that vary across browsers. Fourth, multi-language support means multiple tracks compete for the active slot, and the browser's built-in language preference logic can override your programmatic selection. Fifth, custom video players (Video.js, Plyr, JW Player) wrap the native elements in their own DOM and often manage captions through JavaScript rather than native track elements, meaning your selectors and API calls differ from one player to the next.

Caption Loading Pipeline

🌐

HTML Parsed

<video> and <track> elements created

⚙️

Track Mode Set

Mode changes to showing or hidden

↪️

VTT Fetched

Browser requests .vtt file

⚙️

Cues Parsed

WebVTT cues added to TextTrackCueList

🌐

Video Plays

currentTime advances

✅

Active Cues

activeCues updates in real time

2. Setting Up a Reliable Caption Test Environment

Before writing any caption tests, you need a deterministic video fixture and a known VTT file. Using production video URLs introduces network latency and potential CDN failures. Instead, serve a short, silent test video from your local fixture directory and pair it with a handcrafted VTT file that has precisely timed cues.

fixtures/test-captions-en.vtt

fixtures/test-captions-es.vtt

Test HTML Fixture

Create a minimal HTML page that your Playwright tests can load directly. This isolates caption behavior from your application framework and eliminates variables like lazy loading, route transitions, and JavaScript hydration delays.

fixtures/video-captions.html

Playwright Configuration

Configure Playwright to serve your fixtures directory as a static site. The webServer option handles this cleanly, or you can use page.goto with a file:// URL. The static server approach is more reliable because some browsers restrict TextTrack loading from file:// origins.

playwright.config.ts

Setting Up the Caption Test Environment

3. Scenario: Verifying Captions Load Successfully

The most fundamental caption test confirms that the browser successfully fetches the VTT file, parses it without errors, and populates the TextTrack's cue list. This is your smoke test. A failed VTT fetch (wrong URL, CORS error, malformed file) results in an empty cue list with no visible error in the console, making it a silent failure that only surfaces when a real user tries to enable captions.

Verify Caption Track Loads and Cues Are Parsed

Straightforward

Goal

Load the video fixture, wait for the default caption track to reach the "loaded" readyState, and confirm the cue list contains the expected number of cues.

Preconditions

Fixture HTML served at /video-captions.html
English VTT file contains exactly 5 cues
The English track has the default attribute

Playwright Implementation

captions-load.spec.ts

What to Assert Beyond the UI

The track's readyState is 2 (LOADED), not 3 (ERROR)
The cues.length matches the VTT file exactly
No console errors related to VTT parsing or CORS

Caption Load Check: Playwright vs Assrt

import { test, expect } from '@playwright/test';

test('caption track loads', async ({ page }) => {
  await page.goto('/video-captions.html');
  const video = page.locator('#test-video');
  await expect(video).toBeVisible();

  const trackReady = await page.evaluate(() => {
    return new Promise<boolean>((resolve) => {
      const video = document.getElementById('test-video') as HTMLVideoElement;
      const track = video.textTracks[0];
      if (track.mode === 'disabled') track.mode = 'showing';
      if (track.cues && track.cues.length > 0) {
        resolve(true);
        return;
      }
      const el = document.getElementById('track-en') as HTMLTrackElement;
      el.addEventListener('load', () => resolve(true));
      el.addEventListener('error', () => resolve(false));
      setTimeout(() => resolve(false), 10_000);
    });
  });

  expect(trackReady).toBe(true);
  const cueCount = await page.evaluate(() => {
    const v = document.getElementById('test-video') as HTMLVideoElement;
    return v.textTracks[0].cues?.length ?? 0;
  });
  expect(cueCount).toBe(5);
});

60% fewer lines

4. Scenario: Asserting Cue Timing and Content

Confirming that captions load is only step one. The real value of caption testing is verifying that the right text appears at the right time. A common production bug is a VTT file where cue timestamps are shifted by a few seconds (often caused by re-encoding or editing tools that recalculate offsets incorrectly). This test seeks the video to known timestamps and asserts which cues are active.

Cue Timing and Active Cue Content Verification

Moderate

Goal

Seek the video to multiple known timestamps and verify that the activeCues property returns the expected text at each position.

Playwright Implementation

cue-timing.spec.ts

What to Assert Beyond the UI

Each cue's startTime and endTime match the VTT file within a 50ms tolerance
The activeCues list is empty when seeking to gaps between cues
Cue text content matches exactly, including punctuation

Cue Timing: Playwright vs Assrt

import { test, expect } from '@playwright/test';

test('correct captions at each timestamp', async ({ page }) => {
  await page.goto('/video-captions.html');
  await page.evaluate(() => {
    const v = document.getElementById('test-video') as HTMLVideoElement;
    v.textTracks[0].mode = 'showing';
  });
  await page.evaluate(() => {
    return new Promise<void>((resolve) => {
      const v = document.getElementById('test-video') as HTMLVideoElement;
      if (v.textTracks[0].cues?.length) { resolve(); return; }
      document.getElementById('track-en')!
        .addEventListener('load', () => resolve());
    });
  });

  const assertions = [
    { seekTo: 2.0, text: 'Welcome to the product demo.' },
    { seekTo: 6.0, text: 'Click the dashboard icon to begin.' },
  ];

  for (const { seekTo, text } of assertions) {
    const active = await page.evaluate((t) => {
      return new Promise<string>((resolve) => {
        const v = document.getElementById('test-video') as HTMLVideoElement;
        v.currentTime = t;
        v.addEventListener('seeked', () => {
          const c = v.textTracks[0].activeCues;
          resolve(c?.length ? (c[0] as VTTCue).text : '');
        }, { once: true });
      });
    }, seekTo);
    expect(active).toBe(text);
  }
});

57% fewer lines

Try Assrt for free

Enter your email to access the dashboard. No credit card required.

5. Scenario: Toggling Track Modes (Showing, Hidden, Disabled)

The TextTrack API has three modes that control caption behavior. The "showing" mode renders captions visually on the video and fires cuechange events. The "hidden" mode fires events but renders nothing, which is useful for programmatic access to caption data without visual display. The "disabled" mode stops everything: no events, no cue loading, and the cue list may be null. Testing mode transitions is critical because many video players implement a "CC" toggle button that cycles through these modes, and a bug in the toggle logic can leave captions in the wrong state.

Track Mode Transitions and Side Effects

Moderate

Goal

Programmatically toggle the track through all three modes and verify that captions render in showing mode, are invisible in hidden mode, and that the cue list behaves correctly in disabled mode.

track-modes.spec.ts

6. Scenario: Multi-Language Caption Switching

Multi-language support is where caption testing gets genuinely tricky. The HTML spec says that only one caption or subtitle track should be showing at a time. When you set one track to "showing", the browser should automatically disable the previously active track. In practice, this behavior is inconsistent. Some browsers keep both tracks in "showing" mode temporarily, leading to overlapping captions. Custom video players often manage track switching in JavaScript and may not respect the native mutual exclusion rule at all.

Switch Between English and Spanish Captions

Complex

Goal

Start with English captions active, switch to Spanish, verify the Spanish cue text appears at the same timestamp, and confirm the English track is no longer showing.

Preconditions

Both English and Spanish VTT files loaded
English track has the default attribute
Spanish VTT cues are time-aligned with English

language-switch.spec.ts

Language Switching: Playwright vs Assrt

test('switch English to Spanish', async ({ page }) => {
  await page.goto('/video-captions.html');
  await page.evaluate(() => {
    const v = document.getElementById('test-video') as HTMLVideoElement;
    v.textTracks[0].mode = 'showing';
    v.textTracks[1].mode = 'disabled';
  });
  await page.waitForFunction(() => {
    const v = document.getElementById('test-video') as HTMLVideoElement;
    return (v.textTracks[0].cues?.length ?? 0) > 0;
  });
  // ... seek, check English, switch, check Spanish ...
  await page.evaluate(() => {
    const v = document.getElementById('test-video') as HTMLVideoElement;
    v.textTracks[0].mode = 'disabled';
    v.textTracks[1].mode = 'showing';
  });
  await page.waitForFunction(() => {
    const v = document.getElementById('test-video') as HTMLVideoElement;
    return (v.textTracks[1].cues?.length ?? 0) > 0;
  });
  const spanishCue = await page.evaluate(() => {
    return new Promise<string>((resolve) => {
      const v = document.getElementById('test-video') as HTMLVideoElement;
      v.currentTime = 2.0;
      v.addEventListener('seeked', () => {
        const c = v.textTracks[1].activeCues;
        resolve(c?.length ? (c[0] as VTTCue).text : '');
      }, { once: true });
    });
  });
  expect(spanishCue).toBe('Bienvenido...');
});

63% fewer lines

7. Scenario: VTT File Parsing and Validation

WebVTT files have a strict format. The file must start with the string WEBVTT, followed by an optional header, then cue blocks separated by blank lines. Common production errors include a missing WEBVTT header (caused by saving as SRT without converting), overlapping cue timestamps, UTF-8 BOM characters that break parsing, and malformed timestamp formats (using commas instead of periods for milliseconds, which is the SRT convention). This scenario validates the VTT file structure programmatically.

VTT File Structure Validation

Moderate

Goal

Fetch the VTT file directly, parse its contents, and validate the structure: correct header, valid timestamps, no overlaps, and proper encoding.

vtt-validation.spec.ts

VTT Validation Pipeline

🌐

Fetch VTT

HTTP GET the raw file

✅

Check Header

Must start with WEBVTT

⚙️

Parse Cues

Extract timestamps and text

✅

Validate Order

No overlapping timestamps

✅

Check Encoding

No BOM, valid UTF-8

✅

Assert Content

No empty cue text

8. Scenario: Dynamically Loaded and Live Captions

Not all captions come from static VTT files declared in HTML. Modern applications often load captions dynamically via JavaScript, either from an API response, a WebSocket for live captions, or through the addTextTrack() method on the video element. Live streaming platforms generate captions in real time using speech-to-text services, and those cues are appended to the track as the stream progresses. Testing dynamic captions requires intercepting the caption source and verifying cues appear after the JavaScript loading logic completes.

Dynamically Added Caption Track via JavaScript

Complex

Goal

Simulate a JavaScript-driven caption loading flow where tracks are added programmatically after page load, and verify the dynamically added cues are accessible through the TextTrack API.

dynamic-captions.spec.ts

Route Interception for Caption Testing

Playwright's route interception lets you replace VTT file responses with custom content. This is invaluable for testing error handling (what happens when the VTT file returns a 404?), testing specific cue edge cases, or simulating slow network conditions that delay caption loading.

caption-intercept.spec.ts

9. Common Pitfalls That Break Caption Test Suites

Caption testing has a unique set of failure modes that do not appear in standard UI testing. These issues are sourced from real bug reports on the Playwright GitHub repository, the WebVTT spec errata, and video.js community forums.

Querying Cues Before the Track Has Loaded

The most frequent mistake in caption tests is reading textTracks[0].cues immediately after setting the mode. The VTT file is fetched asynchronously, and the cue list is null or empty until parsing completes. Always wait for the load event on the <track> element or poll the cue count with waitForFunction.

Forgetting That Seeking Is Asynchronous

Setting video.currentTime = 5.0 does not instantly move the playhead. The browser must decode the video at that position, and the seeked event fires only after the seek completes. Checking activeCues immediately after assignment will return stale data from the previous position. Always wrap seek operations in a Promise that resolves on the seeked event.

CORS Blocking VTT Requests

If your video is served from a CDN and the VTT file is on a different origin, the browser silently blocks the track load unless the VTT response includes the correct Access-Control-Allow-Origin header and the <track> element has the crossorigin attribute. The track's readyState transitions to ERROR with no console message in some browsers. Test this explicitly by checking readyState after load.

SRT Files Served as VTT

SRT (SubRip) files use commas for millisecond separators (00:00:01,000) while VTT uses periods (00:00:01.000). If someone renames a .srt file to .vtt without converting the timestamp format, the browser silently fails to parse any cues. The track loads, readyState reaches LOADED, but the cue list is empty. This is one of the most common caption bugs in production and the test catches it by asserting cue count after load.

Browser Differences in Track Mode Behavior

Chromium, Firefox, and WebKit handle the transition from disabled to showing differently. Chromium re-fetches the VTT file when re-enabling a disabled track. Firefox may keep the parsed cues in memory. WebKit sometimes requires a small delay after mode change before the cue list populates. Always use waitForFunction rather than fixed timeouts when waiting for cues.

Caption Testing Pre-Flight Checklist

VTT file starts with WEBVTT header (no BOM)
Timestamps use periods, not commas, for milliseconds
Wait for track load event before querying cues
Wait for seeked event before checking activeCues
CORS headers set for cross-origin VTT files
Test fixture uses a short, deterministic video file
Only one caption track in showing mode at a time
Check readyState for ERROR (value 3) after load attempt

Common Anti-Patterns to Avoid

Reading cues immediately after setting track mode without waiting
Checking activeCues without waiting for the seeked event
Using fixed sleeps instead of waitForFunction for cue loading
Serving SRT files renamed to .vtt without format conversion
Assuming all browsers handle disabled-to-showing transition identically
Ignoring the crossorigin attribute on track elements

Caption Test Suite Run

10. Writing Caption Tests in Plain English with Assrt

Every scenario above requires deep knowledge of the TextTrack API, asynchronous seeking patterns, and browser-specific cue loading behavior. The cue timing test alone is over 40 lines of Playwright TypeScript with nested Promises and event listeners. Assrt lets you describe the test intent in plain English, generates the equivalent Playwright code, and regenerates the selectors and API calls automatically when the underlying video player changes.

The multi-language switching scenario from Section 6 demonstrates this well. In raw Playwright, you need to manually manage track indices, wait for cue loading with polling functions, wrap seek operations in Promise-based event handlers, and verify mode transitions across multiple tracks. In Assrt, you describe what you want to verify and the framework handles the async coordination.

scenarios/video-captions-suite.assrt

Assrt compiles each scenario into the same Playwright TypeScript you saw in the preceding sections, committed to your repo as real tests you can read, run, and modify. When a video player library updates its DOM structure, when a new browser version changes track mode behavior, or when your VTT file format evolves, Assrt detects the failure, analyzes the new API surface, and opens a pull request with updated test code. Your scenario files remain unchanged.

Start with the caption load smoke test. Once it passes in CI, add the cue timing scenario, then track mode toggling, then multi-language switching, then dynamic caption loading. Within a single afternoon you can have comprehensive video caption coverage that catches the silent VTT failures, CORS blocks, and timing bugs that most applications never detect until users report missing captions.

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

View on GitHub

1. Why Testing Video Captions Is Harder Than It Looks

2. Setting Up a Reliable Caption Test Environment

Test HTML Fixture

Playwright Configuration

3. Scenario: Verifying Captions Load Successfully

Verify Caption Track Loads and Cues Are Parsed

Goal

Preconditions

Playwright Implementation

What to Assert Beyond the UI

4. Scenario: Asserting Cue Timing and Content

Cue Timing and Active Cue Content Verification

Goal

Playwright Implementation

What to Assert Beyond the UI

5. Scenario: Toggling Track Modes (Showing, Hidden, Disabled)

Track Mode Transitions and Side Effects

Goal

6. Scenario: Multi-Language Caption Switching

Switch Between English and Spanish Captions

Goal

Preconditions

7. Scenario: VTT File Parsing and Validation

VTT File Structure Validation

Goal

8. Scenario: Dynamically Loaded and Live Captions

Dynamically Added Caption Track via JavaScript

Goal

Route Interception for Caption Testing

9. Common Pitfalls That Break Caption Test Suites

Querying Cues Before the Track Has Loaded

Forgetting That Seeking Is Asynchronous

CORS Blocking VTT Requests

SRT Files Served as VTT

Browser Differences in Track Mode Behavior

10. Writing Caption Tests in Plain English with Assrt

Related Guides

How to Test Google Maps Embed

How to Test Google Places Autocomplete

How to Test postMessage

Ready to automate your testing?

Comments (••)

Comments ()