How to Debug CI Test Failures in GitLab (and Other Pipelines)
CI failures that pass locally are one of the most frustrating experiences in software development. This guide walks through a systematic approach to diagnosing pipeline test failures, from environment drift to flaky selectors, with practical fixes you can apply today.
“Generates standard Playwright files you can inspect, modify, and run in any CI pipeline.”
Open-source test automation
1. Why Tests Pass Locally but Fail in CI
The most common CI debugging scenario is the test that passes perfectly on your machine but fails in the pipeline. This happens because your local environment and the CI environment are never truly identical. Your local machine has cached dependencies, a specific browser version, environment variables set in your shell profile, and leftover state from previous runs. The CI environment starts fresh every time, which is both its strength and the source of these discrepancies.
In GitLab CI specifically, the runner executes inside a Docker container (or a shell executor) that may have different system libraries, timezone settings, locale configurations, and network characteristics. A test that relies onDate.now() formatting, locale-specific number rendering, or fast network responses can behave differently in this environment. The first step in any CI debugging session is to identify which category of difference is causing the failure.
Start by checking whether the failure is deterministic or intermittent. Run the failing job three times using GitLab's retry button. If it fails every time with the same error, you have an environment or configuration issue. If it fails sometimes and passes sometimes, you have a flaky test. These two categories require completely different debugging approaches.
2. Reading GitLab CI Logs Like a Detective
GitLab CI job logs contain more information than most developers realize. Beyond the test output itself, the log includes the full environment setup: which Docker image was pulled, which dependencies were installed, which environment variables were set, and the exact commands that were executed. When debugging a failure, start at the top of the log, not the bottom.
Look for version mismatches first. Compare the Node.js version, browser version, and dependency versions in the CI log against your local versions. A common culprit is Playwright's browser binaries. If your CI installs Playwright 1.48 but your lock file pins 1.47, the browser binaries might not match, causing subtle rendering differences or API incompatibilities.
For GitLab specifically, use theartifacts:reports:junitconfiguration to generate structured test reports. This gives you a test-by-test breakdown in the merge request UI, making it much easier to identify which tests failed and their exact error messages. Combine this withartifacts:pathsto save screenshots, videos, and trace files from failed runs.
3. Environment Drift and Container Mismatches
Environment drift is the gradual divergence between your local setup and the CI environment. It happens slowly: you update Node.js locally but the CI image still uses the old version. You install a system library for a one-off task and forget that CI doesn't have it. You set an environment variable in your.env file that never made it into the CI configuration.
The fix is to pin everything explicitly. Use a specific Docker image tag (not :latest) in your .gitlab-ci.yml. Pin your Node.js version with.nvmrc or.node-version. Use a lock file for dependencies and verify it's committed. For Playwright, runnpx playwright installin CI to ensure the correct browser binaries are present.
For teams that want to eliminate environment drift entirely, consider running your tests inside the same Docker image locally and in CI. Playwright provides official Docker images (mcr.microsoft.com/playwright) that come with all required system dependencies and browser binaries pre-installed. Using the same image everywhere guarantees identical behavior.
4. Flaky Tests: Timing, State, and Ordering
Flaky tests are tests that sometimes pass and sometimes fail without any code changes. In CI, they are more common than locally because CI runners often have less CPU, less memory, and more variable network latency. A test that relies on an animation completing in 300ms might work fine on your fast development machine but fail on a shared CI runner under load.
The three most common causes of flakiness are timing assumptions, shared state, and test ordering. Timing issues appear when tests use hard-coded waits or assume operations complete within a specific duration. The fix is to use Playwright's built-in auto-waiting and assertion retries instead of page.waitForTimeout(). Shared state issues appear when tests depend on data created by previous tests. The fix is to isolate each test with its own data setup and teardown.
Test ordering issues are the sneakiest. Test A creates a side effect that Test B unknowingly depends on. When tests run in parallel or in a different order, Test B fails. To catch these, run your test suite with Playwright's--shard option or randomize the test order. If tests fail when reordered, they have hidden dependencies that need to be removed.
5. Artifact Inspection and Trace Analysis
Playwright's trace viewer is the single most powerful tool for debugging CI failures. A trace file captures every action the test performed, every network request, every console message, and a screenshot at every step. Configure yourplaywright.config.tsto save traces on failure:trace: 'retain-on-failure'. Then configure GitLab to save the trace files as artifacts.
Download the trace zip file from GitLab's artifact browser and open it attrace.playwright.dev. This gives you a step-by-step replay of exactly what happened in CI. You can see the DOM state at each step, inspect network requests, check console errors, and compare screenshots. Most CI failures become obvious within seconds when you have a trace file.
For video recordings, setvideo: 'retain-on-failure'in your Playwright config. Videos are larger than traces but sometimes reveal issues that traces miss, such as visual glitches, unexpected popups, or timing problems that are easier to see in real-time playback.
6. Preventing CI Failures Before They Happen
The best debugging strategy is prevention. Start by generating tests that are designed for CI from the beginning, not tests written on a developer's machine and then hoped to work in a pipeline. Tools like Assrt auto-discover test scenarios by crawling your application and generate Playwright tests that use stable, auto-waiting selectors. Because the generated tests follow Playwright best practices by default, they tend to be less flaky than hand-written tests that accumulate shortcuts over time.
Implement a pre-merge test gate that runs the full suite before code lands on main. GitLab makes this straightforward with merge request pipelines. Configure your pipeline to run tests only on merge requests (usingrules: - if: $CI_MERGE_REQUEST_IID) and require the pipeline to pass before merging. This catches failures before they affect the rest of the team.
Finally, track your CI test health metrics over time. Monitor the failure rate, the median test duration, and the number of retries per pipeline. When these metrics trend upward, it is a signal that the test suite needs attention before it becomes unmaintainable. A weekly flake review, where the team looks at the most frequently failing tests and fixes or removes them, keeps the suite healthy and keeps CI green.
Ready to automate your testing?
Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.