CI/CD Integration Guide
E2E Testing in CI/CD Pipelines: Integration Guide for 2026
A comprehensive guide to integrating end-to-end tests into your continuous integration and delivery pipelines. Includes complete configuration examples for GitHub Actions, GitLab CI, and Jenkins.
“In our experience, most production incidents involve integration failures that pass all unit tests but would have been caught by E2E tests running in the pipeline.”
1. Why E2E Tests Belong in Your Pipeline
Unit tests verify individual functions. Integration tests verify how modules interact. But neither catches the subtle, devastating bugs that emerge when your entire application runs as users actually experience it. E2E tests fill that gap by exercising real browsers, real API calls, and real user workflows from start to finish.
Running E2E tests manually before each release is not sustainable. Engineers forget. Deadlines compress testing windows. Release frequency drops because nobody wants to spend two hours clicking through flows that could be automated. Integrating E2E tests directly into your CI/CD pipeline solves all of these problems at once.
Catch Integration Bugs Before Production
A recent study of post-mortems across 200 engineering teams found that 73% of production incidents involved integration failures between services or UI components that passed all unit tests. E2E tests in CI catch these before they reach users. When your checkout flow breaks because the payment service changed its response format, an E2E test will catch it at build time instead of after customers start filing support tickets.
Prevent Regressions Automatically
Every pull request that merges into your main branch should pass your E2E suite. This creates a safety net that grows more valuable over time. When a developer refactors the authentication module, the E2E test that logs in, navigates to the dashboard, and creates a new project will immediately signal if anything broke. No manual verification needed.
Ship with Confidence
Teams with E2E tests in their pipeline deploy more frequently. When every merge is backed by automated verification of critical user paths, the fear of breaking things diminishes. Developers can ship features on Friday afternoon knowing the pipeline will catch problems before deployment completes.
2. The Testing Pyramid in CI/CD
The testing pyramid is the foundational mental model for structuring tests in a CI/CD pipeline. At the base, you have many fast unit tests. In the middle, a moderate number of integration tests. At the top, a smaller set of E2E tests that verify critical paths.
Unit Tests: The Foundation
Unit tests run in milliseconds and should number in the thousands. They validate individual functions, class methods, and utility modules in isolation. In your pipeline, unit tests should run first because they provide the fastest feedback. If a unit test fails, there is no point running slower E2E tests.
Integration Tests: The Middle Layer
Integration tests verify that modules work together correctly. They might test an API endpoint with a real database connection, or verify that a React component renders correctly when given data from a mock API. These typically run in seconds and should number in the hundreds. They run after unit tests pass.
E2E Tests: The Safety Net
E2E tests are the most expensive to run but provide the highest confidence. Each test spins up a real browser, navigates through actual user workflows, and verifies that the entire stack works together. A well-structured pipeline runs 20 to 50 critical E2E tests on every PR, with the full suite running on merges to main or on a nightly schedule.
Feedback Time Tradeoffs
The key tradeoff is speed versus confidence. Unit tests finish in under a minute and catch logic errors. Integration tests finish in 2 to 5 minutes and catch wiring errors. E2E tests can take 10 to 30 minutes for a full suite but catch the subtle, user-facing bugs that nothing else will find. The goal of your pipeline architecture is to minimize wait time while maximizing the bugs caught before deployment.
3. Pipeline Architecture
A well-designed CI/CD pipeline arranges stages to maximize speed and minimize wasted compute. The general pattern is to run cheap, fast checks first and progressively move to more expensive, slower checks. If any stage fails, subsequent stages are skipped.
Recommended Stage Order
Stage 1: Lint + Type Check (~30s) ─── fail-fast Stage 2: Unit Tests (~1m) ─── fail-fast Stage 3: Build (~2m) ─── required for E2E Stage 4: E2E Tests (parallel) (~8m) ─── sharded across workers Stage 5: Deploy Preview (~2m) ─── only on success
Parallel Stages
Stages 1 and 2 can often run in parallel since they are independent. The build stage must complete before E2E tests begin because E2E tests need a running application. Within the E2E stage itself, tests should be sharded across multiple workers to reduce total execution time. A suite of 100 E2E tests that takes 30 minutes on a single worker can finish in under 5 minutes when sharded across 8 workers.
Fail-Fast Strategies
Configure your pipeline to cancel all running jobs when any critical stage fails. If linting finds errors, there is no reason to continue building and testing. This saves CI minutes and gives developers faster feedback. Most CI platforms support this natively through cancel-in-progress settings or fail-fast matrix configurations.
4. GitHub Actions Integration
GitHub Actions is the most popular CI/CD platform for open-source and startup teams. Here is a complete workflow configuration that runs Playwright E2E tests with sharding, artifact upload, and retry logic.
name: E2E Tests
on:
pull_request:
branches: [main]
push:
branches: [main]
concurrency:
group: e2e-${{ github.ref }}
cancel-in-progress: true
jobs:
lint-and-typecheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- run: npm run lint
- run: npm run typecheck
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- run: npm test
build:
needs: [lint-and-typecheck, unit-tests]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- run: npm run build
- uses: actions/upload-artifact@v4
with:
name: build-output
path: .next/
retention-days: 1
e2e-tests:
needs: build
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
shard: [1/4, 2/4, 3/4, 4/4]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- uses: actions/download-artifact@v4
with:
name: build-output
path: .next/
- name: Install Playwright browsers
run: npx playwright install --with-deps chromium
- name: Run E2E tests
run: npx playwright test --shard=${{ matrix.shard }}
- name: Upload test results
if: always()
uses: actions/upload-artifact@v4
with:
name: test-results-${{ strategy.job-index }}
path: |
playwright-report/
test-results/
retention-days: 7
merge-reports:
needs: e2e-tests
if: always()
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- uses: actions/download-artifact@v4
with:
pattern: test-results-*
merge-multiple: true
path: all-results/
- name: Merge reports
run: npx playwright merge-reports all-results/
- uses: actions/upload-artifact@v4
with:
name: merged-e2e-report
path: playwright-report/
retention-days: 30This configuration runs lint, type checking, and unit tests in parallel. Once those pass, it builds the application and uploads the build artifact. The E2E test job then downloads that artifact and runs Playwright tests sharded across 4 workers. Each shard uploads its results, and a final job merges all reports into a single HTML report.
Key Configuration Details
The concurrency block ensures that pushing a new commit cancels any in-progress runs for the same branch. The fail-fast: false setting on the E2E matrix keeps all shards running even if one fails, so you get complete test results. Artifact retention is set to 7 days for individual shard results and 30 days for the merged report.
5. GitLab CI Integration
GitLab CI uses a .gitlab-ci.yml file with stages that run sequentially. Jobs within the same stage run in parallel by default. Here is a complete configuration for running Playwright E2E tests.
stages:
- validate
- build
- test
- deploy
variables:
npm_config_cache: "$CI_PROJECT_DIR/.npm"
PLAYWRIGHT_BROWSERS_PATH: "$CI_PROJECT_DIR/.playwright"
.node-cache: &node-cache
cache:
key:
files:
- package-lock.json
paths:
- .npm/
- .playwright/
lint:
stage: validate
image: node:22
<<: *node-cache
script:
- npm ci --prefer-offline
- npm run lint
- npm run typecheck
unit-tests:
stage: validate
image: node:22
<<: *node-cache
script:
- npm ci --prefer-offline
- npm test
coverage: '/All files[^|]*\|[^|]*\s+([\d.]+)/'
artifacts:
reports:
junit: junit-results.xml
build:
stage: build
image: node:22
<<: *node-cache
script:
- npm ci --prefer-offline
- npm run build
artifacts:
paths:
- .next/
expire_in: 1 hour
e2e-tests:
stage: test
image: mcr.microsoft.com/playwright:v1.50.0-noble
<<: *node-cache
parallel: 4
script:
- npm ci --prefer-offline
- npx playwright test --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
artifacts:
when: always
paths:
- playwright-report/
- test-results/
expire_in: 7 days
retry:
max: 1
when:
- runner_system_failure
- stuck_or_timeout_failure
deploy-preview:
stage: deploy
image: node:22
script:
- npm run deploy:preview
only:
- merge_requests
environment:
name: preview/$CI_MERGE_REQUEST_IID
url: https://preview-$CI_MERGE_REQUEST_IID.example.comThe validate stage runs lint and unit tests in parallel. Once both pass, the build stage compiles the application and stores the output as an artifact. The E2E test stage uses GitLab's parallel keyword to shard tests across 4 runners automatically. The built-in CI_NODE_INDEX and CI_NODE_TOTAL variables map directly to Playwright's shard syntax.
Caching and Services
The YAML anchor &node-cache reuses cache configuration across all jobs. Caching both npm packages and Playwright browsers can reduce job startup time by 60% or more. For applications that need a database during testing, add a services block with PostgreSQL or MySQL containers that GitLab spins up automatically alongside your test runner.
6. Jenkins Integration
Jenkins remains widely used in enterprise environments. Here is a declarative Jenkinsfile that runs E2E tests with parallel stages and proper artifact archiving.
pipeline {
agent { docker { image 'mcr.microsoft.com/playwright:v1.50.0-noble' } }
environment {
CI = 'true'
HOME = '/root'
}
stages {
stage('Install') {
steps {
sh 'npm ci'
}
}
stage('Validate') {
parallel {
stage('Lint') {
steps {
sh 'npm run lint'
}
}
stage('Unit Tests') {
steps {
sh 'npm test -- --reporter=junit --outputFile=unit-results.xml'
}
post {
always {
junit 'unit-results.xml'
}
}
}
}
}
stage('Build') {
steps {
sh 'npm run build'
}
}
stage('E2E Tests') {
parallel {
stage('Shard 1') {
steps {
sh 'npx playwright test --shard=1/3'
}
}
stage('Shard 2') {
steps {
sh 'npx playwright test --shard=2/3'
}
}
stage('Shard 3') {
steps {
sh 'npx playwright test --shard=3/3'
}
}
}
}
}
post {
always {
archiveArtifacts artifacts: 'playwright-report/**', allowEmptyArchive: true
archiveArtifacts artifacts: 'test-results/**', allowEmptyArchive: true
}
failure {
mail to: 'team@example.com',
subject: "E2E Tests Failed: ${env.JOB_NAME} #${env.BUILD_NUMBER}",
body: "Check: ${env.BUILD_URL}"
}
}
}Jenkins uses the Playwright Docker image directly as the build agent, which comes with all browser dependencies pre-installed. The Validate stage runs lint and unit tests in parallel. After building, E2E tests are sharded across three parallel stages. The post block archives reports regardless of test outcome and sends email notifications on failure.
7. Execution Strategies
Not every pipeline run needs to execute every E2E test. Smart execution strategies balance thoroughness with speed, running the right tests at the right time.
Smoke Tests on Pull Requests
Tag your 10 to 20 most critical E2E tests with a smoke tag. Run only these on pull requests. They should cover login, core navigation, the primary conversion flow, and any payment-related paths. Smoke tests should complete in under 5 minutes to keep the PR feedback loop tight.
// Run smoke tests on PRs, full suite on main
const isSmokeRun = process.env.CI_SMOKE === 'true';
export default defineConfig({
grep: isSmokeRun ? /@smoke/ : undefined,
workers: isSmokeRun ? 2 : '50%',
retries: isSmokeRun ? 0 : 2,
});Full Suite on Main Branch
When code merges to main, run the complete E2E suite. This is your last line of defense before deployment. With proper sharding, even a suite of 200 tests can finish in under 10 minutes. If the full suite fails on main, block the deployment and alert the team immediately.
Scheduled and Nightly Runs
Schedule comprehensive test runs that include edge cases, accessibility tests, and cross-browser verification. These runs can take 30 minutes or more without impacting developer productivity. Run them overnight and review results each morning. This is where you catch the slow-burning regressions that smoke tests miss.
Canary Deployments
For teams using canary deployment strategies, run a targeted E2E suite against the canary environment after deploying the new version to a small percentage of traffic. If E2E tests pass against the canary, gradually increase traffic. If they fail, roll back automatically. This pattern catches environment-specific bugs that only appear in production infrastructure.
8. Debugging Pipeline Failures
E2E test failures in CI are notoriously difficult to debug because you cannot see the browser. The key is to capture enough artifacts during the test run to reconstruct what happened after the fact.
Artifacts and Screenshots
Configure Playwright to capture screenshots on failure. Upload these as CI artifacts so developers can see exactly what the page looked like when the test failed. For intermittent failures, capture screenshots on every test (not just failures) and compare passing versus failing runs.
export default defineConfig({
use: {
screenshot: 'only-on-failure',
video: 'retain-on-failure',
trace: 'retain-on-failure',
},
reporter: [
['html', { open: 'never' }],
['json', { outputFile: 'test-results.json' }],
],
});Video Recording
Playwright can record video of the entire test execution. Use the retain-on-failure option to keep videos only for failed tests, which saves storage. When debugging a flaky test, you can temporarily switch to on to capture video for all runs and compare successful versus failed executions side by side.
Trace Files
Playwright traces are the most powerful debugging tool for CI failures. A trace captures every network request, DOM snapshot, console log, and action performed during the test. You can open trace files locally with npx playwright show-trace trace.zip and step through the test execution frame by frame. This is often the fastest way to understand why a test failed in CI when it passes locally.
Retry Logic
Flaky tests are an inevitability in E2E testing. Configure retries at the test runner level to handle transient failures without blocking the pipeline. Two retries is a reasonable default. If a test fails all retries, it is a real failure. If it passes on retry, flag it for investigation but do not block the pipeline. Track retry rates over time to identify tests that need fixing.
export default defineConfig({
retries: process.env.CI ? 2 : 0,
reporter: process.env.CI
? [['blob'], ['github']]
: [['html', { open: 'on-failure' }]],
});Tools like Assrt take this further by automatically detecting flaky tests, analyzing failure patterns, and self-healing tests that break due to UI changes. Instead of spending hours debugging why a selector changed, Assrt updates the selector automatically and keeps your pipeline green.
Related Guides
Ready to automate your testing?
Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.