CI/CD Integration Guide

E2E Testing in CI/CD Pipelines: Integration Guide for 2026

By Pavel Borji··Founder @ Assrt

A comprehensive guide to integrating end-to-end tests into your continuous integration and delivery pipelines. Includes complete configuration examples for GitHub Actions, GitLab CI, and Jenkins.

Most

In our experience, most production incidents involve integration failures that pass all unit tests but would have been caught by E2E tests running in the pipeline.

1. Why E2E Tests Belong in Your Pipeline

Unit tests verify individual functions. Integration tests verify how modules interact. But neither catches the subtle, devastating bugs that emerge when your entire application runs as users actually experience it. E2E tests fill that gap by exercising real browsers, real API calls, and real user workflows from start to finish.

Running E2E tests manually before each release is not sustainable. Engineers forget. Deadlines compress testing windows. Release frequency drops because nobody wants to spend two hours clicking through flows that could be automated. Integrating E2E tests directly into your CI/CD pipeline solves all of these problems at once.

Catch Integration Bugs Before Production

A recent study of post-mortems across 200 engineering teams found that 73% of production incidents involved integration failures between services or UI components that passed all unit tests. E2E tests in CI catch these before they reach users. When your checkout flow breaks because the payment service changed its response format, an E2E test will catch it at build time instead of after customers start filing support tickets.

Prevent Regressions Automatically

Every pull request that merges into your main branch should pass your E2E suite. This creates a safety net that grows more valuable over time. When a developer refactors the authentication module, the E2E test that logs in, navigates to the dashboard, and creates a new project will immediately signal if anything broke. No manual verification needed.

Ship with Confidence

Teams with E2E tests in their pipeline deploy more frequently. When every merge is backed by automated verification of critical user paths, the fear of breaking things diminishes. Developers can ship features on Friday afternoon knowing the pipeline will catch problems before deployment completes.

2. The Testing Pyramid in CI/CD

The testing pyramid is the foundational mental model for structuring tests in a CI/CD pipeline. At the base, you have many fast unit tests. In the middle, a moderate number of integration tests. At the top, a smaller set of E2E tests that verify critical paths.

Unit Tests: The Foundation

Unit tests run in milliseconds and should number in the thousands. They validate individual functions, class methods, and utility modules in isolation. In your pipeline, unit tests should run first because they provide the fastest feedback. If a unit test fails, there is no point running slower E2E tests.

Integration Tests: The Middle Layer

Integration tests verify that modules work together correctly. They might test an API endpoint with a real database connection, or verify that a React component renders correctly when given data from a mock API. These typically run in seconds and should number in the hundreds. They run after unit tests pass.

E2E Tests: The Safety Net

E2E tests are the most expensive to run but provide the highest confidence. Each test spins up a real browser, navigates through actual user workflows, and verifies that the entire stack works together. A well-structured pipeline runs 20 to 50 critical E2E tests on every PR, with the full suite running on merges to main or on a nightly schedule.

Feedback Time Tradeoffs

The key tradeoff is speed versus confidence. Unit tests finish in under a minute and catch logic errors. Integration tests finish in 2 to 5 minutes and catch wiring errors. E2E tests can take 10 to 30 minutes for a full suite but catch the subtle, user-facing bugs that nothing else will find. The goal of your pipeline architecture is to minimize wait time while maximizing the bugs caught before deployment.

Try Assrt for free

Open-source AI testing framework. No signup required.

Get Started

3. Pipeline Architecture

A well-designed CI/CD pipeline arranges stages to maximize speed and minimize wasted compute. The general pattern is to run cheap, fast checks first and progressively move to more expensive, slower checks. If any stage fails, subsequent stages are skipped.

Recommended Stage Order

Stage 1: Lint + Type Check     (~30s)   ─── fail-fast
Stage 2: Unit Tests            (~1m)    ─── fail-fast
Stage 3: Build                 (~2m)    ─── required for E2E
Stage 4: E2E Tests (parallel)  (~8m)    ─── sharded across workers
Stage 5: Deploy Preview        (~2m)    ─── only on success

Parallel Stages

Stages 1 and 2 can often run in parallel since they are independent. The build stage must complete before E2E tests begin because E2E tests need a running application. Within the E2E stage itself, tests should be sharded across multiple workers to reduce total execution time. A suite of 100 E2E tests that takes 30 minutes on a single worker can finish in under 5 minutes when sharded across 8 workers.

Fail-Fast Strategies

Configure your pipeline to cancel all running jobs when any critical stage fails. If linting finds errors, there is no reason to continue building and testing. This saves CI minutes and gives developers faster feedback. Most CI platforms support this natively through cancel-in-progress settings or fail-fast matrix configurations.

4. GitHub Actions Integration

GitHub Actions is the most popular CI/CD platform for open-source and startup teams. Here is a complete workflow configuration that runs Playwright E2E tests with sharding, artifact upload, and retry logic.

.github/workflows/e2e.yml
name: E2E Tests

on:
  pull_request:
    branches: [main]
  push:
    branches: [main]

concurrency:
  group: e2e-${{ github.ref }}
  cancel-in-progress: true

jobs:
  lint-and-typecheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - run: npm run lint
      - run: npm run typecheck

  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - run: npm test

  build:
    needs: [lint-and-typecheck, unit-tests]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - run: npm run build
      - uses: actions/upload-artifact@v4
        with:
          name: build-output
          path: .next/
          retention-days: 1

  e2e-tests:
    needs: build
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shard: [1/4, 2/4, 3/4, 4/4]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - uses: actions/download-artifact@v4
        with:
          name: build-output
          path: .next/
      - name: Install Playwright browsers
        run: npx playwright install --with-deps chromium
      - name: Run E2E tests
        run: npx playwright test --shard=${{ matrix.shard }}
      - name: Upload test results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: test-results-${{ strategy.job-index }}
          path: |
            playwright-report/
            test-results/
          retention-days: 7

  merge-reports:
    needs: e2e-tests
    if: always()
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - uses: actions/download-artifact@v4
        with:
          pattern: test-results-*
          merge-multiple: true
          path: all-results/
      - name: Merge reports
        run: npx playwright merge-reports all-results/
      - uses: actions/upload-artifact@v4
        with:
          name: merged-e2e-report
          path: playwright-report/
          retention-days: 30

This configuration runs lint, type checking, and unit tests in parallel. Once those pass, it builds the application and uploads the build artifact. The E2E test job then downloads that artifact and runs Playwright tests sharded across 4 workers. Each shard uploads its results, and a final job merges all reports into a single HTML report.

Key Configuration Details

The concurrency block ensures that pushing a new commit cancels any in-progress runs for the same branch. The fail-fast: false setting on the E2E matrix keeps all shards running even if one fails, so you get complete test results. Artifact retention is set to 7 days for individual shard results and 30 days for the merged report.

5. GitLab CI Integration

GitLab CI uses a .gitlab-ci.yml file with stages that run sequentially. Jobs within the same stage run in parallel by default. Here is a complete configuration for running Playwright E2E tests.

.gitlab-ci.yml
stages:
  - validate
  - build
  - test
  - deploy

variables:
  npm_config_cache: "$CI_PROJECT_DIR/.npm"
  PLAYWRIGHT_BROWSERS_PATH: "$CI_PROJECT_DIR/.playwright"

.node-cache: &node-cache
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - .npm/
      - .playwright/

lint:
  stage: validate
  image: node:22
  <<: *node-cache
  script:
    - npm ci --prefer-offline
    - npm run lint
    - npm run typecheck

unit-tests:
  stage: validate
  image: node:22
  <<: *node-cache
  script:
    - npm ci --prefer-offline
    - npm test
  coverage: '/All files[^|]*\|[^|]*\s+([\d.]+)/'
  artifacts:
    reports:
      junit: junit-results.xml

build:
  stage: build
  image: node:22
  <<: *node-cache
  script:
    - npm ci --prefer-offline
    - npm run build
  artifacts:
    paths:
      - .next/
    expire_in: 1 hour

e2e-tests:
  stage: test
  image: mcr.microsoft.com/playwright:v1.50.0-noble
  <<: *node-cache
  parallel: 4
  script:
    - npm ci --prefer-offline
    - npx playwright test --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
  artifacts:
    when: always
    paths:
      - playwright-report/
      - test-results/
    expire_in: 7 days
  retry:
    max: 1
    when:
      - runner_system_failure
      - stuck_or_timeout_failure

deploy-preview:
  stage: deploy
  image: node:22
  script:
    - npm run deploy:preview
  only:
    - merge_requests
  environment:
    name: preview/$CI_MERGE_REQUEST_IID
    url: https://preview-$CI_MERGE_REQUEST_IID.example.com

The validate stage runs lint and unit tests in parallel. Once both pass, the build stage compiles the application and stores the output as an artifact. The E2E test stage uses GitLab's parallel keyword to shard tests across 4 runners automatically. The built-in CI_NODE_INDEX and CI_NODE_TOTAL variables map directly to Playwright's shard syntax.

Caching and Services

The YAML anchor &node-cache reuses cache configuration across all jobs. Caching both npm packages and Playwright browsers can reduce job startup time by 60% or more. For applications that need a database during testing, add a services block with PostgreSQL or MySQL containers that GitLab spins up automatically alongside your test runner.

6. Jenkins Integration

Jenkins remains widely used in enterprise environments. Here is a declarative Jenkinsfile that runs E2E tests with parallel stages and proper artifact archiving.

Jenkinsfile
pipeline {
    agent { docker { image 'mcr.microsoft.com/playwright:v1.50.0-noble' } }

    environment {
        CI = 'true'
        HOME = '/root'
    }

    stages {
        stage('Install') {
            steps {
                sh 'npm ci'
            }
        }

        stage('Validate') {
            parallel {
                stage('Lint') {
                    steps {
                        sh 'npm run lint'
                    }
                }
                stage('Unit Tests') {
                    steps {
                        sh 'npm test -- --reporter=junit --outputFile=unit-results.xml'
                    }
                    post {
                        always {
                            junit 'unit-results.xml'
                        }
                    }
                }
            }
        }

        stage('Build') {
            steps {
                sh 'npm run build'
            }
        }

        stage('E2E Tests') {
            parallel {
                stage('Shard 1') {
                    steps {
                        sh 'npx playwright test --shard=1/3'
                    }
                }
                stage('Shard 2') {
                    steps {
                        sh 'npx playwright test --shard=2/3'
                    }
                }
                stage('Shard 3') {
                    steps {
                        sh 'npx playwright test --shard=3/3'
                    }
                }
            }
        }
    }

    post {
        always {
            archiveArtifacts artifacts: 'playwright-report/**', allowEmptyArchive: true
            archiveArtifacts artifacts: 'test-results/**', allowEmptyArchive: true
        }
        failure {
            mail to: 'team@example.com',
                 subject: "E2E Tests Failed: ${env.JOB_NAME} #${env.BUILD_NUMBER}",
                 body: "Check: ${env.BUILD_URL}"
        }
    }
}

Jenkins uses the Playwright Docker image directly as the build agent, which comes with all browser dependencies pre-installed. The Validate stage runs lint and unit tests in parallel. After building, E2E tests are sharded across three parallel stages. The post block archives reports regardless of test outcome and sends email notifications on failure.

7. Execution Strategies

Not every pipeline run needs to execute every E2E test. Smart execution strategies balance thoroughness with speed, running the right tests at the right time.

Smoke Tests on Pull Requests

Tag your 10 to 20 most critical E2E tests with a smoke tag. Run only these on pull requests. They should cover login, core navigation, the primary conversion flow, and any payment-related paths. Smoke tests should complete in under 5 minutes to keep the PR feedback loop tight.

playwright.config.ts
// Run smoke tests on PRs, full suite on main
const isSmokeRun = process.env.CI_SMOKE === 'true';

export default defineConfig({
  grep: isSmokeRun ? /@smoke/ : undefined,
  workers: isSmokeRun ? 2 : '50%',
  retries: isSmokeRun ? 0 : 2,
});

Full Suite on Main Branch

When code merges to main, run the complete E2E suite. This is your last line of defense before deployment. With proper sharding, even a suite of 200 tests can finish in under 10 minutes. If the full suite fails on main, block the deployment and alert the team immediately.

Scheduled and Nightly Runs

Schedule comprehensive test runs that include edge cases, accessibility tests, and cross-browser verification. These runs can take 30 minutes or more without impacting developer productivity. Run them overnight and review results each morning. This is where you catch the slow-burning regressions that smoke tests miss.

Canary Deployments

For teams using canary deployment strategies, run a targeted E2E suite against the canary environment after deploying the new version to a small percentage of traffic. If E2E tests pass against the canary, gradually increase traffic. If they fail, roll back automatically. This pattern catches environment-specific bugs that only appear in production infrastructure.

8. Debugging Pipeline Failures

E2E test failures in CI are notoriously difficult to debug because you cannot see the browser. The key is to capture enough artifacts during the test run to reconstruct what happened after the fact.

Artifacts and Screenshots

Configure Playwright to capture screenshots on failure. Upload these as CI artifacts so developers can see exactly what the page looked like when the test failed. For intermittent failures, capture screenshots on every test (not just failures) and compare passing versus failing runs.

playwright.config.ts
export default defineConfig({
  use: {
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
    trace: 'retain-on-failure',
  },
  reporter: [
    ['html', { open: 'never' }],
    ['json', { outputFile: 'test-results.json' }],
  ],
});

Video Recording

Playwright can record video of the entire test execution. Use the retain-on-failure option to keep videos only for failed tests, which saves storage. When debugging a flaky test, you can temporarily switch to on to capture video for all runs and compare successful versus failed executions side by side.

Trace Files

Playwright traces are the most powerful debugging tool for CI failures. A trace captures every network request, DOM snapshot, console log, and action performed during the test. You can open trace files locally with npx playwright show-trace trace.zip and step through the test execution frame by frame. This is often the fastest way to understand why a test failed in CI when it passes locally.

Retry Logic

Flaky tests are an inevitability in E2E testing. Configure retries at the test runner level to handle transient failures without blocking the pipeline. Two retries is a reasonable default. If a test fails all retries, it is a real failure. If it passes on retry, flag it for investigation but do not block the pipeline. Track retry rates over time to identify tests that need fixing.

playwright.config.ts
export default defineConfig({
  retries: process.env.CI ? 2 : 0,
  reporter: process.env.CI
    ? [['blob'], ['github']]
    : [['html', { open: 'on-failure' }]],
});

Tools like Assrt take this further by automatically detecting flaky tests, analyzing failure patterns, and self-healing tests that break due to UI changes. Instead of spending hours debugging why a selector changed, Assrt updates the selector automatically and keeps your pipeline green.

Related Guides

Ready to automate your testing?

Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.

$npm install @assrt/sdk