Production-Grade Playwright Framework Setup
Setting up Playwright for a side project takes five minutes. Setting it up for production takes thought. This guide covers everything you need for a Playwright framework that scales: project structure, configuration, CI integration, parallel execution, reporting, and maintenance strategies.
“Generates standard Playwright files you can inspect, modify, and run in any CI pipeline.”
Open-source test automation
1. Project Structure for Large Test Suites
A production Playwright project needs a clear directory structure that scales with the test suite. The recommended layout separates tests by feature or domain, keeps page objects (or page models) in their own directory, and isolates fixtures and utilities. A typical structure looks like:tests/ for test files organized by feature,pages/ for page object models,fixtures/ for shared test setup, andutils/ for helpers.
Page objects encapsulate interaction patterns for each page of your application. A LoginPage object provides methods likelogin(email, password)and expectLoggedIn()that tests call without knowing the underlying selectors. When the login UI changes, you update the page object in one place instead of every test that logs in. This is the single most impactful pattern for maintaining a large test suite.
Use Playwright's built-in fixture system for shared setup. Define custom fixtures for authenticated state, test data creation, and API clients. Fixtures compose naturally: a "logged in user" fixture can depend on a "test user" fixture that depends on a "database connection" fixture. This composition keeps each fixture focused and reusable.
2. Configuration Best Practices
Your playwright.config.tsis the control center for your test suite. For production, configure it with explicit settings rather than relying on defaults. Setretries: 2(not more) to catch genuine flakiness without hiding real failures. Settimeout: 30000for individual test timeouts andexpect.timeout: 5000for assertion timeouts. These values work for most applications but should be tuned based on your app's performance characteristics.
Configure trace collection to activate on failure:trace: 'retain-on-failure'. This saves trace files only for failed tests, keeping artifact storage manageable while ensuring you have debugging data when you need it. Addscreenshot: 'only-on-failure'and video: 'retain-on-failure'for additional debugging artifacts. In CI, save these as pipeline artifacts so they are accessible from the build log.
Use environment-specific configuration through Playwright projects. Define separate projects for local development (against localhost), staging (against your staging URL), and production smoke tests (against your production URL). Each project can have its own base URL, timeouts, and browser configuration. This lets you run the same tests against different environments with a single command-line flag.
3. CI Integration and Parallel Execution
Playwright's built-in parallelism uses worker processes to run tests concurrently. By default, it uses half of your CPU cores. In CI, configure the worker count based on your runner's resources:workers: process.env.CI ? 2 : undefined. Too many workers on a small runner causes memory pressure and flakiness. Too few workers wastes pipeline time.
For large test suites (100+ tests), use Playwright's sharding feature to distribute tests across multiple CI machines. Configure your CI pipeline to run multiple jobs in parallel, each with a different shard:npx playwright test --shard=1/4through--shard=4/4. This scales linearly: four shards run approximately four times faster than a single machine. Most CI platforms support matrix builds that make this configuration straightforward.
Install browser binaries efficiently in CI. Use Playwright's Docker images (mcr.microsoft.com/playwright) which come with browsers pre-installed, or cache the browser binaries between CI runs. A freshnpx playwright installdownloads 300+ MB of browsers, which adds significant time to every pipeline if not cached.
4. Test Data Management and Isolation
Test data management is where most Playwright setups fail at scale. When tests share data (using the same user account, the same database records), they interfere with each other. Test A creates a record that Test B deletes, causing Test A to fail on its next assertion. This is the primary source of ordering-dependent flakiness.
The gold standard is complete test isolation: each test creates its own data, operates on that data exclusively, and cleans up after itself. Use Playwright fixtures to create test-specific users, organizations, and records through your API before each test. This is more expensive than sharing data but eliminates an entire category of flakiness.
For applications where creating fresh data for every test is too slow, use data pooling. Create a pool of test accounts and data sets, assign each test worker its own pool member, and reset the pool between test runs. This provides isolation at the worker level without the overhead of creating data for every individual test.
5. Reporting, Artifacts, and Debugging
Playwright's HTML reporter is excellent for local debugging but insufficient for team-wide visibility. For production, add JUnit XML output (reporter: [['junit', { outputFile: 'results.xml' }]]) for CI integration and a custom or third-party reporter for historical trend analysis. Most CI platforms (GitLab, GitHub Actions, Jenkins) can parse JUnit XML and display test results in the build UI.
Configure artifact retention carefully. Trace files, videos, and screenshots from failed tests are invaluable for debugging but consume storage. Set your CI to retain artifacts for 7 to 14 days, long enough to investigate failures but not so long that storage costs accumulate. For critical failures, download and archive artifacts before they expire.
Build a test health dashboard that tracks pass rate, flake rate, average duration, and failure trends over time. This does not require a fancy tool. A script that parses JUnit XML files and writes metrics to a database or spreadsheet is sufficient. The important thing is visibility: when the team can see test health degrading, they fix it before it becomes a crisis.
6. Scaling from 10 to 1,000 Tests
The transition from a small test suite to a large one happens faster than most teams expect. What works for 10 tests (simple structure, shared data, single CI worker) breaks at 100 tests and collapses at 1,000. Plan for scale from the beginning by using the patterns described above: page objects, fixtures, data isolation, and parallel execution.
Automated test generation accelerates scaling. Instead of writing every test by hand, use Assrt to discover test scenarios and generate Playwright test files. This is particularly effective for increasing coverage across pages and flows that the team has not prioritized for manual test writing. The generated tests follow Playwright best practices (resilient selectors, auto-waiting, proper assertions) and integrate with your existing page objects and fixtures.
At scale, test suite maintenance becomes a dedicated responsibility. Assign ownership of test health metrics, flake investigation, and test infrastructure to a specific person or team. Without clear ownership, large test suites degrade gradually until the team loses confidence and stops relying on them. The investment in ownership pays for itself through faster deployments, fewer production incidents, and higher developer confidence.
Ready to automate your testing?
Assrt discovers test scenarios, writes Playwright tests from plain English, and self-heals when your UI changes.