Manufacturing software QA

The test coverage gap in manufacturing software is the screens nobody reaches

MES dashboards, ERP portals, quality systems, configurators, work-order UIs. They are deep, role-gated, and stateful. The coverage gap that bites is not a low percentage on a report. It is the dozens of operational screens an operator can reach that your test suite never opens.

Matthew Diakonov, Written with AI

Published June 17, 20268 min read

Direct answer

Test coverage gaps in manufacturing software live in the operational screens that are only reachable through auth, navigation, and state: work orders, BOM editors, batch records, quality holds, role-specific dashboards. Code-coverage metrics cannot see them, because a screen no test ever opens does not show up as a missed line. You close the gap by walking the app the way an operator does and writing tests for the screens you find, not by chasing a higher coverage percentage on the code you already exercise.

Mechanism verified 2026-06-17 in assrt-mcp (core/agent.ts).

The number that lies

Most advice on coverage gaps starts and ends with a percentage. Run the suite, read the report, find the red lines, add tests until the bar turns green. For a unit of business logic that is fine. For the kind of software that runs a plant, it quietly answers the wrong question.

Manufacturing web apps are not mostly logic. They are mostly screens: a work-order detail with a dozen status transitions, a BOM editor whose behavior depends on revision and effectivity, a quality module gated to a quality role, a configurator where option A disables option C. A code-coverage report cannot tell you the disposition screen is broken under a supervisor session, because no test ever navigated there. The screen is not 0 percent covered. It is absent from the report entirely. That is the gap, and it is invisible to the tool everyone reaches for.

Two ways to look at coverage

Measures which lines, branches, and functions your existing tests execute. A green bar means the paths your tests touch are exercised.

Cannot see a screen no test opens
Rewards adding assertions to code you already cover
Says nothing about role-gated or stateful behavior

Where the gap actually hides in a plant app

The screens below are the ones I see fall out of coverage first. They are reachable in production every shift, and they are exactly the ones a hand-maintained suite stops keeping up with after a few releases. None of them is the login page, which is usually the one screen that does have a test.

Manufacturing screens that go untested first

Work-order detail: status transitions, operator sign-offs, hold and release
BOM and routing editors: revision changes, component swaps, effectivity dates
Batch and device-history records: e-signature steps, lot genealogy, deviations
Quality screens: nonconformance, CAPA, inspection results, disposition
Role-gated dashboards: supervisor vs operator vs quality views of the same data
Multi-step configurators and order forms: option dependencies, validation, totals
Schedule and dispatch boards: drag, reassign, re-sequence, capacity warnings
Integration-facing pages: ERP sync status, label print, SCADA tag mapping views

What these have in common: you can only get to them through a workflow. Sign in, pick the right object, advance its state, switch to the right role. That is precisely the path a code-coverage report cannot follow and a busy team cannot re-walk by hand every release.

Find the gap by walking the app, not by reading a report

If the gap is the screens nobody reaches, the way to find it is to reach them. That is what Assrt's discovery does. While a test run executes one of your real scenarios, it watches where the run goes and quietly catalogs every page it lands on. Each new screen becomes a candidate for coverage.

How discovery turns a run into a coverage map

Land and act

The agent runs your scenario: signs in, opens a work order, applies a hold.

Notice new URLs

Every page the run navigates to is queued, normalized to origin + path, deduped.

Snapshot the page

Accessibility tree plus a screenshot, captured without interrupting the run.

Propose cases

A QA model reads the page and writes 1-2 cases against the buttons it can see.

Surface the gap

You get a list of reachable screens and what to test on each, not a number.

The output is not a percentage. It is a list of reachable screens with one or two proposed test cases each, written against the buttons and inputs that are actually on the page. You decide which to keep. Because Assrt emits standard Playwright files, closing a gap means committing a real test to your repository, not trusting a number on someone else's dashboard.

The exact behavior, so you can check it

This is the part no generic coverage guide can copy, because it is the real implementation. The discovery crawler in core/agent.ts is deliberately bounded so it covers a deep app fast without wandering forever.

0Reachable pages the discovery crawler will walk per run

0Pages discovered concurrently so a deep app gets covered fast

0Characters of the accessibility tree fed to the case generator

0Test cases proposed per discovered page, tied to real buttons

Concretely: every URL the run navigates to is passed through queueDiscoverPage, which normalizes it to origin plus pathname (so /work-orders/42 and the same page with a trailing slash or query string are not covered twice), dedupes against everything seen, and stops at 20 pages. A skip list drops /logout, /api/, javascript:, about:blank, and data URLs so the crawler never burns budget on noise. For each surviving page it captures the accessibility tree, slices the first 4000 characters, attaches a screenshot, and asks a QA model for one or two cases that reference visible elements only. The same prompt forbids login, signup, CSS, layout, and performance cases on purpose: discovery is for the operational screens that the login already unlocked.

What this does not do

Honesty matters more here than in most software, because regulated manufacturing has real obligations. Discovery does not replace requirements-to-test traceability. If you owe an auditor proof that every documented requirement has a verifying test, you still need that mapping and the paper trail. Discovery answers the orthogonal question: which screens a person can reach have no test at all. The 20-page cap means a very large app needs more than one run, seeded from different starting scenarios, to walk its whole surface. And discovery proposes cases; it does not decide which risks matter to your line. You do.

What you avoid is the trap most coverage tooling sets: a comforting number that goes up while the screens that would actually stop a shift stay untested. Reachable-screen coverage is harder to fake, because the unit is a real page with a real test, kept in your own repository.

Walk your plant app and see what has no test

Bring your MES, ERP, or quality web app and we will run discovery against it live, then hand you the Playwright files for the screens it surfaces.

Questions teams ask about manufacturing-software coverage gaps

Frequently asked questions

What actually causes test coverage gaps in manufacturing software?

Not the things coverage reports flag. The gap lives in operational screens that are only reachable after a real workflow: sign in, pick a work order, advance its status, then land on the disposition page. Code-coverage tools measure which lines ran in your existing tests, so a screen no test ever opens is invisible to them, not flagged as 0 percent. Manual QA cannot keep up with dozens of role-gated, state-dependent screens every release. So the gap is the set of pages a user can reach that your suite never does.

Is code coverage a bad metric for a MES or ERP web app?

It is a useful metric for the wrong question. A high line-coverage number tells you the code paths your tests touch are exercised. It tells you nothing about whether the work-order hold screen, the BOM revision editor, or the supervisor dashboard renders and behaves under a real session. Those are flow-and-state problems, not line problems. Treat code coverage as a floor for unit logic and track operational-screen coverage separately: how many of the screens an operator can actually reach are under test.

How does Assrt find screens that no test currently covers?

It walks the app the way an operator would. As an Assrt run executes your scenario, every new URL it navigates to is queued for discovery. The crawler caps at 20 reachable pages per run and processes 3 at a time, normalizes each URL to origin plus pathname so it does not re-cover the same screen, and skips noise like /logout, /api/, and blank pages. For each discovered page it captures the accessibility tree and a screenshot, then a QA model proposes one or two test cases tied to the actual buttons and inputs on that page. You can read the exact behavior in core/agent.ts in the assrt-mcp repository.

Why does the crawler skip login and signup pages?

The discovery prompt explicitly tells the case generator not to write login or signup cases, and not to write cases about CSS, responsive layout, or performance. The reason is focus: the coverage gap that matters in manufacturing software is operational behavior on screens behind the login, not the login itself. Auth gets tested as the first step of your scenario; discovery spends its budget on the deep screens that auth unlocks.

Does this replace requirements-to-test traceability?

No, and it should not pretend to. Traceability answers whether every documented requirement has a test. Discovery answers a different question: which screens a user can physically reach have no test at all. Regulated manufacturing still needs requirements coverage and the paper trail that goes with it. Discovery is the layer that catches the screens nobody wrote a requirement for but operators use every shift. Run both.

What do I get out of a run, a number or actual tests?

Both, but the useful output is concrete. Discovery gives you a list of reachable screens plus 1-2 proposed cases per screen, written against buttons that exist on the page. Assrt generates standard Playwright files you can read, edit, and commit, so closing a gap means keeping a test, not trusting a dashboard. There is no proprietary format and nothing to migrate out of later.