Argument, with file and line numbers

A green E2E run in CI/CD only ships when each assertion writes down why it passed

Direct answer

A green CI run earns ship-confidence only when every passing assertion in the report records a one-line written reason next to the boolean, not just an absent stack trace. The structural change is an assertion type that carries three fields per check (description, passed, evidence), so the JSON artifact from the runner is auditable without rerunning the suite. The rest of this page is the argument, walked against the open Assrt source so every claim is checkable.

M
Matthew Diakonov
9 min read

The agent source referenced here is at github.com/assrt-ai/assrt-mcp. If a line number looks specific, it is because the file is open.

What "confidence" actually means in a CI/CD context

When a team says they want confidence from their E2E suite in CI, they mean something specific. They mean: when the build is green, we ship without a human re-reading the diff. When the build is red, we trust the signal enough to roll back or block the deploy. The whole point of the runner sitting in the pipeline is to remove a human from the loop.

That contract has two halves. The red half is well-served by every modern runner: a failed assertion throws, the stack trace points at the line, the trace viewer replays the DOM. You know what broke. The green half is where most suites silently fall short. A green Playwright run carries the information "none of the assertions threw". That is a weaker statement than "the app worked". The gap between those two statements is where shipping anxiety lives.

The closing move is not more tests. It is a different shape for the existing tests, so the green path carries the same kind of evidence the red path already does.

The structural change: three fields per assertion, not two

Vanilla Playwright assertions are two-field: the thing you are checking, and the boolean outcome. The boolean lives implicitly in "did the line throw". There is no recorded reason for the passes. There cannot be: the API does not have a place to put one.

Assrt's assertion shape adds a third field. The tool declaration at src/core/agent.ts:134-142 requires three properties on every assert call:

{
  name: "assert",
  description: "Make a test assertion about the current page state.",
  input_schema: {
    type: "object",
    properties: {
      description: { type: "string", description: "What you are asserting" },
      passed:      { type: "boolean", description: "Whether the assertion passed" },
      evidence:    { type: "string", description: "Evidence for the result" },
    },
    required: ["description", "passed", "evidence"],
  },
}

The required array is the load-bearing part. The agent cannot call assert with the evidence field empty; the tool call is rejected at the model layer before any test state is mutated. That is why every entry in the final report carries a written reason.

The TestAssertion type that lands in the JSON report is at src/core/types.ts:13-17. The same three fields, persisted:

export interface TestAssertion {
  description: string;
  passed: boolean;
  evidence: string;
}

What the test artifact carries

A failed assertion produces a stack trace, screenshot, and trace.zip. A passed assertion produces nothing.

  • 200 silent passes are indistinguishable from 200 short-circuits
  • On a green run, the audit trail is the absence of errors
  • To check what a pass meant, you re-read the test source
  • Re-reading does not tell you what the DOM looked like at run time

How the runner talks to CI: one exit code, one JSON artifact

The other half of a usable CI step is the wiring. A runner that carries rich assertion data internally but only prints to stdout is still opaque to the pipeline. Two things have to happen at the boundary: the process must exit non-zero on any failure (so the CI step actually fails the build), and the report must land somewhere the pipeline can grab it (so the failure summary is one curl away).

The exit-code wiring in Assrt is one line, at src/cli.ts:601:

process.exit(report.failedCount > 0 ? 1 : 0);

The JSON artifact is the --json flag, parsed at src/cli.ts:590-598. Pass the flag, redirect stdout to a file, and you have a build artifact you can upload from any pipeline (GitHub Actions upload-artifact, GitLab artifacts:paths, Buildkite artifact upload). The report contains every scenario, every step, every assertion, every evidence string, the total duration, and the timestamp. Reading the artifact tells you enough to decide whether to ship; you only open the video when you cannot decide from text.

The end-to-end shape of a CI run

CI runnerAssrt CLIAgentReport artifactnpx @m13v/assrt run --url ... --jsonload plan, start scenariosassert({description, passed, evidence})scenario_complete eventTestReport JSON to stdoutprocess.exit(failedCount > 0 ? 1 : 0)upload as build artifact, read on failure

What a green run looks like, end to end

Concretely: this is what a CI step running the Assrt CLI prints on a simple checkout scenario. Two assertions, both passing. Notice that each PASSED line is followed by an evidence line describing what the agent actually observed. The shell exit code at the end is 0, which is what the pipeline reads to decide green-or-red.

checkout flow, green

The thing to look at is not the PASSED lines. It is the evidence lines below them. "Subtotal text was \"$29.00\" matching the unit price of the single SKU in the cart" tells you the assertion was actually exercising the price calculation, not silently passing on an unrelated DOM. Six months later, reading the same artifact, you will know what that scenario covered without rerunning it.

Where this matters in practice: a regression slips in and the test still passes for the wrong reason. With evidence-per-pass, you can grep the historical reports for the assertion description and watch the evidence string drift over time. A pass whose evidence used to say "total was $29.00" and now says "heading text included '29'" is a pass that used to mean something and no longer does. That is a signal vanilla green-or-red does not carry.

The honest pushback

The fair objection: you can already write the equivalent in vanilla Playwright. Wrap every expect in a try/catch, attach a custom annotation on success, write the annotation to a side file. Mechanically true. In practice no one does it, because the framework does not require it and the discipline collapses the third time someone writes a test under deadline pressure.

The structural argument is that the tool schema is the right place to enforce the discipline. If the runner's tool schema makes the evidence field required (Assrt's does, at agent.ts:142), every test author has to write the string or the call does not go through. The discipline is enforced once, in the tool definition, not 200 times across a growing test suite. That is what separates a structure that survives from a convention that erodes.

The other fair objection: evidence strings can lie. The agent could write "page loaded correctly" on a page that loaded incorrectly. True, and the mitigation is the same as for any test: the evidence string is reviewed alongside the description in code review, and historical evidence drift is watched on the trend graph. Lying evidence is at least visibly present in the report; it is a thing you can look at and challenge. Silent passes are not.

What to do on Monday

Three concrete moves, in order of cost.

  1. 1.Pick one critical user flow that currently has a green E2E test in CI. Read the test source. Ask: can I tell, from the code, what each pass actually proves? If the answer is "no, I would have to rerun it in headed mode", that test is a candidate for the structural rewrite.
  2. 2.Run Assrt against the same flow: npx @m13v/assrt run --url <your-app> --plan "Walk the checkout flow with a valid card" --json > report.json. Read the resulting JSON. The evidence strings on the passes are the data your current suite does not produce.
  3. 3.Add the run to your existing pipeline as a second step, behind a feature flag if the team is cautious. The exit code at cli.ts:601 is already standard. The JSON artifact goes into the same uploader your existing tests use. You do not need to rewrite the suite to start seeing the difference; you need one flow producing evidence-per-pass and a habit of reading it.

Want to walk a real CI artifact together?

Bring a current green run from your suite. We will look at what the report carries today, what it could carry, and whether evidence-per-pass would survive your team's review culture.

Common questions about CI confidence and assertion structure

When does a green E2E run in CI/CD actually mean ship?

When every passing assertion records a one-line reason (evidence string) next to the boolean, so the report is auditable without rerunning the suite. A green Playwright run that only carries the absence of thrown errors is a thumbs-up, not a shipping decision. The structural fix is an assertion type that captures three fields per check (what was asserted, the boolean, a written reason). Assrt's TestAssertion interface at src/core/types.ts line 13-17 is exactly that shape. Once your report carries the evidence, you can read a CI artifact and decide whether to ship or roll back without ever opening a video.

What is wrong with the standard Playwright assert model for CI confidence?

Nothing for failures, almost everything for passes. expect(page.locator('.welcome')).toBeVisible() either throws (and you get a stack trace, a screenshot, a trace.zip) or returns silently. The silent return is the problem. In a 200-assertion suite, 198 of them passed silently. Your CI artifact tells you no error happened, but cannot tell you which assertions were exercising the code under test versus which ones short-circuited on selectors that no longer matched the intended element. The confidence ceiling is 'the runner did not throw on these lines'. That ceiling is below what most teams assume they bought.

How does Assrt change the assertion structure?

The agent's assert tool, declared at src/core/agent.ts line 134-142, requires three fields per check: description (string), passed (boolean), evidence (string). The runner refuses to call the tool with the evidence field empty. When the agent dispatches an assertion at agent.ts line 893-903, the evidence string is pushed onto a per-scenario assertions array and emitted on the SSE stream. The final TestReport (types.ts:28-35) carries every assertion's evidence in JSON. In CI, that JSON is the artifact your release gate reads.

Does the Assrt CLI return a clean exit code for CI?

Yes. The CLI's last line before exit is process.exit(report.failedCount > 0 ? 1 : 0) at src/cli.ts line 601. Pair that with --json and you have a CI step that fails the build on any failed assertion AND writes a structured JSON report to stdout in the same call. The runner sets exit code 1 once. The report file carries enough detail that you do not need to rerun in headed mode to know what happened.

Does this slow the suite down?

It costs the few tokens the model takes to write the evidence string. The actual selector resolution, network requests, and DOM work are the dominant cost; producing one short sentence per assertion is a rounding error. The trade is paying a small per-check overhead to avoid the much larger cost of rerunning a flaky CI build in headed mode to figure out which of yesterday's 200 silent passes was actually doing something. If the suite is small, you pay nothing meaningful. If the suite is large, the saved rerun cost dwarfs the overhead.

How does this interact with screenshots and traces?

Additively. Evidence-per-assertion does not replace screenshots, video, or the Playwright trace viewer. Those are still the right tools when something failed and you need to see the rendered pixels. Evidence-per-assertion replaces the question 'did the passes mean anything', which screenshots cannot answer (there is no failed screenshot to look at on a pass). In a CI pipeline, the structured JSON tells you whether to even bother loading the video. Most of the time you do not need to.

What does a failure look like, end-to-end?

Agent calls assert with passed: false and evidence: 'Heading text was "Welcome to dashboard" but I was looking for "Welcome back"'. The assertion is pushed onto the scenario's assertions array at agent.ts:897. scenarioPassed flips to false at agent.ts:901. The SSE stream emits an 'assertion' event for any live UI watching. complete_scenario writes the scenario into the TestReport. The CLI prints the report, finds failedCount > 0 in cli.ts:601, exits 1. The CI step fails. The JSON artifact written to disk has the failing assertion's description and evidence string as the first thing you read. No rerun required to know what broke.

Is this open source so I can verify it?

Yes. The MCP server, agent loop, browser wrapper, and CLI live at github.com/assrt-ai/assrt-mcp. The TestAssertion interface is at src/core/types.ts (line 13). The assert tool declaration is at src/core/agent.ts (line 134). The assert dispatch and evidence push is at src/core/agent.ts (line 893). The exit-code wiring is at src/cli.ts (line 601). The npm package is @m13v/assrt. Every claim on this page is checkable on the main branch.

A short closing

E2E test confidence in CI/CD is a question about the shape of the test artifact, not about the volume of tests. A small suite where every assertion carries written evidence will out-ship a large suite of silent passes. The runner's tool schema is the right place to enforce that, because it is the only place the discipline cannot erode under deadline pressure. The rest is paperwork.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.