CI & Verification

AI config CI verification, done where the agent edited the config

Your AI agent rewrites next.config.mjs, tweaks vercel.json, renames an env var. The change typechecks. CI is green. The deployment boots into a 500. This is the gap, and a behavioral gate that fires on git commit closes it.

M
Matthew Diakonov
9 min read
4.9from developers running Assrt locally
Plain-text scenarios live next to next.config.mjs in the repo
Hook fires on git commit, no GitHub Actions wiring required
JSON output (TestReport schema) pipes into any existing CI

The actual failure mode

Most CI pipelines verify config in three ways: schema validation, type checking, and unit tests. All three operate on the file. None of them operate on the running app. So when an AI agent renames NEXT_PUBLIC_PRICING_TIER to PUBLIC_PRICING_TIER because it "looks cleaner," the build is happy. The homepage is happy. The pricing page is rendering an empty array.

The same pattern hides in dozens of places an AI agent will edit without thinking twice: a missing transpilePackages entry after adding a new internal package, a regex change in a rewrites() rule, a Tailwind @source path that no longer matches, an Origin allowlist for images.remotePatterns. Each of those typechecks. Each one breaks the running app.

The fix is not better config validation. The fix is to actually boot the app and look at it, in the same loop that produced the edit, before the change leaves the developer machine.

Static validation vs. behavioral verification

FeatureStatic validationBehavioral (Assrt)
Catches typo in env var nameOnly if a Zod schema asserts it existsYes. The page that reads the var renders blank, scenario fails
Catches a missing transpilePackages entryNo. Build succeeds, runtime throws on importYes. The route that imports it returns 500, scenario fails
Catches a broken rewrite ruleNo. Config object is shaped correctlyYes. Navigate /old-path, expect /new-path, scenario fails
Catches a Tailwind @source path driftNo. Tailwind silently produces a smaller CSS fileYes. Visual scenario sees unstyled component
Where verification runsRemote CI, after PR openLocal agent loop, before git push
Author of the testHuman, after the regression shipsSame agent that edited the config, in the same turn

The shape of the loop

The verification gate has three inputs (the agent edit, the test file checked into the repo, the local dev server) and one output (a pass/fail JSON the agent reads back in the same turn).

Inputs and the gate

next.config.mjs
vercel.json
.env.local
tests/*.txt
assrt_test
TestReport JSON
Recorded video
Agent re-reads

The hook that wires it

The reason this loop works without a separate CI pipeline is a Claude Code PostToolUse hook installed by npx @assrt-ai/assrt setup. The hook intercepts every Bash tool call the agent makes and checks whether the command was a git commit or push. If yes, it emits a structured response that Claude Code injects into the conversation as additionalContext, instructing the agent to run a verification before continuing.

The hook is short enough to read in full. Open it yourself at ~/.claude/hooks/assrt-qa-reminder.sh after running setup, or read the source at src/cli.ts:199-208.

~/.claude/hooks/assrt-qa-reminder.sh

What the agent sees

The hook turns a one-shot commit into a feedback loop. The agent runs a commit, gets reminded to verify, runs assrt_test, reads the JSON, and either pushes or fixes the regression in the same turn. Here is the actual sequence.

claude code session
0GitHub Actions YAML required
0 keysTestReport JSON shape
0hook script (under 10 lines)
0vendor lock-in

Wiring it up

The whole setup is four steps and runs once per machine. After that, every config edit the agent makes flows through the same behavioral gate, automatically, on every commit.

1

Install the MCP server and the hook

npx @assrt-ai/assrt setup registers the assrt MCP server globally with Claude Code, writes the QA reminder hook to ~/.claude/hooks/, and appends a QA Testing section to your global CLAUDE.md so the agent knows the contract.

2

Check in a #Case file next to the config

Drop a tests/post-commit.txt (or any path you like) into the repo. Each #Case is a plain-language scenario the agent will execute in a real browser. No Playwright code. The same agent that edits next.config.mjs can extend this file in the same turn.

3

Let the hook fire on git commit

After any git commit or git push the agent runs, the PostToolUse hook injects a reminder to run assrt_test. The agent calls the MCP tool against the local dev server, gets a structured TestReport back, and acts on it.

4

Pipe the JSON into your existing CI if you want

assrt run --json writes a TestReport to stdout. failedCount > 0 is the only condition you need to fail a build. You can run the same command in a GitHub Actions step, a Vercel build hook, or a pre-push git hook.

The contract: TestReport JSON

Everything the verification gate produces flows through one JSON shape. It is six keys at the top level. There is no proprietary YAML, no DSL, no SaaS payload. Read the type and you have read the API.

src/core/types.ts

The full file is 106 lines and lives in the open-source assrt-mcp repo. There is no second schema you need to learn to consume the results. jq '.failedCount' is a complete CI gate.

What a checked-in #Case file looks like

The scenarios that guard your config live in the repo as plain text. Three scenarios is usually enough to catch the failure modes that schema validation misses. Scope them to the surface area the config actually controls.

tests/post-commit.txt

Try the setup once, keep the loop forever

One npx command installs the MCP server, the PostToolUse hook, and the global CLAUDE.md instructions. Every config edit your agent makes from then on flows through a real-browser gate before it leaves your machine.

Read the install steps

Frequently asked questions

What does ai config ci verification actually mean?

It is the part of CI that confirms a config change still produces a working application. Schema validation only proves the config parses. Type checks only prove it satisfies the type system. Verification means the app actually runs, the routes actually respond, and the user-facing behavior the config controls (rewrites, redirects, headers, transpilePackages, env-driven features) is still correct in a real browser.

Why is verification harder when an AI agent writes the config?

AI agents are good at producing config that satisfies the type signature and looks like the documentation. They are bad at noticing that withSeoContent must wrap the export, that transpilePackages: ["@seo/components"] is required for the imports they just added, or that the env var they renamed is read in three other files. The compiler does not catch any of that. The browser does.

Where does Assrt fit relative to GitHub Actions or my existing CI?

Assrt runs inside the agent loop on the developer machine, before the push. Your remote CI still runs unit tests, type checks, and lint on the resulting commit. The point of the local behavioral gate is to fail the change early, in the same conversation that produced it, so the agent fixes it now instead of opening a PR you have to revert.

What does the PostToolUse hook actually do?

After npx @assrt-ai/assrt setup, a script lands at ~/.claude/hooks/assrt-qa-reminder.sh. It reads the Bash tool input as JSON, greps tool_input.command for the regex git (commit|push), and emits a JSON object with hookSpecificOutput.additionalContext that instructs the agent to run assrt_test against the local dev server. The full body is the QA_REMINDER_HOOK constant in src/cli.ts:199-208 of the assrt-mcp source.

What JSON does Assrt return that I can pipe into a CI gate?

The TestReport schema in src/core/types.ts:28-35 is the contract: url, scenarios[], totalDuration, passedCount, failedCount, generatedAt. Each scenario has name, passed, steps[], assertions[], summary, duration. failedCount > 0 is the only condition you need to fail a build. assrt run --json writes this to stdout so you can pipe it directly to jq, a script, or a CI step.

Why use plain-language #Case scenarios instead of writing Playwright code directly?

Two reasons. First, the agent that wrote the config can also write the scenarios in the same paragraph. There is no second translation layer. Second, when the config changes, the scenario rarely changes; you described the user-facing behavior, not the selectors. The Playwright execution is real (Assrt wraps @playwright/mcp under the hood) so you keep deterministic browser semantics with a much shorter author-loop.

Is this open source and self-hosted?

Yes. The MCP server and CLI are open source, run locally, and require no cloud account. The artifact uploader is opt-in. The keychain integration reads your existing Claude Code credentials so you do not need a separate API key for the runner. Tests stay in your repo as plain text files; there is no proprietary YAML and no vendor lock-in.