CI & Verification
AI config CI verification, done where the agent edited the config
Your AI agent rewrites next.config.mjs, tweaks vercel.json, renames an env var. The change typechecks. CI is green. The deployment boots into a 500. This is the gap, and a behavioral gate that fires on git commit closes it.
The actual failure mode
Most CI pipelines verify config in three ways: schema validation, type checking, and unit tests. All three operate on the file. None of them operate on the running app. So when an AI agent renames NEXT_PUBLIC_PRICING_TIER to PUBLIC_PRICING_TIER because it "looks cleaner," the build is happy. The homepage is happy. The pricing page is rendering an empty array.
The same pattern hides in dozens of places an AI agent will edit without thinking twice: a missing transpilePackages entry after adding a new internal package, a regex change in a rewrites() rule, a Tailwind @source path that no longer matches, an Origin allowlist for images.remotePatterns. Each of those typechecks. Each one breaks the running app.
The fix is not better config validation. The fix is to actually boot the app and look at it, in the same loop that produced the edit, before the change leaves the developer machine.
Static validation vs. behavioral verification
| Feature | Static validation | Behavioral (Assrt) |
|---|---|---|
| Catches typo in env var name | Only if a Zod schema asserts it exists | Yes. The page that reads the var renders blank, scenario fails |
| Catches a missing transpilePackages entry | No. Build succeeds, runtime throws on import | Yes. The route that imports it returns 500, scenario fails |
| Catches a broken rewrite rule | No. Config object is shaped correctly | Yes. Navigate /old-path, expect /new-path, scenario fails |
| Catches a Tailwind @source path drift | No. Tailwind silently produces a smaller CSS file | Yes. Visual scenario sees unstyled component |
| Where verification runs | Remote CI, after PR open | Local agent loop, before git push |
| Author of the test | Human, after the regression ships | Same agent that edited the config, in the same turn |
The shape of the loop
The verification gate has three inputs (the agent edit, the test file checked into the repo, the local dev server) and one output (a pass/fail JSON the agent reads back in the same turn).
Inputs and the gate
The hook that wires it
The reason this loop works without a separate CI pipeline is a Claude Code PostToolUse hook installed by npx @assrt-ai/assrt setup. The hook intercepts every Bash tool call the agent makes and checks whether the command was a git commit or push. If yes, it emits a structured response that Claude Code injects into the conversation as additionalContext, instructing the agent to run a verification before continuing.
The hook is short enough to read in full. Open it yourself at ~/.claude/hooks/assrt-qa-reminder.sh after running setup, or read the source at src/cli.ts:199-208.
What the agent sees
The hook turns a one-shot commit into a feedback loop. The agent runs a commit, gets reminded to verify, runs assrt_test, reads the JSON, and either pushes or fixes the regression in the same turn. Here is the actual sequence.
Wiring it up
The whole setup is four steps and runs once per machine. After that, every config edit the agent makes flows through the same behavioral gate, automatically, on every commit.
Install the MCP server and the hook
npx @assrt-ai/assrt setup registers the assrt MCP server globally with Claude Code, writes the QA reminder hook to ~/.claude/hooks/, and appends a QA Testing section to your global CLAUDE.md so the agent knows the contract.
Check in a #Case file next to the config
Drop a tests/post-commit.txt (or any path you like) into the repo. Each #Case is a plain-language scenario the agent will execute in a real browser. No Playwright code. The same agent that edits next.config.mjs can extend this file in the same turn.
Let the hook fire on git commit
After any git commit or git push the agent runs, the PostToolUse hook injects a reminder to run assrt_test. The agent calls the MCP tool against the local dev server, gets a structured TestReport back, and acts on it.
Pipe the JSON into your existing CI if you want
assrt run --json writes a TestReport to stdout. failedCount > 0 is the only condition you need to fail a build. You can run the same command in a GitHub Actions step, a Vercel build hook, or a pre-push git hook.
The contract: TestReport JSON
Everything the verification gate produces flows through one JSON shape. It is six keys at the top level. There is no proprietary YAML, no DSL, no SaaS payload. Read the type and you have read the API.
The full file is 106 lines and lives in the open-source assrt-mcp repo. There is no second schema you need to learn to consume the results. jq '.failedCount' is a complete CI gate.
What a checked-in #Case file looks like
The scenarios that guard your config live in the repo as plain text. Three scenarios is usually enough to catch the failure modes that schema validation misses. Scope them to the surface area the config actually controls.
Try the setup once, keep the loop forever
One npx command installs the MCP server, the PostToolUse hook, and the global CLAUDE.md instructions. Every config edit your agent makes from then on flows through a real-browser gate before it leaves your machine.
Read the install steps →Frequently asked questions
What does ai config ci verification actually mean?
It is the part of CI that confirms a config change still produces a working application. Schema validation only proves the config parses. Type checks only prove it satisfies the type system. Verification means the app actually runs, the routes actually respond, and the user-facing behavior the config controls (rewrites, redirects, headers, transpilePackages, env-driven features) is still correct in a real browser.
Why is verification harder when an AI agent writes the config?
AI agents are good at producing config that satisfies the type signature and looks like the documentation. They are bad at noticing that withSeoContent must wrap the export, that transpilePackages: ["@seo/components"] is required for the imports they just added, or that the env var they renamed is read in three other files. The compiler does not catch any of that. The browser does.
Where does Assrt fit relative to GitHub Actions or my existing CI?
Assrt runs inside the agent loop on the developer machine, before the push. Your remote CI still runs unit tests, type checks, and lint on the resulting commit. The point of the local behavioral gate is to fail the change early, in the same conversation that produced it, so the agent fixes it now instead of opening a PR you have to revert.
What does the PostToolUse hook actually do?
After npx @assrt-ai/assrt setup, a script lands at ~/.claude/hooks/assrt-qa-reminder.sh. It reads the Bash tool input as JSON, greps tool_input.command for the regex git (commit|push), and emits a JSON object with hookSpecificOutput.additionalContext that instructs the agent to run assrt_test against the local dev server. The full body is the QA_REMINDER_HOOK constant in src/cli.ts:199-208 of the assrt-mcp source.
What JSON does Assrt return that I can pipe into a CI gate?
The TestReport schema in src/core/types.ts:28-35 is the contract: url, scenarios[], totalDuration, passedCount, failedCount, generatedAt. Each scenario has name, passed, steps[], assertions[], summary, duration. failedCount > 0 is the only condition you need to fail a build. assrt run --json writes this to stdout so you can pipe it directly to jq, a script, or a CI step.
Why use plain-language #Case scenarios instead of writing Playwright code directly?
Two reasons. First, the agent that wrote the config can also write the scenarios in the same paragraph. There is no second translation layer. Second, when the config changes, the scenario rarely changes; you described the user-facing behavior, not the selectors. The Playwright execution is real (Assrt wraps @playwright/mcp under the hood) so you keep deterministic browser semantics with a much shorter author-loop.
Is this open source and self-hosted?
Yes. The MCP server and CLI are open source, run locally, and require no cloud account. The artifact uploader is opt-in. The keychain integration reads your existing Claude Code credentials so you do not need a separate API key for the runner. Tests stay in your repo as plain text files; there is no proprietary YAML and no vendor lock-in.
Open source. Self-hosted. No account required.