Continuous monitoring for web app tests, when the test plan is a markdown file the agent keeps editing.
Almost every guide on this topic describes the same shape: write a Playwright spec, drop it in a Docker image, run it on a five-minute cron, alert when it fails. That works. It also freezes the test definition the moment you push it. When a selector drifts, every interval fails until a human opens the editor and ships a fix. This page is about a different shape: a continuous monitoring loop designed around a markdown plan an agent can rewrite between fires, a watcher that picks the rewrite up within a second, and a five position schedule that just keeps reading the latest plan.
Everything in this guide cites a real file. The two paths that do most of the load-bearing work are assrt-mcp/src/core/scenario-files.ts and assrt/src/app/app/test/page.tsx. Open them while you read.
Step 1
The shape every other monitoring guide ships, and where it stops working.
Read the public Playwright synthetic monitoring write-ups from Checkly, Tracetest, OneUptime, the workadventure Docker image, and the Microsoft Application Insights post. They converge on one recipe. Commit a Playwright spec to a repo. Build a container with Chromium. Trigger it from a five-minute cron, an Azure Function, or a Checkly schedule. Page someone when failedCount goes above zero.
The recipe assumes the spec file is correct and stable. When it is, the loop is fine. When it is not, every interval between the moment a selector silently drifts and the moment a human pushes a fix is wasted: the test fires, fails, pages, and tells you nothing new. You are paying compute and alert noise to confirm a known broken test is still broken. The interesting product question is whether the loop can fix the test itself.
The frozen-spec loop versus the watched-plan loop
A committed Playwright file fires every 5 minutes. The plan is whatever you last pushed. If a selector drifts at 02:14, every fire from 02:15 onward fails until a human pushes a fix. The monitoring stack pages you 12 times an hour with the same broken assertion until then.
- Plan is immutable between deploys
- Self-healing requires a human in the deploy loop
- Same alert fires every interval until a fix ships
- Drift between code and reality only closes on push
Step 2
The watcher that turns the plan from a file into a feed.
Open assrt-mcp/src/core/scenario-files.ts and look at the top of the file. Three paths are declared as constants: /tmp/assrt/scenario.md for the plan, /tmp/assrt/scenario.json for metadata, and /tmp/assrt/results/latest.json for the most recent run. Every assrt_test invocation writes the plan to the first path, runs against your URL, then writes the report to the third.
The interesting bit is what happens between writes. At line 97 the file calls fs.watch on scenario.md. Any edit, whether from the agent rewriting a step, from a human opening the file in an editor, or from another process, fires the listener. The listener does not sync immediately; it sets a 1000ms debounce timer (line 100). If another edit lands inside that window, the timer resets. After one quiet second, the handler reads the file, compares it to lastWrittenContent to skip echoing the runner's own writes, then PATCHes /api/public/scenarios/:id via the updateScenario function in scenario-store.ts.
One concrete consequence: a scenario whose ID begins with local- never sets up the watcher (line 94). Local-only plans skip sync entirely, because they have no central row to update. So you can run a private monitoring loop with no network at all, and the only thing you lose is the cross-machine consistency that the central PATCH provides.
One scheduled fire end-to-end
Step 3
The five cadences the dashboard offers, and what each one is for.
The job creation form lives at assrt/src/app/app/test/page.tsx, and the schedule selector is hardcoded between lines 1255 and 1259. Five options, no cron string field. The runner translates the choice into a fire interval, the job state machine in src/components/types.ts:50 tracks the result through five states (scheduled, running, completed, failed, cancelled). Pick the cadence by what failure mode you are trying to catch.
From one shot to weekly, the five positions on the dial
- 1
Run once
Smoke a deploy by hand. Same plan, single fire.
- 2
Hourly
Tight loop for a flow under active iteration.
- 3
Every 6 hours
Default for most production paths.
- 4
Daily 9am
Slow flows, long signups, weekly-cohort funnels.
- 5
Weekly Mon 9am
Compliance-grade or low-volume flows.
If you need a cadence that is not in the dropdown, wrap the runner in your own cron. The dashboard is one trigger; the underlying CLI npx @m13v/assrt run --url <url> --plan-file scenario.md --json is the same command the scheduled job calls. Drop it in a GitHub Actions schedule, a systemd timer, or a Cloudflare cron trigger and you get a custom interval against the same watched plan. The five-option dropdown is the convenient path; the CLI is the escape hatch.
Step 4
Read both shapes side by side.
The left panel is the canonical Playwright synthetic monitoring recipe pushed by every guide listed at the top of this page: a GitHub Actions workflow plus a committed spec file. The right panel is what a watched plan looks like in plain Markdown. Same flow under test, same output schema, two different stories about who is allowed to edit the plan and when.
Frozen workflow versus watched plan
# .github/workflows/synthetic.yml
# What every "Playwright synthetic monitoring" guide proposes:
# a frozen .spec.ts firing on a fixed cron.
name: Synthetic monitor
on:
schedule:
- cron: "*/5 * * * *" # every 5 minutes, forever
jobs:
run:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npx playwright install chromium
- run: npx playwright test tests/login.spec.ts --reporter=line
env:
PROD_URL: ${{ vars.PROD_URL }}
# tests/login.spec.ts — committed, immutable until a human edits it.
# If the selector drifts, every interval fails until someone pushes a fix.
import { test, expect } from "@playwright/test";
test("login still works", async ({ page }) => {
await page.goto(process.env.PROD_URL!);
await page.getByTestId("email-input").fill("monitor@example.com");
await page.getByTestId("password-input").fill(process.env.MONITOR_PASS!);
await page.getByRole("button", { name: "Sign in" }).click();
await expect(page).toHaveURL(/.*\/dashboard/);
});Step 5
The artifact every alerting hook should read.
You do not need to subscribe to a webhook to integrate with a continuous run. Every fire writes two files locally: /tmp/assrt/results/latest.json (overwritten each run) and /tmp/assrt/results/<runId>.json (immutable, keyed by runId). Both follow the TestReport schema declared in core/types.ts: a top-level passedCount and failedCount, totalDuration in ms, and an array of scenarios each carrying assertions with per-assertion evidence strings. The simplest possible alert is a tail of latest.json piped to your notifier when failedCount is nonzero. The richer path is to read the per-runId file and POST a summary to your incident channel.
If you want artifacts beyond the JSON, the same run uploads screenshots, a webm of the browser session, and the full event log to /api/public/scenarios/:id/runs/:runId/artifacts via the uploadArtifacts function in scenario-store.ts (line 202). The cloud URLs are deterministic: scenario id, run id, file name. So a Slack alert can link directly to the failing run's video without any signed-URL ceremony.
Step 6
Why the plan is Markdown, and why the run is real Playwright.
The reason the plan is Markdown and not YAML or a proprietary AST is that the agent has to write it. An agent emitting structured YAML is doing extra work to please a schema; an agent emitting numbered Markdown bullets is doing what it already does well. The format is forgiving enough that a human can hand-edit the same file in any text editor, with no schema validator yelling about missing keys.
The reason the run underneath is real Playwright (Chromium, Firefox, WebKit, all the standard APIs) is that the moment the plan format becomes proprietary, you have a vendor lock-in problem. The continuous monitoring loop is a habit you build over months; if leaving the vendor means rewriting every plan in someone else's DSL, you do not have a habit, you have a dependency. With Markdown plans plus a Playwright runner, the plans port to any engine that knows the same nine verbs (open, type, click, wait, assert, and so on) and the runner output is standard Playwright trace and video.
Step 7
When the frozen shape is the right answer.
The watched plan is not always the right model. If you are running regulated workflows where the test definition is a control under audit, you want the plan to be immutable between human-reviewed commits, full stop. A SOX or HIPAA-flavored shop should run the frozen .spec.ts loop and treat any drift as a deployment event, not as a self-heal. The Assrt model assumes you trust the agent more than you trust the cron of last week's spec; for some teams that is the wrong tradeoff and the answer is to keep using Checkly, Tracetest, or the bare workadventure container.
The honest pitch for the watched-plan model is for fast-moving teams shipping daily, where the cost of a brittle test catching a real regression is high but the cost of a brittle test pinging for two days because the deploy queue is full is also high. There the self-heal saves real toil. Outside that, the boring shape wins.
Want this loop running against your URL by Friday?
Bring a flow you care about. We will set up the watched plan, the schedule, and the alert hook on a 30 minute call.
Frequently asked questions
What does continuous monitoring actually do for a web app, in one sentence?
It re-runs a small set of end-to-end tests against your production URL on a schedule, so a regression that breaks login, checkout, or any other critical flow gets caught by the test agent rather than by a user opening a support ticket. The tests are written once but fire forever; the value is in the time between when something breaks and when you find out.
How is this different from uptime monitoring or status-page pings?
An uptime check tells you the homepage returned a 200. It does not tell you that the login form silently swallows the password because someone shipped a typo in a React state setter, or that checkout returns 200 but the Stripe redirect never fires, or that the magic-link email path errors after the first redirect. Continuous monitoring drives the actual flow: it types into the form, clicks the button, waits for the post-login URL, and asserts what the user would assert. The cost is more setup; the payoff is catching everything between the homepage 200 and the working feature.
Why is the test plan stored as markdown instead of a TypeScript file?
The plan is what the agent reasons about. Storing it as Markdown means the agent can read it, edit it, and write it back without parsing TypeScript. The actual run still produces real Playwright execution underneath, but the plan layer above is text the AI can author. The watcher in scenario-files.ts (line 97) listens for any edit to /tmp/assrt/scenario.md, debounces for 1000ms, and PATCHes the change to /api/public/scenarios/:id so the next scheduled run picks up the new plan automatically.
What cadences does the scheduler offer, and where is that defined?
Five options, defined in /Users/matthewdi/assrt/src/app/app/test/page.tsx between lines 1255 and 1259: Run Once (Now), Every Hour, Every 6 Hours, Daily at 9am, and Weekly (Monday 9am). The job state machine has five states (scheduled, running, completed, failed, cancelled) declared in src/components/types.ts:50. There is no cron-string field in the UI: you pick from the dropdown and the runner translates it. If you need a custom cadence, you can wrap the CLI in your own GitHub Actions workflow or systemd timer, since the underlying runner is just `npx assrt run --url <url> --plan-file <path>`.
What stops the watcher from echoing back its own writes in a loop?
scenario-files.ts keeps a `lastWrittenContent` variable (line 24). Every time the agent or the user writes to scenario.md through the public API, the new content is stashed in that variable. When the watcher fires, it reads the current file contents and compares them against `lastWrittenContent`; if they match, the sync function returns early. So the round trip is: external edit happens, watcher fires, content does not match, PATCH goes out. Internal write happens, content matches, PATCH is skipped. Local-only scenario IDs (those starting with `local-`) skip watching entirely (line 94), since they have nowhere to sync.
If the central API is down when a scheduled run fires, what happens?
The runner reads scenario.md from disk and from the local cache at ~/.assrt/scenarios/<id>.json, which is populated on every successful fetch by the writeLocal call in scenario-store.ts. The Playwright run still executes against your URL. The PATCH on edits and the run-result POST both fail soft; sync errors are logged to stderr but the test does not abort. The next time the API is reachable, the local cache and the central store reconverge on the next write. So a scheduled monitoring run with no internet to assrt.ai still produces a real local report, just without the cloud-side run history.
Does the agent actually rewrite the plan, or is that aspirational?
The diagnose path edits the plan when an assertion is brittle. assrt_diagnose reads the failure, proposes a more specific selector or step, and writes the revised plan back to /tmp/assrt/scenario.md. The watcher picks it up, debounces 1000ms, syncs. Whether you let the agent author edits autonomously is a policy choice; in MCP-driven setups the human reviews the diff and accepts before the schedule continues. The point is that the data path supports the loop. The plan is text, the storage is a watched file, and the API has an idempotent PATCH endpoint.
What does the run output that I can monitor against?
Every run writes /tmp/assrt/results/latest.json and /tmp/assrt/results/<runId>.json (defined in scenario-files.ts:19-20). The schema is the TestReport type in core/types.ts: a top-level passedCount, failedCount, totalDuration, plus an array of scenarios each carrying a list of assertions, a summary, and per-assertion evidence. If you want to fan run results into your own observability stack, point a tail or a notify-send hook at that file. If you want richer artifacts, the same run uploads screenshots, video, and the full event log to /api/public/scenarios/:id/runs/:runId/artifacts (scenario-store.ts:222).
How is this different from Checkly, Tracetest, or the workadventure synthetic-monitoring Docker image?
Those run a frozen Playwright spec on a cron. The .spec.ts is committed code; if it breaks because a selector drifted, you open the editor, fix it, push, redeploy, and wait for the next interval. Assrt's monitoring is built around a watched markdown file the agent can edit between fires, so a brittle assertion can be rewritten by the agent, picked up by the watcher within a second, and exercised on the next scheduled run without redeploying anything. The trade is honest: a frozen spec is reproducible to the byte, while a self-healing plan can drift if you do not review the edits. Pick frozen for compliance-grade audit trails, pick the watched plan for a startup that wants the test agent to keep up.
Can I run this in CI as well as continuously, against the same plan?
Yes. The CLI command `npx @m13v/assrt run --url <url> --plan-file scenario.md --json` is the same call the scheduler uses, just driven by your CI runner instead of the dashboard's interval. The JSON it prints to stdout (with --json) matches the latest.json schema, so a CI step can fail the build on `failedCount > 0`. Same plan file, same runner, two different triggers. The continuous-monitoring schedule is best for catching production drift; the CI run is best for catching merges before they ship.
Other guides that touch the same files this page cites.
Adjacent reads on the same product
AI agent browser isolation: four layers, not one toggle, with the file paths to prove it
Profile, session, process, and per-run artifact isolation for an AI test agent, with the source paths.
Visual regression testing built in to AI automation
How a screenshot diff fits into the same scenario and result file the schedule reads from.
AI Playwright cached selector and stale-element detection
What goes wrong inside a single run when a selector drifts, and how the agent rewrites the plan.
How did this page land for you?
React to reveal totals
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.