Playwright + Vitest testing
This document is the long-form reference for the web suite. The short version lives in AGENTS.md under “Web Dashboard Playwright Tests”. Read the short version first.
The two suites
| Suite | Config | Where | Speed | When to use |
|---|---|---|---|---|
| Mocked Playwright | web/playwright.config.ts | web/tests/*.spec.ts | Fast | UI logic that does not depend on real backend state |
| Live Playwright | web/playwright.live.config.ts | web/tests/live/*.spec.ts | Slower (real cargo binary + tmux) | Backend, persistence, auth, sessions, cockpit, read-only |
| Vitest + RTL + MSW | web/vite.config.ts (test block) | web/src/**/__tests__/, web/src/**/*.test.{ts,tsx} | Very fast | Request-payload permutations, local UI state |
Run them:
cd web
npx playwright test --config=playwright.config.ts # mocked
npx playwright test --config=playwright.live.config.ts # live
npm run test:unit # vitest
npm run test:unit -- --coverage # vitest with v8 coverage
Picking the right tool
Decision tree:
- Does the test need real backend persistence, auth, tmux, git, or cockpit? → live Playwright.
- Does the test need a browser-specific behavior (focus, keyboard, drag-drop, modal escape, touch event) with no real backend? → mocked Playwright.
- Otherwise → Vitest + React Testing Library, with MSW if you need to assert request payloads.
Heuristics:
- “Every settings control emits the right JSON keys” → Vitest, not Playwright. Looping 30 controls through a live server is slow without adding signal.
- “Theme persists across page reload” → live Playwright. The test point is the backend round-trip.
- “The wizard’s review step lets me edit the title inline and Escape cancels” → mocked Playwright. No backend needed; browser-specific Escape handling matters.
- “POST /api/settings with body X returns 200, the value persists” → both: Vitest contract for the payload shape, live Playwright for one representative round-trip.
Live harness (web/tests/helpers/aoeServe.ts)
spawnAoeServe() boots a real aoe serve subprocess against a per-test isolated filesystem root. Three fixtures wrap it in liveTest.ts:
import { test, expect, seedAuth } from "../helpers/liveTest";
test("dashboard loads", async ({ serve, page }) => {
await page.goto(serve.baseUrl);
await expect(page.getByRole("heading", { name: "Sessions" })).toBeVisible();
});
test("login flow", async ({ servePassphrase, page }) => {
await page.goto(servePassphrase.baseUrl);
await page.locator("input#passphrase").fill(servePassphrase.passphrase!);
await page.locator("button[type=submit]").click();
});
test("read-only blocks new sessions", async ({ serveReadOnly }) => {
const res = await fetch(`${serveReadOnly.baseUrl}/api/sessions`, {
method: "POST",
body: "...",
});
expect(res.status).toBe(403);
});
Fixtures:
serve:aoe serve --no-auth. Default for backend round-trip flows.servePassphrase:aoe serve --passphrase aoe-e2e-fixed-passphrase. The harness mints a session cookie viaPOST /api/loginand exposes it ashandle.sessionCookie. Specs that drive auth from the browser side useseedAuth(page, handle)to inject cookie + binding secret before the first navigation.serveToken:aoe serve --auth=token. The harness reads the daemon-writtenserve.tokenfrom the isolated app dir and exposes the value ashandle.authTokenplus its on-disk path ashandle.tokenFile. Rotation-aware specs callspawnAoeServedirectly withtokenLifetimeSecs/tokenGraceSecsoverrides; both env vars are debug-build only (AOE_TEST_TOKEN_LIFETIME_SECS,AOE_TEST_TOKEN_GRACE_SECS) and ignored in release.serveReadOnly:aoe serve --no-auth --read-only.serveCockpit: likeservebut the fake-ACP agent (see below) is on$PATHasclaude,claude-agent-acp, andaoe-agent, andPATCH /api/cockpit/masteris called after startup. Theclaude-agent-acpname matters because the cockpit supervisor resolves theclaudetool key throughAgentRegistryto commandclaude-agent-acp, notclaude; without that shim the supervisor would fall through to the system-installed adapter and fail with “Authentication required” on the first prompt.
Isolation per test:
- Fresh
mkdtempforHOME, withXDG_CONFIG_HOME,TMPDIR,TMUX_TMPDIR, andbin/as subdirs (all0700). - Port:
5200 + workerIndex*100 + parallelIndex + attempt*7. Five retries on bind failure. - Fake
claudeshim inhome/bin/claude(exec tail -f /dev/null). Cockpit fixture overrides with the fake-ACP shim, installed underclaude,claude-agent-acp, andaoe-agent. stop()doesSIGTERMwith a 2s wait,SIGKILLfallback, thenrm -rf home. Never callstmux kill-server(would kill the developer’s tmux).restart()kills the running proc (SIGTERMthenSIGKILLafter 2s) and respawns with the same args on the same port. Used by connectivity-recovery specs (disconnect-banner.spec.ts) that need to observesetServerDown(true)on SIGTERM andsetServerDown(false)after the server comes back. Token mode re-reads the freshly writtenserve.tokensohandle.authTokentracks the second boot. Does NOT re-run passphrase prelogin or cockpit master-enable across the restart; specs that need those should callspawnAoeServeagain.
Binary resolution: AOE_E2E_BINARY env wins; otherwise <repo>/target/release/aoe. liveGlobalSetup.ts runs once before any worker and calls cargo build --features serve --release if the binary is missing.
Fake ACP agent (web/tests/helpers/fakeAcpAgent.mjs)
Cockpit specs need a deterministic ACP agent because the real claude subprocess depends on Anthropic credentials and emits non-deterministic output. The fake speaks the minimal slice of the Agent Client Protocol over newline-delimited JSON-RPC 2.0:
initializereturns protocolVersion 1 + agentCapabilities.session/newandsession/loadreturn a deterministic sessionId.session/promptconsumes one entry from a script file (path supplied viaFAKE_ACP_SCRIPTenv), emits its scriptedsession/updatenotifications, then responds withstopReason. Default script (used when env is absent) emits oneagent_message_chunkthen stops.session/setModeresponds and emitscurrent_mode_changed.session/cancelresponds and emitsstopped { stopReason: "cancelled" }.- Other methods return
-32601 Method not found.
Script file shape:
{
"turns": [
{
"updates": [
{ "sessionUpdate": "agent_message_chunk", "content": { "type": "text", "text": "..." } },
{ "sessionUpdate": "permission_request", "nonce": "fake-nonce", "toolCall": { "id": "...", "title": "...", "kind": "edit" } }
],
"stopReason": "end_turn"
}
]
}
Specs that need a custom script call spawnAoeServe({ cockpit: true, fakeAcpScript: "/tmp/script.json", ... }) directly instead of using the serveCockpit fixture.
Cockpit user-story specs
web/tests/live/cockpit-stories/ holds UI-driven cockpit specs that
drive the React surface end-to-end (clicks, keystrokes, navigation) and
assert on rendered DOM. They complement the REST-contract specs at
web/tests/live/cockpit-*.spec.ts, which assert against
/api/sessions/:id/cockpit/replay. The story specs catch reducer-to-render
plumbing breakage that the REST tracers cannot see.
Pattern:
import { test as base, expect } from "@playwright/test";
import { spawnAoeServe, listSessions, seedSessionViaAoeAdd } from "../../helpers/aoeServe";
import { enableCockpitAndWait, waitForCockpitView } from "../../helpers/cockpit";
base("send message via Enter renders agent chunk", async ({ page }, testInfo) => {
const serve = await spawnAoeServe({
authMode: "none",
cockpit: true,
workerIndex: testInfo.workerIndex,
parallelIndex: testInfo.parallelIndex,
seedFn: seedSessionViaAoeAdd({ title: "story" }),
});
try {
const sessions = await listSessions(serve.baseUrl);
const seeded = sessions.find((s) => s.title === "story");
if (!seeded) throw new Error("seeded session 'story' missing");
const sessionId = seeded.id;
await enableCockpitAndWait(serve.baseUrl, sessionId);
await page.goto(`${serve.baseUrl}/session/${sessionId}`);
await waitForCockpitView(page);
const composer = page.getByRole("textbox", { name: /Send a message/i });
await composer.fill("hello");
await composer.press("Enter");
await expect(page.getByText(/Hello from fake ACP agent/)).toBeVisible();
} finally {
await serve.stop();
}
});
enableCockpitAndWait posts to /cockpit/enable, asserts the response
was 2xx (so a 4xx/5xx surfaces immediately rather than as a noisy
readiness timeout), and then waits for the supervisor handshake.
waitForCockpitView waits for the React tree to mount the composer.
Together they ensure both sides are ready before any click or keystroke.
Look up the seeded session by title rather than sessions[0] so the
spec stays deterministic if seeding adds more rows later.
Custom per-spec scripts go through a temp file (see
cockpit-stories/approval-allow.spec.ts or cockpit-approval.spec.ts
for the canonical setup); the serveCockpit fixture is for stories
happy with the default chunk-then-stop script.
Coverage matrix
web/tests/coverage-matrix.json is the source of truth for “what does each spec cover”. Every entry has:
{
"id": "auth.passphrase-login",
"kind": "live-playwright", // live-playwright | mocked-playwright | vitest | deferred | out-of-scope
"risk": "high", // high | medium | low | n/a
"specs": ["tests/live/auth-login-passphrase.spec.ts"],
"components": ["web/src/components/LoginPage.tsx"]
}
deferred entries also have issue: "<URL>"; out-of-scope entries have reason: "<string>".
web/tests/coverage-matrix.exempt.json lists component files intentionally not assigned to a surface (small presentational primitives covered transitively).
web/tests/validate-coverage-matrix.mjs runs in CI on every PR. It fails if:
- A referenced spec file is missing.
- A
deferredentry has noissueURL. - An
out-of-scopeentry has noreason. - A
.tsxfile underweb/src/components/**appears in neither the matrix nor the exempt list. - An exempt entry has no
reason.
Add a new surface to the matrix at the same time you add the spec. Add a new component to either the matrix or the exempt list at the same time you create the file. CI catches you on the same PR otherwise.
Coverage reports
Vitest writes coverage/vitest/ via @vitest/coverage-v8 (set in web/vite.config.ts’s test.coverage block).
Playwright collects window.__coverage__ after each test when AOE_COVERAGE=1 is set. The instrumentation is added by vite-plugin-istanbul, conditionally registered in web/vite.config.ts. build.rs honors the same env so the embedded web bundle in the aoe binary carries instrumentation when requested.
npm run coverage:merge runs web/scripts/merge-coverage.mjs, which feeds both inputs into monocart-coverage-reports and emits:
web/coverage/merged/lcov.infoweb/coverage/merged/coverage-summary.jsonandcoverage-final.jsonweb/coverage/merged/index.html
The CI coverage job:
- Builds aoe with
AOE_COVERAGE=1 cargo build --features serve --release. - Runs Vitest with
--coverage. - Runs mocked + live Playwright with
AOE_COVERAGE=1. - Merges via the merge script.
- Posts a PR comment via
davelosert/vitest-coverage-report-actionwith per-file deltas against the latest main-branch baseline artifact. - Optionally uploads to codecov.io if
CODECOV_TOKENis set.
Report-only in this PR. Phase-2 threshold floor and phase-3 ratchet upward are tracked in issue #1225.
Gotchas
--no-auth,--passphrase, and--auth=tokenare the supported auth modes. Token-mode specs need a debug-buildaoebecauseAOE_TEST_TOKEN_LIFETIME_SECSandAOE_TEST_TOKEN_GRACE_SECSare gated behindcfg!(debug_assertions); release builds keep the production 24h/4h lifetimes and 300s grace.- The fake-ACP agent does not delegate FS or terminal calls back to the supervisor. If a scripted turn emits
tool_call_startedfor a tool that would normally callfs/write, the supervisor will not actually write anything (and the test should not assume it does). - Cockpit replay uses the in-memory broadcast channel plus the SQLite event store. If a test asserts a specific event in the replay, give the supervisor up to a few hundred ms to flush (the included tracer specs poll for up to 6 seconds).
- Synthetic touchmove events for mobile specs fire back-to-back with Δt≈1ms. Cap velocity and per-frame emit counts in production code, or a real device will look sane while the e2e produces runaway momentum (and vice versa).
Adding a new live spec
- Create
web/tests/live/<surface>.spec.ts. - Import from
../helpers/liveTest. - Pick a fixture (
serve,servePassphrase,serveToken,serveReadOnly,serveCockpit) or callspawnAoeServe()directly if you need custom options. - Add (or update) the matching surface entry in
web/tests/coverage-matrix.json. Make sure every component the spec touches is in itscomponents[](or already in another surface or the exempt list). - Run
node web/tests/validate-coverage-matrix.mjslocally. CI runs the same script. - Run
npx playwright test --config=playwright.live.config.ts <your spec>to confirm it passes. Live specs require tmux installed andcargo build --features serve --releaseto have run at least once.
Adding a Vitest contract test
- Create
web/src/<area>/__tests__/<Thing>.test.tsx(or co-locate as<Thing>.test.tsx). - Use
// @vitest-environment jsdomat the top if the test renders React. - Import the component, mount it with React Testing Library, fire events, assert on callback invocations (and
MSWinterceptions if the component makes realfetchcalls). - Update the coverage matrix entry’s
kindtovitestand thespecspath. - Run
npm run test:unit -- <your test>.