Structured View Persistence & Recovery
Structured view workers and transcripts are designed to outlive a daemon restart, a closed laptop, or a reconnect. This page covers what survives, what gets cleaned up on delete, and how conversation context is rehydrated. For the failure modes and their fixes, see Structured view Troubleshooting.
Worker persistence across aoe serve restart
Behavior change (structured view-only). Prior releases tore down every structured view ACP worker on
aoe serve --stop(and any other daemon shutdown). As of this release, the daemon detaches without killing the runner: in-flight turns surviveaoe serve --stop,aoe update, daemon crashes, and host suspend/wake. To actually terminate workers, useaoe acp stop <session>oraoe acp stop --all(graceful), oraoe acp kill <session>(force). tmux-based (non-structured view) sessions are unaffected.
Structured view workers run as detached aoe __acp-runner processes that
outlive the daemon. aoe serve --stop drops the daemon’s connection
to each worker but does not terminate the runner: the agent
keeps running, in-flight turns continue, and a subsequent aoe serve
reattaches via the worker’s unix socket.
Each runner registers itself at
<app_dir>/acp-workers/<session_id>.json with its PID, socket
path, and cached ACP session id. The same directory holds the
per-session .sock (unix socket) and .log (runner stderr drain)
files. aoe acp ps lists running workers.
Practical implications:
aoe updatefollowed by a daemon restart (aoe serve --restart, or the restart promptaoe updatenow shows) keeps every structured view agent’s in-flight turn alive; a worker left on the old build then respawns on the new binary once its turn finishes (see Build-version upgrade afteraoe update).- Closing the laptop or restarting the host with
aoe serverunning: the daemon dies on suspend, but the runner continues. On wake the nextaoe servereattaches. - To actually terminate a worker, run
aoe acp stop <session>oraoe acp stop --all. To force-kill,aoe acp kill <session>. - No orphaned agents on teardown. Every termination path (idle
auto-stop,
aoe acp stop|kill|restart, snooze/archive, daemonshutdown_all, and a fresh spawn that supersedes a stale runner) signals the runner’s whole process group, so the agent subprocess and its own children (the node ACP wrapper and the SDK child it execs) die with the runner instead of reparenting to PID 1. Earlier releases signalled only the runner pid, so superseded or torn-down runners leakednode/claudeprocesses that accumulated across daemon restarts and e2e runs (#1689). A runner that is hard-killed withSIGKILL(which cannot run its own cleanup) can still leak; prefer the verbs above overkill -9. - Self-termination when abandoned. The reapers above all run inside a
live daemon, so a daemon that dies WITHOUT killing its runners (a crash,
a
SIGKILL, or an ephemeral test$HOMEthat gets deleted) would otherwise orphan the runner plus its agent tree forever. Each runner carries a watchdog that polls its own registry record and self-destructs (terminating its whole process group when it is the group leader, which it normally is viasetsid) when it observes that it has been abandoned: the record file vanished, the record was taken over by a newer runner, or it has been detached with no daemon attached for longer than a 48h retention window. The 48h window is generous enough to cover an overnight or weekendaoe serve --stop, and the clock resets on every reattach, so an intentionally stopped session stays reattachable. A pendingaoe acp restartis exempt (the runner sees the restart marker and waits). This is a backstop; the explicit verbs above are still the fast path. See #1921. - During the detach window (between
aoe serve --stopand the nextaoe serve), the runner buffers up to 256 agent → daemon notification lines so per-stream chunks emitted while the daemon was down get replayed on reattach. Permission requests issued while detached block the agent’s turn until reattach. - Mid-turn reattach. When the daemon comes back up against a
session that was actively streaming a prompt, the new daemon resumes
the existing ACP session id directly (no
session/neworsession/loadis sent; the agent process never died, so its in- memory session is still addressable). The agent’s eventual response to the orphaned in-flightsession/promptis dropped silently by the transport because its request id was issued by the previous daemon; to keep the UI from staying stuck on “thinking” forever, the daemon arms a resume-idle watchdog. It disarms on the first inbound notification the runner forwards after reattach: once any event arrives the turn is observable, and later gaps are normal mid-turn silence (Task subagents, slow Bash, reasoning between tool calls), not an orphan. It synthesizes aStopped { reason: "reattach_idle" }event only when the runner forwards no notification at all within 30s of reattach. The narrow residual, a turn that completes after reattach whose lost response leaves a stale spinner, is rare and clears via Force end turn or the next prompt. Sessions that the runner cannot reattach to (dead PID, missing socket, etc.) fall through to a fresh spawn; if the on-disk event log shows that fresh spawn’s session was mid-prompt at the moment the daemon died, the reconciler publishes aStopped { reason: "orphaned_at_restart" }event before the new agent starts so the UI clears immediately. The same path covers themain-branch case where there is no runner at all and every structured view session takes the fresh-spawn branch on restart.
Build-version upgrade after aoe update
Surviving a daemon restart is the right behavior for in-flight turns,
but it means a daemon restarted on a newly-installed binary would
otherwise re-adopt workers still executing the previous build, running
mixed versions silently. To prevent that, each runner records the build
identity of the binary that spawned it (build_version in its registry
JSON, shown as the BUILD column in aoe acp ps), and the daemon
compares it against its own on every reattach:
- A worker whose build matches the daemon is reattached as usual; a
same-version
aoe serverestart keeps in-flight turns alive exactly as before. - A worker on an older build with no in-flight turn is terminated and respawned on the new binary immediately.
- A worker on an older build that is mid-turn is adopted so the turn keeps streaming, then respawned on the new binary at the next idle boundary. The in-flight turn is drained, never hard-killed.
aoe acp ps tags any such not-yet-respawned worker (stale) in its
BUILD column so a mixed-version state is visible rather than silent.
The new binary takes effect only once the daemon itself restarts.
aoe update closes this loop: after an in-place (tarball) update it
detects a running self-managed daemon and offers to restart it
(prompting by default, automatic with aoe update -y) so the reconciler
pass above runs against the new binary. You can also bounce a running
daemon at any time with aoe serve --restart, which replays the host,
port, mode, and auth it was launched with; the passphrase is recalled
from serve.passphrase or AOE_SERVE_PASSPHRASE before the old daemon
is stopped, so a passphrase-protected daemon is never left down. A daemon
run in the foreground or under a service supervisor (systemd, launchd) is
left to its manager: restart only touches daemons started by
aoe serve --daemon. A worker’s build identity pairs the package version
with a git commit hash, so local rebuilds across commits are detected,
not just shipped release upgrades. See
#1754
and #1794.
Session deletion
session/delete fires only on permanent removal: deleting a
structured view session, or disabling structured view on a session (which discards
the conversation). Reversible teardown, aoe acp stop, snooze,
archive, and idle auto-stop, deliberately does NOT fire it, so the
agent’s transcript stays on disk and the next respawn resumes via
session/load instead of resetting the conversation. Firing it on
those paths previously reset context on every snooze / archive /
idle-stop. See
#1710.
When you permanently delete a structured view session, aoe opportunistically
fires the experimental session/delete ACP request against the live
worker (bounded by a 2-second timeout) whenever a stored ACP session id
exists, and then proceeds with the existing kill path
(session/cancel for in-flight prompts, then SIGTERM on the runner,
then on-disk cleanup). aoe does not inspect the initialize-time
capability flag: adapters that implement the method use the request
to release adapter-side persisted state, for example
claude-agent-acp 0.37.0+ clearing the on-disk Claude session
record so a recreated session id does not inherit prior transcripts.
Adapters that do not implement the method (aoe-agent, codex,
opencode, older claude-agent-acp) reply with JSON-RPC
-32601 method_not_found; aoe logs that at debug and proceeds to
SIGTERM. A wedged adapter is bounded by the 2-second timeout, so
delete never stalls the UI. Outcomes are logged under
target = "acp.protocol" in debug.log with an adapter=<agent_key>
field so operators can correlate behavior across adapters: success
and -32601 land at debug, timeout and other errors at warn. See
#1404.
Conversation persistence
Structured view transcripts survive page reloads, session switches, and
aoe serve --stop/restart cycles. For agents that support session
restoration (Claude today), the model itself also retains conversation
context across restarts; so a follow-up like “what did we just
decide?” still works after a daemon restart.
The web dashboard mirrors each session’s reduced state into
localStorage under the aoe:acp-state:v1:<session_id> key so a
full page reload (mobile OS evicting the tab, Cloudflare tunnel
re-auth, PWA cold start) hydrates the chat surface instantly from the
last-known state and only fetches the seq-delta from the server. Entries
expire after seven days; an oversized session that exceeds the
per-origin quota falls back to the full server replay path without
warning. clearAcpCache and the session-delete handler drop the
matching entry so a freshly-recreated session id doesn’t briefly show
the prior transcript.
If context restoration fails (e.g., the agent’s stored session is no longer available), structured view falls back to a fresh session and renders an amber “Conversation context reset” callout in the transcript so you know prior turns are no longer in the model’s context window.
After that callout, an inline “Resume with prior context” banner
appears above the composer. Clicking it calls
GET /api/sessions/{id}/acp/context-primer?before_seq=<reset-seq>,
which walks the SQLite event log and returns a compact markdown
recap of the last ~20 turns (capped at ~24k characters, bulky tool
inputs/outputs elided, tool calls collapsed to one-liners). The
primer is pre-filled into the composer so you can review, trim, or
extend it before sending; nothing is sent silently. The banner is
one-shot per reset: dismiss it or submit any prompt and it stays
gone until the next session/load failure. See #1004.
The bundled aoe-agent doesn’t yet support context restoration; its
transcript still replays from disk, but the model starts fresh on each
spawn. Tracked in
#1005.