Skip to content

Add the Windows Sandbox state-aware lifecycle + host daemon (Phase 2)#578

Open
MGudgin wants to merge 1 commit into
user/gudge/wsb-phase1-one-shotfrom
user/gudge/wsb-phase2-state-aware
Open

Add the Windows Sandbox state-aware lifecycle + host daemon (Phase 2)#578
MGudgin wants to merge 1 commit into
user/gudge/wsb-phase1-one-shotfrom
user/gudge/wsb-phase2-state-aware

Conversation

@MGudgin

@MGudgin MGudgin commented Jun 26, 2026

Copy link
Copy Markdown
Member

Note

Stacked on #577 (which is stacked on #576). This PR's base is
user/gudge/wsb-phase1-one-shot, so the diff shows only the Phase 2 commit.
Merge order: #576 -> #577 -> this PR. GitHub auto-retargets the base to main
as each parent merges and its branch is deleted.


Summary

Phase 2 of the Windows Sandbox rewrite stack: add the state-aware lifecycle
(provision -> start -> exec* -> stop -> deprovision) for Windows Sandbox on top
of the Phase 1 one-shot rewrite, holding a single live VM across separate
wxc-exec phase processes behind a persistent detached host-side daemon. The SDK
follows in Phase 3.

Details

  • windows_sandbox_lifecycle: re-enables state_aware.rs -- the
    StatefulSandboxBackend impl on WindowsSandboxRunner. provision validates +
    snapshots the filesystem policy and mints wsb:<token>; start spawns the
    detached daemon (which boots the VM and holds the guest control connection);
    exec relays a script over the held connection; stop/deprovision tear down.
    A process-global named mutex serialises start/stop/deprovision; the single host
    VM slot is enforced.
  • Daemon rewrite (wxc_windows_sandbox_daemon): control_server.rs serves a
    nonce-authenticated localhost line protocol (PING/STOP/EXEC), each
    connection on its own task so a long exec can't block a STOP. Auth nonce via
    stdin; sandbox token via --token. Re-added to the workspace members (dropped
    in Phase 1); the legacy pipe_server/sandbox_vm/tcp_bridge/rendezvous
    modules are deleted (rendezvous now lives in the lifecycle crate).
  • wxc dispatch: run_state_aware routes the wsb: prefix to
    WindowsSandboxRunner with the same --experimental gate as one-shot;
    state_aware_dispatch learns the wsb -> WindowsSandbox prefix mapping.
  • Docs: the full dual-mode rewrites land here with the behavior they describe --
    windows-sandbox.md/windows-sandbox-reference.md, schema.md (dual-mode
    backend row + state-aware envelope section), the state-aware API doc, and
    copilot-instructions.md. While touching schema.md I reconciled its version
    table to the canonical schemas/schema-version.json (>=0.4, <=0.8; stable
    0.7.0-alpha; state-aware 0.6.0-alpha) -- the rewrite had left it at the pre-0.8
    <=0.7.
  • Tests: adds run_windows_sandbox_state_aware_tests.ps1 (state-aware E2E).

Tests

  • cargo check --workspace --all-targets, cargo clippy --workspace --all-targets -- -D warnings, and cargo fmt --all -- --check: clean.
  • cargo test: windows_sandbox_lifecycle 157 (incl. the state-aware suite),
    wxc_windows_sandbox_daemon 23, wxc_common 364 (incl. the new wsb resolver
    test), wxc 13 -- all passed.
  • node scripts/versioning/check-schema-versions.js: green.

Related Issues

Microsoft Reviewers: Open in CodeFlow
@MGudgin MGudgin requested a review from a team as a code owner June 26, 2026 19:08
This PR adds the state-aware lifecycle (provision -> start -> exec* -> stop
-> deprovision) for the Windows Sandbox backend on top of the Phase 1
one-shot rewrite, holding a single live VM across separate `wxc-exec`
phase processes behind a persistent detached host-side daemon. It is
Phase 2 of the Windows Sandbox rewrite stack (stacked on the Phase 1 PR);
the SDK lands in Phase 3.

Details

* `windows_sandbox_lifecycle`: re-enable `state_aware.rs` -- the
  `StatefulSandboxBackend` impl on `WindowsSandboxRunner`. `provision`
  validates + snapshots the filesystem policy and mints `wsb:<token>`;
  `start` spawns the detached daemon (which boots the VM and holds the
  guest control connection); `exec` relays a script over the held
  connection; `stop`/`deprovision` tear down. A process-global named mutex
  serialises start/stop/deprovision; the single host VM slot is enforced.
* Daemon rewrite (`wxc_windows_sandbox_daemon`): replaces the old
  pipe/warm-reuse client model. `control_server.rs` serves a
  nonce-authenticated localhost line protocol (`PING`/`STOP`/`EXEC`),
  handling each connection on its own task so a long exec can't block a
  `STOP`. The auth nonce is read from the daemon's stdin; the sandbox
  token comes via `--token`. Re-added to the workspace members (dropped in
  Phase 1). The legacy `pipe_server`/`sandbox_vm`/`tcp_bridge`/`rendezvous`
  modules are deleted (rendezvous now lives in the lifecycle crate).
* wxc dispatch: `run_state_aware` now routes the `wsb:` prefix to
  `WindowsSandboxRunner`, with the same `--experimental` gate the one-shot
  path enforces. `state_aware_dispatch` learns the `wsb` -> WindowsSandbox
  prefix mapping.
* Docs: the full dual-mode rewrites land here with the behavior they
  describe -- `windows-sandbox.md`/`windows-sandbox-reference.md` (one-shot
  + state-aware architecture, daemon protocol, auth handshake),
  `schema.md` (windows_sandbox dual-mode backend row + the state-aware
  lifecycle envelope section), the state-aware API doc, and
  `copilot-instructions.md` (full backend description). While touching
  `schema.md` I also reconciled its version table to the canonical
  `schemas/schema-version.json` (>=0.4, <=0.8; stable 0.7.0-alpha;
  state-aware 0.6.0-alpha), which the rewrite branch had left at the
  pre-0.8 `<=0.7` (relates to #565).
* Tests: adds `run_windows_sandbox_state_aware_tests.ps1` (provision/
  start/exec*/stop/deprovision E2E). The one-shot e2e/script from Phase 1
  is unchanged.

Tests

* `cargo check --workspace --all-targets`, `cargo clippy --workspace
  --all-targets -- -D warnings`, and `cargo fmt --all -- --check`: clean.
* `cargo test`: windows_sandbox_lifecycle 157 (incl. the state-aware
  suite), wxc_windows_sandbox_daemon 23, wxc_common 364 (incl. the new
  `wsb` resolver test), wxc 13 -- all passed.
* `node scripts/versioning/check-schema-versions.js`: green
  (maxSupported 0.8.0-alpha, stable 0.7.0-alpha, state-aware 0.6.0-alpha).

Refs #565

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Generated-with: claude-opus-4.8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant