════════════════════════════════════════════════════════════════════════
  WAVE B ROADMAP
  Gnomad Desktop Assistant · docs/WAVE_B_ROADMAP.md
════════════════════════════════════════════════════════════════════════

WAVE B ROADMAP — STATE-OF-THE-ART SYSTEMS
=========================================

Status: Shipped on main (June 2026)  
Audience: Engineering, security review, portfolio narrative  
Last updated: June 2026

This document captures four systems-level upgrades from the follow-up evaluation. They build on what shipped in Wave A (structured GnomadError payloads, elevation hardening, App.tsx decomposition) and address residual alpha gaps called out in CODE_REVIEW.md and SECURITY_MODEL.md.

────────────────────────────────────────

CURRENT BASELINE
----------------

  Area         |  Today                                                                                       
  HITL         |  B1 shipped: HMAC tokens via hitl_token.rs; boolean bypass rejected                          
  Local LLM    |  External Ollama HTTP; in-process GGUF for planner + optional local chat (embedded-llm build)
  YOLO!        |  Broader FS via agent_settings; optional sandboxed shell (B4) in YOLO + experimental flag    
  Terminal UX  |  xterm.js live stream + replay on command cards; summary cards for simple runs               

  [mermaid]
    flowchart LR
      subgraph today [Wave A]
        UI[Sudo Gate UI]
        IPC["invoke(hitl_approved: true)"]
        Rust[shell_session + privilege]
        UI --> IPC --> Rust
      end
      subgraph waveB [Wave B target]
        Token[HMAC approval token]
        Rust2[Verify token + command hash]
        UI2[Sudo Gate UI]
        UI2 --> Token --> Rust2
      end
      today -.-> waveB

────────────────────────────────────────

RECOMMENDED DELIVERY ORDER
--------------------------

  Order  |  Initiative                 |  Why first                                                                      
  B1     |  Cryptographic HITL tokens  |  Closes real IPC bypass class; small Rust surface; unblocks enterprise narrative
  B2     |  In-process local LLM       |  ✓ B2a planner + B2b local chat shipped                                         
  B3     |  True terminal (Xterm.js)   |  ✓ Live stream + replay in chat                                                 
  B4     |  Micro-sandboxing for YOLO  |  ✓ Experimental sandbox-exec / bwrap                                            

────────────────────────────────────────

B1 — CRYPTOGRAPHIC HITL APPROVAL TOKENS
---------------------------------------

Problem

Any client that can call shell_session_run or agent_execute_tool with hitl_approved: true may bypass the UI gate. Safety heuristics still run, but approval is not cryptographically bound to a specific command, time window, or session.

Design

  1. check_command_safety (or a dedicated request_hitl_token command) returns:
  • requires_hitl_approval, danger_reason (unchanged)
  • approval_nonce + approval_token when HITL is required

  2. Token payload (signed, not encrypted — local app only):

  [text]
       v1 | command_sha256 | nonce | issued_at_unix | expires_at_unix | scope

  • command_sha256: SHA-256 of normalized command string (trim, NFC)
  • scope: shell_run | elevated | path_once (future)
  • TTL: e.g. 60–120 seconds, single-use (nonce stored in memory until consumed or expired)

  3. Signing: HMAC-SHA256 with a per-install secret generated on first launch and stored in OS keychain (gnomad-hitl-secret), not in frontend.

  4. Execution: shell_session_run / elevated path accepts optional approval_token instead of bare hitl_approved. Rust:
  • Verifies HMAC + expiry + command hash match + nonce not reused
  • Re-runs check_command_safety (defense in depth)
  • On success, burns nonce

  5. Frontend: After Sudo Gate approve, pass approval_token from the pending safety response — never a raw boolean.

  6. Elevation: execute_elevated_command requires token with scope=elevated and matching command hash.

Files (indicative)

  Layer  |  Files                                                                      
  Rust   |  New hitl_token.rs; privilege.rs, shell_session.rs, agent_runtime.rs, lib.rs
  TS     |  shellSession.ts, agentRuntime.ts, useAgentExecution.ts, agentLoop.ts       
  Docs   |  SECURITY_MODEL.md, QA_CHECKLIST.md (IPC bypass test case)                  

Acceptance criteria

  • [x] hitl_approved: true without valid token → safety_blocked JSON payload
  • [x] Token for command A rejected when executing command B
  • [x] Reused token rejected (expiry test: manual)
  • [x] Unit tests: sign/verify, wrong hash, replay, boolean bypass
  • [x] Manual: approve in UI → success; devtools invoke with boolean only → fail

Effort & risk

  Effort  |  ~3–5 days                                                                       
  Risk    |  Low–medium (API migration; keep deprecated bool one release behind feature flag)

────────────────────────────────────────

B2 — IN-PROCESS NATIVE LOCAL LLM (CANDLE OR LLAMA.CPP)
------------------------------------------------------

Problem

Ollama is an extra daemon, version skew, and install step. Portfolio story: “local works out of the box” requires an embedded inference path for small models (e.g. 1B Qwen-Coder for planner / tag extraction).

Options

  Backend                      |  Pros                                                         |  Cons                                                  
  llama-cpp-2 (Rust bindings)  |  Mature GGUF ecosystem; matches existing GGUF settings field  |  Binary size, CPU/GPU feature matrix per platform      
  Candle (Hugging Face)        |  Pure Rust, good for custom models                            |  Heavier integration for chat templates; GPU paths vary

Recommendation: llama-cpp-2 for v1 in-process path (aligns with stored GGUF path + planner use case); keep Ollama as optional “bring your own models.”

Design

  1. local_inference module in src-tauri:
  • load_model(path: PathBuf, n_ctx, n_threads) — lazy singleton
  • complete(prompt, max_tokens, stop) → string
  • plan_command(prose) -> Result — used by command planner

  2. Model delivery:
  • Phase B2a: User-selected GGUF on disk (Settings)
  • Phase B2b (optional): Bundled tiny model in app resources (size cap ~500MB–1GB for alpha)

  3. Frontend: Provider mode local-embedded vs local-ollama; same chat UI, different backend invoke.

  4. Build: Feature flag embedded-llm; CI builds without it on constrained runners; macOS/Windows/Linux matrix docs in BUILD_PLATFORMS.md.

Acceptance criteria

  • [x] Planner works with no Ollama process when GGUF configured
  • [x] Graceful llm error payload if model missing or load fails
  • [x] Document RAM/CPU expectations (e.g. 1B Q4 ≈ 1GB RAM) — see BUILD.md
  • [x] Ollama path unchanged (regression)

Effort & risk

  Effort  |  ~2–4 weeks (bindings, threading, packaging)                                        
  Risk    |  Medium–high (artifact size, Metal/CUDA/CPU fallbacks, licensing of bundled weights)

────────────────────────────────────────

B3 — TRUE TERMINAL EMULATION (XTERM.JS)
---------------------------------------

Problem

PTY output is reduced to stdout/stderr strings for cards. ANSI colors, progress bars, TUI apps, and interactive prompts are lossy or confusing (stall/timeout heuristics fight full-screen TUIs).

Design

  1. Rust (minimal change): Already emits PTY chunks via events — add optional base64 or raw UTF-8 frame event shell-pty-output with session id (if not already sufficient).

  2. Frontend:
  • Add @xterm/xterm + fit addon
  • TerminalPanel component: embedded in chat when user expands a run, or docked below composer
  • Wire subscribeShellOutput → term.write(data)
  • Input: optional term.onData → new shell_session_write command for interactive sessions (separate from one-shot shell_run)

  3. Modes:
  • Compact (default): Keep ShellCommandBlock summary for simple commands
  • Live: Open xterm when command tagged interactive or user clicks “Show terminal”

  4. Security: xterm does not bypass gates; interactive mode still requires HITL token for flagged commands.

Acceptance criteria

  • [x] ls --color=auto, npm install progress render in live mode / replay
  • [x] Stop button sends interrupt
  • [x] No regression for cloud agent tool loop (summary cards still work)
  • [x] Cross-platform: macOS, Windows, Linux windowed + panel

Effort & risk

  Effort  |  ~1–2 weeks                                                          
  Risk    |  Medium (bundle size, focus/keyboard in Tauri webview, accessibility)

────────────────────────────────────────

B4 — MICRO-SANDBOXING FOR YOLO! MODE
------------------------------------

Problem

YOLO expands filesystem reach; shell on host PTY can still exfiltrate, pivot, or damage outside workspace if the model is tricked. Goal: when YOLO is on, contain shell side effects without breaking normal Standard mode.

Design (platform-specific)

  OS       |  Mechanism                                      |  Notes                                                                                                                                
  macOS    |  sandbox-exec with dynamic profile per session  |  Profile allows: workspace R/W, temp dir, deny network optional, deny ~/.ssh etc. Fragile across macOS versions — needs version matrix
  Linux    |  bubblewrap / user namespaces                   |  Mount minimal FS; require optional dep (LINUX_PACKAGES.md)                                                                           
  Windows  |  Workspace-scoped init (yolo-shell-init.cmd)    |  TEMP + cwd scoped to workspace; network not blocked — full AppContainer deferred                                                     

Principle: Sandbox wraps shell execution path only; agent_fs already path-gated — optionally route YOLO shell through bwrap helper binary shipped with app.

UX

  • Settings → YOLO: sub-option “Sandbox shell (experimental)” with platform badge
  • Audit log records sandboxed: true and profile hash

Acceptance criteria

  • [x] In sandboxed YOLO: reads outside workspace blocked; workspace writes allowed (profile-dependent)
  • [x] Escape attempts documented in test notes (not pen-test complete)
  • [x] Clear fallback when sandbox helper missing (disable feature, error not silent host run)

Effort & risk

  Effort  |  ~4–8 weeks across OSes                                                                                   
  Risk    |  High — support burden, false sense of security if profiles wrong; enterprise may demand third-party audit

────────────────────────────────────────

CROSS-CUTTING DEPENDENCIES
--------------------------

  [mermaid]
    flowchart TB
      B1[B1 HITL tokens]
      B2[B2 Embedded LLM]
      B3[B3 Xterm.js]
      B4[B4 Sandbox YOLO]
      B1 --> B4
      B3 --> B4
      B1 --> B3

  • B4 should assume B1 so sandboxed runs cannot skip gates via IPC.
  • B3 interactive PTY should require tokens for elevated/interactive flows.
  • B2 is largely orthogonal; improves planner without widening shell attack surface.

────────────────────────────────────────

MAPPING TO PRODUCT VERSIONS
---------------------------

  Version    |  Wave B items                                                                  
  v0.2 beta  |  B1 (HITL tokens) + Wave B error migration (llm, command_planner, chat_history)
  v0.3       |  B3 (xterm) + B2a (GGUF in-process planner)                                    
  v0.4+      |  B2b (optional bundled model), B4 (sandbox experimental per OS)                

See also ROADMAP.md.

────────────────────────────────────────

EXPLICIT NON-GOALS (WAVE B)
---------------------------

  • Full VM per command (Docker/Podman required) — too heavy for default install
  • Remote attestation / hardware security modules
  • Replacing cloud providers with on-device 70B models
  • Autonomous background agents without user message

────────────────────────────────────────

RELATED DOCS
------------

  • SECURITY_MODEL.md — threat model + future HITL token section
  • AGENT_CAPABILITIES_PROPOSAL.md — agent architecture
  • CODE_REVIEW.md — Wave A completion status
  • CROSS_PLATFORM_CHECKLIST.md — smoke tests when each B item lands

════════════════════════════════════════════════════════════════════════
Built with ❤️ by Gnomad Studio 🦙
https://gnomadstudio.org
════════════════════════════════════════════════════════════════════════