Operations Runbook
Use this page for quick operator triage. The full repository reference lives in docs/operations/runbook.md.
Auth checks
401 unauthorized:- verify
Authorization: Bearer <api-key>or themc_sessioncookie - inspect scopes with
GET /api/auth/api_keys
- verify
- Safety review:
- run
GET /api/config/self_check - fix
severity: highwarnings first
- run
- Login throttling:
- wait for the cooldown window and retry
Gateway bridge checks
- The Mission Control-compatible WebSocket bridge is served from
GET /with WebSocket upgrade. - Quick smoke:
MICROCLAW_GATEWAY_TOKEN=... microclaw gateway call health
MICROCLAW_GATEWAY_TOKEN=... microclaw gateway call status
MICROCLAW_GATEWAY_TOKEN=... microclaw gateway call sessions_send \
--params '{"sessionKey":"main","message":"status summary"}'
- Supported operator methods:
health,status,chat.send,chat.historysession_delete,sessions_send,sessions_kill,sessions_spawnsession_setThinking,session_setVerbose,session_setReasoning,session_setLabel
- Expected live events:
connect.challengechattick
Session controls
- Session tree:
GET /api/sessions/tree - Fork session:
POST /api/sessions/fork - Delete session: Web API
POST /api/delete_sessionor bridgesession_delete - Kill active run: bridge
sessions_kill - Persist per-session label/settings:
session_setLabelsession_setThinkingsession_setVerbosesession_setReasoning
Metrics and SLOs
- Snapshot:
GET /api/metrics - History:
GET /api/metrics/history?minutes=60 - Summary/SLOs:
GET /api/metrics/summary - OTLP gaps under burst traffic:
- raise
otlp_queue_capacity - review retry settings and endpoint reachability
- raise
Stability checks
- Local smoke:
scripts/ci/stability_smoke.sh - CI job:
Stability Smoke - When SLO burn alerts fire:
- freeze non-critical feature merges
- assign an incident owner
- prepare rollback or hotfix if user impact continues