OpenClaw + Ollama API 降级(2026 云端 Mac):云 API 故障时的本地 LLM 路由
Teams running OpenClaw on rented Apple Silicon Mac minis hit the same wall: upstream LLM APIs return 429, time out after 45–60 seconds, or disappear during regional incidents—while local Ollama on the same M4 host could answer simpler prompts. This guide’s takeaway: define an explicit routing tier—cloud model first, quantized local model second—with measured timeouts and logs every time you switch. You will get a decision matrix, a seven-step runbook, launchd pairing notes, and observability rules so MacLogin nodes in Hong Kong, Japan, Korea, Singapore, and the US stay useful when vendors wobble.
基线安装见 OpenClaw 安装与部署;环境变量与 launchd 见 环境变量与 launchd。周期任务请与 OpenClaw cron 与 launchd 对齐,避免降级与调度器冲突。
Why API Failover Matters for 24/7 OpenClaw Agents
Agents are not humans: they do not “try again later” gracefully unless you code it. Shared cloud Macs also carry neighbor noise—CPU spikes from another tenant’s Xcode build can lengthen token latency just enough to cross your HTTP client deadline.
- Rate limits: Burst traffic from 3–5 parallel subagents can exhaust per-minute quotas.
- Network partitions: Trans-Pacific paths to US APIs from Tokyo nodes occasionally exceed 200 ms RTT during peering events.
- Cost caps: Hard stops on billing profiles surface as synthetic errors—your fallback must not recurse infinitely.
Routing Matrix: Primary, Secondary, and Hard Stop
| Tier | Trigger | Typical model | Latency budget |
|---|---|---|---|
| Primary | Healthy API, quota OK | Cloud-hosted frontier or mid model | ≤ 45 s connect+first token |
| Secondary | 429/5xx/timeout | Ollama local on 127.0.0.1 | ≤ 120 s for batch summarization |
| Hard stop | Both tiers fail twice | None—surface error to human | Alert if > 15% sessions/hour hit this |
Seven-Step Failover Runbook
- Instrument attempts: Log provider, model id, HTTP status, and duration for every call.
- Set client timeouts: Start 45 s socket read for interactive, 90 s for codegen batches.
- Health-check Ollama:
curl -sS http://127.0.0.1:11434/api/tagsin launchd-friendly scripts. - Map tools: Disable high-risk tools in fallback tier if local models hallucinate on shell access.
- Cap retries: Maximum 2 round-trips per user intent before escalation.
- Persist state: Use
OPENCLAW_STATE_DIRconsistently per 环境变量指南. - Review weekly: Plot fallback percentage; tune model size if secondary tier exceeds 35% of traffic.
launchd: Start Ollama Before OpenClaw Gateway
Declare Ollama as a LaunchDaemon with RunAtLoad and make OpenClaw’s gateway unit depend on successful TCP bind to 11434. On Apple Silicon, keep Unified Memory pressure in mind—parallel pulls of large GGUF files while OpenClaw spawns agents can trigger memory compression; stagger model preload to off-peak minutes.
Observability: Prove Failover in Audits
导出含 tier、latency_ms、outcome 的 JSON 行。合规团队取证时可关联 CLI 钩子与审计日志。扩容硬件前在 定价页 选择低延迟区域并实测 RTT。