AI 自動化 2026年4月2日

OpenClaw + Ollama API フェイルオーバー(2026 クラウド Mac):クラウド API 障害時のローカル LLM ルーティング

MacLogin AI ソリューション 2026年4月2日 約 10 分

Teams running OpenClaw on rented Apple Silicon Mac minis hit the same wall: upstream LLM APIs return 429, time out after 45–60 seconds, or disappear during regional incidents—while local Ollama on the same M4 host could answer simpler prompts. This guide’s takeaway: define an explicit routing tier—cloud model first, quantized local model second—with measured timeouts and logs every time you switch. You will get a decision matrix, a seven-step runbook, launchd pairing notes, and observability rules so MacLogin nodes in Hong Kong, Japan, Korea, Singapore, and the US stay useful when vendors wobble.

ベースラインは OpenClaw インストールとデプロイ。環境は 環境変数と launchd。定期ジョブは cron と launchd と整合。

Why API Failover Matters for 24/7 OpenClaw Agents

Agents are not humans: they do not “try again later” gracefully unless you code it. Shared cloud Macs also carry neighbor noise—CPU spikes from another tenant’s Xcode build can lengthen token latency just enough to cross your HTTP client deadline.

  • Rate limits: Burst traffic from 3–5 parallel subagents can exhaust per-minute quotas.
  • Network partitions: Trans-Pacific paths to US APIs from Tokyo nodes occasionally exceed 200 ms RTT during peering events.
  • Cost caps: Hard stops on billing profiles surface as synthetic errors—your fallback must not recurse infinitely.
Tip: Keep a small quantized model (for example 7B–8B class) on SSD; loading multi-hundred-GB bundles on cloud Macs wastes IOPS budgets and slows failover.

Routing Matrix: Primary, Secondary, and Hard Stop

TierTriggerTypical modelLatency budget
PrimaryHealthy API, quota OKCloud-hosted frontier or mid model≤ 45 s connect+first token
Secondary429/5xx/timeoutOllama local on 127.0.0.1≤ 120 s for batch summarization
Hard stopBoth tiers fail twiceNone—surface error to humanAlert if > 15% sessions/hour hit this

Seven-Step Failover Runbook

  1. Instrument attempts: Log provider, model id, HTTP status, and duration for every call.
  2. Set client timeouts: Start 45 s socket read for interactive, 90 s for codegen batches.
  3. Health-check Ollama: curl -sS http://127.0.0.1:11434/api/tags in launchd-friendly scripts.
  4. Map tools: Disable high-risk tools in fallback tier if local models hallucinate on shell access.
  5. Cap retries: Maximum 2 round-trips per user intent before escalation.
  6. Persist state: Use OPENCLAW_STATE_DIR consistently per 環境変数ガイド.
  7. Review weekly: Plot fallback percentage; tune model size if secondary tier exceeds 35% of traffic.

launchd: Start Ollama Before OpenClaw Gateway

Declare Ollama as a LaunchDaemon with RunAtLoad and make OpenClaw’s gateway unit depend on successful TCP bind to 11434. On Apple Silicon, keep Unified Memory pressure in mind—parallel pulls of large GGUF files while OpenClaw spawns agents can trigger memory compression; stagger model preload to off-peak minutes.

Security: Never expose Ollama to the public interface. If you need remote pull, tunnel over SSH instead of opening the port on the cloud Mac edge.

Observability: Prove Failover in Audits

Export JSON lines with fields tier, latency_ms, and outcome. Correlate with CLI フックと監査ログ when compliance teams ask for evidence. When ready to scale hardware, pick a low-latency MacLogin region on 料金 and validate RTT before locking API regions.

Connectivity questions belong in ヘルプ; GUI debugging alongside agents may still need VNC for Keychain or browser-based OAuth flows.

Apple Silicon を借り、意図を持ってモデルをルーティング

クラウド Mac を用意し OpenClaw を展開。Ollama は安全網に。