Should Ollama listen on localhost only?

Yes on shared cloud Macs: bind to 127.0.0.1 and firewall egress so only OpenClaw on the same host consumes the API. Exposing Ollama to the LAN multiplies abuse risk.

What timeout is reasonable before falling back?

Start with 30–45 seconds for interactive agents and 60–90 seconds for batch jobs; tune with p95 latency from your region to the vendor API.

Does failover replace monitoring?

No—emit structured logs on every switch and alert if fallback rate exceeds a threshold such as 15 percent over a rolling hour.

AI 自動化 2026年4月2日

OpenClaw + Ollama API 降級（2026 雲端 Mac）：雲 API 故障時的本機 LLM 路由

MacLogin AI 自動化團隊 2026年4月2日約 10 分鐘閱讀

Teams running OpenClaw on rented Apple Silicon Mac minis hit the same wall: upstream LLM APIs return 429, time out after 45–60 seconds, or disappear during regional incidents—while local Ollama on the same M4 host could answer simpler prompts. This guide’s takeaway: define an explicit routing tier—cloud model first, quantized local model second—with measured timeouts and logs every time you switch. You will get a decision matrix, a seven-step runbook, launchd pairing notes, and observability rules so MacLogin nodes in Hong Kong, Japan, Korea, Singapore, and the US stay useful when vendors wobble.

基線安裝見 OpenClaw 安裝與部署；環境變數與 launchd 見環境變數與 launchd。週期任務請與 OpenClaw cron 與 launchd 對齊，避免降級與排程器衝突。

Why API Failover Matters for 24/7 OpenClaw Agents

Agents are not humans: they do not “try again later” gracefully unless you code it. Shared cloud Macs also carry neighbor noise—CPU spikes from another tenant’s Xcode build can lengthen token latency just enough to cross your HTTP client deadline.

Rate limits: Burst traffic from 3–5 parallel subagents can exhaust per-minute quotas.
Network partitions: Trans-Pacific paths to US APIs from Tokyo nodes occasionally exceed 200 ms RTT during peering events.
Cost caps: Hard stops on billing profiles surface as synthetic errors—your fallback must not recurse infinitely.

Tip: Keep a small quantized model (for example 7B–8B class) on SSD; loading multi-hundred-GB bundles on cloud Macs wastes IOPS budgets and slows failover.

Routing Matrix: Primary, Secondary, and Hard Stop

Tier	Trigger	Typical model	Latency budget
Primary	Healthy API, quota OK	Cloud-hosted frontier or mid model	≤ 45 s connect+first token
Secondary	429/5xx/timeout	Ollama local on `127.0.0.1`	≤ 120 s for batch summarization
Hard stop	Both tiers fail twice	None—surface error to human	Alert if > 15% sessions/hour hit this

Seven-Step Failover Runbook

Instrument attempts: Log provider, model id, HTTP status, and duration for every call.
Set client timeouts: Start 45 s socket read for interactive, 90 s for codegen batches.
Health-check Ollama: curl -sS http://127.0.0.1:11434/api/tags in launchd-friendly scripts.
Map tools: Disable high-risk tools in fallback tier if local models hallucinate on shell access.
Cap retries: Maximum 2 round-trips per user intent before escalation.
Persist state: Use OPENCLAW_STATE_DIR consistently per 環境變數指南.
Review weekly: Plot fallback percentage; tune model size if secondary tier exceeds 35% of traffic.

launchd: Start Ollama Before OpenClaw Gateway

Declare Ollama as a LaunchDaemon with RunAtLoad and make OpenClaw’s gateway unit depend on successful TCP bind to 11434. On Apple Silicon, keep Unified Memory pressure in mind—parallel pulls of large GGUF files while OpenClaw spawns agents can trigger memory compression; stagger model preload to off-peak minutes.

See also: Ollama 對標 LM Studio 選型.

Security: Never expose Ollama to the public interface. If you need remote pull, tunnel over SSH instead of opening the port on the cloud Mac edge.

Observability: Prove Failover in Audits

匯出含 tier、latency_ms、outcome 的 JSON 行。合規團隊取證時可關聯 CLI 鉤子與稽核日誌。擴容硬體前在定價頁選低延遲區域並實測 RTT。

連線問題見說明；與 Agent 並行的 GUI 除錯仍可能需要 VNC 處理鑰匙圈或瀏覽器 OAuth。

租用 Apple Silicon，有策略地路由模型

開通雲端 Mac、部署 OpenClaw，把 Ollama 當安全網而非唯一大腦。

查看方案部署說明