Skip to content

OpenClaw with Ollama: Running Local AI Models

nacre.sh TeamMay 4, 20269 min read

Configure OpenClaw with Ollama to run AI models locally. Complete guide for privacy-first setups with no API costs — setup, model recommendations, and performance.

openclaw ollama local llmopenclaw local aiollama setupoffline ai agent

OpenClaw with Ollama enables a fully local AI agent — no API costs, no data sent to external servers, and no dependency on third-party service availability. Ollama runs LLM models directly on your hardware, exposing an OpenAI-compatible API that OpenClaw connects to seamlessly. This guide covers setup, model selection, and realistic performance expectations.

Requirements

Local LLM performance scales directly with RAM. As a rough guide:

RAMBest ModelQuality Level
8GBLlama 3.2 3B, Phi-3.5 miniGood for simple tasks
16GBLlama 3.1 8B, Mistral 7BVery good general use
32GBLlama 3.1 70B Q4, Qwen 2.5 32BNear GPT-3.5 quality
64GB+Llama 3.1 70B full, Qwen 2.5 72BExcellent for most tasks

Apple Silicon (M2/M3/M4) unified memory makes Macs particularly efficient — a Mac Mini M4 Pro with 24GB runs Llama 3.1 70B at 4-bit quantization very well.

Installing Ollama

# macOS or Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Windows: download installer from ollama.ai

# Verify installation
ollama --version

Pulling Models

# General purpose, excellent for most agent tasks
ollama pull llama3.1:8b

# Lightweight and fast, good for high-frequency tasks
ollama pull phi3.5

# Best coding model
ollama pull qwen2.5-coder:7b

# More capable, needs 32GB+ RAM
ollama pull llama3.1:70b

Running Ollama as a Service

# Start Ollama server (it listens on localhost:11434)
ollama serve

On macOS, Ollama starts automatically as a menu bar app. On Linux, enable the systemd service:

sudo systemctl enable ollama
sudo systemctl start ollama

Configuring OpenClaw for Ollama

{
  "llm": {
    "provider": "ollama",
    "base_url": "http://localhost:11434",
    "model": "llama3.1:8b",
    "temperature": 0.7,
    "max_tokens": 4096
  }
}

Performance Expectations

Local models are slower than cloud APIs for generation. On a Mac Mini M4 16GB:

ModelTokens/secondTypical response time
Phi-3.5 mini60–80 tok/s1–3 seconds
Llama 3.1 8B40–55 tok/s3–8 seconds
Llama 3.1 70B Q48–15 tok/s20–60 seconds

For an agent that processes a few messages per hour, these speeds are perfectly acceptable.

Hybrid Setup: Local + Cloud Fallback

Many users combine Ollama for routine tasks with a cloud LLM for complex reasoning:

{
  "llm": {
    "provider": "ollama",
    "model": "llama3.1:8b",
    "fallback_provider": "anthropic",
    "fallback_model": "claude-3-5-sonnet-20261022",
    "fallback_api_key": "sk-ant-...",
    "fallback_on_complexity": true
  }
}

nacre.sh with Ollama

nacre.sh supports connecting to an external Ollama instance via the dashboard's LLM provider settings. Your Ollama endpoint must be publicly accessible or within the same network as your nacre.sh instance.

Frequently Asked Questions

Which Ollama model is best for OpenClaw tasks?

For general agent use, Llama 3.1 8B is the best combination of quality and speed on 16GB+ systems. For coding tasks, Qwen 2.5 Coder 7B is excellent.

Can Ollama use my GPU?

Yes. Ollama automatically uses NVIDIA GPUs (via CUDA), AMD GPUs (via ROCm), and Apple Silicon's Metal framework. GPU acceleration significantly increases generation speed.

Is a locally running agent completely private?

Yes. Messages never leave your device when using Ollama with a local OpenClaw instance. This is the maximum privacy configuration available.

nacre.sh

Run OpenClaw without the server headaches

Dedicated instance, automatic TLS, nightly backups, and 290+ LLM integrations. Live in under 90 seconds from $12/month.

Deploy your agent →

Related posts