Raspberry AI

Ollama + Claude Code Setup Guide

To run Claude Code locally on a Raspberry Pi 5 using Ollama, you must install both tools, configure the API endpoint, and use small, optimized models to handle the limited CPU and memory resources.

1. Install Prerequisites

Install Ollama and Node.js (required for Claude Code) on your Raspberry Pi OS (64-bit recommended).

# Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

# Install Node.js 22 (if not already installed)

curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -

sudo apt install -y nodejs git

2. Install Claude Code

Install the official Claude Code CLI globally via npm.

npm install -g @anthropic-ai/claude-code

3. Configure Environment Variables

Set the ANTHROPIC_BASE_URL to point to your local Ollama instance. Add this to your ~/.bashrc or ~/.zshrc

export ANTHROPIC_BASE_URL=http://localhost:11434/v1

Reload your shell: source ~/.bashrc

4. Pull a cloud-hosted models that support tool calling

Due to the Raspberry Pi's limited RAM (even 8GB or 16GB), use powerful cloud-hosted models, which is ideal for users without high-end local hardware. No Large Download: Unlike local models, this does not download gigabytes of weights to your disk. It only registers the model reference locally.

Authentication Required: You must be signed in to use cloud models.

If you haven't already, from the command line run: ollama signin

How to Pull Cloud Models

Run the standard pull command with the :cloud suffix:

Cloud Models via Ollama (Free Tier)

The recommended cloud models with full tool-calling capabilities for Claude Code are:

qwen3.5:cloud: A strong, all-around coding model competitive with proprietary options.

glm-5:cloud: Performs well on code generation benchmarks.

qwen3-coder:480b-cloud: A very large 480B parameter model for demanding tasks.

kimi-k2.5:cloud: Excels at multi-step reasoning.

These cloud models run on Ollama's infrastructure, are free with rate limits, offer full 128K+ context windows, and provide performance close to frontier models like Claude Opus.

4a. Free unlimited if using Local LLM

GLM-4.7-flash: This is the most highly recommended model. It features native tool-calling support, a 128K context window, and is optimized for agentic workflows, making it the top choice for running Claude Code locally. (19G)

Qwen3-Coder: This model also supports tool-calling and is suitable for coding tasks. While reliable, it may be slightly less consistent than GLM-4.7-flash on complex, multi-step operations.(51G)

5. Launch Claude Code

Run Claude Code, specifying the local model:

ollama launch claude --model <model_name>

ollama launch claude --model qwen3.5:cloud

When you use that command, Ollama spins up the Claude Code tool but silently injects local environment variables behind the scenes (ANTHROPIC_BASE_URL=http://localhost:11434 and ANTHROPIC_AUTH_TOKEN=ollama). That routes all the traffic directly to your local Ollama engine instead of the cloud. It completely bypasses Anthropic's API, which is why it is free, private, and requires no login.

Important Considerations for Raspberry Pi

Performance: Small models (1.5B–3B) will be slow; expect significant latency for code generation.
Memory Management: Avoid running other heavy processes. If using a Raspberry Pi 4, it is strongly recommended to skip local Ollama entirely and use cloud API keys instead, as local inference will likely cause swapping and system hangs.
Context Window: Keep prompts short. Large context windows will exhaust Pi memory quickly.

Page updated

Google Sites

Report abuse