Ollama

Run LLM models locally on your own machine or server, supporting models like Llama, Mistral, and Gemma.

Review notes

Resource requirements depend on model size. 7B models need at least 8 GB RAM. GPU significantly improves speed.

Deployment guide

Run a simple Docker container. Pull models after startup.

Run the Ollama container with a volume for model storage.
Pull the first model with `ollama pull llama3.2`.
Test with `ollama run llama3.2` or call the API at port 11434.
Pair with Open WebUI for a chat interface.
Configure GPU passthrough if an NVIDIA GPU is available.

Backup:Models can be re-downloaded. Back up the config directory if you have custom Modelfiles.

Copy and run on your server

Use each block separately: save the compose file, or copy the bash script to create it and start the container.

docker-compose.ymlyaml

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ./models:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped

setup.shbash

#!/usr/bin/env bash
set -euo pipefail

sudo mkdir -p /opt/ollama
sudo chown "$USER":"$USER" /opt/ollama
cd /opt/ollama

cat > docker-compose.yml <<'COMPOSE'
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ./models:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped
COMPOSE

docker compose up -d
echo "Ollama is running on http://SERVER_IP:11434"
echo "Pull a model with: docker exec ollama ollama pull llama3.2"

Stack

GoDocker