Local LLM
Ollama
Run LLM models locally on your own machine or server, supporting models like Llama, Mistral, and Gemma.
Review notes
Resource requirements depend on model size. 7B models need at least 8 GB RAM. GPU significantly improves speed.
Deployment guide
Run a simple Docker container. Pull models after startup.
- Run the Ollama container with a volume for model storage.
- Pull the first model with `ollama pull llama3.2`.
- Test with `ollama run llama3.2` or call the API at port 11434.
- Pair with Open WebUI for a chat interface.
- Configure GPU passthrough if an NVIDIA GPU is available.
Backup:Models can be re-downloaded. Back up the config directory if you have custom Modelfiles.
Copy and run on your server
Use each block separately: save the compose file, or copy the bash script to create it and start the container.
docker-compose.ymlyaml
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ./models:/root/.ollama
ports:
- "11434:11434"
restart: unless-stoppedsetup.shbash
#!/usr/bin/env bash
set -euo pipefail
sudo mkdir -p /opt/ollama
sudo chown "$USER":"$USER" /opt/ollama
cd /opt/ollama
cat > docker-compose.yml <<'COMPOSE'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ./models:/root/.ollama
ports:
- "11434:11434"
restart: unless-stopped
COMPOSE
docker compose up -d
echo "Ollama is running on http://SERVER_IP:11434"
echo "Pull a model with: docker exec ollama ollama pull llama3.2"Stack
GoDocker