What is Ollama?

Ollama is an open-source local large language model (LLM) runtime framework. Its core design philosophy is simple: to enable anyone to run AI models on their personal computers.

Before Ollama, running large models like Llama or Mistral locally required complex dependency configurations and CUDA environment setup. Ollama abstracts all of this into just a few simple commands:

# After installation, just this line runs an AI
ollama run llama3.1

That's it!

Why Choose Ollama?

🎯 Core Advantages

FeatureOllamaTraditional Approach (PyTorch + Transformer)
Installation Difficulty⭐️ One-click install⭐️⭐️⭐️⭐️⭐️ Requires complex environment setup
Resource UsageAutomatically optimizedManual tuning required
Model SupportDozens of pre-trained models availableEach model requires individual configuration
Update SpeedNew models added weeklyManually download weight files
API SupportBuilt-in RESTful APIRequires additional service setup

💡 Use Cases

  • 🔐 Privacy First: Data stays local; nothing is uploaded to the cloud
  • 💰 Cost Savings: No need to pay for API call fees
  • 🏃 Low Latency: Fast response times without network transmission delays
  • 🧪 Developer Friendly: Ideal for rapid prototyping and experimentation

System Requirements

Minimum Configuration

  • CPU: 64-bit processor (x86_64 or ARM64)
  • Memory: 8GB RAM
  • Storage: At least 10GB free space (add ~5–20GB per additional model)
  • CPU: Apple M1/M2/M3 chips or Intel Core i7 / Ryzen 7 and above
  • Memory: 16GB RAM or more (32GB+ recommended for running large models)
  • GPU: NVIDIA RTX 3060 12GB or higher (optional, but accelerates inference)

Supported Operating Systems

macOS: 12.0 (Monterey) and above
Linux: Ubuntu 20.04+, Debian 11+, Fedora 36+
Windows: Windows 10/11 (64-bit), requires WSL2 or direct installer


macOS Installation Steps

This is the simplest method, suitable for most Mac users.

Step 1: Download the Installer

Open Terminal and run:

# Visit the official download page
open https://ollama.com/download

Or visit https://ollama.com/download in your browser.

You'll see two options:

  • Apple Silicon (M1/M2/M3): Choose this if your Mac was purchased after 2020
  • Intel Mac: Choose this for older Mac models

Click the download button to get a .pkg installer file.

Step 2: Run the Installer

Double-click ollama-darwin-x86_64.pkg or ollama-darwin-arm64.pkg.

The installer will prompt:

Welcome to the Ollama Installer
--------------------------------
A launch agent will be created with default installation path at /Applications/Ollama.app
Continue? [Y/n]

Type Y and press Enter to confirm.

Step 3: Verify Installation

Open Terminal and run:

ollama --version

If you see output like ollama version 0.5.2, the installation was successful!

Method 2: Homebrew Installation (For Developers)

If you prefer managing apps via Homebrew:

# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install Ollama
brew install ollama

Linux Installation Steps

Method 1: Official Script Installation (Universal)

Works on almost all Linux distributions.

Step 1: Run the Installation Script

# Run the script with root privileges
curl -fsSL https://ollama.com/install.sh | sh

The script automatically detects your system type and selects the appropriate installation method.

Note: If using a non-root user, elevate privileges first:

sudo curl -fsSL https://ollama.com/install.sh | sh

Step 2: Start the Ollama Service

After installation, Ollama starts automatically via systemd. You can check its status:

# Check service status
systemctl status ollama

# If not running, start manually
sudo systemctl start ollama

# Enable auto-start on boot
sudo systemctl enable ollama

Example output:

● ollama.service - Ollama Service
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled)
     Active: active (running) since Mon 2026-04-15 10:30:00 CST

Step 3: Verify Installation

ollama --version

Method 2: Docker Containerized Installation

If you have Docker installed, you can also run Ollama in a container:

# Pull the image
docker pull ollama/ollama

# Run the container
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Windows Installation Steps

Method 1: Direct Installer (Windows 10/11)

Step 1: Download the Installer

Visit https://ollama.com/download in your browser and click the download button for the Windows version.

After downloading, you'll receive an ollama-setup.exe file.

Step 2: Run the Installation Wizard

Double-click ollama-setup.exe. The wizard will ask:

  1. Installation Location: Defaults to C:\Program Files\Ollama; proceed as-is
  2. Create Desktop Shortcut: Recommended to check
  3. Associate Model Folder: Keep default settings

Once complete, Ollama will start automatically in the background.

Step 3: Use from Command Line

Open PowerShell or CMD and run:

ollama --version

Method 2: WSL2 Installation (Linux Environment on Windows)

To use a Linux environment:

# 1. Ensure WSL2 is installed
wsl --install

# 2. Enter the WSL subsystem (Ubuntu)
wsl

# 3. Follow the Linux installation steps above
curl -fsSL https://ollama.com/install.sh | sh

First Run and Model Download

Now that installation is complete, let's run our first AI model!

Step 1: Start Ollama

On graphical systems (macOS/Windows), Ollama runs as a background service. Confirm the process exists via Task Manager or Activity Monitor.

On Linux servers, ensure the service is running:

systemctl status ollama

Step 2: Run Your First Model

In the terminal, enter:

ollama run llama3.1

On first run, Ollama will:

  1. Check if llama3.1 exists locally
  2. If not, automatically download it from Hugging Face (~4.7GB)
  3. Load it into memory and begin chatting

Wait patiently for the download to finish—typically 3–5 minutes depending on internet speed.

Step 3: Try the Conversation

Once loaded, you'll see:

>>> 

Now you can start asking questions! For example:

Hello, please introduce yourself

Llama 3.1 will respond naturally. Try more complex queries:

Write a quicksort algorithm in Python and explain each step

Press Ctrl+C to exit the conversation or type /bye to end the session.

Step 4: List Installed Models

ollama list

Example output:

NAME            ID              SIZE    MODIFIED
llama3.1        8a7b9e...       4.7 GB  2 hours ago

Step 5: Remove Unwanted Models

ollama rm llama3.1

Ollama supports dozens of open-source models. Here are some top picks:

🧠 All-Purpose Large Models

Model NameSizeFeaturesBest For
llama3.14.7GBMeta's latest, best overall performanceDaily chat, writing, coding
llama3.1:70b40GBLarger version, smarter but resource-heavyTasks requiring high intelligence
mistral4.1GBStrong European open model, excellent code supportProgramming assistance
gemma2:9b5.6GBFrom Google, strong multilingual capabilitiesCross-language tasks

💻 Coding-Specific

Model NameSizeFeatures
codellama3.8GBSpecialized in code generation and debugging
deepseek-coder2.9GBExcellent understanding of Chinese code comments
starcoder23.8GBSupports multiple programming languages

📱 Small & Efficient

Model NameSizeFeatures
phi32.3GBMicrosoft lightweight model, extremely fast
tinyllama0.4GBTiny size, ideal for testing
qwen2:0.5b0.4GBFrom Alibaba, smallest yet useful

🚀 Quick Start Recommendations

Beginner Users (standard laptop):

# Most balanced choice
ollama run llama3.1

# Or faster and smaller
ollama run phi3

Developers:

# Coding-focused
ollama run codellama

# Bilingual (Chinese/English)
ollama run qwen2:7b

Pro Users (high-end PC):

# Maximum capability
ollama run llama3.1:70b

API Usage

Ollama includes a simple RESTful API for easy integration into your applications.

Starting the API Service

By default, as long as Ollama is running, the API is available at http://localhost:11434.

Basic API Endpoints

1. Generate Response (Chat)

HTTP Request:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Explain the basic principles of quantum computing",
  "stream": false
}'

Python Example:

import requests

response = requests.post('http://localhost:11434/api/chat', json={
    'model': 'llama3.1',
    'messages': [{'role': 'user', 'content': 'Hello'}]
})

print(response.json()['message']['content'])

2. List Available Models

curl http://localhost:11434/api/tags

3. Copy a Model

curl http://localhost:11434/api/copy -d '{
  "source": "llama3.1",
  "destination": "my-llama"
}'

Using the Python Client

Install dependencies:

pip install ollama

Usage example:

import ollama

# Simple chat
response = ollama.chat(model='llama3.1', messages=[
  {
    'role': 'user',
    'content': 'Write a Fibonacci sequence in Python',
  },
])

print(response['message']['content'])

Advanced Configuration

Environment Variables

# Customize model storage path
export OLLAMA_MODELS="/data/ollama/models"

# Specify GPU device (CUDA)
export CUDA_VISIBLE_DEVICES=0,1

# Adjust batch size (impacts performance and memory usage)
export OLLAMA_NUM_PARALLEL=4

# Set log level
export OLLAMA_DEBUG=true

Customizing Models with Modelfile

You can modify parameters based on existing models:

Create a Modelfile

FROM llama3.1

# Set temperature (creativity)
PARAMETER temperature 0.7

# Set context length
PARAMETER num_ctx 4096

# System instruction
SYSTEM "You are a professional programming assistant who always provides concise and accurate code solutions."

Build the Custom Model

ollama create my-coder -f Modelfile
ollama run my-coder

Docker Persistent Storage Configuration

# Mount external storage
docker run -d \
  -v /your/host/path:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

Troubleshooting

Issue 1: Command Not Found After Installation

Symptoms: command not found: ollama

Solutions:

  1. Check PATH environment variable:

    echo $PATH | grep ollama
    
  2. Reinstall:

    # macOS
    brew reinstall ollama
    
    # Linux
    sudo systemctl restart ollama
    
  3. Restart Terminal: Sometimes closing and reopening the terminal window is necessary.

Issue 2: Model Download Timeout or Failure

Symptoms: Download hangs or fails

Solutions:

  1. Check Network Connectivity: Ensure access to GitHub and HuggingFace

    ping huggingface.co
    
  2. Use Domestic Mirror Sources (for users in mainland China only):

    export HF_ENDPOINT=https://hf-mirror.com
    
  3. Resume Download: Delete failed files and retry

    ollama pull --help  # View download progress options
    

Issue 3: Out of Memory

Symptoms: out of memory error

Solutions:

  1. Use Smaller Models:

    ollama run phi3  # Much smaller than llama3.1
    
  2. Close Other Applications: Free up more memory

  3. Limit Concurrency:

    export OLLAMA_NUM_PARALLEL=1
    

Issue 4: GPU Not Enabled

Symptoms: Model runs slowly

Solutions:

  1. Check GPU Detection:

    nvidia-smi  # For NVIDIA users
    system_profiler SPDisplaysDataType  # For Mac users
    
  2. Confirm Drivers Are Working: Ensure graphics drivers are properly installed

  3. Force Specification via Environment Variable:

    export ROCM_PATH=/opt/rocm  # For AMD GPUs
    

Issue 5: Port Conflict

Symptoms: address already in use

Solutions:

  1. Find the Occupying Process:

    lsof -i :11434
    
  2. Kill the Process or Change Port:

    kill -9 <PID>
    
    # Or start on a different port
    ollama serve --port 11435
    

Summary

Congratulations! You've completed the full Ollama installation journey. You now know how to:

Install on three platforms: macOS, Linux, or Windows
Choose and run models: From llama3.1 to phi3, pick what fits your needs
Integrate APIs: Call AI directly from your own projects
Troubleshoot common issues: Fix typical errors confidently

🎉 Next Steps

  1. Try Different Models: Compare performance across various models
  2. Explore API Features: Integrate AI into your website or app
  3. Share Your Experience: Write a blog post about your journey
  4. Join the Community: Follow Ollama GitHub for updates

📚 Further Reading