Skip to content

Ollama Cloud Support

The LLM Client now supports both local Ollama instances and Ollama Cloud for running large models without local GPU requirements.

Overview

Ollama Cloud allows you to run large language models in the cloud without needing powerful local hardware. This is ideal for: - Running models that don't fit on your local machine - Working on devices without GPUs - Quick prototyping without model downloads - Accessing cloud-exclusive models

Setup

1. Get Ollama Cloud API Key

First, create an account and get your API key:

# Sign in to Ollama Cloud
ollama signin

# Or create an API key at: https://ollama.com/settings/keys

2. Set Environment Variable

Add your API key to your environment:

# In .env or secrets.env file
OLLAMA_API_KEY=your_api_key_here

# Or export directly
export OLLAMA_API_KEY=your_api_key_here

Usage

Basic Cloud Usage

from llm_client import LLMClient

# Method 1: Explicit cloud mode
client = LLMClient(
    api_choice="ollama",
    llm="gpt-oss:120b-cloud",
    use_ollama_cloud=True
)

# Method 2: Auto-detect from model name (ends with -cloud)
client = LLMClient(
    api_choice="ollama",
    llm="gpt-oss:120b-cloud"
)

# Method 3: From environment with cloud model
client = LLMClient(llm="gpt-oss:120b-cloud")

# Use it like any other provider
messages = [{"role": "user", "content": "Explain quantum computing"}]
response = client.chat_completion(messages)
print(response)

Available Cloud Models

Cloud models are marked with -cloud suffix:

Model Size Description
gpt-oss:120b-cloud 120B Large open-source model via Ollama Cloud

Check ollama.com/search?c=cloud for the latest cloud models.

Streaming with Cloud

from llm_client import LLMClient

client = LLMClient(llm="gpt-oss:120b-cloud")

messages = [{"role": "user", "content": "Tell me a story about AI"}]

print("Response: ", end="")
for chunk in client.chat_completion_stream(messages):
    print(chunk, end="", flush=True)
print()

Switch Between Local and Cloud

from llm_client import LLMClient

# Start with local Ollama
client = LLMClient(api_choice="ollama", llm="llama3.2:1b")
local_response = client.chat_completion(messages)

# Switch to cloud for larger model
client.switch_provider(
    "ollama",
    llm="gpt-oss:120b-cloud",
    use_ollama_cloud=True
)
cloud_response = client.chat_completion(messages)

# Switch back to local
client.switch_provider(
    "ollama",
    llm="llama3.2:1b",
    use_ollama_cloud=False
)

Configuration File

Add Ollama Cloud to your config file:

# llm_config.yaml
providers:
  # Local Ollama
  ollama_local:
    model: llama3.2:1b
    temperature: 0.7
    use_cloud: false
    keep_alive: 5m

  # Ollama Cloud
  ollama_cloud:
    model: gpt-oss:120b-cloud
    temperature: 0.7
    use_cloud: true

Load and use:

# Use local
client = LLMClient.from_config("llm_config.yaml", provider="ollama_local")

# Use cloud
client = LLMClient.from_config("llm_config.yaml", provider="ollama_cloud")

Direct API Access (Advanced)

For direct access to Ollama Cloud API without local Ollama installation:

from llm_client import LLMClient

# Direct cloud access with custom host
client = LLMClient(
    api_choice="ollama",
    llm="gpt-oss:120b",
    use_ollama_cloud=True,
    ollama_host="https://ollama.com"
)

Comparison: Local vs Cloud

Local Ollama

Pros: - ✅ Complete privacy - data never leaves your machine - ✅ No API costs - ✅ No rate limits - ✅ Works offline - ✅ Full control

Cons: - ⚠️ Requires local compute resources - ⚠️ Limited by available hardware - ⚠️ Need to manage model downloads

Ollama Cloud

Pros: - ✅ Access to large models (120B+) - ✅ No local GPU required - ✅ No model downloads - ✅ Fast inference - ✅ Works on any device

Cons: - ⚠️ Requires API key - ⚠️ Data sent to cloud - ⚠️ Potential costs (check pricing) - ⚠️ Requires internet connection

Best Practices

1. Hybrid Approach

Use local for small tasks, cloud for complex ones:

from llm_client import LLMClient

# Small tasks - use local
local_client = LLMClient(
    api_choice="ollama",
    llm="llama3.2:1b"
)

# Complex tasks - use cloud
cloud_client = LLMClient(
    api_choice="ollama",
    llm="gpt-oss:120b-cloud"
)

# Simple question - local
quick_answer = local_client.chat_completion([
    {"role": "user", "content": "What is Python?"}
])

# Complex analysis - cloud
detailed_answer = cloud_client.chat_completion([
    {"role": "user", "content": "Provide a detailed analysis of..."}
])

2. Fallback Strategy

Try local first, fallback to cloud:

from llm_client import LLMClient
from llm_client.exceptions import ChatCompletionError

messages = [{"role": "user", "content": "Your query here"}]

try:
    # Try local first
    client = LLMClient(api_choice="ollama", llm="llama3.2:1b")
    response = client.chat_completion(messages)
except ChatCompletionError:
    # Fallback to cloud
    print("Local Ollama unavailable, using cloud...")
    client = LLMClient(llm="gpt-oss:120b-cloud")
    response = client.chat_completion(messages)

print(response)

3. Environment-Specific Configuration

import os
from llm_client import LLMClient

# Use cloud in production, local in development
if os.getenv("ENVIRONMENT") == "production":
    client = LLMClient(llm="gpt-oss:120b-cloud")
else:
    client = LLMClient(api_choice="ollama", llm="llama3.2:1b")

Python Library Examples

Complete Example

from llm_client import LLMClient

# Initialize cloud client
client = LLMClient(
    api_choice="ollama",
    llm="gpt-oss:120b-cloud",
    temperature=0.7,
    max_tokens=1024
)

# Chat completion
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Explain machine learning in simple terms."}
]

response = client.chat_completion(messages)
print(f"Response: {response}")

# Streaming response
print("\nStreaming response:")
for chunk in client.chat_completion_stream(messages):
    print(chunk, end="", flush=True)
print()

# Token counting
token_count = client.count_tokens(messages)
print(f"\nTokens used: {token_count}")

Error Handling

from llm_client import LLMClient
from llm_client.exceptions import (
    APIKeyNotFoundError,
    ChatCompletionError,
    ProviderNotAvailableError
)

try:
    client = LLMClient(
        api_choice="ollama",
        llm="gpt-oss:120b-cloud"
    )
    response = client.chat_completion(messages)

except APIKeyNotFoundError:
    print("Ollama Cloud API key not found!")
    print("Set OLLAMA_API_KEY environment variable or sign in with: ollama signin")

except ProviderNotAvailableError:
    print("Ollama package not installed!")
    print("Install with: pip install ollama")

except ChatCompletionError as e:
    print(f"Chat completion failed: {e}")

Troubleshooting

API Key Not Found

# Error: APIKeyNotFoundError for ollama_cloud
# Solution: Set your API key
export OLLAMA_API_KEY=your_key_here

Model Not Available

# If cloud model not found, check available models:
# Visit: https://ollama.com/search?c=cloud

# Or list via API:
# curl https://ollama.com/api/tags

Connection Issues

# If connection fails, check:
# 1. Internet connection
# 2. API key is valid
# 3. Ollama cloud service is up
# 4. Try with custom host:

client = LLMClient(
    api_choice="ollama",
    llm="gpt-oss:120b-cloud",
    ollama_host="https://ollama.com"
)

Migration from Local to Cloud

Before (Local Only)

client = LLMClient(
    api_choice="ollama",
    llm="llama3.2:1b"
)

After (Cloud Enabled)

# Option 1: Explicit cloud
client = LLMClient(
    api_choice="ollama",
    llm="gpt-oss:120b-cloud",
    use_ollama_cloud=True
)

# Option 2: Auto-detect from model name
client = LLMClient(
    api_choice="ollama",
    llm="gpt-oss:120b-cloud"
)

Additional Resources

Example: RAG with Ollama Cloud

from llm_client import LLMClient

# Use large cloud model for better RAG performance
client = LLMClient(llm="gpt-oss:120b-cloud")

# Your document
document = """
Ollama Cloud is a new service that allows running large language
models in the cloud without requiring powerful local hardware.
"""

# RAG query
query = "What is Ollama Cloud?"

messages = [
    {
        "role": "system",
        "content": f"Answer based on this document:\n\n{document}"
    },
    {
        "role": "user",
        "content": query
    }
]

response = client.chat_completion(messages)
print(f"Answer: {response}")

Conclusion

Ollama Cloud support in LLM Client provides: - Seamless integration with existing code - Automatic detection from model names - Easy switching between local and cloud - Same API for both modes

Try it out with:

pip install llm_client
export OLLAMA_API_KEY=your_key

from llm_client import LLMClient

client = LLMClient(llm="gpt-oss:120b-cloud")
response = client.chat_completion([
    {"role": "user", "content": "Hello, Ollama Cloud!"}
])
print(response)