Features Overview¶

LLM Client provides a comprehensive set of features for working with multiple LLM providers. This page gives you an overview of all available features.

🎯 Core Features¶

graph TB subgraph "👤 Entwickler" DEV[Ihr Code] SIMPLE["Einfache API:
client.chat_completion(messages)"] end subgraph "🎯 LLM Client - Universeller Python Client" CLIENT[🧠 LLMClient
Einheitliche Schnittstelle] subgraph "✨ Kernfeatures" AUTO[🔍 Auto-Detection
Automatische Provider-Auswahl] SWITCH[🔄 Dynamic Switching
Provider wechseln zur Laufzeit] TOKENS[📊 Token Counting
Kostenkontrolle mit tiktoken] ASYNC[⚡ Async Support
Nicht-blockierende Operationen] STREAM[🌊 Streaming
Echtzeit-Antworten] CONFIG[📁 Config Files
YAML/JSON Konfiguration] end FACTORY[🏭 ProviderFactory
Strategie-Muster] end subgraph "🌐 LLM Provider" OPENAI[OpenAI
gpt-4o, gpt-4o-mini
💰 Kostenpflichtig] GROQ[Groq
llama-3.3-70b
⚡ Ultra-schnell] GEMINI[Google Gemini
gemini-2.5-pro/flash
🌍 Lange Kontexte] OLLAMA_LOCAL[Ollama Lokal
llama3.2:1b/3b
🔒 Privat & Offline] OLLAMA_CLOUD[Ollama Cloud
gpt-oss:120b-cloud
☁️ Ohne lokale GPU] end subgraph "💡 Anwendungsfälle" UC1[💰 Kostenoptimierung
Günstiger Provider für
einfache Aufgaben] UC2[🔄 Fallback-Strategie
Automatischer Wechsel
bei Ausfall] UC3[🎯 Qualität vs. Speed
Balance zwischen
Geschwindigkeit & Qualität] UC4[🔒 Privacy First
Lokale Modelle für
sensible Daten] end %% Verbindungen vom Entwickler DEV --> SIMPLE SIMPLE --> CLIENT %% Client zu Features CLIENT -.-> AUTO CLIENT -.-> SWITCH CLIENT -.-> TOKENS CLIENT -.-> ASYNC CLIENT -.-> STREAM CLIENT -.-> CONFIG %% Client zu Factory CLIENT --> FACTORY %% Factory zu Providern FACTORY --> OPENAI FACTORY --> GROQ FACTORY --> GEMINI FACTORY --> OLLAMA_LOCAL FACTORY --> OLLAMA_CLOUD %% Use Cases SWITCH -.-> UC1 SWITCH -.-> UC2 SWITCH -.-> UC3 OLLAMA_LOCAL -.-> UC4 %% Styling classDef userClass fill:#e1f5ff,stroke:#01579b,stroke-width:3px,color:#000 classDef clientClass fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#000 classDef featureClass fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000 classDef providerClass fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#000 classDef usecaseClass fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#000 class DEV,SIMPLE userClass class CLIENT,FACTORY clientClass class AUTO,SWITCH,TOKENS,ASYNC,STREAM,CONFIG featureClass class OPENAI,GROQ,GEMINI,OLLAMA_LOCAL,OLLAMA_CLOUD providerClass class UC1,UC2,UC3,UC4 usecaseClass

Automatic API Detection¶

LLM Client automatically detects which LLM provider to use based on available API keys:

from llm_client import LLMClient

# Automatically selects first available provider:
# 1. OpenAI (if OPENAI_API_KEY set)
# 2. Groq (if GROQ_API_KEY set)
# 3. Gemini (if GEMINI_API_KEY set)
# 4. Ollama (local fallback, no key needed)
client = LLMClient()

print(f"Using: {client.api_choice}")  # e.g., "openai"

Learn more

Unified Interface¶

One consistent API for all providers - no need to learn different APIs:

# Same code works with any provider
messages = [{"role": "user", "content": "Hello!"}]

# Works with OpenAI
openai_client = LLMClient(api_choice="openai")
response1 = openai_client.chat_completion(messages)

# Works with Groq
groq_client = LLMClient(api_choice="groq")
response2 = groq_client.chat_completion(messages)

# Works with Gemini
gemini_client = LLMClient(api_choice="gemini")
response3 = gemini_client.chat_completion(messages)

See examples

✨ Advanced Features (v0.3.0)¶

📊 Token Counting¶

Accurate token counting with tiktoken for cost management:

# Count tokens before sending
token_count = client.count_tokens(messages)
print(f"This will use ~{token_count} tokens")

# Check budget
if token_count < 4000:
    response = client.chat_completion(messages)

Token Counting Guide

⚡ Async Support¶

Full async/await support for non-blocking operations:

# Create async client
async_client = LLMClient(use_async=True)

# Async completion
response = await async_client.achat_completion(messages)

# Async streaming
async for chunk in async_client.achat_completion_stream(messages):
    print(chunk, end="", flush=True)

Async Guide

📁 Configuration Files¶

Manage multiple provider configurations with YAML/JSON:

# Load from config file
client = LLMClient.from_config("llm_config.yaml")

# Use specific provider
client = LLMClient.from_config("llm_config.yaml", provider="groq")

Example llm_config.yaml:

default_provider: openai

providers:
  openai:
    model: gpt-4o-mini
    temperature: 0.7

  groq:
    model: llama-3.3-70b-versatile
    temperature: 0.5

Configuration Guide

🌊 Response Streaming¶

Stream responses in real-time for better UX:

messages = [{"role": "user", "content": "Tell me a story"}]

print("Response: ", end="")
for chunk in client.chat_completion_stream(messages):
    print(chunk, end="", flush=True)
print()

Streaming Guide

🔄 Dynamic Provider Switching¶

Switch between providers at runtime:

# Start with OpenAI
client = LLMClient(api_choice="openai")
response1 = client.chat_completion(messages)

# Switch to Groq
client.switch_provider("groq")
response2 = client.chat_completion(messages)

# Switch to Gemini with new parameters
client.switch_provider("gemini", temperature=0.8)
response3 = client.chat_completion(messages)

Use Cases: - Cost optimization - Fallback strategies - A/B testing - Quality vs. speed trade-offs

Provider Switching Guide

🧰 Tool Calling¶

OpenAI-compatible function/tool calling for all providers:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            }
        }
    }
}]

result = client.chat_completion_with_tools(messages, tools)

if result['tool_calls']:
    for call in result['tool_calls']:
        print(f"Calling: {call['function']['name']}")

Tool Calling Guide

📎 File Upload¶

Upload images, PDFs, videos, and audio with your messages:

# Analyze an image
messages = [{"role": "user", "content": "What's in this image?"}]
response = client.chat_completion_with_files(
    messages,
    files=["photo.jpg"]
)

# Analyze a PDF
messages = [{"role": "user", "content": "Summarize this document"}]
response = client.chat_completion_with_files(
    messages,
    files=["report.pdf"]
)

Supported Formats by Provider:

Provider	Images	PDFs	Videos	Audio
OpenAI	✅	✅	❌	❌
Gemini	✅	✅	✅	✅
Groq	✅	❌	❌	❌
Ollama	✅	❌	❌	❌

File Upload Guide

☁️ Ollama Cloud¶

Access powerful cloud models without local GPU:

# Automatic cloud detection
client = LLMClient(llm="gpt-oss:120b-cloud")

# Or explicit cloud mode
client = LLMClient(
    api_choice="ollama",
    llm="gpt-oss:120b-cloud",
    use_ollama_cloud=True
)

Benefits: - No local GPU needed - Access to large models (120B+) - Fast inference - Easy switching between local and cloud

Ollama Cloud Guide

🛠️ Developer Features¶

Comprehensive Logging¶

Built-in logging for debugging and monitoring:

from llm_client import setup_logging

# Enable debug logging
setup_logging(level="DEBUG")

# Your code here
client = LLMClient()

Logging Guide

Custom Exceptions¶

Detailed exception hierarchy for better error handling:

from llm_client.exceptions import (
    APIKeyNotFoundError,
    ChatCompletionError,
    InvalidProviderError
)

try:
    client = LLMClient(api_choice="openai")
    response = client.chat_completion(messages)
except APIKeyNotFoundError as e:
    print(f"Missing API key: {e.key_name}")
except ChatCompletionError as e:
    print(f"API error: {e}")

Exception Reference

Retry Logic¶

Automatic retry with exponential backoff:

# Automatically retries up to 3 times
# with delays: 4s, 8s, 10s
response = client.chat_completion(messages)

Type Hints¶

Full type hints for better IDE support:

from llm_client import LLMClient
from typing import List, Dict

def process_conversation(
    client: LLMClient,
    messages: List[Dict[str, str]]
) -> str:
    return client.chat_completion(messages)

📦 Integration Features¶

Google Colab Support¶

Automatic secret loading in Google Colab:

# Add keys to Colab Secrets (🔑 icon)
# Keys: OPENAI_API_KEY, GROQ_API_KEY, etc.

from llm_client import LLMClient

# Automatically loads from Colab secrets
client = LLMClient()

llama-index Integration¶

Seamless integration with llama-index:

from llm_client import LLMClientAdapter, LLMClient
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Create adapter
llm_adapter = LLMClientAdapter(client=LLMClient())

# Use in llama-index
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, llm=llm_adapter)

🎯 Comparison with Other Libraries¶

vs. OpenAI SDK¶

Feature	LLM Client	OpenAI SDK
Multi-provider	✅	❌
Auto-detection	✅	❌
Token counting	✅	❌
Provider switching	✅	❌
Unified interface	✅	❌
Streaming	✅	✅

vs. LangChain¶

Feature	LLM Client	LangChain
Simplicity	✅ Simple	⚠️ Complex
Multi-provider	✅	✅
Async support	✅	✅
File upload	✅	⚠️ Limited
Learning curve	Low	High

🚀 Coming Soon¶

Features planned for future releases:

[ ] Embedding support
[ ] Batch processing
[ ] Caching layer
[ ] Prompt templates
[ ] More providers (Anthropic, Cohere)
[ ] Advanced RAG utilities

📚 Learn More¶

Getting Started - Installation and setup
API Reference - Complete API documentation
Examples - Real-world examples
Troubleshooting - Common issues

💡 Need Help?¶

📖 Documentation
🐛 Report Issues