Skip to content

Features OverviewΒΆ

LLM Client provides a comprehensive set of features for working with multiple LLM providers. This page gives you an overview of all available features.


🎯 Core Features¢

graph TB subgraph "πŸ‘€ Entwickler" DEV[Ihr Code] SIMPLE["Einfache API:
client.chat_completion(messages)"] end subgraph "🎯 LLM Client - Universeller Python Client" CLIENT[🧠 LLMClient
Einheitliche Schnittstelle] subgraph "✨ Kernfeatures" AUTO[πŸ” Auto-Detection
Automatische Provider-Auswahl] SWITCH[πŸ”„ Dynamic Switching
Provider wechseln zur Laufzeit] TOKENS[πŸ“Š Token Counting
Kostenkontrolle mit tiktoken] ASYNC[⚑ Async Support
Nicht-blockierende Operationen] STREAM[🌊 Streaming
Echtzeit-Antworten] CONFIG[πŸ“ Config Files
YAML/JSON Konfiguration] end FACTORY[🏭 ProviderFactory
Strategie-Muster] end subgraph "🌐 LLM Provider" OPENAI[OpenAI
gpt-4o, gpt-4o-mini
πŸ’° Kostenpflichtig] GROQ[Groq
llama-3.3-70b
⚑ Ultra-schnell] GEMINI[Google Gemini
gemini-2.5-pro/flash
🌍 Lange Kontexte] OLLAMA_LOCAL[Ollama Lokal
llama3.2:1b/3b
πŸ”’ Privat & Offline] OLLAMA_CLOUD[Ollama Cloud
gpt-oss:120b-cloud
☁️ Ohne lokale GPU] end subgraph "πŸ’‘ AnwendungsfΓ€lle" UC1[πŸ’° Kostenoptimierung
GΓΌnstiger Provider fΓΌr
einfache Aufgaben] UC2[πŸ”„ Fallback-Strategie
Automatischer Wechsel
bei Ausfall] UC3[🎯 QualitÀt vs. Speed
Balance zwischen
Geschwindigkeit & QualitΓ€t] UC4[πŸ”’ Privacy First
Lokale Modelle fΓΌr
sensible Daten] end %% Verbindungen vom Entwickler DEV --> SIMPLE SIMPLE --> CLIENT %% Client zu Features CLIENT -.-> AUTO CLIENT -.-> SWITCH CLIENT -.-> TOKENS CLIENT -.-> ASYNC CLIENT -.-> STREAM CLIENT -.-> CONFIG %% Client zu Factory CLIENT --> FACTORY %% Factory zu Providern FACTORY --> OPENAI FACTORY --> GROQ FACTORY --> GEMINI FACTORY --> OLLAMA_LOCAL FACTORY --> OLLAMA_CLOUD %% Use Cases SWITCH -.-> UC1 SWITCH -.-> UC2 SWITCH -.-> UC3 OLLAMA_LOCAL -.-> UC4 %% Styling classDef userClass fill:#e1f5ff,stroke:#01579b,stroke-width:3px,color:#000 classDef clientClass fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#000 classDef featureClass fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000 classDef providerClass fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#000 classDef usecaseClass fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#000 class DEV,SIMPLE userClass class CLIENT,FACTORY clientClass class AUTO,SWITCH,TOKENS,ASYNC,STREAM,CONFIG featureClass class OPENAI,GROQ,GEMINI,OLLAMA_LOCAL,OLLAMA_CLOUD providerClass class UC1,UC2,UC3,UC4 usecaseClass

Automatic API DetectionΒΆ

LLM Client automatically detects which LLM provider to use based on available API keys:

from llm_client import LLMClient

# Automatically selects first available provider:
# 1. OpenAI (if OPENAI_API_KEY set)
# 2. Groq (if GROQ_API_KEY set)
# 3. Gemini (if GEMINI_API_KEY set)
# 4. Ollama (local fallback, no key needed)
client = LLMClient()

print(f"Using: {client.api_choice}")  # e.g., "openai"

Learn more


Unified InterfaceΒΆ

One consistent API for all providers - no need to learn different APIs:

# Same code works with any provider
messages = [{"role": "user", "content": "Hello!"}]

# Works with OpenAI
openai_client = LLMClient(api_choice="openai")
response1 = openai_client.chat_completion(messages)

# Works with Groq
groq_client = LLMClient(api_choice="groq")
response2 = groq_client.chat_completion(messages)

# Works with Gemini
gemini_client = LLMClient(api_choice="gemini")
response3 = gemini_client.chat_completion(messages)

See examples


✨ Advanced Features (v0.3.0)¢

πŸ“Š Token CountingΒΆ

Accurate token counting with tiktoken for cost management:

# Count tokens before sending
token_count = client.count_tokens(messages)
print(f"This will use ~{token_count} tokens")

# Check budget
if token_count < 4000:
    response = client.chat_completion(messages)

Token Counting Guide


⚑ Async Support¢

Full async/await support for non-blocking operations:

# Create async client
async_client = LLMClient(use_async=True)

# Async completion
response = await async_client.achat_completion(messages)

# Async streaming
async for chunk in async_client.achat_completion_stream(messages):
    print(chunk, end="", flush=True)

Async Guide


πŸ“ Configuration FilesΒΆ

Manage multiple provider configurations with YAML/JSON:

# Load from config file
client = LLMClient.from_config("llm_config.yaml")

# Use specific provider
client = LLMClient.from_config("llm_config.yaml", provider="groq")

Example llm_config.yaml:

default_provider: openai

providers:
  openai:
    model: gpt-4o-mini
    temperature: 0.7

  groq:
    model: llama-3.3-70b-versatile
    temperature: 0.5

Configuration Guide


🌊 Response Streaming¢

Stream responses in real-time for better UX:

messages = [{"role": "user", "content": "Tell me a story"}]

print("Response: ", end="")
for chunk in client.chat_completion_stream(messages):
    print(chunk, end="", flush=True)
print()

Streaming Guide


πŸ”„ Dynamic Provider SwitchingΒΆ

Switch between providers at runtime:

# Start with OpenAI
client = LLMClient(api_choice="openai")
response1 = client.chat_completion(messages)

# Switch to Groq
client.switch_provider("groq")
response2 = client.chat_completion(messages)

# Switch to Gemini with new parameters
client.switch_provider("gemini", temperature=0.8)
response3 = client.chat_completion(messages)

Use Cases: - Cost optimization - Fallback strategies - A/B testing - Quality vs. speed trade-offs

Provider Switching Guide


🧰 Tool Calling¢

OpenAI-compatible function/tool calling for all providers:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            }
        }
    }
}]

result = client.chat_completion_with_tools(messages, tools)

if result['tool_calls']:
    for call in result['tool_calls']:
        print(f"Calling: {call['function']['name']}")

Tool Calling Guide


πŸ“Ž File UploadΒΆ

Upload images, PDFs, videos, and audio with your messages:

# Analyze an image
messages = [{"role": "user", "content": "What's in this image?"}]
response = client.chat_completion_with_files(
    messages,
    files=["photo.jpg"]
)

# Analyze a PDF
messages = [{"role": "user", "content": "Summarize this document"}]
response = client.chat_completion_with_files(
    messages,
    files=["report.pdf"]
)

Supported Formats by Provider:

Provider Images PDFs Videos Audio
OpenAI βœ… βœ… ❌ ❌
Gemini βœ… βœ… βœ… βœ…
Groq βœ… ❌ ❌ ❌
Ollama βœ… ❌ ❌ ❌

File Upload Guide


☁️ Ollama Cloud¢

Access powerful cloud models without local GPU:

# Automatic cloud detection
client = LLMClient(llm="gpt-oss:120b-cloud")

# Or explicit cloud mode
client = LLMClient(
    api_choice="ollama",
    llm="gpt-oss:120b-cloud",
    use_ollama_cloud=True
)

Benefits: - No local GPU needed - Access to large models (120B+) - Fast inference - Easy switching between local and cloud

Ollama Cloud Guide


πŸ› οΈ Developer FeaturesΒΆ

Comprehensive LoggingΒΆ

Built-in logging for debugging and monitoring:

from llm_client import setup_logging

# Enable debug logging
setup_logging(level="DEBUG")

# Your code here
client = LLMClient()

Logging Guide


Custom ExceptionsΒΆ

Detailed exception hierarchy for better error handling:

from llm_client.exceptions import (
    APIKeyNotFoundError,
    ChatCompletionError,
    InvalidProviderError
)

try:
    client = LLMClient(api_choice="openai")
    response = client.chat_completion(messages)
except APIKeyNotFoundError as e:
    print(f"Missing API key: {e.key_name}")
except ChatCompletionError as e:
    print(f"API error: {e}")

Exception Reference


Retry LogicΒΆ

Automatic retry with exponential backoff:

# Automatically retries up to 3 times
# with delays: 4s, 8s, 10s
response = client.chat_completion(messages)

Type HintsΒΆ

Full type hints for better IDE support:

from llm_client import LLMClient
from typing import List, Dict

def process_conversation(
    client: LLMClient,
    messages: List[Dict[str, str]]
) -> str:
    return client.chat_completion(messages)

πŸ“¦ Integration FeaturesΒΆ

Google Colab SupportΒΆ

Automatic secret loading in Google Colab:

# Add keys to Colab Secrets (πŸ”‘ icon)
# Keys: OPENAI_API_KEY, GROQ_API_KEY, etc.

from llm_client import LLMClient

# Automatically loads from Colab secrets
client = LLMClient()

llama-index IntegrationΒΆ

Seamless integration with llama-index:

from llm_client import LLMClientAdapter, LLMClient
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Create adapter
llm_adapter = LLMClientAdapter(client=LLMClient())

# Use in llama-index
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, llm=llm_adapter)

🎯 Comparison with Other Libraries¢

vs. OpenAI SDKΒΆ

Feature LLM Client OpenAI SDK
Multi-provider βœ… ❌
Auto-detection βœ… ❌
Token counting βœ… ❌
Provider switching βœ… ❌
Unified interface βœ… ❌
Streaming βœ… βœ…

vs. LangChainΒΆ

Feature LLM Client LangChain
Simplicity βœ… Simple ⚠️ Complex
Multi-provider βœ… βœ…
Async support βœ… βœ…
File upload βœ… ⚠️ Limited
Learning curve Low High

πŸš€ Coming SoonΒΆ

Features planned for future releases:

  • [ ] Embedding support
  • [ ] Batch processing
  • [ ] Caching layer
  • [ ] Prompt templates
  • [ ] More providers (Anthropic, Cohere)
  • [ ] Advanced RAG utilities

πŸ“š Learn MoreΒΆ


πŸ’‘ Need Help?ΒΆ