Features OverviewΒΆ
LLM Client provides a comprehensive set of features for working with multiple LLM providers. This page gives you an overview of all available features.
π― Core FeaturesΒΆ
client.chat_completion(messages)"] end subgraph "π― LLM Client - Universeller Python Client" CLIENT[π§ LLMClient
Einheitliche Schnittstelle] subgraph "β¨ Kernfeatures" AUTO[π Auto-Detection
Automatische Provider-Auswahl] SWITCH[π Dynamic Switching
Provider wechseln zur Laufzeit] TOKENS[π Token Counting
Kostenkontrolle mit tiktoken] ASYNC[β‘ Async Support
Nicht-blockierende Operationen] STREAM[π Streaming
Echtzeit-Antworten] CONFIG[π Config Files
YAML/JSON Konfiguration] end FACTORY[π ProviderFactory
Strategie-Muster] end subgraph "π LLM Provider" OPENAI[OpenAI
gpt-4o, gpt-4o-mini
π° Kostenpflichtig] GROQ[Groq
llama-3.3-70b
β‘ Ultra-schnell] GEMINI[Google Gemini
gemini-2.5-pro/flash
π Lange Kontexte] OLLAMA_LOCAL[Ollama Lokal
llama3.2:1b/3b
π Privat & Offline] OLLAMA_CLOUD[Ollama Cloud
gpt-oss:120b-cloud
βοΈ Ohne lokale GPU] end subgraph "π‘ AnwendungsfΓ€lle" UC1[π° Kostenoptimierung
GΓΌnstiger Provider fΓΌr
einfache Aufgaben] UC2[π Fallback-Strategie
Automatischer Wechsel
bei Ausfall] UC3[π― QualitΓ€t vs. Speed
Balance zwischen
Geschwindigkeit & QualitΓ€t] UC4[π Privacy First
Lokale Modelle fΓΌr
sensible Daten] end %% Verbindungen vom Entwickler DEV --> SIMPLE SIMPLE --> CLIENT %% Client zu Features CLIENT -.-> AUTO CLIENT -.-> SWITCH CLIENT -.-> TOKENS CLIENT -.-> ASYNC CLIENT -.-> STREAM CLIENT -.-> CONFIG %% Client zu Factory CLIENT --> FACTORY %% Factory zu Providern FACTORY --> OPENAI FACTORY --> GROQ FACTORY --> GEMINI FACTORY --> OLLAMA_LOCAL FACTORY --> OLLAMA_CLOUD %% Use Cases SWITCH -.-> UC1 SWITCH -.-> UC2 SWITCH -.-> UC3 OLLAMA_LOCAL -.-> UC4 %% Styling classDef userClass fill:#e1f5ff,stroke:#01579b,stroke-width:3px,color:#000 classDef clientClass fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#000 classDef featureClass fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000 classDef providerClass fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#000 classDef usecaseClass fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#000 class DEV,SIMPLE userClass class CLIENT,FACTORY clientClass class AUTO,SWITCH,TOKENS,ASYNC,STREAM,CONFIG featureClass class OPENAI,GROQ,GEMINI,OLLAMA_LOCAL,OLLAMA_CLOUD providerClass class UC1,UC2,UC3,UC4 usecaseClass
Automatic API DetectionΒΆ
LLM Client automatically detects which LLM provider to use based on available API keys:
from llm_client import LLMClient
# Automatically selects first available provider:
# 1. OpenAI (if OPENAI_API_KEY set)
# 2. Groq (if GROQ_API_KEY set)
# 3. Gemini (if GEMINI_API_KEY set)
# 4. Ollama (local fallback, no key needed)
client = LLMClient()
print(f"Using: {client.api_choice}") # e.g., "openai"
Unified InterfaceΒΆ
One consistent API for all providers - no need to learn different APIs:
# Same code works with any provider
messages = [{"role": "user", "content": "Hello!"}]
# Works with OpenAI
openai_client = LLMClient(api_choice="openai")
response1 = openai_client.chat_completion(messages)
# Works with Groq
groq_client = LLMClient(api_choice="groq")
response2 = groq_client.chat_completion(messages)
# Works with Gemini
gemini_client = LLMClient(api_choice="gemini")
response3 = gemini_client.chat_completion(messages)
β¨ Advanced Features (v0.3.0)ΒΆ
π Token CountingΒΆ
Accurate token counting with tiktoken for cost management:
# Count tokens before sending
token_count = client.count_tokens(messages)
print(f"This will use ~{token_count} tokens")
# Check budget
if token_count < 4000:
response = client.chat_completion(messages)
β‘ Async SupportΒΆ
Full async/await support for non-blocking operations:
# Create async client
async_client = LLMClient(use_async=True)
# Async completion
response = await async_client.achat_completion(messages)
# Async streaming
async for chunk in async_client.achat_completion_stream(messages):
print(chunk, end="", flush=True)
π Configuration FilesΒΆ
Manage multiple provider configurations with YAML/JSON:
# Load from config file
client = LLMClient.from_config("llm_config.yaml")
# Use specific provider
client = LLMClient.from_config("llm_config.yaml", provider="groq")
Example llm_config.yaml:
default_provider: openai
providers:
openai:
model: gpt-4o-mini
temperature: 0.7
groq:
model: llama-3.3-70b-versatile
temperature: 0.5
π Response StreamingΒΆ
Stream responses in real-time for better UX:
messages = [{"role": "user", "content": "Tell me a story"}]
print("Response: ", end="")
for chunk in client.chat_completion_stream(messages):
print(chunk, end="", flush=True)
print()
π Dynamic Provider SwitchingΒΆ
Switch between providers at runtime:
# Start with OpenAI
client = LLMClient(api_choice="openai")
response1 = client.chat_completion(messages)
# Switch to Groq
client.switch_provider("groq")
response2 = client.chat_completion(messages)
# Switch to Gemini with new parameters
client.switch_provider("gemini", temperature=0.8)
response3 = client.chat_completion(messages)
Use Cases: - Cost optimization - Fallback strategies - A/B testing - Quality vs. speed trade-offs
π§° Tool CallingΒΆ
OpenAI-compatible function/tool calling for all providers:
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
}]
result = client.chat_completion_with_tools(messages, tools)
if result['tool_calls']:
for call in result['tool_calls']:
print(f"Calling: {call['function']['name']}")
π File UploadΒΆ
Upload images, PDFs, videos, and audio with your messages:
# Analyze an image
messages = [{"role": "user", "content": "What's in this image?"}]
response = client.chat_completion_with_files(
messages,
files=["photo.jpg"]
)
# Analyze a PDF
messages = [{"role": "user", "content": "Summarize this document"}]
response = client.chat_completion_with_files(
messages,
files=["report.pdf"]
)
Supported Formats by Provider:
| Provider | Images | PDFs | Videos | Audio |
|---|---|---|---|---|
| OpenAI | β | β | β | β |
| Gemini | β | β | β | β |
| Groq | β | β | β | β |
| Ollama | β | β | β | β |
βοΈ Ollama CloudΒΆ
Access powerful cloud models without local GPU:
# Automatic cloud detection
client = LLMClient(llm="gpt-oss:120b-cloud")
# Or explicit cloud mode
client = LLMClient(
api_choice="ollama",
llm="gpt-oss:120b-cloud",
use_ollama_cloud=True
)
Benefits: - No local GPU needed - Access to large models (120B+) - Fast inference - Easy switching between local and cloud
π οΈ Developer FeaturesΒΆ
Comprehensive LoggingΒΆ
Built-in logging for debugging and monitoring:
from llm_client import setup_logging
# Enable debug logging
setup_logging(level="DEBUG")
# Your code here
client = LLMClient()
Custom ExceptionsΒΆ
Detailed exception hierarchy for better error handling:
from llm_client.exceptions import (
APIKeyNotFoundError,
ChatCompletionError,
InvalidProviderError
)
try:
client = LLMClient(api_choice="openai")
response = client.chat_completion(messages)
except APIKeyNotFoundError as e:
print(f"Missing API key: {e.key_name}")
except ChatCompletionError as e:
print(f"API error: {e}")
Retry LogicΒΆ
Automatic retry with exponential backoff:
# Automatically retries up to 3 times
# with delays: 4s, 8s, 10s
response = client.chat_completion(messages)
Type HintsΒΆ
Full type hints for better IDE support:
from llm_client import LLMClient
from typing import List, Dict
def process_conversation(
client: LLMClient,
messages: List[Dict[str, str]]
) -> str:
return client.chat_completion(messages)
π¦ Integration FeaturesΒΆ
Google Colab SupportΒΆ
Automatic secret loading in Google Colab:
# Add keys to Colab Secrets (π icon)
# Keys: OPENAI_API_KEY, GROQ_API_KEY, etc.
from llm_client import LLMClient
# Automatically loads from Colab secrets
client = LLMClient()
llama-index IntegrationΒΆ
Seamless integration with llama-index:
from llm_client import LLMClientAdapter, LLMClient
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Create adapter
llm_adapter = LLMClientAdapter(client=LLMClient())
# Use in llama-index
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, llm=llm_adapter)
π― Comparison with Other LibrariesΒΆ
vs. OpenAI SDKΒΆ
| Feature | LLM Client | OpenAI SDK |
|---|---|---|
| Multi-provider | β | β |
| Auto-detection | β | β |
| Token counting | β | β |
| Provider switching | β | β |
| Unified interface | β | β |
| Streaming | β | β |
vs. LangChainΒΆ
| Feature | LLM Client | LangChain |
|---|---|---|
| Simplicity | β Simple | β οΈ Complex |
| Multi-provider | β | β |
| Async support | β | β |
| File upload | β | β οΈ Limited |
| Learning curve | Low | High |
π Coming SoonΒΆ
Features planned for future releases:
- [ ] Embedding support
- [ ] Batch processing
- [ ] Caching layer
- [ ] Prompt templates
- [ ] More providers (Anthropic, Cohere)
- [ ] Advanced RAG utilities
π Learn MoreΒΆ
- Getting Started - Installation and setup
- API Reference - Complete API documentation
- Examples - Real-world examples
- Troubleshooting - Common issues
π‘ Need Help?ΒΆ
- π Documentation
- π Report Issues