Features OverviewΒΆ
LLM Client provides a comprehensive set of features for working with multiple LLM providers. This page gives you an overview of all available features.
π― Core FeaturesΒΆ
client.chat_completion(messages)"] end subgraph "π― LLM Client - Universeller Python Client" CLIENT[π§ LLMClient
Einheitliche Schnittstelle] subgraph "β¨ Kernfeatures" AUTO[π Auto-Detection
Automatische Provider-Auswahl] SWITCH[π Dynamic Switching
Provider wechseln zur Laufzeit] TOKENS[π Token Counting
Kostenkontrolle mit tiktoken] ASYNC[β‘ Async Support
Nicht-blockierende Operationen] STREAM[π Streaming
Echtzeit-Antworten] CONFIG[π Config Files
YAML/JSON Konfiguration] end FACTORY[π ProviderFactory
Strategie-Muster] end subgraph "π LLM Provider" OPENAI[OpenAI
gpt-4o, gpt-4o-mini
π° Kostenpflichtig] GROQ[Groq
llama-3.3-70b
β‘ Ultra-schnell] GEMINI[Google Gemini
gemini-2.5-pro/flash
π Lange Kontexte] OLLAMA_LOCAL[Ollama Lokal
llama3.2:1b/3b
π Privat & Offline] OLLAMA_CLOUD[Ollama Cloud
gpt-oss:120b-cloud
βοΈ Ohne lokale GPU] end subgraph "π‘ AnwendungsfΓ€lle" UC1[π° Kostenoptimierung
GΓΌnstiger Provider fΓΌr
einfache Aufgaben] UC2[π Fallback-Strategie
Automatischer Wechsel
bei Ausfall] UC3[π― QualitΓ€t vs. Speed
Balance zwischen
Geschwindigkeit & QualitΓ€t] UC4[π Privacy First
Lokale Modelle fΓΌr
sensible Daten] end %% Verbindungen vom Entwickler DEV --> SIMPLE SIMPLE --> CLIENT %% Client zu Features CLIENT -.-> AUTO CLIENT -.-> SWITCH CLIENT -.-> TOKENS CLIENT -.-> ASYNC CLIENT -.-> STREAM CLIENT -.-> CONFIG %% Client zu Factory CLIENT --> FACTORY %% Factory zu Providern FACTORY --> OPENAI FACTORY --> GROQ FACTORY --> GEMINI FACTORY --> OLLAMA_LOCAL FACTORY --> OLLAMA_CLOUD %% Use Cases SWITCH -.-> UC1 SWITCH -.-> UC2 SWITCH -.-> UC3 OLLAMA_LOCAL -.-> UC4 %% Styling classDef userClass fill:#e1f5ff,stroke:#01579b,stroke-width:3px,color:#000 classDef clientClass fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#000 classDef featureClass fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000 classDef providerClass fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px,color:#000 classDef usecaseClass fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#000 class DEV,SIMPLE userClass class CLIENT,FACTORY clientClass class AUTO,SWITCH,TOKENS,ASYNC,STREAM,CONFIG featureClass class OPENAI,GROQ,GEMINI,OLLAMA_LOCAL,OLLAMA_CLOUD providerClass class UC1,UC2,UC3,UC4 usecaseClass
Automatic API DetectionΒΆ
LLM Client automatically detects which LLM provider to use based on available API keys.
Generic API Key Support (Recommended)ΒΆ
In addition to provider-specific keys (OPENAI_API_KEY, etc.), the client supports a generic API_KEY variable. This is the recommended approach as the client analyzes the key prefix to determine the provider automatically:
| Prefix | Detected Provider |
|---|---|
sk- |
OpenAI |
gsk- |
Groq |
gsk_ |
Groq |
AIza |
Google Gemini |
ExampleΒΆ
import os
from llm_client import LLMClient
# Set only a generic key
os.environ["API_KEY"] = "sk-..."
# Client automatically detects it's OpenAI
client = LLMClient()
print(client.api_choice) # "openai"
Unified InterfaceΒΆ
One consistent API for all providers - no need to learn different APIs:
# Same code works with any provider
messages = [{"role": "user", "content": "Hello!"}]
# Works with OpenAI
openai_client = LLMClient(api_choice="openai")
response1 = openai_client.chat_completion(messages)
# Works with Groq
groq_client = LLMClient(api_choice="groq")
response2 = groq_client.chat_completion(messages)
# Works with Gemini
gemini_client = LLMClient(api_choice="gemini")
response3 = gemini_client.chat_completion(messages)
β¨ Advanced Features (v0.3.0)ΒΆ
Token CountingΒΆ
Accurate token counting with tiktoken for cost management:
# Count tokens before sending
token_count = client.count_tokens(messages)
print(f"This will use ~{token_count} tokens")
# Check budget
if token_count < 4000:
response = client.chat_completion(messages)
Async SupportΒΆ
Full async/await support for non-blocking operations:
# Create async client
async_client = LLMClient(use_async=True)
# Async completion
response = await async_client.achat_completion(messages)
# Async streaming
async for chunk in async_client.achat_completion_stream(messages):
print(chunk, end="", flush=True)
Configuration FilesΒΆ
Manage multiple provider configurations with YAML/JSON:
# Load from config file
client = LLMClient.from_config("llm_config.yaml")
# Use specific provider
client = LLMClient.from_config("llm_config.yaml", provider="groq")
Example llm_config.yaml:
default_provider: openai
providers:
openai:
model: gpt-4o-mini
temperature: 0.7
groq:
model: llama-3.3-70b-versatile
temperature: 0.5
Response StreamingΒΆ
Stream responses in real-time for better UX:
messages = [{"role": "user", "content": "Tell me a story"}]
print("Response: ", end="")
for chunk in client.chat_completion_stream(messages):
print(chunk, end="", flush=True)
print()
Dynamic Provider SwitchingΒΆ
Switch between providers at runtime:
# Start with OpenAI
client = LLMClient(api_choice="openai")
response1 = client.chat_completion(messages)
# Switch to Groq
client.switch_provider("groq")
response2 = client.chat_completion(messages)
# Switch to Gemini with new parameters
client.switch_provider("gemini", temperature=0.8)
response3 = client.chat_completion(messages)
Use Cases:
- Cost optimization
- Fallback strategies
- A/B testing
- Quality vs. speed trade-offs
Tool CallingΒΆ
OpenAI-compatible function/tool calling for all providers:
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
}]
result = client.chat_completion_with_tools(messages, tools)
if result['tool_calls']:
for call in result['tool_calls']:
print(f"Calling: {call['function']['name']}")
File UploadΒΆ
Upload images, PDFs, videos, and audio with your messages:
# Analyze an image
messages = [{"role": "user", "content": "What's in this image?"}]
response = client.chat_completion_with_files(
messages,
files=["photo.jpg"]
)
# Analyze a PDF
messages = [{"role": "user", "content": "Summarize this document"}]
response = client.chat_completion_with_files(
messages,
files=["report.pdf"]
)
Supported Formats by Provider:
| Provider | Images | PDFs | Videos | Audio |
|---|---|---|---|---|
| OpenAI | β | β | β | β |
| Gemini | β | β | β | β |
| Groq | β | β | β | β |
| Ollama | β | β | β | β |
βοΈ Ollama CloudΒΆ
Access powerful cloud models without local GPU:
# Automatic cloud detection
client = LLMClient(llm="gpt-oss:120b-cloud")
# Or explicit cloud mode
client = LLMClient(
api_choice="ollama",
llm="gpt-oss:120b-cloud",
use_ollama_cloud=True
)
Benefits:
- No local GPU needed
- Access to large models (120B+)
- Fast inference
- Easy switching between local and cloud
π οΈ Developer FeaturesΒΆ
Comprehensive LoggingΒΆ
Built-in logging for debugging and monitoring:
from llm_client import setup_logging
# Enable debug logging
setup_logging(level="DEBUG")
# Your code here
client = LLMClient()
Custom ExceptionsΒΆ
Detailed exception hierarchy for better error handling:
from llm_client.exceptions import (
APIKeyNotFoundError,
ChatCompletionError,
InvalidProviderError
)
try:
client = LLMClient(api_choice="openai")
response = client.chat_completion(messages)
except APIKeyNotFoundError as e:
print(f"Missing API key: {e.key_name}")
except ChatCompletionError as e:
print(f"API error: {e}")
Retry LogicΒΆ
Automatic retry with exponential backoff:
# Automatically retries up to 3 times
# with delays: 4s, 8s, 10s
response = client.chat_completion(messages)
Type HintsΒΆ
Full type hints for better IDE support:
from llm_client import LLMClient
from typing import List, Dict
def process_conversation(
client: LLMClient,
messages: List[Dict[str, str]]
) -> str:
return client.chat_completion(messages)
π¦ Integration FeaturesΒΆ
Google Colab SupportΒΆ
Automatic secret loading in Google Colab:
# Add keys to Colab Secrets (π icon)
# Keys: OPENAI_API_KEY, GROQ_API_KEY, etc.
from llm_client import LLMClient
# Automatically loads from Colab secrets
client = LLMClient()
llama-index IntegrationΒΆ
Seamless integration with llama-index:
from llm_client import LLMClientAdapter, LLMClient
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Create adapter
llm_adapter = LLMClientAdapter(client=LLMClient())
# Use in llama-index
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, llm=llm_adapter)
π― Comparison with Other LibrariesΒΆ
vs. OpenAI SDKΒΆ
| Feature | LLM Client | OpenAI SDK |
|---|---|---|
| Multi-provider | β | β |
| Auto-detection | β | β |
| Token counting | β | β |
| Provider switching | β | β |
| Unified interface | β | β |
| Streaming | β | β |
vs. LangChainΒΆ
| Feature | LLM Client | LangChain |
|---|---|---|
| Simplicity | β Simple | β οΈ Complex |
| Multi-provider | β | β |
| Async support | β | β |
| File upload | β | β οΈ Limited |
| Learning curve | Low | High |
π Coming SoonΒΆ
Features planned for future releases:
- [ ] Embedding support
- [ ] Batch processing
- [ ] Caching layer
- [ ] Prompt templates
- [ ] More providers (Anthropic, Cohere)
- [ ] Advanced RAG utilities
π Learn MoreΒΆ
- Getting Started - Installation and setup
- API Reference - Complete API documentation
- Examples - Real-world examples
- Troubleshooting - Common issues
π‘ Need Help?ΒΆ
- π Documentation
- π Report Issues