Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
v0.4.1 (2026-02-12)¶
v0.4.0 (2026-02-11)¶
Feat¶
- add automatic versioning and changelog updates
Fix¶
- update branch references from main to master
- enable emoji and icon support in documentation
- resolve async provider test failures and CI issues
- resolve async provider test failures and CI issues
- tests: resolve failing async provider tests
- tests: make test_detect_file_type_unsupported deterministic
[0.3.0] - 2025-01-XX¶
Added¶
Token Counting¶
- β¨ Token counting with tiktoken - Accurate token counting for all messages
- π
count_tokens()method for counting tokens in message lists - π
count_string_tokens()method for counting tokens in plain text - π Automatic fallback to estimation when tiktoken is not available
- π¦ Support for all GPT models (GPT-4o, GPT-4o-mini, GPT-3.5-turbo)
Async Support¶
- β‘ Full async/await support for chat completions
- π
achat_completion()- Async chat completion method - π
achat_completion_stream()- Async streaming support - π
achat_completion_with_tools()- Async tool calling support - π¦ Async providers:
AsyncOpenAIProvider,AsyncGroqProvider,AsyncGeminiProvider - π―
use_async=Trueparameter for creating async clients
Configuration Files¶
- π YAML/JSON configuration file support
- π§
LLMConfigclass for managing configurations - π§
from_config()class method to load client from config files - π§
generate_config_template()utility function - π§
create_default_config()helper function - β Configuration validation with detailed error messages
- π Support for multiple provider configurations in one file
- π Global settings with per-provider overrides
Ollama Cloud Support¶
- βοΈ Ollama Cloud API integration - Access to cloud-hosted Ollama models
- π Automatic cloud detection from model names ending with
-cloud - π Support for
OLLAMA_API_KEYenvironment variable - π―
use_ollama_cloud=Trueparameter for explicit cloud mode - π
ollama_hostparameter for custom Ollama endpoints - π Seamless switching between local and cloud Ollama instances
- π Example:
ollama_cloud_examples.pydemonstrating all cloud features
Changed¶
- π Enhanced documentation with examples for all new features
- π§ͺ Expanded test suite with >92% coverage
- π¦ Updated dependencies:
tiktoken,pyyaml,asyncio - π§ Improved type hints throughout the codebase
Examples¶
- π
examples/usage_examples.py- Demonstrates token counting, async, and config features - π
examples/ollama_cloud_examples.py- Comprehensive Ollama Cloud usage examples - π Updated all existing examples with new capabilities
Dependencies¶
- β Added
tiktokenfor accurate token counting - β Added
pyyamlfor YAML configuration support - β Added
asynciofor async support
Maintained¶
- β 100% backward compatibility
- β All existing functionality preserved
- β No breaking changes
[0.2.0] - 2024-12-XX¶
Added¶
Response Streaming¶
- β¨ Response streaming support for all providers (OpenAI, Groq, Gemini, Ollama)
- π
chat_completion_stream()method for real-time token streaming - π¦ Generator-based API for memory-efficient streaming
- β‘ Enables progressive response display in UIs
Retry Logic¶
- π Automatic retry with exponential backoff
- π― Up to 3 retry attempts on transient failures
- β±οΈ Exponential backoff: 4s, 8s, 10s delays
- π¦ Powered by
tenacitylibrary - π‘οΈ Transparent handling of temporary API errors
Custom Exceptions¶
- π¨ Comprehensive exception hierarchy
LLMClientError- Base exception for all package errorsAPIKeyNotFoundError- Missing API key errors with contextProviderNotAvailableError- Package installation errorsInvalidProviderError- Invalid provider name errorsChatCompletionError- API call failures with original errorStreamingNotSupportedError- Streaming not available errors- π Detailed error messages with actionable information
Architecture Improvements¶
- ποΈ Strategy Pattern implementation with provider classes
- π Factory Pattern for provider creation
- π―
BaseProviderabstract class for consistent interface - π¦ Concrete providers:
OpenAIProvider,GroqProvider,GeminiProvider,OllamaProvider - π§
ProviderFactoryfor centralized provider management
Changed¶
- π
chat_completion()now includes automatic retry logic by default - π Error messages are more descriptive with custom exceptions
- π Better type hints and documentation throughout
- π Refactored codebase for better maintainability
Dependencies¶
- β Added
tenacity>=8.2.0for retry logic
Examples¶
- π
examples/streaming_example.py- Comprehensive streaming examples - π Demonstrates retry behavior and exception handling
Tests¶
- π§ͺ Full test coverage for streaming functionality
- π§ͺ Tests for retry logic with transient failures
- π§ͺ Exception handling tests
- π§ͺ Provider switching with streaming support
Maintained¶
- β 100% backward compatibility
- β All existing functionality preserved
- β No breaking changes
[0.1.0] - 2024-11-XX¶
Added¶
- π Initial release
- π€ Support for multiple LLM providers:
- OpenAI (GPT-4, GPT-4o, GPT-3.5-turbo)
- Groq (Llama, Mixtral, Gemma models)
- Google Gemini (Gemini 2.0 Flash, Gemini 2.5 Flash)
- Ollama (local models)
- π Automatic API detection based on available API keys
- π Dynamic provider switching at runtime with
switch_provider() - βοΈ Unified interface - one method for all LLM backends
- π§ Flexible configuration - model, temperature, max_tokens customizable
- π Google Colab support - automatic secret loading from userdata
- π¦ Zero-config - works out-of-the-box with Ollama (no API keys needed)
- π§© llama-index integration via
LLMClientAdapter - π§ͺ Comprehensive test suite with >90% coverage
- π Detailed documentation with examples and tutorials
- π Jupyter notebooks for RAG applications
Project Structure¶
- Clean architecture with provider abstraction
- CI/CD pipelines (GitHub Actions)
- Automated testing (pytest)
- Code quality tools (black, ruff, mypy, bandit)
- Pre-commit hooks for code quality
Documentation¶
- Comprehensive README with examples
- Contribution guidelines (development/contributing.md)
- Test documentation (tests/README.md)
- Notebook tutorials for RAG applications
Release History¶
- 0.3.0 - Token counting, async support, configuration files, Ollama Cloud
- 0.2.0 - Streaming, retry logic, custom exceptions
- 0.1.0 - Initial release with multi-provider support
Upgrading¶
From 0.2.0 to 0.3.0¶
No breaking changes. New features are opt-in:
# Token counting (optional)
token_count = client.count_tokens(messages)
# Async support (optional)
async_client = LLMClient(use_async=True)
response = await async_client.achat_completion(messages)
# Config files (optional)
client = LLMClient.from_config("config.yaml")
# Ollama Cloud (optional)
cloud_client = LLMClient(llm="gpt-oss:120b-cloud", use_ollama_cloud=True)
From 0.1.0 to 0.2.0¶
No breaking changes. New features work automatically:
- Retry logic is enabled by default
- Streaming available via
chat_completion_stream() - Custom exceptions provide better error messages
- All existing code continues to work unchanged
Contributing¶
See development/contributing.md for guidelines on contributing to this project.
License¶
This project is licensed under the MIT License - see LICENSE for details.