Zum Inhalt

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.


v0.4.1 (2026-02-12)

v0.4.0 (2026-02-11)

Feat

  • add automatic versioning and changelog updates

Fix

  • update branch references from main to master
  • enable emoji and icon support in documentation
  • resolve async provider test failures and CI issues
  • resolve async provider test failures and CI issues
  • tests: resolve failing async provider tests
  • tests: make test_detect_file_type_unsupported deterministic

[0.3.0] - 2025-01-XX

Added

Token Counting

  • ✨ Token counting with tiktoken - Accurate token counting for all messages
  • πŸ“Š count_tokens() method for counting tokens in message lists
  • πŸ“Š count_string_tokens() method for counting tokens in plain text
  • πŸ”„ Automatic fallback to estimation when tiktoken is not available
  • πŸ“¦ Support for all GPT models (GPT-4o, GPT-4o-mini, GPT-3.5-turbo)

Async Support

  • ⚑ Full async/await support for chat completions
  • πŸ”„ achat_completion() - Async chat completion method
  • πŸ”„ achat_completion_stream() - Async streaming support
  • πŸ”„ achat_completion_with_tools() - Async tool calling support
  • πŸ“¦ Async providers: AsyncOpenAIProvider, AsyncGroqProvider, AsyncGeminiProvider
  • 🎯 use_async=True parameter for creating async clients

Configuration Files

  • πŸ“ YAML/JSON configuration file support
  • πŸ”§ LLMConfig class for managing configurations
  • πŸ”§ from_config() class method to load client from config files
  • πŸ”§ generate_config_template() utility function
  • πŸ”§ create_default_config() helper function
  • βœ… Configuration validation with detailed error messages
  • πŸ”„ Support for multiple provider configurations in one file
  • 🌍 Global settings with per-provider overrides

Ollama Cloud Support

  • ☁️ Ollama Cloud API integration - Access to cloud-hosted Ollama models
  • πŸ”„ Automatic cloud detection from model names ending with -cloud
  • πŸ”‘ Support for OLLAMA_API_KEY environment variable
  • 🎯 use_ollama_cloud=True parameter for explicit cloud mode
  • 🌐 ollama_host parameter for custom Ollama endpoints
  • πŸ”€ Seamless switching between local and cloud Ollama instances
  • πŸ“ Example: ollama_cloud_examples.py demonstrating all cloud features

Changed

  • πŸ“š Enhanced documentation with examples for all new features
  • πŸ§ͺ Expanded test suite with >92% coverage
  • πŸ“¦ Updated dependencies: tiktoken, pyyaml, asyncio
  • πŸ”§ Improved type hints throughout the codebase

Examples

  • πŸ“– examples/usage_examples.py - Demonstrates token counting, async, and config features
  • πŸ“– examples/ollama_cloud_examples.py - Comprehensive Ollama Cloud usage examples
  • πŸ“– Updated all existing examples with new capabilities

Dependencies

  • βž• Added tiktoken for accurate token counting
  • βž• Added pyyaml for YAML configuration support
  • βž• Added asyncio for async support

Maintained

  • βœ… 100% backward compatibility
  • βœ… All existing functionality preserved
  • βœ… No breaking changes

[0.2.0] - 2024-12-XX

Added

Response Streaming

  • ✨ Response streaming support for all providers (OpenAI, Groq, Gemini, Ollama)
  • πŸ”„ chat_completion_stream() method for real-time token streaming
  • πŸ“¦ Generator-based API for memory-efficient streaming
  • ⚑ Enables progressive response display in UIs

Retry Logic

  • πŸ”„ Automatic retry with exponential backoff
  • 🎯 Up to 3 retry attempts on transient failures
  • ⏱️ Exponential backoff: 4s, 8s, 10s delays
  • πŸ“¦ Powered by tenacity library
  • πŸ›‘οΈ Transparent handling of temporary API errors

Custom Exceptions

  • 🚨 Comprehensive exception hierarchy
  • LLMClientError - Base exception for all package errors
  • APIKeyNotFoundError - Missing API key errors with context
  • ProviderNotAvailableError - Package installation errors
  • InvalidProviderError - Invalid provider name errors
  • ChatCompletionError - API call failures with original error
  • StreamingNotSupportedError - Streaming not available errors
  • πŸ“‹ Detailed error messages with actionable information

Architecture Improvements

  • πŸ—οΈ Strategy Pattern implementation with provider classes
  • 🏭 Factory Pattern for provider creation
  • 🎯 BaseProvider abstract class for consistent interface
  • πŸ“¦ Concrete providers: OpenAIProvider, GroqProvider, GeminiProvider, OllamaProvider
  • πŸ”§ ProviderFactory for centralized provider management

Changed

  • πŸ”„ chat_completion() now includes automatic retry logic by default
  • πŸ”„ Error messages are more descriptive with custom exceptions
  • πŸ”„ Better type hints and documentation throughout
  • πŸ“š Refactored codebase for better maintainability

Dependencies

  • βž• Added tenacity>=8.2.0 for retry logic

Examples

  • πŸ“– examples/streaming_example.py - Comprehensive streaming examples
  • πŸ“– Demonstrates retry behavior and exception handling

Tests

  • πŸ§ͺ Full test coverage for streaming functionality
  • πŸ§ͺ Tests for retry logic with transient failures
  • πŸ§ͺ Exception handling tests
  • πŸ§ͺ Provider switching with streaming support

Maintained

  • βœ… 100% backward compatibility
  • βœ… All existing functionality preserved
  • βœ… No breaking changes

[0.1.0] - 2024-11-XX

Added

  • πŸŽ‰ Initial release
  • πŸ€– Support for multiple LLM providers:
  • OpenAI (GPT-4, GPT-4o, GPT-3.5-turbo)
  • Groq (Llama, Mixtral, Gemma models)
  • Google Gemini (Gemini 2.0 Flash, Gemini 2.5 Flash)
  • Ollama (local models)
  • πŸ” Automatic API detection based on available API keys
  • πŸ”„ Dynamic provider switching at runtime with switch_provider()
  • βš™οΈ Unified interface - one method for all LLM backends
  • πŸ”§ Flexible configuration - model, temperature, max_tokens customizable
  • πŸ” Google Colab support - automatic secret loading from userdata
  • πŸ“¦ Zero-config - works out-of-the-box with Ollama (no API keys needed)
  • 🧩 llama-index integration via LLMClientAdapter
  • πŸ§ͺ Comprehensive test suite with >90% coverage
  • πŸ“š Detailed documentation with examples and tutorials
  • πŸ““ Jupyter notebooks for RAG applications

Project Structure

  • Clean architecture with provider abstraction
  • CI/CD pipelines (GitHub Actions)
  • Automated testing (pytest)
  • Code quality tools (black, ruff, mypy, bandit)
  • Pre-commit hooks for code quality

Documentation

  • Comprehensive README with examples
  • Contribution guidelines (development/contributing.md)
  • Test documentation (tests/README.md)
  • Notebook tutorials for RAG applications

Release History

  • 0.3.0 - Token counting, async support, configuration files, Ollama Cloud
  • 0.2.0 - Streaming, retry logic, custom exceptions
  • 0.1.0 - Initial release with multi-provider support

Upgrading

From 0.2.0 to 0.3.0

No breaking changes. New features are opt-in:

# Token counting (optional)
token_count = client.count_tokens(messages)

# Async support (optional)
async_client = LLMClient(use_async=True)
response = await async_client.achat_completion(messages)

# Config files (optional)
client = LLMClient.from_config("config.yaml")

# Ollama Cloud (optional)
cloud_client = LLMClient(llm="gpt-oss:120b-cloud", use_ollama_cloud=True)

From 0.1.0 to 0.2.0

No breaking changes. New features work automatically:

  • Retry logic is enabled by default
  • Streaming available via chat_completion_stream()
  • Custom exceptions provide better error messages
  • All existing code continues to work unchanged

Contributing

See development/contributing.md for guidelines on contributing to this project.

License

This project is licensed under the MIT License - see LICENSE for details.