LiteLLM: Unifying the LLM Galaxy for Seamless Development
Project Description
LiteLLM is a Python SDK and Proxy Server (LLM Gateway) designed to simplify interactions with over 100 Large Language Model (LLM) APIs. It unifies various LLM providers (e.g., Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq) under a consistent OpenAI-like format.
LiteLLM aims to manage complexities such as: - Translating inputs to provider-specific completion, embedding, and image generation endpoints. -Ensuring consistent output formats across different LLMs. - Implementing retry and fallback logic across multiple deployments (e.g., Azure/OpenAI) using its Router feature. - Enabling budget and rate limit enforcement per project, API key, and model via the LiteLLM Proxy Server.
Usage Instructions
To use LiteLLM, you can install it via pip:
pip install litellm
Basic Chat Completion
from litellm import completion
import os
# Set environment variables for API keys
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# OpenAI call
response = completion(model="openai/gpt-4o", messages=messages)
# Anthropic call
response = completion(model="anthropic/claude-3-sonnet-20240229", messages=messages)
print(response)
Asynchronous Calls
from litellm import acompletion
import asyncio
async def test_get_response():
user_message = "Hello, how are you?"
messages = [{"content": user_message, "role": "user"}]
response = await acompletion(model="openai/gpt-4o", messages=messages)
return response
response = asyncio.run(test_get_response())
print(response)
Streaming Responses
from litellm import completion
response = completion(model="openai/gpt-4o", messages=messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")
Logging and Observability
LiteLLM supports various logging and observability tools (Lunary, MLflow, Langfuse, DynamoDB, S3, Helicone, Promptlayer, Traceloop, Athina, Slack) via callbacks.
from litellm import completion
import os
import litellm
# Set environment variables for logging tools and API keys
os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = "" # Use actual keys
os.environ["LANGFUSE_SECRET_KEY"] = "" # Use actual keys
os.environ["ATHINA_API_KEY"] = "your-athina-api-key"
os.environ["OPENAI_API_KEY"] = "your-openai-key"
# Set callbacks
litellm.success_callback = ["lunary", "mlflow", "langfuse", "athina", "helicone"]
response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hi π - i'm openai"}])
LiteLLM Proxy Server
To run the LiteLLM Proxy Server:
- Install with proxy dependencies:
pip install 'litellm[proxy]'
- Start the proxy:
litellm --model huggingface/bigcode/starcoder # INFO: Proxy running on http://0.0.0.0:4000
- Make requests to the proxy using an OpenAI SDK:
import openai # openai v1.0.0+ client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000") # set proxy to base_url response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [ { "role": "user", "content": "this is a test request, write a short poem" } ]) print(response)
Key Features
- Unified API Interface: Connect to 100+ LLMs using a single OpenAI-like API format.
- Provider Support: Supports major LLM providers including Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq, and many more.
- Consistent Output: All text responses are consistently available at
['choices'][0]['message']['content']
. - Router (Retry/Fallback Logic): Automatically handles retry and fallback mechanisms across multiple LLM deployments.
- Streaming Support: Supports streaming responses for all integrated models.
- Asynchronous Operations: Provides asynchronous API calls for improved performance.
- Observability: Integrates with various logging and observability tools (e.g., Lunary, MLflow, Langfuse, Helicone) via callbacks.
- LiteLLM Proxy Server (LLM Gateway):
- Cost Tracking: Monitor spend across different projects.
- Load Balancing: Distribute requests across multiple LLM deployments.
- Rate Limiting: Enforce rate limits per project, API key, and model.
- Key Management: Connects with a PostgreSQL database to create and manage proxy keys with granular control over models, durations, and metadata.
- Web UI: Offers a user interface (
/ui
) for managing the proxy server, including setting budgets and rate limits.
- Enterprise Features: Offers enhanced security, user management, and professional support for commercial users, including custom integrations and SLAs.
Target Audience
- Developers and Engineers: Seeking a unified interface to interact with various LLM providers, simplifying code and management.
- AI/ML Teams: Looking for solutions to manage LLM access, monitor usage, control costs, and implement robust retry/fallback strategies.
- Organizations building LLM-powered applications: Requiring features like rate limiting, budget management, and observability for their AI infrastructure.
- Researchers: Who need to experiment with multiple LLM models and providers efficiently.
Project Links
- GitHub Repository: https://github.com/BerriAI/litellm
- Documentation: https://docs.litellm.ai/docs/
- PyPI Package: https://pypi.org/project/litellm/
Application Scenarios
- Building Multi-LLM Applications: Easily switch between or combine different LLMs (e.g., GPT-4 for creative writing, Claude for summarization) without changing core application logic.
- Cost Optimization and Budgeting: Implement hard quotas and soft budgets per user, project, or API key using the proxy, preventing unexpected spending.
- Ensuring High Availability and Reliability: Utilize the Router for automatic fallback to an alternative LLM provider or deployment if one fails or hits rate limits.
- A/B Testing LLM Models: Seamlessly route traffic to different models to compare performance and cost in production.
- Centralized LLM Gateway: Establish a single entry point for all LLM calls within an organization, simplifying security, logging, and access control.
- Monitoring and Observability: Gain insights into LLM usage, performance, and costs by integrating with various observability tools.
- Developing LLM Agents/Orchestration: Provide a robust and flexible backend for agents that need to interact with diverse LLM capabilities (chat, embeddings, image generation).