LiteLLM：统一大型语言模型（LLM）生态，实现无缝开发体验

June 04, 2025

实用开源项目

项目描述

LiteLLM是一个Python SDK和代理服务器（LLM网关），旨在简化与100多个大型语言模型（LLM）API的交互。它将各种LLM提供商（如Bedrock、Azure、OpenAI、VertexAI、Cohere、Anthropic、Sagemaker、HuggingFace、Replicate、Groq）统一到一种类似OpenAI的格式中。

LiteLLM致力于解决以下复杂问题： - 将输入转换为针对特定提供商的补全、嵌入和图像生成端点。 - 确保不同LLM之间输出格式的一致性。 - 利用其路由功能，在多部署（如Azure/OpenAI）之间实现重试和回退逻辑。 - 通过LiteLLM代理服务器，按项目、API密钥和模型强制执行预算和速率限制。

使用说明

要使用LiteLLM，可以通过pip安装：

pip install litellm

基本聊天补全

from litellm import completion
import os

# 设置API密钥环境变量
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# 调用OpenAI
response = completion(model="openai/gpt-4o", messages=messages)

# 调用Anthropic
response = completion(model="anthropic/claude-3-sonnet-20240229", messages=messages)
print(response)

异步调用

from litellm import acompletion
import asyncio

async def test_get_response():
    user_message = "Hello, how are you?"
    messages = [{"content": user_message, "role": "user"}]
    response = await acompletion(model="openai/gpt-4o", messages=messages)
    return response

response = asyncio.run(test_get_response())
print(response)

流式响应

from litellm import completion

response = completion(model="openai/gpt-4o", messages=messages, stream=True)
for part in response:
    print(part.choices[0].delta.content or "")

日志记录与可观测性

LiteLLM通过回调支持各种日志记录和可观测性工具（Lunary、MLflow、Langfuse、DynamoDB、S3、Helicone、Promptlayer、Traceloop、Athina、Slack）。

from litellm import completion
import os
import litellm

# 设置日志工具和API密钥的环境变量
os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = "" # 使用实际密钥
os.environ["LANGFUSE_SECRET_KEY"] = "" # 使用实际密钥
os.environ["ATHINA_API_KEY"] = "your-athina-api-key"
os.environ["OPENAI_API_KEY"] = "your-openai-key"

# 设置回调
litellm.success_callback = ["lunary", "mlflow", "langfuse", "athina", "helicone"]

response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])

LiteLLM 代理服务器

运行LiteLLM代理服务器的步骤：

安装代理依赖项：
```
pip install 'litellm[proxy]'
```

启动代理：

litellm --model huggingface/bigcode/starcoder
# INFO: Proxy running on http://0.0.0.0:4000

使用OpenAI SDK向代理发送请求：

import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000") # 将代理设为base_url
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])
print(response)

主要特性

统一API接口：使用单一的类似OpenAI的API格式连接到100多个LLM。
提供商支持：支持主要LLM提供商，包括Bedrock、Azure、OpenAI、VertexAI、Cohere、Anthropic、Sagemaker、HuggingFace、Replicate、Groq等。
一致的输出：所有文本响应都在['choices'][0]['message']['content']中一致可用。
路由（重试/回退逻辑）：自动处理多个LLM部署之间的重试和回退机制。
流式支持：支持所有集成模型的流式响应。
异步操作：提供异步API调用以提高性能。
可观测性：通过回调与各种日志记录和可观测性工具（如Lunary、MLflow、Langfuse、Helicone）集成。
LiteLLM 代理服务器（LLM网关）：
- 成本追踪：监控不同项目的开销。
- 负载均衡：将请求分发到多个LLM部署。
- 速率限制：按项目、API密钥和模型强制执行速率限制。
- 密钥管理：连接PostgreSQL数据库，创建并管理代理密钥，对模型、持续时间和元数据进行精细控制。
- Web UI：提供用户界面（/ui）用于管理代理服务器，包括设置预算和速率限制。
企业级功能：为商业用户提供增强的安全性、用户管理和专业支持，包括自定义集成和SLA协议。