背景
- LLM 是没有 Web API 的,需要进行一次封装
- 将 LLM 的核心接口封装成 Web API 来为用户提供服务 - 必经之路
接口封装
FastAPI
接口封装
Uvicorn + FastAPI
- Uvicorn 类似于 Tomcat,但比 Tomcat 轻量很多,作为 Web 服务器
- 允许异步处理 HTTP 请求,非常适合处理并发请求
- 基于 uvloop 和 httptools,具备非常高的性能
- FastAPI 类似于 SpringBoot,同样比 SpringBoot 轻量很多,作为 API 框架
- 结合 Uvicorn 和 FastAPI
- 可以构建一个高性能的、易于扩展的异步 Web 应用程序
- Uvicorn 作为服务器运行 FastAPI 应用,可以提供优异的并发处理能力
安装依赖
1 2
| $ pip install fastapi $ pip install uvicorn
|
代码分层
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| import uvicorn from fastapi import FastAPI
app = FastAPI()
@app.get("/") async def root(): return {"message": "Hello World"}
if __name__ == '__main__': uvicorn.run(app, host='0.0.0.0', port=8888, log_level="info", workers=1)
|
1 2 3 4
| $ curl -s 127.0.0.1:8888 | jq { "message": "Hello World" }
|
模型定义
在 Python 中使用 Pydantic 模型来定义数据结构
Pydantic - 数据验证 + 数据管理 - 类似于 Java Validation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| import uvicorn from fastapi import FastAPI from pydantic import BaseModel, Field from typing import List
app = FastAPI()
class Message(BaseModel): role: str content: str
class ChatMessage(BaseModel): history: List[Message] prompt: str max_tokens: int temperature: float top_p: float = Field(default=1.0)
@app.post("/v1/chat/completions") async def create_chat_response(message: ChatMessage): return {"message": "Hello World"}
if __name__ == '__main__': uvicorn.run(app, host='0.0.0.0', port=8888, log_level="info", workers=1)
|
- BaseModel 为了数据验证和管理而设计的
- 当继承 BaseModel 后,将自动获得数据验证、序列化和反序列化的功能
项目结构
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
| project_name/ │ ├── app/ # 主应用目录 │ ├── main.py # FastAPI 应用入口 │ └── controller/ # API 特定逻辑 │ └── chat_controller.py │ └── common/ # 通用API组件 │ └── errors.py # 错误处理和自定义异常 │ ├── services/ # 服务层目录 │ ├── chat_service.py # 聊天服务相关逻辑 │ ├── schemas/ # Pydantic 模型(请求和响应模式) │ ├── chat_schema.py # 聊天数据模式 │ ├── database/ # 数据库连接和会话管理 │ ├── session.py # 数据库会话配置 │ └── engine.py # 数据库引擎配置 │ ├── tools/ # 工具和实用程序目录 │ ├── data_migration.py # 数据迁移工具 │ ├── tests/ # 测试目录 │ ├── conftest.py # 测试配置和夹具 │ ├── test_services/ # 服务层测试 │ │ ├── test_chat_service.py │ └── test_controller/ │ ├── test_chat_controller.py │ ├── requirements.txt # 项目依赖文件 └── setup.py # 安装、打包、分发配置文件
|
路由集成
FastAPI 通过 include_router 将不同的路由集成到主应用中
1 2 3 4 5 6 7 8 9 10 11
| ├── app │ ├── controller │ │ └── chat_controller.py │ └── main.py ├── schemas │ └── chat_schema.py ├── services │ └── chat_service.py └── tests └── test_controller └── test_chat_controller.py
|
chat_schema.py
1 2 3 4 5 6 7 8 9 10 11 12 13
| from pydantic import BaseModel, Field
class Message(BaseModel): role: str content: str
class ChatMessage(BaseModel): prompt: str max_tokens: int temperature: float = Field(default=1.0) top_p: float = Field(default=1.0)
|
chat_service.py
1 2 3 4 5 6 7 8 9 10
| from schemas.chat_schema import ChatMessage
class ChatService: def post_message(self, message: ChatMessage): print(message.prompt) return {"message": "post message"}
def get_messages(self): return {"message": "get message"}
|
chat_controller.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| from fastapi import APIRouter
from schemas.chat_schema import ChatMessage from services.chat_service import ChatService
chat_router = APIRouter() chat_service = ChatService()
@chat_router.post("/new/message/") def post_message(message: ChatMessage): return chat_service.post_message(message)
@chat_router.get("/get/messages/") def get_messages(): return chat_service.get_messages()
|
main.py
1 2 3 4 5 6 7 8 9
| import uvicorn as uvicorn from fastapi import FastAPI from controller.chat_controller import chat_router as chat_router
app = FastAPI() app.include_router(chat_router, prefix="/chat", tags=["chat"])
if __name__ == '__main__': uvicorn.run(app, host='0.0.0.0', port=8888, log_level="info", workers=1)
|
test_chat_controller.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| import json import requests
data = { 'prompt': 'hello', 'max_tokens': 1000 }
url1 = 'http://localhost:8888/chat/new/message/' response = requests.post(url1, data=json.dumps(data)) print(response.text)
url2 = 'http://localhost:8888/chat/get/messages/' response = requests.get(url2) print(response.text)
|
LLM
不同的 LLM 对应的对话接口不一样
懒加载
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| from transformers import AutoTokenizer, AutoModelForCausalLM
class ModelManager: _model = None _tokenizer = None
@classmethod def get_model(cls): if cls._model is None: _model = AutoModelForCausalLM.from_pretrained("chatglm3-6b", trust_remote_code=True).half().cuda().eval() return _model
@classmethod def get_tokenizer(cls): if cls._tokenizer is None: _tokenizer = AutoTokenizer.from_pretrained("chatglm3-6b", trust_remote_code=True) return _tokenizer
|
Chat
一次性输出 LLM 返回的内容
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| from datetime import datetime
from schemas.chat_schema import ChatMessage from services.model_service import ModelManager
class ChatService: def post_message(self, message: ChatMessage): print(message.prompt) model = ModelManager.get_model() tokenizer = ModelManager.get_tokenizer() response, history = model.chat( tokenizer, message.prompt, history=message.histroy, max_length=message.max_tokens, top_p=message.top_p, temperature=message.temperature ) now = datetime.datetime.now() time = now.strftime("%Y-%m-%d %H:%M:%S") answer = { "response": response, "history": history, "status": 200, "time": time } log = "[" + time + "] " + '", prompt:"' + message.prompt + '", response:"' + repr(response) + '"' print(log) return answer
def get_messages(self): return {"message": "get message"}
|
Stream Chat
model.stream_chat()
通过 stream 变量控制是否流式输出
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| if stream: async for token in callback.aiter(): yield json.dumps( {"text": token, "message_id": message_id}, ensure_ascii=False) else: answer = "" async for token in callback.aiter(): answer += token yield json.dumps( {"text": answer, "message_id": message_id}, ensure_ascii=False) await task
|
stream=true
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| data: {"text": "你", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "好", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "👋", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "!", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "我是", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "人工智能", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "助手", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": " Chat", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "GL", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "M", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "3", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "-", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "6", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "B", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": ",", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "很高兴", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "见到", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "你", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": ",", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "欢迎", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "问我", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "任何", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "问题", "message_id": "80b2af55c5b7440eaca6b9d510677a75"} data: {"text": "。", "message_id": "80b2af55c5b7440eaca6b9d510677a75"}
|
stream=false
1
| data: {"text": "你好!我是人工智能助手,很高兴为您服务。请问有什么问题我可以帮您解答吗?", "message_id": "741a630ac3d64fd5b1832cc0bae6bb68"}
|
接口调用
- 在工程化实践中,一般会将与 AI 相关的逻辑封装在 Python 应用中
- 上层应用一般通过其它语言实现 - Java / Go
Java
Java -> Python,基于 OkHttp 实现 Server-Sent Events
JavaScript
JavaScript -> Java,基于 EventSource 实现 Server-Sent Events
1 2 3 4 5 6 7 8
| <script> let eventData = ''; const eventSource = new EventSource('http://localhost:9999/sendMessage'); eventSource.onmessage = function(event) { eventData += event.data; }; </script>
|