Files
agcore/specs/llm-call-lifecycle.md

279 lines
7.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# LLM 调用周期控制 — 实施方案
> 参考实现: [HKUDS/OpenHarness](https://github.com/HKUDS/OpenHarness)
## 目标
实现大模型基础调用周期控制,作为 agcore 的核心底层件。
## 范围
- 仅支持 OpenAI-compatible API (`POST /v1/chat/completions`)
- 仅非流式调用(后续可扩展流式)
- 支持传入 tool definitions 和解析 tool_use response,但**不含 tool 自动执行循环**
- 单次请求-响应周期控制
## 领域模块结构
所有 LLM 调用周期相关代码归入 `llm` 领域目录,未来其他功能(工具、记忆、提示词等)以同样方式组织。
```
src/
lib.rs # crate 根
llm.rs # mod llm — 领域根(声明 + 重导出)
llm/
types.rs # llm::types — Message, ContentBlock, ChatRequest/Response, ToolDefinition
error.rs # llm::error — LlmError
provider.rs # llm::provider — LlmProvider trait(仅接口)
provider/
openai.rs # llm::provider::openai — OpenaiProvider 实现
cycle.rs # llm::cycle — 生命周期引擎(子模块根)
cycle/
retry.rs # llm::cycle::retry — 重试策略
usage.rs # llm::cycle::usage — Token 用量
# 未来领域示例(占位):
# tools.rs + tools/ # 工具调用、MCP
# memory.rs + memory/ # 记忆系统
# prompt.rs + prompt/ # 提示词工程
# agent.rs + agent/ # Agent 运行时
```
`llm.rs` 根模块声明:
```rust
// llm.rs
pub mod types;
pub mod error;
pub mod provider;
pub mod cycle;
```
## 模块设计
### 1. llm/types.rs — 核心数据类型
```rust
pub enum Role { User, Assistant, System, Tool }
pub enum ContentBlock {
Text { text: String },
ImageUrl { url: String }, // 多模态支持
ToolUse { id: String, name: String, input: Value }, // 预留,暂不实现 tool 自动执行循环
ToolResult { tool_use_id: String, content: String }, // 预留,暂不实现 tool 自动执行循环
}
pub struct Message {
pub role: Role,
pub content: Vec<ContentBlock>,
}
pub struct ToolDefinition {
pub name: String,
pub description: String,
pub input_schema: Value,
}
pub struct ChatRequest {
pub model: String,
pub messages: Vec<Message>,
pub system_prompt: Option<String>,
pub tools: Vec<ToolDefinition>,
pub max_tokens: Option<u32>,
pub temperature: Option<f32>,
pub extra_body: Option<Value>, // 用于 enable_thinking 等扩展参数(如阿里云 DashScope)
}
pub struct ChatResponse {
pub message: Message,
pub usage: Usage,
pub stop_reason: Option<StopReason>,
}
pub enum StopReason {
Stop,
ToolUse, // 预留,暂不实现 tool 自动执行循环
MaxTokens, // 达到 max_tokens 限制
ContentFilter,
Length, // 同 MaxTokens,兼容某些 API 的 finish_reason
Other(String),
}
```
> **注意**`ToolUse` / `ToolResult` / `ToolUse` variant of `StopReason` 为预留类型,暂不实现 tool 自动执行循环。
### 2. llm/error.rs — 错误体系
```rust
#[derive(thiserror::Error)]
pub enum LlmError {
#[error("认证失败: {0}")]
Authentication(String),
#[error("限流{retry_after:?}")]
RateLimit { retry_after: Option<Duration> },
#[error("请求失败({status}): {body}")]
Request { status: u16, body: String },
#[error("请求超时({duration:?})")]
Timeout { duration: Duration },
#[error("流式响应错误: {0}")]
Stream(String),
#[error("上下文超限(actual:{actual}, limit:{limit})")]
ContextLength { actual: u32, limit: u32 },
#[error("LLM 调用失败: {0}")]
Other(String),
}
```
**可重试错误**`RateLimit``Timeout`、状态码 `5xx`
**不可重试**`Authentication`、状态码 `4xx`(除 429)、`ContextLength`
### 3. llm/provider.rs — Provider 接口
trait 单独存放,具体实现在 `provider/` 子模块。
```rust
// llm/provider.rs
pub mod openai;
#[async_trait]
pub trait LlmProvider: Send + Sync {
async fn chat(&self, request: ChatRequest) -> Result<ChatResponse, LlmError>;
}
```
#### 3.1 llm/provider/openai.rs — OpenAI 兼容实现
```rust
use super::LlmProvider;
pub struct OpenaiProvider {
http_client: reqwest::Client,
base_url: String,
api_key: String,
model: String,
}
impl LlmProvider for OpenaiProvider {
async fn chat(&self, request: ChatRequest) -> Result<ChatResponse, LlmError> {
// POST {base_url}/chat/completions
// extra_body 会被合并到请求体中(如 enable_thinking
// 解析 response → ChatResponse
todo!()
}
}
```
> **注意**`extra_body` 中的字段需与目标 API 兼容。部分 API(如阿里云 DashScope)通过 `extra_body` 传递扩展参数(如 `enable_thinking`)。
后续新增实现: `provider/anthropic.rs``provider/azure.rs` 等。
### 4. llm/cycle.rs — 生命周期引擎
```rust
mod retry;
mod usage;
pub use retry::RetryConfig;
pub use usage::{CostTracker, Usage};
pub struct CycleConfig {
pub model: String,
pub max_tokens: Option<u32>,
pub temperature: Option<f32>,
pub max_turns: Option<u32>,
pub retry: RetryConfig,
}
pub struct LlmCycle {
provider: Box<dyn LlmProvider>,
config: CycleConfig,
usage: CostTracker,
messages: Vec<Message>,
system_prompt: Option<String>,
}
```
`submit()` 完整流程:
```
submit(prompt, tools)
├─ ① push Message(user, [Text(prompt)])
├─ ② 构建 ChatRequest { messages, system, tools, max_tokens, temperature }
├─ ③ [重试循环] provider.chat(request)
│ ├─ Ok → 解析 ChatResponse
│ └─ Err(可重试) → compute_delay → sleep → retry
├─ ④ push Message(assistant, [Text(...) | ToolUse(...)])
├─ ⑤ usage.add(response.usage)
└─ ⑥ return ChatResponse
```
#### 4.1 llm/cycle/retry.rs — 重试策略
```rust
pub struct RetryConfig {
pub max_retries: u32, // 默认 3
pub base_delay: Duration, // 默认 1s
pub max_delay: Duration, // 默认 30s
pub jitter_factor: f64, // 默认 0.25
}
```
指数退避 + jitter: `delay = min(base * 2^attempt, max_delay) + random(0, delay * jitter_factor)`
**可重试错误**: `RateLimit``Timeout`、状态码 `5xx`
**不可重试**: `Authentication`、状态码 `4xx`(除 429)、`ContextLength`
`should_retry(err: &LlmError) -> bool` 判断逻辑:
- `RateLimit` → true
- `Timeout` → true
- `Request { status, .. }` → status >= 500 || status == 429
- 其他 → false
#### 4.2 llm/cycle/usage.rs — Token 用量
```rust
#[derive(Default)]
pub struct Usage { pub input_tokens: u32, pub output_tokens: u32 }
pub struct CostTracker { accumulated: Usage }
impl CostTracker {
pub fn add(&mut self, usage: &Usage);
pub fn total(&self) -> &Usage;
pub fn reset(&mut self);
}
```
## 依赖
```toml
[dependencies]
tokio = { version = "1", features = ["full"] }
reqwest = { version = "0.12", features = ["json"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
thiserror = "2"
async-trait = "0.1"
tracing = "0.1"
```
## 测试
- Unit: types 序列化、retry 退避计算、usage 累计
- Mock: HTTP mock server 测试 provider 请求/响应/错误处理
- Integration (可选): Ollama 本地真实调用验证
## 后续扩展
- 流式接口 (`Stream<CycleEvent>`)
- Tool 自动执行循环 (参考 OpenHarness `run_query()`)
- 多 Provider 注册发现 (参考 OpenHarness `ProviderRegistry`)
- 上下文压缩 (auto-compaction)
- 生命周期钩子 (pre/post tool use hooks)