# LLM 调用周期控制 — 实施方案 > 参考实现: [HKUDS/OpenHarness](https://github.com/HKUDS/OpenHarness) ## 目标 实现大模型基础调用周期控制,作为 agcore 的核心底层件。 ## 范围 - 仅支持 OpenAI-compatible API (`POST /v1/chat/completions`) - 仅非流式调用(后续可扩展流式) - 支持传入 tool definitions 和解析 tool_use response,但**不含 tool 自动执行循环** - 单次请求-响应周期控制 ## 领域模块结构 所有 LLM 调用周期相关代码归入 `llm` 领域目录,未来其他功能(工具、记忆、提示词等)以同样方式组织。 ``` src/ lib.rs # crate 根 llm.rs # mod llm — 领域根(声明 + 重导出) llm/ types.rs # llm::types — Message, ContentBlock, ChatRequest/Response, ToolDefinition error.rs # llm::error — LlmError provider.rs # llm::provider — LlmProvider trait(仅接口) provider/ openai.rs # llm::provider::openai — OpenaiProvider 实现 cycle.rs # llm::cycle — 生命周期引擎(子模块根) cycle/ retry.rs # llm::cycle::retry — 重试策略 usage.rs # llm::cycle::usage — Token 用量 # 未来领域示例(占位): # tools.rs + tools/ # 工具调用、MCP # memory.rs + memory/ # 记忆系统 # prompt.rs + prompt/ # 提示词工程 # agent.rs + agent/ # Agent 运行时 ``` `llm.rs` 根模块声明: ```rust // llm.rs pub mod types; pub mod error; pub mod provider; pub mod cycle; ``` ## 模块设计 ### 1. llm/types.rs — 核心数据类型 ```rust pub enum Role { User, Assistant, System, Tool } pub enum ContentBlock { Text { text: String }, ImageUrl { url: String }, // 多模态支持 ToolUse { id: String, name: String, input: Value }, // 预留,暂不实现 tool 自动执行循环 ToolResult { tool_use_id: String, content: String }, // 预留,暂不实现 tool 自动执行循环 } pub struct Message { pub role: Role, pub content: Vec, } pub struct ToolDefinition { pub name: String, pub description: String, pub input_schema: Value, } pub struct ChatRequest { pub model: String, pub messages: Vec, pub system_prompt: Option, pub tools: Vec, pub max_tokens: Option, pub temperature: Option, pub extra_body: Option, // 用于 enable_thinking 等扩展参数(如阿里云 DashScope) } pub struct ChatResponse { pub message: Message, pub usage: Usage, pub stop_reason: Option, } pub enum StopReason { Stop, ToolUse, // 预留,暂不实现 tool 自动执行循环 MaxTokens, // 达到 max_tokens 限制 ContentFilter, Length, // 同 MaxTokens,兼容某些 API 的 finish_reason Other(String), } ``` > **注意**:`ToolUse` / `ToolResult` / `ToolUse` variant of `StopReason` 为预留类型,暂不实现 tool 自动执行循环。 ### 2. llm/error.rs — 错误体系 ```rust #[derive(thiserror::Error)] pub enum LlmError { #[error("认证失败: {0}")] Authentication(String), #[error("限流{retry_after:?}")] RateLimit { retry_after: Option }, #[error("请求失败({status}): {body}")] Request { status: u16, body: String }, #[error("请求超时({duration:?})")] Timeout { duration: Duration }, #[error("流式响应错误: {0}")] Stream(String), #[error("上下文超限(actual:{actual}, limit:{limit})")] ContextLength { actual: u32, limit: u32 }, #[error("LLM 调用失败: {0}")] Other(String), } ``` **可重试错误**:`RateLimit`、`Timeout`、状态码 `5xx`。 **不可重试**:`Authentication`、状态码 `4xx`(除 429)、`ContextLength`。 ### 3. llm/provider.rs — Provider 接口 trait 单独存放,具体实现在 `provider/` 子模块。 ```rust // llm/provider.rs pub mod openai; #[async_trait] pub trait LlmProvider: Send + Sync { async fn chat(&self, request: ChatRequest) -> Result; } ``` #### 3.1 llm/provider/openai.rs — OpenAI 兼容实现 ```rust use super::LlmProvider; pub struct OpenaiProvider { http_client: reqwest::Client, base_url: String, api_key: String, model: String, } impl LlmProvider for OpenaiProvider { async fn chat(&self, request: ChatRequest) -> Result { // POST {base_url}/chat/completions // extra_body 会被合并到请求体中(如 enable_thinking) // 解析 response → ChatResponse todo!() } } ``` > **注意**:`extra_body` 中的字段需与目标 API 兼容。部分 API(如阿里云 DashScope)通过 `extra_body` 传递扩展参数(如 `enable_thinking`)。 后续新增实现: `provider/anthropic.rs`、`provider/azure.rs` 等。 ### 4. llm/cycle.rs — 生命周期引擎 ```rust mod retry; mod usage; pub use retry::RetryConfig; pub use usage::{CostTracker, Usage}; pub struct CycleConfig { pub model: String, pub max_tokens: Option, pub temperature: Option, pub max_turns: Option, pub retry: RetryConfig, } pub struct LlmCycle { provider: Box, config: CycleConfig, usage: CostTracker, messages: Vec, system_prompt: Option, } ``` `submit()` 完整流程: ``` submit(prompt, tools) │ ├─ ① push Message(user, [Text(prompt)]) ├─ ② 构建 ChatRequest { messages, system, tools, max_tokens, temperature } ├─ ③ [重试循环] provider.chat(request) │ ├─ Ok → 解析 ChatResponse │ └─ Err(可重试) → compute_delay → sleep → retry ├─ ④ push Message(assistant, [Text(...) | ToolUse(...)]) ├─ ⑤ usage.add(response.usage) └─ ⑥ return ChatResponse ``` #### 4.1 llm/cycle/retry.rs — 重试策略 ```rust pub struct RetryConfig { pub max_retries: u32, // 默认 3 pub base_delay: Duration, // 默认 1s pub max_delay: Duration, // 默认 30s pub jitter_factor: f64, // 默认 0.25 } ``` 指数退避 + jitter: `delay = min(base * 2^attempt, max_delay) + random(0, delay * jitter_factor)` **可重试错误**: `RateLimit`、`Timeout`、状态码 `5xx` **不可重试**: `Authentication`、状态码 `4xx`(除 429)、`ContextLength` `should_retry(err: &LlmError) -> bool` 判断逻辑: - `RateLimit` → true - `Timeout` → true - `Request { status, .. }` → status >= 500 || status == 429 - 其他 → false #### 4.2 llm/cycle/usage.rs — Token 用量 ```rust #[derive(Default)] pub struct Usage { pub input_tokens: u32, pub output_tokens: u32 } pub struct CostTracker { accumulated: Usage } impl CostTracker { pub fn add(&mut self, usage: &Usage); pub fn total(&self) -> &Usage; pub fn reset(&mut self); } ``` ## 依赖 ```toml [dependencies] tokio = { version = "1", features = ["full"] } reqwest = { version = "0.12", features = ["json"] } serde = { version = "1", features = ["derive"] } serde_json = "1" thiserror = "2" async-trait = "0.1" tracing = "0.1" ``` ## 测试 - Unit: types 序列化、retry 退避计算、usage 累计 - Mock: HTTP mock server 测试 provider 请求/响应/错误处理 - Integration (可选): Ollama 本地真实调用验证 ## 后续扩展 - 流式接口 (`Stream`) - Tool 自动执行循环 (参考 OpenHarness `run_query()`) - 多 Provider 注册发现 (参考 OpenHarness `ProviderRegistry`) - 上下文压缩 (auto-compaction) - 生命周期钩子 (pre/post tool use hooks)