OpenAI Responses API
OpenAI Responses API
Section titled “OpenAI Responses API”The agent uses OpenAI’s Responses API to access model reasoning and structured streaming events.
Configuration
Section titled “Configuration”Enable in agent_config.yaml:
openai: model: gpt-5-mini base_url: https://api.openai.com/v1 use_responses_api: true # Required reasoning_effort: low # For GPT-5 models text_verbosity: low # For GPT-5 modelsThe config system validates this on startup. Missing use_responses_api raises an error.
Requires langchain-openai>=0.3.9 (specified in pyproject.toml).
Streaming Events
Section titled “Streaming Events”The agent emits structured events during response generation:
from consumer_agent.agent import Agentfrom consumer_agent.factory import create_chat_model
model = create_chat_model()agent = Agent(model)
messages = [{"role": "user", "content": "What is 2+2?"}]system_prompt = "You are a helpful assistant."
async for event in agent.stream(messages, system_prompt): if isinstance(event, ReasoningEvent): print(f"Thinking: {event.content}") elif isinstance(event, TextEvent): print(f"Response: {event.content}") elif isinstance(event, ToolCallStartEvent): print(f"Calling tool: {event.tool_name}") elif isinstance(event, ToolCallEndEvent): print(f"Tool completed in {event.duration_ms}ms") elif isinstance(event, UsageEvent): print(f"Tokens: {event.total_tokens} ({event.reasoning_tokens} reasoning)")Event types (src/consumer_agent/agent/streaming.py):
- ReasoningEvent: Model reasoning before tool calls
- TextEvent: Final response text after tools
- ToolCallStartEvent: Tool execution begins
- ToolCallEndEvent: Tool execution completes with duration
- UsageEvent: Token usage metrics (input, output, reasoning, total)
- CompletedEvent: Stream finished
- ErrorEvent: Error occurred
Reasoning vs Response Text
Section titled “Reasoning vs Response Text”The agent separates reasoning from final responses:
- Before first tool call: Tokens are
ReasoningEvent(internal thinking) - After tool calls: Tokens are
TextEvent(final answer)
Example:
reasoning = []response = []
async for event in agent.stream(messages, system_prompt): if isinstance(event, ReasoningEvent): reasoning.append(event.content) elif isinstance(event, TextEvent): response.append(event.content)
print("Reasoning:", "".join(reasoning))print("Response:", "".join(response))Model Family Differences
Section titled “Model Family Differences”GPT-5 Series (gpt-5-mini, gpt-5, gpt-5-nano)
Section titled “GPT-5 Series (gpt-5-mini, gpt-5, gpt-5-nano)”Reasoning tokens are generated and accessible.
Configuration:
openai: model: gpt-5-mini reasoning_effort: low # minimal, low, medium, high text_verbosity: low # low, medium, highReasoning effort levels:
minimal: Fast, few/no reasoning tokenslow: Basic reasoning (default)medium: Balanced reasoning and speedhigh: Deep reasoning, more tokens
Reasoning appears in ReasoningEvent during streaming.
O1/O3 Series (o1, o1-mini, o3-mini)
Section titled “O1/O3 Series (o1, o1-mini, o3-mini)”Reasoning tokens are generated but hidden (not accessible via API).
Configuration:
openai: model: o1-mini # No reasoning_effort or text_verbosity parametersYou pay for reasoning tokens but cannot see the reasoning text.
Token Costs
Section titled “Token Costs”Reasoning tokens count as output tokens:
Example with GPT-5-mini:
- Input: 100 tokens
- Reasoning: 50 tokens
- Output: 150 tokens
- Billed output: 200 tokens (reasoning + output)
Token counts appear in response metadata:
# Available in message.response_metadatausage = response_metadata.get('usage', {})details = usage.get('completion_tokens_details', {})reasoning_tokens = details.get('reasoning_tokens', 0)Token Usage Tracking
Section titled “Token Usage Tracking”The agent emits UsageEvent at the end of each stream with token consumption metrics.
Usage Event Structure
Section titled “Usage Event Structure”from consumer_agent.agent.streaming import UsageEvent
# Event fields (OpenAI/LangChain standard names)event.input_tokens # Prompt/input tokensevent.output_tokens # Completion tokens (includes reasoning)event.total_tokens # Sum of input + outputevent.reasoning_tokens # Reasoning tokens (subset of output)event.cached_tokens # Cached input tokensTracking Token Usage
Section titled “Tracking Token Usage”async for event in agent.stream(messages, system_prompt): if isinstance(event, UsageEvent): print(f"Input: {event.input_tokens}") print(f"Output: {event.output_tokens}") print(f"Reasoning: {event.reasoning_tokens}") print(f"Total: {event.total_tokens}")Cumulative Session Tracking
Section titled “Cumulative Session Tracking”Track token usage across multiple turns:
session_tokens = { "input": 0, "output": 0, "reasoning": 0, "total": 0}
async for event in agent.stream(messages, system_prompt): if isinstance(event, UsageEvent): session_tokens["input"] += event.input_tokens session_tokens["output"] += event.output_tokens session_tokens["reasoning"] += event.reasoning_tokens session_tokens["total"] += event.total_tokens
print(f"Session total: {session_tokens['total']} tokens")print(f"Reasoning: {session_tokens['reasoning']} tokens")Cost Calculation
Section titled “Cost Calculation”Calculate estimated costs:
# GPT-5-mini pricing (example)INPUT_COST_PER_1M = 0.15 # $0.15 per 1M input tokensOUTPUT_COST_PER_1M = 0.60 # $0.60 per 1M output tokens
async for event in agent.stream(messages, system_prompt): if isinstance(event, UsageEvent): input_cost = (event.input_tokens / 1_000_000) * INPUT_COST_PER_1M output_cost = (event.output_tokens / 1_000_000) * OUTPUT_COST_PER_1M total_cost = input_cost + output_cost
print(f"Estimated cost: ${total_cost:.4f}") print(f" Input: ${input_cost:.4f}") print(f" Output: ${output_cost:.4f} (includes {event.reasoning_tokens} reasoning tokens)")Streaming Implementation
Section titled “Streaming Implementation”The agent uses dual streaming modes for low latency:
- “messages” mode: Direct LLM token streaming (lowest latency)
- “updates” mode: Agent state updates (tool tracking)
Benefits:
- 3-5x faster time-to-first-token
- Separate reasoning/response: Automatic classification
- Tool tracking: Start/end events with durations
Tool Call Tracking
Section titled “Tool Call Tracking”Track tool execution timing:
tool_timings = {}
async for event in agent.stream(messages, system_prompt): if isinstance(event, ToolCallStartEvent): print(f"Starting {event.tool_name}...") elif isinstance(event, ToolCallEndEvent): tool_timings[event.tool_name] = event.duration_ms print(f"{event.tool_name}: {event.duration_ms}ms")
print("Performance:", tool_timings)Error Handling
Section titled “Error Handling”Handle stream errors:
try: async for event in agent.stream(messages, system_prompt): if isinstance(event, ErrorEvent): print(f"Error: {event.error} (code: {event.code})") break elif isinstance(event, TextEvent): print(event.content)except Exception as e: print(f"Stream failed: {e}")Troubleshooting
Section titled “Troubleshooting”Error: use_responses_api missing in agent_config.yaml
Section titled “Error: use_responses_api missing in agent_config.yaml”Add the required parameter:
openai: use_responses_api: trueNo reasoning events appear
Section titled “No reasoning events appear”Check configuration:
- Verify
reasoning_effortis set (GPT-5 only) - Ensure
use_responses_api: truein config - Confirm model is GPT-5 series (O1/O3 hide reasoning)
High reasoning token costs
Section titled “High reasoning token costs”Reduce reasoning effort:
openai: reasoning_effort: minimalImport error: langchain-openai version too old
Section titled “Import error: langchain-openai version too old”Update dependencies:
uv syncReferences
Section titled “References”- Setup Guide - Environment configuration
- Architecture Guide - Streaming internals
- MCP Guide - Tool integration