Sub-Agents-as-Tools Architecture Design
Sub-Agents-as-Tools Architecture Design
Section titled “Sub-Agents-as-Tools Architecture Design”Last Updated: 2026-04-20
Section titled “Last Updated: 2026-04-20”Related
Section titled “Related”- Jira: PLT-616 — Spike: Evaluate sub-agents-as-tools architecture
- Branch:
feature/sub-agents-as-tools - Prior work: PLT-616 spike notebook (PR #246), intent-classification-research.md
- OpenAI guidance: Section 5b — “one orchestrator agent calling agent as tools > multi agent handoff for simplicity”
- Depends on: PR #245 (gpt-5.4-nano swap) for model validation, but built independently on
main
Context and Motivation
Section titled “Context and Motivation”Building on PR #246’s Findings
Section titled “Building on PR #246’s Findings”PR #246’s multi-agent orchestration notebook measured four patterns: monolith, supervisor (langgraph_supervisor.create_supervisor), swarm, and A2A. Its conclusion — keep the monolithic shopping subgraph + static-routing gateway — is well-supported for the patterns it evaluated.
The notebook focused on multi-agent handoff patterns (supervisor with full delegation, swarm with peer handoffs). OpenAI’s Fetch guidance document (Section 5b) recommends a different pattern:
“one orchestrator agent calling agent as tools > multi agent handoff for simplicity”
This agents-as-tools pattern (one orchestrator calling sub-agents as function calls) wasn’t covered in PR #246. This spike builds on those findings by evaluating the pattern OpenAI specifically recommended.
Phase 1 → Phase 3 History
Section titled “Phase 1 → Phase 3 History”The intent-classification-research.md documents the evolution:
-
Phase 1 (tool-based): Scout as a
BaseTool— single conversational agent callsscout_answerlike any other tool. Parallel tool calls for mixed intent. Zero added TTFB. Worked well (validated in Appendix B of that doc), butBaseTool._arun()returns a singlestr— no streaming. Combined with Forethought’s ~12s latency, this meant 12 seconds of dead silence for support queries. -
Phase 3 (gateway graph, current): Custom LangGraph
StateGraphwith intent classifier (gpt-4.1-mini structured output) routing tosupport_handler(Forethought streaming viaStreamWriter) or shopping agent subgraph. Solves the streaming problem but uses static conditional routing — closer to the handoff pattern OpenAI says not to use.
Combine the best of both: the agents-as-tools simplicity OpenAI recommends with the streaming capability Phase 3 provides. Upgrade the gateway classifier into an orchestrator that calls sub-agents as tool functions, with streaming through tool execution via LangGraph’s StreamWriter.
Architecture
Section titled “Architecture”Current (Static Routing)
Section titled “Current (Static Routing)”START → classifier (gpt-4.1-mini, structured output) → [static conditional edges] → support_handler → END → rewrite_query → shopping_agent → END → support_handler → rewrite_query → shopping_agent → ENDNew (Orchestrator with Tool Calling)
Section titled “New (Orchestrator with Tool Calling)”START → orchestrator_node (gpt-5.4-nano, tool calling) → LLM emits preamble text (streams to user immediately) → LLM returns tool call(s) → post-processing guards validate tool calls → execute tool(s): ask_support() — calls Forethought, streams via StreamWriter ask_shopping() — runs shopping agent, streams via StreamWriter → ENDWhat Changes
Section titled “What Changes”gateway_node(classifier + structured output) →orchestrator_node(tool calling)- Static conditional edges (
_route_by_intent,_after_support) → removed; LLM tool selection is the routing _rewrite_shopping_query→ absorbed into_execute_shopping()tool functionsupport_handler_node→ core logic wrapped as_execute_support()callable inside orchestrator
What Stays the Same
Section titled “What Stays the Same”- Forethought client, conversation_id continuity, chunk streaming — all internal to
_execute_support() - Shopping agent subgraph (
create_agentwith MCP tools) — invoked inside_execute_shopping() - Verbatim query protection — prompt-level + code-level override in tool functions
- SSE event types, HistoryMiddleware — unchanged
api/main.py— callscreate_gateway_graph(), streams result
Orchestrator Node Design
Section titled “Orchestrator Node Design”Execution Flow
Section titled “Execution Flow”The orchestrator node is a single async function — a plain LangGraph node with writer: StreamWriter, not a ReAct agent.
async def orchestrator_node(state: dict, writer: StreamWriter, *, ...): # 1. Build messages with system prompt # 2. Call gpt-5.4-nano with tool schemas # 3. Stream preamble text tokens → writer(TextEvent) # 4. Intercept tool calls, apply post-processing guards # 5. Execute validated tool calls (FIFO queue for mixed intent) # 6. Return final state (AIMessages for history)Tool Schemas
Section titled “Tool Schemas”ask_support = { "name": "ask_support", "description": "Answer customer support questions about Fetch.", "parameters": { "query": "The user's exact support question, verbatim.", "support_category": "Topic category (missing_points, ereceipts, " "rejected_receipt, rewards, fetch_shop, fetch_play, " "fetch_card, point_pass, referrals, profile_help, " "goodrx, social). Optional.", "support_summary": "One-sentence summary of the support issue using " "full conversation context. Under 300 chars.", "prior_context": "Brief summary of relevant prior conversation context " "when the user switches intent (e.g., from shopping to " "support). Optional." }}
ask_shopping = { "name": "ask_shopping", "description": "Handle shopping queries — product search, deals, " "recommendations, price comparisons, purchase history.", "parameters": { "query": "The user's exact shopping question, verbatim.", "prior_context": "Brief summary of relevant prior conversation context " "when the user switches intent (e.g., from support to " "shopping). Optional." }}Orchestrator System Prompt
Section titled “Orchestrator System Prompt”XML format (matching PR #245’s style for gpt-5.4). Carries over all classification intelligence from the current classifier prompt:
- Support keywords list → guides tool selection
- Intent categories → mapped to tool descriptions
- Image classification rules (receipt → support, product → shopping)
- Greeting/check-in rules (→ Fastpath direct response, or
ask_shoppingif beyond simple greetings) - No-carryforward rule (only classify latest message)
- No-reformulation rule (pass user’s exact words)
- Support category taxonomy
- Preamble instruction: “Before calling tools, emit a brief sentence acknowledging what you’re about to do”
- gpt-5.4-nano with
reasoning_effort: none+ intent_count failsafe
Model Selection Decision
Section titled “Model Selection Decision”Benchmarking compared gpt-5.4-nano and gpt-4.1-mini for orchestrator tool calling:
| Model | Mixed intent (30 queries, 3 batches) | Avg latency | Notes |
|---|---|---|---|
gpt-5.4-nano (none) | 30/30 (100%) | ~1,450ms | Rare stochastic failures observed in early testing (~1 in 15) |
gpt-5.4-nano (low) | N/A | N/A | Requires Responses API for tool calling — not compatible with current LangChain path |
| gpt-4.1-mini | 30/30 (100%) | ~3,230ms | Reliable but 2.2x slower |
gpt-5.4-nano is 2.2x faster and nearly as reliable. The rare stochastic failure (dropping the second tool call for mixed intent) is mitigated by the intent_count failsafe — a code-level safety net that scales to N sub-agents.
Intent Count Failsafe
Section titled “Intent Count Failsafe”Every tool call includes a required intent_count parameter — the LLM reports how many distinct intents it detected. After the LLM returns tool calls, the orchestrator compares:
intent_count(from tool args) vslen(tool_calls)(actual tool calls made)- If
intent_count > len(tool_calls): the LLM dropped a tool call → retry with a stronger prompt that explicitly demands the missing tool call(s) - The retry merges new tool calls with existing ones (no duplicates)
This is a general mechanism that scales to N sub-agents — it compares numbers, not hard-coded intent types. If we add 5 more sub-agents and the LLM reports 3 intents but only calls 2 tools, the failsafe catches it and retries.
intent_count = tool_calls[0].get("args", {}).get("intent_count", len(tool_calls))if intent_count > len(tool_calls): # Re-invoke orchestrator with explicit instruction to call missing tools retry_response = await orchestrator_model.ainvoke(retry_messages) # Merge new tool calls with existing onesFinal decision: Use gpt-4.1-mini. Stage testing confirmed gpt-5.4-nano stochastically drops the second tool call for mixed-intent queries (~1 in 15), even with the intent_count failsafe — the LLM reports intent_count=1 (misclassifies as single intent), so the failsafe cannot detect the miss. gpt-4.1-mini is 2.2x slower but 100% reliable on mixed intent. The latency cost is offset by preamble streaming.
Streaming Architecture
Section titled “Streaming Architecture”Three Scenarios
Section titled “Three Scenarios”| Scenario | Behavior |
|---|---|
| Single intent (support) | ask_support streams live via writer |
| Single intent (shopping) | ask_shopping streams live via writer |
| Mixed intent | FIFO queue — both fire concurrently, first to produce output streams live, other buffers and flushes with simulated streaming |
FIFO Queue for Mixed Intent
Section titled “FIFO Queue for Mixed Intent”Both sub-agent calls fire concurrently via asyncio.gather. First to produce output claims the “live” streaming slot (protected by asyncio.Lock). The other buffers its output. When the live stream finishes, the buffered response flushes with a small artificial delay between chunks to simulate streaming (configurable, e.g., 15-20ms).
async def _execute_tools_concurrent(tool_calls, state, writer, ...): lock = asyncio.Lock() winner = None # first to produce output buffers = {} # name → list of chunks
async def run_tool(name, func): nonlocal winner async for chunk in func(state, ...): async with lock: # Lock covers the entire check-and-emit path to prevent # race where two coroutines both claim the live slot. if winner is None: winner = name if name == winner: writer(TextEvent(content=chunk)) # live stream else: buffers.setdefault(name, []).append(chunk)
await asyncio.gather( run_tool("ask_support", _execute_support), run_tool("ask_shopping", _execute_shopping), )
# Flush buffered responses with simulated streaming delay for name, chunks in buffers.items(): for chunk in chunks: await asyncio.sleep(0.015) # 15ms simulated delay writer(TextEvent(content=chunk))Sub-Agent Streaming Internals
Section titled “Sub-Agent Streaming Internals”-
_execute_support: CallsForethoughtClient.ask_stream(), yields chunks (Forethought streams chunk-by-chunk, not token-by-token). Each chunk is a widget component fragment — typically a sentence or paragraph. -
_execute_shopping: This is the most complex part of the migration. Today,stream_adapter.pyis a 350-line adapter that depends onsubgraphs=Trueand named node namespaces (gateway,support_handler,shopping_agent) to classify streaming events (reasoning vs text, tool call tracking, response_id extraction). Collapsing the graph to a single orchestrator node means the adapter can no longer distinguish event sources by namespace.Solution:
_execute_shoppingruns the shopping agent subgraph as its own innerastream(subgraphs=True)call and internally processes the three stream modes — replicating the relevant parts of stream_adapter’s shopping-path logic.subgraphs=Trueis required to get real-time token streaming from the shopping agent’s innermodel/toolsnodes; without it, LangGraph batches inner tokens until the subgraph completes:messagesmode: Extract LLM token chunks. Classify as reasoning (before tools) or text (after tools) using thehas_called_toolsflag. Extract reasoning fromadditional_kwargs. YieldTextEventorReasoningEvent.updatesmode: Track tool call start/end events from innermodelandtoolsnodes. YieldToolCallStartEvent,ToolResultEvent,ToolCallEndEvent. Trackresponse_idfrom message metadata.custommode: Pass through any custom events from inner nodes.
This is essentially extracting the shopping-specific logic from
stream_adapter.pyinto a self-contained async generator. The stream_adapter itself simplifies — it only needs to handle the orchestrator’s top-levelcustomevents (preamble text, support chunks, shopping chunks all arrive asStreamEventobjects viawriter()).
Revised Stream Adapter
Section titled “Revised Stream Adapter”With the orchestrator emitting all events via writer(), the stream adapter simplifies significantly:
async for namespace, mode, chunk in graph.astream( graph_input, stream_mode=["custom"], # only custom mode needed config=config,): if mode == "custom" and isinstance(chunk, StreamEvent): yield chunkThe complex namespace/mode dispatch logic moves into _execute_shopping() where it processes the inner shopping subgraph’s stream. The adapter becomes a thin passthrough for typed StreamEvent objects.
Events emitted by the orchestrator via writer():
ThinkingEvent— TTFB optimization (emitted immediately)ResponseIdEvent— syntheticft_*for support, realresp_*from shoppingTextEvent— preamble text, support chunks, shopping textReasoningEvent— shopping agent reasoning tokensToolCallStartEvent/ToolCallEndEvent/ToolResultEvent— shopping agent tool activitySupportContentEvent— separator between support and shopping in mixed intentUsageEvent— token consumption (from shopping agent, zero for support-only)CompletedEvent— stream finished
Preamble
Section titled “Preamble”The orchestrator LLM emits text tokens before returning tool calls. These stream to the user immediately via writer(TextEvent) — no dead silence. Example:
User: "my receipt didn't scan and find me coffee deals"Assistant: "Let me look into your receipt issue and find some deals for you."[support response streams][SupportContentEvent emitted — internal, not visible to user][shopping response streams]Mixed Intent History Separation
Section titled “Mixed Intent History Separation”For mixed intent, the orchestrator emits a SupportContentEvent between the support and shopping responses. This is an internal event, not visible to the user — it is consumed by HistoryMiddleware to store the support response as a separate assistant message in DynamoDB, keeping support and shopping history cleanly separated. The SSE endpoint filters it out before sending to the client.
The user sees one continuous response: support text streams, then shopping text streams, with no visible separator. Same behavior as today, but emitted explicitly by the orchestrator node rather than detected by the stream adapter from node completion updates.
Fastpath (Greetings, Chitchat, Simple Follow-Ups)
Section titled “Fastpath (Greetings, Chitchat, Simple Follow-Ups)”When the orchestrator LLM responds with text only and no tool calls, this is the Fastpath — the orchestrator handles the response directly without invoking any sub-agent. Examples: greetings (“hi”), chitchat (“what’s up”), acknowledgments (“thanks”), simple follow-ups (“ok”).
Currently these go to the shopping agent (gpt-5-mini with full conversational prompt and MCP tools) — heavyweight for a simple reply. The Fastpath lets gpt-5.4-nano respond directly, saving a full shopping agent LLM call and delivering a faster response.
The orchestrator prompt includes lightweight persona instructions (tone, name, brief greeting style) so Fastpath responses match the assistant’s voice without needing the full conversational prompt.
Safety constraint: The orchestrator has no safety guardrails (no refusal instructions, no content policy, no sensitive topic handling). Fastpath must be limited to a strict allowlist of trivially safe message types:
- Greetings and hellos
- Acknowledgments (“ok”, “thanks”, “got it”)
- Simple chitchat (“what’s up”, “how are you”)
Anything beyond this allowlist must route to ask_shopping. If the user asks a question, makes a request, or says anything that could require judgment about content safety, the orchestrator must call ask_shopping and let the shopping agent handle it — the shopping agent’s full conversational prompt has built-in guardrails for safety, refusals, sensitive topics, and policy compliance.
The orchestrator prompt enforces this: “Only respond directly for simple greetings, acknowledgments, and chitchat. For ALL other messages — including general questions, opinions, advice, or anything you’re unsure about — call ask_shopping.”
Verbatim Query Protection
Section titled “Verbatim Query Protection”Two layers, same as current architecture:
Layer 1: Prompt-Level
Section titled “Layer 1: Prompt-Level”Orchestrator system prompt instructs: “Pass the user’s exact words as the query argument — do not reformulate, rephrase, expand, or add context.”
Layer 2: Code-Level Override
Section titled “Layer 2: Code-Level Override”Applied inside each tool function before execution:
- Single intent (orchestrator called one tool): always use the user’s exact text from state, ignoring LLM’s query arg.
- Mixed intent, short message (≤4 words): also use exact text.
- Mixed intent, long message: use LLM’s split but validate keyword overlap against user’s original text. Fall back to user’s text if no overlap.
The “single intent” signal comes from whether the orchestrator returned one tool call or two.
async def _execute_support(state, tool_args, ...): user_text = get_latest_user_message(state) is_single = len(tool_calls) == 1
if is_single: effective_query = user_text elif len(user_text.split()) <= 4: effective_query = user_text else: query = tool_args.get("query", "") effective_query = query if has_overlap(query, user_text) else user_text
# proceed with effective_query...Context Isolation and Cross-Turn Context
Section titled “Context Isolation and Cross-Turn Context”Design Principles
Section titled “Design Principles”Context management has two competing goals:
- Within a single turn (mixed intent): Each sub-agent should only see its own portion — support content shouldn’t leak into the shopping agent’s current-turn context and vice versa.
- Across turns (intent switch): The full conversation history is valuable context. A user who discussed points issues in turn 1 and asks for deals in turn 2 expects the assistant to remember the whole conversation.
General Rule (Scales to N Sub-Agents)
Section titled “General Rule (Scales to N Sub-Agents)”Each sub-agent receives:
- Full cross-turn history — all prior turns from all sub-agents. This shared context makes the assistant feel coherent across intent switches.
- Current turn query — only the portion relevant to this sub-agent (from the orchestrator’s tool call args).
prior_context(optional) — a brief summary of relevant prior conversation context, populated by the orchestrator when the user switches intent. Useful for sub-agents that can’t consume full message history (e.g., API-based services).
The orchestrator is the natural place to manage this — it sees the full history, knows which sub-agent it’s calling, and can adapt context format per sub-agent type:
For each sub-agent tool call: if sub-agent is LLM-based (accepts message history): pass full cross-turn history + current turn query prior_context is supplementary (nice-to-have, not required) if sub-agent is API-based (accepts only a query string): pass current turn query prior_context is essential — summarizes relevant cross-turn contextUpdated Tool Schemas
Section titled “Updated Tool Schemas”Both tools include an optional prior_context parameter:
ask_support = { "parameters": { "query": "The user's exact support question, verbatim.", "support_category": "...", "support_summary": "...", "prior_context": "Brief summary of relevant prior conversation " "context when the user switches intent. Optional." }}
ask_shopping = { "parameters": { "query": "The user's exact shopping question, verbatim.", "prior_context": "Brief summary of relevant prior conversation " "context when the user switches intent. Optional." }}How Each Sub-Agent Uses Context
Section titled “How Each Sub-Agent Uses Context”Support (API-based — Forethought):
- Forethought accepts a query string, not structured history
prior_contextis prepended to the query or passed as a Forethought context variable- Example: User shopped for Folgers in turn 1, then says “the coffee never gave me points” in turn 2 →
prior_context: "User previously searched for Folgers coffee offers"helps Forethought understand the specific product
Shopping (LLM-based — agent subgraph):
- Receives full cross-turn message history (including prior support turns from other turns)
prior_contextis supplementary — the LLM can read the history directly- Current-turn isolation preserved: For mixed intent, the current turn’s support content is still stripped from the shopping agent’s view (same logic as today’s
_rewrite_shopping_query)
Future sub-agents: Same pattern — LLM-based sub-agents get full history, API-based sub-agents rely on prior_context from the orchestrator.
Orchestrator Context
Section titled “Orchestrator Context”The orchestrator LLM sees the full conversation history (all turns, all sub-agents) to make routing decisions and populate prior_context. It doesn’t generate user-facing content from this context (except Fastpath responses and preambles).
Post-Processing Guards
Section titled “Post-Processing Guards”Three guards from the current classifier, applied between the LLM returning tool calls and execution:
Guard 1: Echo Follow-Up Detection
Section titled “Guard 1: Echo Follow-Up Detection”Short user replies that echo the previous assistant’s support question (e.g., “Fetch shop” after “Is it from a Fetch Shop purchase?”) should route to support.
Adaptation: If the LLM calls ask_shopping on a short echo reply, override to ask_support. Same _is_support_echo_followup logic.
Guard 2: Keyword Overlap Check
Section titled “Guard 2: Keyword Overlap Check”For mixed intent (both tools called), if one tool’s query arg has no content word overlap with the user’s message, drop that tool call.
Adaptation: Same _content_words and _has_overlap functions, applied to tool call args.
Guard 3: Verbatim Override
Section titled “Guard 3: Verbatim Override”Handled inside tool functions (see Verbatim Query Protection section above).
Application Order
Section titled “Application Order”tool_calls = response.tool_calls
# Guard 1: echo follow-up — may replace ask_shopping with ask_supporttool_calls = _apply_echo_guard(tool_calls, state)
# Guard 2: overlap check — may drop one tool call from mixedtool_calls = _apply_overlap_guard(tool_calls, user_text)
# Guard 3: verbatim override — applied inside tool functionsawait _execute_tools(tool_calls, state, writer)Conversation Continuity
Section titled “Conversation Continuity”Forethought conversation_id
Section titled “Forethought conversation_id”Unchanged. _execute_support() contains the same logic as today’s support_handler_node:
- Retrieve
conversation_idfrom episode metadata in DynamoDB - Check TTL (55 min expiry)
- If valid → pass to Forethought (PUT to continue)
- If expired → replay last 6 support turns as context prefix
- After call → store new
conversation_idback to episode metadata
All managed inside _execute_support(), independent of routing pattern.
OpenAI previous_response_id
Section titled “OpenAI previous_response_id”The shopping agent subgraph receives previous_response_id for OpenAI server-side context. This is bound to the model at graph creation time in factory.py — unchanged.
File Changes
Section titled “File Changes”Modified
Section titled “Modified”| File | Change |
|---|---|
gateway/classifier.py | Replace → rename to gateway/orchestrator.py. Structured output prompt + GatewayOutput → tool-calling orchestrator prompt + tool schemas. Post-processing guards adapted for tool calls. |
gateway/graph.py | Simplify — remove _route_by_intent, _after_support, _rewrite_shopping_query, conditional edges. Graph becomes START → orchestrator_node → END. |
gateway/support_handler.py | Refactor — extract core logic into callable async generator with explicit function signature: async def execute_support(query, support_category, support_summary, prior_context, *, forethought_client, history_store, episode_id, user_id, forethought_stream) -> AsyncGenerator[str, None]. Yields chunks instead of writing directly to StreamWriter. Fields previously read from state (scout_query, support_category, support_summary) become function arguments; prior_context is prepended to the query for cross-turn context on intent switches. The node wrapper (support_handler_node) is removed; the function is called from the orchestrator. |
gateway/state.py | Simplify — GatewayState drops intent, scout_query, shopping_query, support_category, support_summary (become local to orchestrator). |
gateway/stream_adapter.py | Simplify significantly — becomes a thin passthrough for StreamEvent objects from writer(). The complex namespace/mode dispatch logic (350 lines) moves into _execute_shopping() inside the orchestrator. Only needs stream_mode=["custom"]. |
agent_config.yaml | Update gateway agent config if needed for tool-calling. |
factory.py | create_gateway_graph() simplified — fewer params. |
tests/ | Classifier tests → orchestrator tests. Validate tool selection parity with current classification quality. |
| File | Purpose |
|---|---|
gateway/orchestrator.py | Orchestrator node + FIFO queue + tool execution functions |
Unchanged
Section titled “Unchanged”| File | Why |
|---|---|
tools/scout.py | ForethoughtClient — same interface, called from _execute_support() |
agent/agent.py | Shopping agent subgraph — invoked from _execute_shopping() |
agent/streaming.py | Same event types (TextEvent, ReasoningEvent, etc.) |
history/middleware.py | Wraps graph stream — unchanged |
api/main.py | Calls create_gateway_graph(), streams result — unchanged |
Risks and Mitigations
Section titled “Risks and Mitigations”| Risk | Mitigation |
|---|---|
| gpt-5.4-nano tool calling quality insufficient | Validate with PR #245’s 38 integration test scenarios adapted for tool selection. Fall back to structured output + code dispatch (Approach C) if quality is unreliable. |
| Orchestrator LLM reformulates queries | Two-layer protection: prompt instruction + code-level verbatim override. |
| FIFO queue produces jarring UX for mixed intent | Simulated streaming delay (15-20ms between chunks) for buffered responses. |
| Shopping agent token streaming through orchestrator writer | _execute_shopping() runs the subgraph with astream() and forwards tokens via the FIFO queue callback. Needs validation that dual-mode streaming works correctly in this forwarding pattern. |
| Added latency vs current classifier | Orchestrator replaces classifier (not adds to it). Net cost: tool-calling overhead vs structured output — expected ~50-150ms additional. Offset by preamble UX improvement. |
Validation Plan
Section titled “Validation Plan”- Tool selection parity: Port PR #245’s 38 integration tests to validate orchestrator routes correctly (same scenarios, tool calls instead of structured output).
- Streaming correctness: Verify preamble + support streaming + shopping streaming produce correct SSE events end-to-end.
- FIFO queue: Test mixed-intent scenarios with both orderings (support first, shopping first).
- Verbatim protection: Test that queries reach Forethought and shopping agent unmodified.
- conversation_id continuity: Multi-turn support conversations maintain Forethought context.
- Context isolation (same turn): Verify shopping agent never sees current-turn support content and vice versa in mixed intent.
- Cross-turn context (intent switch): Test that
prior_contextis populated correctly when intent switches between turns (shopping→support, support→shopping). Verify Forethought receives enriched context and shopping agent receives full cross-turn history. - Fastpath: Validate greetings/chitchat are handled directly by orchestrator without invoking sub-agents. Verify safety — anything beyond trivial greetings routes to
ask_shopping. - Eval parity: Run Opik eval suite against orchestrator path, compare response_quality and policy scores against current gateway baseline.