Sub-Agents-as-Tools Architecture Design

Last Updated: 2026-04-20

Jira: PLT-616 — Spike: Evaluate sub-agents-as-tools architecture
Branch: feature/sub-agents-as-tools
Prior work: PLT-616 spike notebook (PR #246), intent-classification-research.md
OpenAI guidance: Section 5b — “one orchestrator agent calling agent as tools > multi agent handoff for simplicity”
Depends on: PR #245 (gpt-5.4-nano swap) for model validation, but built independently on main

Context and Motivation

Building on PR #246’s Findings

PR #246’s multi-agent orchestration notebook measured four patterns: monolith, supervisor (langgraph_supervisor.create_supervisor), swarm, and A2A. Its conclusion — keep the monolithic shopping subgraph + static-routing gateway — is well-supported for the patterns it evaluated.

The notebook focused on multi-agent handoff patterns (supervisor with full delegation, swarm with peer handoffs). OpenAI’s Fetch guidance document (Section 5b) recommends a different pattern:

“one orchestrator agent calling agent as tools > multi agent handoff for simplicity”

This agents-as-tools pattern (one orchestrator calling sub-agents as function calls) wasn’t covered in PR #246. This spike builds on those findings by evaluating the pattern OpenAI specifically recommended.

Phase 1 → Phase 3 History

The intent-classification-research.md documents the evolution:

Phase 1 (tool-based): Scout as a BaseTool — single conversational agent calls scout_answer like any other tool. Parallel tool calls for mixed intent. Zero added TTFB. Worked well (validated in Appendix B of that doc), but BaseTool._arun() returns a single str — no streaming. Combined with Forethought’s ~12s latency, this meant 12 seconds of dead silence for support queries.
Phase 3 (gateway graph, current): Custom LangGraph StateGraph with intent classifier (gpt-4.1-mini structured output) routing to support_handler (Forethought streaming via StreamWriter) or shopping agent subgraph. Solves the streaming problem but uses static conditional routing — closer to the handoff pattern OpenAI says not to use.

Goal

Combine the best of both: the agents-as-tools simplicity OpenAI recommends with the streaming capability Phase 3 provides. Upgrade the gateway classifier into an orchestrator that calls sub-agents as tool functions, with streaming through tool execution via LangGraph’s StreamWriter.

Architecture

Current (Static Routing)

START → classifier (gpt-4.1-mini, structured output)
         → [static conditional edges]
              → support_handler → END
              → rewrite_query → shopping_agent → END
              → support_handler → rewrite_query → shopping_agent → END

New (Orchestrator with Tool Calling)

START → orchestrator_node (gpt-5.4-nano, tool calling)
         → LLM emits preamble text (streams to user immediately)
         → LLM returns tool call(s)
         → post-processing guards validate tool calls
         → execute tool(s):
              ask_support() — calls Forethought, streams via StreamWriter
              ask_shopping() — runs shopping agent, streams via StreamWriter
         → END

What Changes

gateway_node (classifier + structured output) → orchestrator_node (tool calling)
Static conditional edges (_route_by_intent, _after_support) → removed; LLM tool selection is the routing
_rewrite_shopping_query → absorbed into _execute_shopping() tool function
support_handler_node → core logic wrapped as _execute_support() callable inside orchestrator

What Stays the Same

Forethought client, conversation_id continuity, chunk streaming — all internal to _execute_support()
Shopping agent subgraph (create_agent with MCP tools) — invoked inside _execute_shopping()
Verbatim query protection — prompt-level + code-level override in tool functions
SSE event types, HistoryMiddleware — unchanged
api/main.py — calls create_gateway_graph(), streams result

Orchestrator Node Design

Execution Flow

The orchestrator node is a single async function — a plain LangGraph node with writer: StreamWriter, not a ReAct agent.

async def orchestrator_node(state: dict, writer: StreamWriter, *, ...):
    # 1. Build messages with system prompt
    # 2. Call gpt-5.4-nano with tool schemas
    # 3. Stream preamble text tokens → writer(TextEvent)
    # 4. Intercept tool calls, apply post-processing guards
    # 5. Execute validated tool calls (FIFO queue for mixed intent)
    # 6. Return final state (AIMessages for history)

Tool Schemas

ask_support = {
    "name": "ask_support",
    "description": "Answer customer support questions about Fetch.",
    "parameters": {
        "query": "The user's exact support question, verbatim.",
        "support_category": "Topic category (missing_points, ereceipts, "
                            "rejected_receipt, rewards, fetch_shop, fetch_play, "
                            "fetch_card, point_pass, referrals, profile_help, "
                            "goodrx, social). Optional.",
        "support_summary": "One-sentence summary of the support issue using "
                           "full conversation context. Under 300 chars.",
        "prior_context": "Brief summary of relevant prior conversation context "
                         "when the user switches intent (e.g., from shopping to "
                         "support). Optional."
    }
}

ask_shopping = {
    "name": "ask_shopping",
    "description": "Handle shopping queries — product search, deals, "
                   "recommendations, price comparisons, purchase history.",
    "parameters": {
        "query": "The user's exact shopping question, verbatim.",
        "prior_context": "Brief summary of relevant prior conversation context "
                         "when the user switches intent (e.g., from support to "
                         "shopping). Optional."
    }
}

Orchestrator System Prompt

XML format (matching PR #245’s style for gpt-5.4). Carries over all classification intelligence from the current classifier prompt:

Support keywords list → guides tool selection
Intent categories → mapped to tool descriptions
Image classification rules (receipt → support, product → shopping)
Greeting/check-in rules (→ Fastpath direct response, or ask_shopping if beyond simple greetings)
No-carryforward rule (only classify latest message)
No-reformulation rule (pass user’s exact words)
Support category taxonomy
Preamble instruction: “Before calling tools, emit a brief sentence acknowledging what you’re about to do”

Model

gpt-5.4-nano with reasoning_effort: none + intent_count failsafe

Model Selection Decision

Benchmarking compared gpt-5.4-nano and gpt-4.1-mini for orchestrator tool calling:

Model	Mixed intent (30 queries, 3 batches)	Avg latency	Notes
gpt-5.4-nano (`none`)	30/30 (100%)	~1,450ms	Rare stochastic failures observed in early testing (~1 in 15)
gpt-5.4-nano (`low`)	N/A	N/A	Requires Responses API for tool calling — not compatible with current LangChain path
gpt-4.1-mini	30/30 (100%)	~3,230ms	Reliable but 2.2x slower

gpt-5.4-nano is 2.2x faster and nearly as reliable. The rare stochastic failure (dropping the second tool call for mixed intent) is mitigated by the intent_count failsafe — a code-level safety net that scales to N sub-agents.

Intent Count Failsafe

Every tool call includes a required intent_count parameter — the LLM reports how many distinct intents it detected. After the LLM returns tool calls, the orchestrator compares:

intent_count (from tool args) vs len(tool_calls) (actual tool calls made)
If intent_count > len(tool_calls): the LLM dropped a tool call → retry with a stronger prompt that explicitly demands the missing tool call(s)
The retry merges new tool calls with existing ones (no duplicates)

This is a general mechanism that scales to N sub-agents — it compares numbers, not hard-coded intent types. If we add 5 more sub-agents and the LLM reports 3 intents but only calls 2 tools, the failsafe catches it and retries.

intent_count = tool_calls[0].get("args", {}).get("intent_count", len(tool_calls))
if intent_count > len(tool_calls):
    # Re-invoke orchestrator with explicit instruction to call missing tools
    retry_response = await orchestrator_model.ainvoke(retry_messages)
    # Merge new tool calls with existing ones

Final decision: Use gpt-4.1-mini. Stage testing confirmed gpt-5.4-nano stochastically drops the second tool call for mixed-intent queries (~1 in 15), even with the intent_count failsafe — the LLM reports intent_count=1 (misclassifies as single intent), so the failsafe cannot detect the miss. gpt-4.1-mini is 2.2x slower but 100% reliable on mixed intent. The latency cost is offset by preamble streaming.

Streaming Architecture

Three Scenarios

Scenario	Behavior
Single intent (support)	`ask_support` streams live via writer
Single intent (shopping)	`ask_shopping` streams live via writer
Mixed intent	FIFO queue — both fire concurrently, first to produce output streams live, other buffers and flushes with simulated streaming

FIFO Queue for Mixed Intent

Both sub-agent calls fire concurrently via asyncio.gather. First to produce output claims the “live” streaming slot (protected by asyncio.Lock). The other buffers its output. When the live stream finishes, the buffered response flushes with a small artificial delay between chunks to simulate streaming (configurable, e.g., 15-20ms).

async def _execute_tools_concurrent(tool_calls, state, writer, ...):
    lock = asyncio.Lock()
    winner = None  # first to produce output
    buffers = {}   # name → list of chunks

    async def run_tool(name, func):
        nonlocal winner
        async for chunk in func(state, ...):
            async with lock:
                # Lock covers the entire check-and-emit path to prevent
                # race where two coroutines both claim the live slot.
                if winner is None:
                    winner = name
                if name == winner:
                    writer(TextEvent(content=chunk))  # live stream
                else:
                    buffers.setdefault(name, []).append(chunk)

    await asyncio.gather(
        run_tool("ask_support", _execute_support),
        run_tool("ask_shopping", _execute_shopping),
    )

    # Flush buffered responses with simulated streaming delay
    for name, chunks in buffers.items():
        for chunk in chunks:
            await asyncio.sleep(0.015)  # 15ms simulated delay
            writer(TextEvent(content=chunk))

Sub-Agent Streaming Internals

_execute_support: Calls ForethoughtClient.ask_stream(), yields chunks (Forethought streams chunk-by-chunk, not token-by-token). Each chunk is a widget component fragment — typically a sentence or paragraph.
_execute_shopping: This is the most complex part of the migration. Today, stream_adapter.py is a 350-line adapter that depends on subgraphs=True and named node namespaces (gateway, support_handler, shopping_agent) to classify streaming events (reasoning vs text, tool call tracking, response_id extraction). Collapsing the graph to a single orchestrator node means the adapter can no longer distinguish event sources by namespace.

Solution: _execute_shopping runs the shopping agent subgraph as its own inner astream(subgraphs=True) call and internally processes the three stream modes — replicating the relevant parts of stream_adapter’s shopping-path logic. subgraphs=True is required to get real-time token streaming from the shopping agent’s inner model/tools nodes; without it, LangGraph batches inner tokens until the subgraph completes:
1. messages mode: Extract LLM token chunks. Classify as reasoning (before tools) or text (after tools) using the has_called_tools flag. Extract reasoning from additional_kwargs. Yield TextEvent or ReasoningEvent.
2. updates mode: Track tool call start/end events from inner model and tools nodes. Yield ToolCallStartEvent, ToolResultEvent, ToolCallEndEvent. Track response_id from message metadata.
3. custom mode: Pass through any custom events from inner nodes.
This is essentially extracting the shopping-specific logic from stream_adapter.py into a self-contained async generator. The stream_adapter itself simplifies — it only needs to handle the orchestrator’s top-level custom events (preamble text, support chunks, shopping chunks all arrive as StreamEvent objects via writer()).

Revised Stream Adapter

With the orchestrator emitting all events via writer(), the stream adapter simplifies significantly:

async for namespace, mode, chunk in graph.astream(
    graph_input,
    stream_mode=["custom"],  # only custom mode needed
    config=config,
):
    if mode == "custom" and isinstance(chunk, StreamEvent):
        yield chunk

The complex namespace/mode dispatch logic moves into _execute_shopping() where it processes the inner shopping subgraph’s stream. The adapter becomes a thin passthrough for typed StreamEvent objects.

Events emitted by the orchestrator via writer():

ThinkingEvent — TTFB optimization (emitted immediately)
ResponseIdEvent — synthetic ft_* for support, real resp_* from shopping
TextEvent — preamble text, support chunks, shopping text
ReasoningEvent — shopping agent reasoning tokens
ToolCallStartEvent / ToolCallEndEvent / ToolResultEvent — shopping agent tool activity
SupportContentEvent — separator between support and shopping in mixed intent
UsageEvent — token consumption (from shopping agent, zero for support-only)
CompletedEvent — stream finished

Preamble

The orchestrator LLM emits text tokens before returning tool calls. These stream to the user immediately via writer(TextEvent) — no dead silence. Example:

User: "my receipt didn't scan and find me coffee deals"
Assistant: "Let me look into your receipt issue and find some deals for you."
[support response streams]
[SupportContentEvent emitted — internal, not visible to user]
[shopping response streams]

Mixed Intent History Separation

For mixed intent, the orchestrator emits a SupportContentEvent between the support and shopping responses. This is an internal event, not visible to the user — it is consumed by HistoryMiddleware to store the support response as a separate assistant message in DynamoDB, keeping support and shopping history cleanly separated. The SSE endpoint filters it out before sending to the client.

The user sees one continuous response: support text streams, then shopping text streams, with no visible separator. Same behavior as today, but emitted explicitly by the orchestrator node rather than detected by the stream adapter from node completion updates.

Fastpath (Greetings, Chitchat, Simple Follow-Ups)

When the orchestrator LLM responds with text only and no tool calls, this is the Fastpath — the orchestrator handles the response directly without invoking any sub-agent. Examples: greetings (“hi”), chitchat (“what’s up”), acknowledgments (“thanks”), simple follow-ups (“ok”).

Currently these go to the shopping agent (gpt-5-mini with full conversational prompt and MCP tools) — heavyweight for a simple reply. The Fastpath lets gpt-5.4-nano respond directly, saving a full shopping agent LLM call and delivering a faster response.

The orchestrator prompt includes lightweight persona instructions (tone, name, brief greeting style) so Fastpath responses match the assistant’s voice without needing the full conversational prompt.

Safety constraint: The orchestrator has no safety guardrails (no refusal instructions, no content policy, no sensitive topic handling). Fastpath must be limited to a strict allowlist of trivially safe message types:

Greetings and hellos
Acknowledgments (“ok”, “thanks”, “got it”)
Simple chitchat (“what’s up”, “how are you”)

Anything beyond this allowlist must route to ask_shopping. If the user asks a question, makes a request, or says anything that could require judgment about content safety, the orchestrator must call ask_shopping and let the shopping agent handle it — the shopping agent’s full conversational prompt has built-in guardrails for safety, refusals, sensitive topics, and policy compliance.

The orchestrator prompt enforces this: “Only respond directly for simple greetings, acknowledgments, and chitchat. For ALL other messages — including general questions, opinions, advice, or anything you’re unsure about — call ask_shopping.”

Verbatim Query Protection

Two layers, same as current architecture:

Layer 1: Prompt-Level

Orchestrator system prompt instructs: “Pass the user’s exact words as the query argument — do not reformulate, rephrase, expand, or add context.”

Layer 2: Code-Level Override

Applied inside each tool function before execution:

Single intent (orchestrator called one tool): always use the user’s exact text from state, ignoring LLM’s query arg.
Mixed intent, short message (≤4 words): also use exact text.
Mixed intent, long message: use LLM’s split but validate keyword overlap against user’s original text. Fall back to user’s text if no overlap.

The “single intent” signal comes from whether the orchestrator returned one tool call or two.

async def _execute_support(state, tool_args, ...):
    user_text = get_latest_user_message(state)
    is_single = len(tool_calls) == 1

    if is_single:
        effective_query = user_text
    elif len(user_text.split()) <= 4:
        effective_query = user_text
    else:
        query = tool_args.get("query", "")
        effective_query = query if has_overlap(query, user_text) else user_text

    # proceed with effective_query...

Context Isolation and Cross-Turn Context

Design Principles

Context management has two competing goals:

Within a single turn (mixed intent): Each sub-agent should only see its own portion — support content shouldn’t leak into the shopping agent’s current-turn context and vice versa.
Across turns (intent switch): The full conversation history is valuable context. A user who discussed points issues in turn 1 and asks for deals in turn 2 expects the assistant to remember the whole conversation.

General Rule (Scales to N Sub-Agents)

Each sub-agent receives:

Full cross-turn history — all prior turns from all sub-agents. This shared context makes the assistant feel coherent across intent switches.
Current turn query — only the portion relevant to this sub-agent (from the orchestrator’s tool call args).
prior_context (optional) — a brief summary of relevant prior conversation context, populated by the orchestrator when the user switches intent. Useful for sub-agents that can’t consume full message history (e.g., API-based services).

The orchestrator is the natural place to manage this — it sees the full history, knows which sub-agent it’s calling, and can adapt context format per sub-agent type:

For each sub-agent tool call:
  if sub-agent is LLM-based (accepts message history):
    pass full cross-turn history + current turn query
    prior_context is supplementary (nice-to-have, not required)
  if sub-agent is API-based (accepts only a query string):
    pass current turn query
    prior_context is essential — summarizes relevant cross-turn context

Updated Tool Schemas

Both tools include an optional prior_context parameter:

ask_support = {
    "parameters": {
        "query": "The user's exact support question, verbatim.",
        "support_category": "...",
        "support_summary": "...",
        "prior_context": "Brief summary of relevant prior conversation "
                         "context when the user switches intent. Optional."
    }
}

ask_shopping = {
    "parameters": {
        "query": "The user's exact shopping question, verbatim.",
        "prior_context": "Brief summary of relevant prior conversation "
                         "context when the user switches intent. Optional."
    }
}

How Each Sub-Agent Uses Context

Support (API-based — Forethought):

Forethought accepts a query string, not structured history
prior_context is prepended to the query or passed as a Forethought context variable
Example: User shopped for Folgers in turn 1, then says “the coffee never gave me points” in turn 2 → prior_context: "User previously searched for Folgers coffee offers" helps Forethought understand the specific product

Shopping (LLM-based — agent subgraph):

Receives full cross-turn message history (including prior support turns from other turns)
prior_context is supplementary — the LLM can read the history directly
Current-turn isolation preserved: For mixed intent, the current turn’s support content is still stripped from the shopping agent’s view (same logic as today’s _rewrite_shopping_query)

Future sub-agents: Same pattern — LLM-based sub-agents get full history, API-based sub-agents rely on prior_context from the orchestrator.

Orchestrator Context

The orchestrator LLM sees the full conversation history (all turns, all sub-agents) to make routing decisions and populate prior_context. It doesn’t generate user-facing content from this context (except Fastpath responses and preambles).

Post-Processing Guards

Three guards from the current classifier, applied between the LLM returning tool calls and execution:

Guard 1: Echo Follow-Up Detection

Short user replies that echo the previous assistant’s support question (e.g., “Fetch shop” after “Is it from a Fetch Shop purchase?”) should route to support.

Adaptation: If the LLM calls ask_shopping on a short echo reply, override to ask_support. Same _is_support_echo_followup logic.

Guard 2: Keyword Overlap Check

For mixed intent (both tools called), if one tool’s query arg has no content word overlap with the user’s message, drop that tool call.

Adaptation: Same _content_words and _has_overlap functions, applied to tool call args.

Guard 3: Verbatim Override

Handled inside tool functions (see Verbatim Query Protection section above).

Application Order

tool_calls = response.tool_calls

# Guard 1: echo follow-up — may replace ask_shopping with ask_support
tool_calls = _apply_echo_guard(tool_calls, state)

# Guard 2: overlap check — may drop one tool call from mixed
tool_calls = _apply_overlap_guard(tool_calls, user_text)

# Guard 3: verbatim override — applied inside tool functions
await _execute_tools(tool_calls, state, writer)

Conversation Continuity

Forethought conversation_id

Unchanged. _execute_support() contains the same logic as today’s support_handler_node:

Retrieve conversation_id from episode metadata in DynamoDB
Check TTL (55 min expiry)
If valid → pass to Forethought (PUT to continue)
If expired → replay last 6 support turns as context prefix
After call → store new conversation_id back to episode metadata

All managed inside _execute_support(), independent of routing pattern.

OpenAI previous_response_id

The shopping agent subgraph receives previous_response_id for OpenAI server-side context. This is bound to the model at graph creation time in factory.py — unchanged.

File Changes

Modified

File	Change
`gateway/classifier.py`	Replace → rename to `gateway/orchestrator.py`. Structured output prompt + `GatewayOutput` → tool-calling orchestrator prompt + tool schemas. Post-processing guards adapted for tool calls.
`gateway/graph.py`	Simplify — remove `_route_by_intent`, `_after_support`, `_rewrite_shopping_query`, conditional edges. Graph becomes `START → orchestrator_node → END`.
`gateway/support_handler.py`	Refactor — extract core logic into callable async generator with explicit function signature: `async def execute_support(query, support_category, support_summary, prior_context, *, forethought_client, history_store, episode_id, user_id, forethought_stream) -> AsyncGenerator[str, None]`. Yields chunks instead of writing directly to `StreamWriter`. Fields previously read from `state` (`scout_query`, `support_category`, `support_summary`) become function arguments; `prior_context` is prepended to the query for cross-turn context on intent switches. The node wrapper (`support_handler_node`) is removed; the function is called from the orchestrator.
`gateway/state.py`	Simplify — `GatewayState` drops `intent`, `scout_query`, `shopping_query`, `support_category`, `support_summary` (become local to orchestrator).
`gateway/stream_adapter.py`	Simplify significantly — becomes a thin passthrough for `StreamEvent` objects from `writer()`. The complex namespace/mode dispatch logic (350 lines) moves into `_execute_shopping()` inside the orchestrator. Only needs `stream_mode=["custom"]`.
`agent_config.yaml`	Update gateway agent config if needed for tool-calling.
`factory.py`	`create_gateway_graph()` simplified — fewer params.
`tests/`	Classifier tests → orchestrator tests. Validate tool selection parity with current classification quality.

New

File	Purpose
`gateway/orchestrator.py`	Orchestrator node + FIFO queue + tool execution functions

Unchanged

File	Why
`tools/scout.py`	`ForethoughtClient` — same interface, called from `_execute_support()`
`agent/agent.py`	Shopping agent subgraph — invoked from `_execute_shopping()`
`agent/streaming.py`	Same event types (`TextEvent`, `ReasoningEvent`, etc.)
`history/middleware.py`	Wraps graph stream — unchanged
`api/main.py`	Calls `create_gateway_graph()`, streams result — unchanged

Risks and Mitigations

Risk	Mitigation
gpt-5.4-nano tool calling quality insufficient	Validate with PR #245’s 38 integration test scenarios adapted for tool selection. Fall back to structured output + code dispatch (Approach C) if quality is unreliable.
Orchestrator LLM reformulates queries	Two-layer protection: prompt instruction + code-level verbatim override.
FIFO queue produces jarring UX for mixed intent	Simulated streaming delay (15-20ms between chunks) for buffered responses.
Shopping agent token streaming through orchestrator writer	`_execute_shopping()` runs the subgraph with `astream()` and forwards tokens via the FIFO queue callback. Needs validation that dual-mode streaming works correctly in this forwarding pattern.
Added latency vs current classifier	Orchestrator replaces classifier (not adds to it). Net cost: tool-calling overhead vs structured output — expected ~50-150ms additional. Offset by preamble UX improvement.

Validation Plan

Tool selection parity: Port PR #245’s 38 integration tests to validate orchestrator routes correctly (same scenarios, tool calls instead of structured output).
Streaming correctness: Verify preamble + support streaming + shopping streaming produce correct SSE events end-to-end.
FIFO queue: Test mixed-intent scenarios with both orderings (support first, shopping first).
Verbatim protection: Test that queries reach Forethought and shopping agent unmodified.
conversation_id continuity: Multi-turn support conversations maintain Forethought context.
Context isolation (same turn): Verify shopping agent never sees current-turn support content and vice versa in mixed intent.
Cross-turn context (intent switch): Test that prior_context is populated correctly when intent switches between turns (shopping→support, support→shopping). Verify Forethought receives enriched context and shopping agent receives full cross-turn history.
Fastpath: Validate greetings/chitchat are handled directly by orchestrator without invoking sub-agents. Verify safety — anything beyond trivial greetings routes to ask_shopping.
Eval parity: Run Opik eval suite against orchestrator path, compare response_quality and policy scores against current gateway baseline.

Sub-Agents-as-Tools Architecture Design

Sub-Agents-as-Tools Architecture Design

Last Updated: 2026-04-20

Related

Context and Motivation

Building on PR #246’s Findings

Phase 1 → Phase 3 History

Goal

Architecture

Current (Static Routing)

New (Orchestrator with Tool Calling)

What Changes

What Stays the Same

Orchestrator Node Design

Execution Flow

Tool Schemas

Orchestrator System Prompt

Model

Model Selection Decision

Intent Count Failsafe

Streaming Architecture

Three Scenarios

FIFO Queue for Mixed Intent

Sub-Agent Streaming Internals

Revised Stream Adapter

Preamble

Mixed Intent History Separation

Fastpath (Greetings, Chitchat, Simple Follow-Ups)

Verbatim Query Protection

Layer 1: Prompt-Level

Layer 2: Code-Level Override

Context Isolation and Cross-Turn Context

Design Principles

General Rule (Scales to N Sub-Agents)

Updated Tool Schemas

How Each Sub-Agent Uses Context

Orchestrator Context

Post-Processing Guards

Guard 1: Echo Follow-Up Detection

Guard 2: Keyword Overlap Check

Guard 3: Verbatim Override

Application Order

Conversation Continuity

Forethought conversation_id

OpenAI previous_response_id

File Changes

Modified

New

Unchanged

Risks and Mitigations

Validation Plan