Runtime Discovery & Sub-Agent Execution

PC2: Runtime Discovery & Sub-Agent Execution

1. Problem Statement

PC1 specifies what an agent card is — Agent Definition shape, orchestrator pattern, prompt assembly, vertical integration. PC1 stops at the surface where the orchestrator emits tool calls and sub-agents return tool results. It does not specify what happens between those two events.

That gap is PC2’s scope. When a user sends a turn:

How does the orchestrator’s LLM know which sub-agents exist and what each one does?
What happens between “tool call emitted” and “tool result returned”?
When a user expresses multiple intents in one turn, how does the system handle that?
When a sub-agent reaches CCS for data, what contract governs that call?
Who is “this user” at each hop?
When something fails, what does the user see?
When is routing considered broken and what’s the lever to fix it?

Without PC2, three things break:

Downstream specs invent contracts. PC3 (streaming), PF2 (multi-agent coordination), PD1 (scheduled execution) all assume a dispatch loop exists. Without PC2’s specification, each downstream spec defines its own model of dispatch, and the platform fragments.
CCS integration is ambiguous. The PS6 spec (consumer-context-service#77) commits CCS to returning EnricherResponse[T] envelopes synchronously and explicitly out-of-scopes “multi-enricher orchestration” as consumer-agent owns chat orchestration. PC2 is what consumer-agent commits to in return.
Vertical onboarding stalls at runtime questions. PC1 lets a vertical engineer author an Agent Definition. PC2 lets them predict how it will execute, fail, and be observed.

PC2 is the orchestrator’s dispatch contract. Everything downstream cites this contract instead of re-deriving it.

Companion: Miro design board — supplementary architecture diagrams. The spec is the source of truth.

2. Capabilities Source

Per the Platform Spec Lab, PC2 owns the Runtime Discovery & Sub-Agent Execution capability for the AI Assistant Platform’s consumer-agent runtime. PC2 specifies what happens after the orchestrator emits tool calls (PC1 §5.3) and before sub-agent tool results return — the dispatch contract every downstream spec cites.

3. Background & Context

3.1 Today’s reality

Today, consumer-agent uses one production dispatch path for chat turns: the sub-agents-as-tools dispatch. The orchestrator’s LLM emits tool calls for ask_shopping (Shop sub-agent) and ask_support (Scout sub-agent, which wraps Forethought as its upstream response-generation service). Each is a registered sub-agent on the orchestrator’s sub_agents list (PC1 §5.9). The legacy classifier path is deprecated; PC2 codifies the sub-agents-as-tools dispatch as the canonical surface.

The orchestrator-as-tool-caller pattern (PC1 §5.3) is implemented and live for the shopping and support verticals. Mixed-intent fan-out has been validated at N=2 (PC1 §11.2). N=3 is unimplemented because a third sub-agent has not been registered.

CCS integration today is per-tool wiring: each sub-agent’s tool calls one or more CCS endpoints. The EnricherResponse[T] envelope shape from PS6 is being adopted iteratively as enrichers ship; layered storage and principal verification are not yet uniformly enforced across all sub-agent tool paths.

Status events exist informally — sub-agents may write progress messages to the LangGraph stream — but typed event vocabulary and orchestrator-mediated rendering are not yet implemented. The reference implementation context is the basis for PC2’s commitments; PC2 codifies what must hold post-implementation, not what holds today.

3.2 What PC1 leaves to PC2

PC2 inherits from PC1:

Agent Definition shape (PC1 §5.2) — sub_agents list is the dispatch-time enumeration surface.
Orchestrator pattern (PC1 §5.3) — sub-agents materialized as tools at factory time; orchestrator LLM owns routing via tool calls.
Vertical integration model (PC1 §5.9) — verticals contribute Agent Definitions; runtime is platform-owned.

PC2 owns content that PC1 references by pointer:

Mixed-intent fan-out — concurrency, cap, partial-failure handling, suppression policy. Lives in PC2 §5.4; PC1 §5.4 carries the dispatch primitive declaration and points here.

PC2 defers to downstream specs:

Status-event vocabulary and transport mechanism → PC3
Envelope schema → PS6
Tool authoring workflow → PS2
Memory model and per-sub-agent retrieval schema → consumer-agent memory architecture
Lifecycle of sub-agent definitions (promotion, rollback) → PF1

3.3 Vocabulary

Term	Meaning
Dispatch	The runtime act of resolving an orchestrator tool call into a sub-agent invocation, executing the sub-agent, and returning a tool result to the orchestrator.
Fan-out	Concurrent dispatch of multiple sub-agents in response to a single user turn with multiple intents.
Intent count	The number of distinct sub-agent tool calls the orchestrator’s LLM emits in a single LLM response.
Sub-agent tool	The materialized representation of a registered sub-agent as a tool callable by the orchestrator’s LLM. Named `ask_<sub_agent_id>`.
Inherited context	The bundle the runtime passes to a sub-agent at instantiation: principal, episode reference, locale, location, prior-context summary. Schema lives in the consumer-agent memory architecture.
Envelope	`EnricherResponse[T]` returned synchronously by CCS to sub-agent tool calls. Defined in PS6 §5.2.
Principal	The on-behalf-of user (or service) a dispatch and CCS call operates for. Propagated unchanged across the dispatch chain; verified at every downstream consumer.
Routing accuracy	The rate at which the orchestrator’s LLM dispatches to the correct sub-agent(s) for a given user turn. Measured via eval.
Status event	A typed identifier emitted by a sub-agent during longer operations, rendered by the orchestrator into user-visible progress strings. Vocabulary owned by PC3.

4. Requirements

4.1 Functional requirements

FR-1 — Factory-time discovery. The orchestrator’s set of callable sub-agents MUST be bound at factory invocation time from the orchestrator’s Agent Definition sub_agents list. The runtime MUST NOT perform runtime sub-agent discovery (registry probe, DB lookup, MCP enumeration) after the orchestrator’s LLM has been called for a given turn.

FR-2 — Tool name derivation. Each registered sub-agent MUST be materialized as a tool callable by the orchestrator’s LLM with name ask_<sub_agent_id> and description equal to the sub-agent’s Agent Definition description field.

FR-3 — Single-intent dispatch loop. For a turn that resolves to a single sub-agent tool call, the runtime MUST execute the six-step loop in §5.3 (tool call emission → Definition resolution → instantiation with inherited context → sub-agent execution → tool result return → orchestrator composition).

FR-4 — Orchestrator-mediated composition. Every user-facing response MUST pass through orchestrator composition. Sub-agent results MUST NOT stream directly to the user, regardless of intent count. The orchestrator MAY pass a sub-agent’s response through verbatim, but the routing of the response through orchestrator composition is unconditional.

FR-5 — Inherited session context at sub-agent boundary. When the runtime instantiates a sub-agent, the sub-agent MUST receive: user identity (principal), episode reference, locale, location, and orchestrator-curated prior_context summary when cross-vertical context exists. The exact schema lives in the consumer-agent memory architecture; PC2 commits only that this set is non-empty.

FR-6 — Concurrent fan-out for mixed intent. When the orchestrator’s LLM emits multiple tool calls in a single LLM response, the runtime MUST execute them concurrently via asyncio.gather. Sub-agents MUST NOT block on each other.

FR-7 — Fan-out cap. Concurrent sub-agent invocations per turn MUST be capped at small N (initial value tuned via experiment). When the orchestrator’s LLM emits more tool calls than the cap, the orchestrator’s system prompt MUST instruct it to select top-N by relevance; the runtime MUST NOT execute over-cap tool calls.

FR-8 — Partial-failure tolerance during fan-out. When a sub-agent fails within a fan-out, other sub-agents MUST continue. The orchestrator’s final response MUST acknowledge the failure in natural language. Raw error events MUST NOT be exposed to the user.

FR-9 — Sync CCS interaction. Sub-agent tools backed by CCS MUST receive EnricherResponse[T] synchronously. The contract MUST NOT introduce a streaming seam between sub-agent and CCS.

FR-10 — Status-based suppression. Sub-agent tools MUST use envelope.status as the single decision boundary for downstream handling: ok proceeds, partial proceeds with optional caveat, error treats the call as failed. Sub-agents MUST NOT re-inspect envelope.partial[] for criticality.

FR-11 — Principal chain of custody. Principal MUST be propagated unchanged across the dispatch chain (session → orchestrator → sub-agent → CCS). Sub-agent tool wrappers MUST populate EnricherRequest.Principal on every CCS call.

FR-12 — Principal verification at sub-agent boundary. Sub-agent tool wrappers MUST verify envelope.principal matches the invoking user before passing payload to the sub-agent’s LLM context. Mismatch MUST be treated as a security incident (suppress + alert + emit ps6.envelope.principal_mismatch_total{consumer="consumer-agent"}). Retry MUST NOT be attempted.

FR-13 — Typed status events. Sub-agents MUST emit status events as typed identifiers, not prose. The orchestrator MUST own rendering policy (forward / transform / suppress / batch) per event. The typed event vocabulary and transport mechanism are PC3’s contract.

FR-14 — Natural-language failure surfacing. Across all failure axes (sub-agent invocation failure, CCS status="error", partial fan-out failure), the orchestrator’s final response MUST acknowledge the failure in natural language. Raw exceptions, error codes, and structured error payloads MUST NOT reach the user.

FR-15 — Routing observability. Each turn MUST emit a routing-decision trace event capturing: which sub-agents were invoked, intent count, fan-out cap behavior (within/at/over cap), per-sub-agent outcome (success/failure/status="error"). Trace shape is platform-standard OpenTelemetry; observability metrics are §10.2.

4.2 Non-functional requirements

NFR-1 — Dispatch overhead latency. The runtime overhead for the dispatch loop (steps 2–5 of §5.3, excluding sub-agent LLM call and CCS call) MUST be bounded such that it does not dominate turn latency. Target: dispatch overhead p95 below the per-tool latency baseline of the slowest CCS-backed tool. Baseline source: PLT-609 Phase C production latency report.

NFR-2 — Fan-out total latency. Total turn latency for a fan-out of N sub-agents MUST be bounded by max(sub_agent_latency_i) + orchestrator_composition_latency, not sum(sub_agent_latency_i). asyncio.gather enforces this at the implementation layer (FR-6); NFR-2 is the observable contract.

NFR-3 — Routing accuracy. Per-vertical routing accuracy MUST meet eval thresholds before the vertical’s promotion (PF1). Initial threshold target: 95% on intent-switch eval pairs covering registered verticals. Baseline source: PC1 §11.2 empirical validation (100% across 4 test cases on a small dataset; production scale requires re-measurement post-merge).

NFR-4 — Principal mismatch is zero-tolerance. ps6.envelope.principal_mismatch_total MUST be zero in production at steady state. Any non-zero value MUST trigger immediate alert (alerting threshold: >0 events in any 5-minute window). This is a security boundary, not a quality metric.

NFR-5 — Status event cadence. Sub-agent status events SHOULD be emitted at a cadence that gives users perceptible feedback during operations longer than a soft threshold (target: events visible within 2 seconds of sub-agent invocation start when sub-agent operations exceed this duration). Implementation tuning, not a hard SLO.

NFR-6 — Eval coverage on intent switches. Routing eval coverage MUST include intent-switch pairs across every pairwise combination of registered verticals at the time of PC2 implementation. Coverage scales with vertical count; cost is bounded because pair count grows quadratically but vertical count is capped at ≤10 at expected scale.

NFR-7 — Fan-out reliability at N greater than 2 is unmeasured at PC2 commit time. Today’s orchestrator has only two sub-agent tools (ask_shopping, ask_support). N=3 reliability cannot be tested until a third sub-agent is registered. NFR-6 measurement at N=3 is a follow-up; NFR-7 is the documented gap.

4.3 Acceptance criteria

AC-1 — Given an orchestrator Agent Definition with sub_agents: [shop, rewards, support], when the orchestrator’s LLM is called, it sees exactly three sub-agent tools named ask_shop, ask_rewards, ask_support, with descriptions matching each sub-agent’s Definition description field.

AC-2 — Given a user turn resolving to a single sub-agent, the dispatch trace shows: one tool call emitted, one sub-agent instantiated, one tool result returned, one orchestrator composition. The user-facing response originates from the orchestrator’s LLM, not the sub-agent’s LLM directly.

AC-3 — Given a user turn resolving to two sub-agent intents, both sub-agent tool calls execute concurrently (overlapping invocation timestamps in trace) and the final response composes from both results, even if one sub-agent’s latency is significantly greater than the other.

AC-4 — Given the orchestrator’s LLM emits N+1 tool calls where N is the configured cap, only N sub-agents are invoked. The remaining tool call is dropped before runtime execution, with a trace event noting the over-cap behavior.

AC-5 — Given a sub-agent’s CCS call returns envelope.status="error", the sub-agent’s tool result indicates data unavailable. The orchestrator’s final response acknowledges the unavailability in natural language. The user does not see error codes, stack traces, or partial[] contents.

AC-6 — Given a CCS envelope where envelope.principal != invoking_user, the sub-agent’s tool wrapper suppresses the payload, emits ps6.envelope.principal_mismatch_total{consumer="consumer-agent"}, and returns a tool result indicating data unavailable. The mismatch surfaces to alerting; the user receives a natural-language failure acknowledgment.

AC-7 — Given an orchestrator dispatching to a sub-agent, the sub-agent’s EnricherRequest.Principal on every CCS call equals the orchestrator’s inherited principal. CCS responses with 403 principal_mismatch are zero in production.

AC-8 — Given a fan-out of N sub-agents where M succeed and (N-M) fail, the orchestrator’s final response presents the M successes and acknowledges the (N-M) failures in natural language. Trace events show per-sub-agent outcomes.

AC-9 — Given a sub-agent operation expected to exceed the status-event soft threshold, at least one typed status event is emitted from the sub-agent and renders as a user-visible progress string in the streaming output (mechanism is PC3’s; AC-9 observes the contract surface).

AC-10 — Routing-decision trace events are present on every turn and contain: sub-agents invoked (list), intent count, cap behavior, per-sub-agent outcome. Trace events feed §10.2 observability metrics without additional client-side aggregation.

5. Solution Design

5.1 The architectural through-line

Discovery is factory-time; execution is per-turn; the orchestrator’s LLM is the runtime decision-maker.

Three properties hold across every PC2 contract:

No runtime discovery. The set of sub-agents available to the orchestrator is fixed at factory time from the orchestrator’s Agent Definition (see PC1 §5.2). No probe, no DB lookup, no MCP-style enumeration at request time. A sub-agent is either bound at factory time or not callable this turn.
The orchestrator’s LLM owns routing. Routing decisions are emitted as tool calls by the orchestrator’s LLM operating on Agent Definition description fields. PC2 does not introduce a classifier, a router service, or any deterministic routing layer. The LLM is the routing primitive.
The dispatch loop is uniform. Single-intent and mixed-intent turns use the same primitive — one or more tool calls, each resolved to a sub-agent invocation, each returning a tool result, all composed by the orchestrator. Mixed-intent is gather over the same loop, not a separate code path.

These three properties together produce a system where adding a vertical means adding an Agent Definition to the sub_agents list — no router config, no dispatch table, no execution-graph change.

5.2 Discovery surface

A sub-agent is discoverable to the orchestrator when its Agent Definition is bound into the orchestrator’s sub_agents list at factory time. The runtime materializes each entry as a tool callable by the orchestrator’s LLM (PC1 §5.3). The orchestrator’s tool list, as presented to the LLM, is:

Platform-level tools the orchestrator calls directly (e.g., llm_feedback)
One tool per registered sub-agent, with name derived from the sub-agent’s id and description from the sub-agent’s description field

The description field is load-bearing. It is the only signal the orchestrator’s LLM has about when to dispatch to a given sub-agent. Description quality directly drives routing accuracy. A vertical that ships a vague description ships a routing problem.

No runtime introspection. The orchestrator does not query a registry at turn time, does not page in tool sets, does not consider feature-flag-gated sub-agents conditionally per request. Feature-flag gating, if needed, is resolved at factory invocation (which runs per-request but produces a fully-bound orchestrator before the LLM is called).

Cap on sub-agent count. At expected scale (≤10 verticals), all sub-agents fit comfortably in the orchestrator’s tool list with their descriptions. Beyond that scale, escalation patterns exist (lazy-loaded descriptions, RAG-over-verticals, hierarchical routing) and are deferred to follow-on specs.

Tool name derivation. Sub-agent tool names follow a stable convention: ask_<sub_agent_id> (e.g., ask_shop, ask_rewards). This is observable in eval traces and production logs.

5.3 Single-intent dispatch

The canonical loop for a turn that resolves to a single sub-agent:

Tool call emission. Orchestrator LLM, conditioned on the user turn and its prompt blocks, emits a tool call naming exactly one sub-agent tool.
Definition resolution. Runtime looks up the named tool, retrieves the bound sub-agent’s Agent Definition.
Sub-agent instantiation. The shared Agent runtime class instantiates the sub-agent with inherited session context: user identity, episode reference, locale, location, and the orchestrator’s prior_context summary if cross-vertical context exists. The precise schema lives in the consumer-agent memory architecture; PC2 commits only that some inherited context exists at this boundary.
Sub-agent execution. The sub-agent runs its own LLM loop with its bound tools (CCS endpoints via the connector framework per PS2, and any direct tools per PC1 §5.8). The sub-agent may emit structured status events during this loop (§5.7).
Tool result return. The sub-agent’s response — narration text plus optional structured component payloads via the render_* tool path — returns to the orchestrator as the tool result.
Final response composition. The orchestrator’s LLM composes the user-facing response from the tool result. For single-intent dispatches, this typically passes the sub-agent’s response through with minimal modification, preserving the orchestrator’s voice and applying platform-level format/safety blocks.

Why orchestrator-mediated composition for single-intent too. A naive optimization would have single-intent results bypass the orchestrator and stream directly to the user. PC1 §5.3 commits against this — the orchestrator owns the voice/UX seam unconditionally. PC2 inherits that commitment: every user-facing response passes through orchestrator composition, regardless of intent count.

Inherited context is reference-bounded. The sub-agent receives identifiers (episode_id, user_id), context values (locale, location, principal), and the orchestrator-curated prior_context prose when cross-vertical context exists. The sub-agent retrieves its own prior session history via the consumer-agent memory architecture. This keeps the dispatch payload bounded regardless of conversation length.

5.4 Mixed-intent fan-out

Priority note. Mixed-intent queries are rare in production today. The contract below is the v1 shape for when it ships; optimization is deferred until production data shows mixed-intent volume warrants it.

Detection. Mixed-intent is detected by the orchestrator’s LLM emitting multiple tool calls in a single LLM response, not by a separate classifier. The runtime executes whatever the LLM emits; the LLM is the intent-count signal.

Concurrency. Multiple tool calls in a single LLM response execute concurrently via asyncio.gather. Sub-agents do not block on each other. Each sub-agent’s invocation follows the single-intent loop (§5.3) independently.

Cap on N. Concurrent sub-agents per turn are capped at small N, tuned via experiment. The primary motivation is latency — total turn latency is bounded by the slowest sub-agent, so unbounded fan-out risks pathological tails. A secondary cognitive-load hypothesis (composed responses from many sub-agents may lose coherence) is the working assumption but is unmeasured until N>2 production data exists.

Over-cap selection. If the orchestrator’s LLM emits more tool calls than the cap, the orchestrator’s system prompt instructs it to pick the top-N by relevance. This selection happens in the LLM, not in the runtime — the runtime simply enforces the cap.

Final-response composition. The orchestrator composes the final response from aggregated sub-agent results. Typical composition presents each result in turn with brief transitions (“Here are the offers I found… And on your points balance…”) rather than synthesizing a single rewritten response. The composition mechanism is an implementation detail; the contract is that the orchestrator owns the final response and the user sees one coherent flow under the orchestrator’s voice.

Partial failure during fan-out. If a sub-agent in a fan-out fails (timeout, runtime exception, unrecoverable tool error), other sub-agents continue. The orchestrator’s final response acknowledges the failure in natural language (e.g., “I wasn’t able to get your points balance right now, but here are the offers I found”). Raw error events are never exposed to the user. System-level error signaling for client telemetry is PC3’s seam (see OQ-5).

Status-event suppression during parallel execution. The orchestrator owns per-event suppression policy across concurrent sub-agents. Concretely: when two sub-agents emit searching_offers and looking_up_points_balance simultaneously, the orchestrator decides whether to render both (“Searching offers and looking up your points…”), one at a time, or suppress entirely. There is no architectural distinction between “sub-agent streaming direct” and “sub-agent streaming via orchestrator”; both flow through the same ambient stream primitive (PC3 owns the mechanism), and the orchestrator decides per-event what reaches the user.

5.5 Sub-agent to CCS interaction

Sub-agents reach external data through tools. The canonical tool path is a CCS endpoint exposed via the connector framework (PS2 owns the framework; PC2 owns the dispatch contract through it).

Sync seam. CCS returns the universal envelope EnricherResponse[T] synchronously on REST or MCP. Sub-agents do not stream from CCS; the sub-agent’s tool call is a request/response unit. Streaming to the user is PC3’s protocol, applied over the orchestrator’s composed final response, not over CCS calls.

Envelope unwrapping. The sub-agent’s tool wrapper receives EnricherResponse[T] and unwraps payload using (domain_type, enricher_id) against the registry. The registry is owned by CCS (PS6 §5.2 defines the typing rules); the sub-agent’s tool consumes it.

Status-based suppression. The envelope’s status field is the single decision boundary:

status="ok" — payload safe to use; sub-agent proceeds with the payload as tool output.
status="partial" — usable degraded data; sub-agent proceeds, may surface a soft caveat in its narration (“I found these offers, though some retailers were unavailable”).
status="error" — response is not safe to act on; sub-agent treats the call as failed, returns a tool result indicating data unavailable. Final-response acknowledgment is the orchestrator’s responsibility per §5.4.

PC2 does not re-inspect envelope.partial[] for criticality. PS6 forces status="error" whenever any FailedSource.Critical=true; the status field already encodes that decision. Re-inspection at the sub-agent layer would duplicate PS6’s logic and risk divergence.

Version skew tolerance. The envelope carries a semver version field on the payload schema. Sub-agents tolerate non-breaking skew (same major version); breaking skew is reported via the upstream metric ps6.envelope.version_mismatch_total (PS6-owned), not surfaced to the user. PS6 commits to a deprecation window on breaking changes plus a zero-use metric gate, except for security-driven changes which are atomic.

Cache metadata is observable, not actionable. Sub-agents do not make decisions on envelope.cache_meta (cached/cacheable/TTL/key). It exists for observability and downstream telemetry. Sub-agents pass cache metadata to their own observability surface (Opik traces) without conditional logic on its values.

5.6 Principal propagation

The principal field identifies the on-behalf-of user (or service) every dispatch and CCS call is operating for. PC2 specifies how it flows across the dispatch chain.

Chain of custody.

Session → orchestrator. The orchestrator receives principal from the incoming session context. For chat turns, principal is the authenticated user_id; for scheduled invocations (PD1), it is the scheduled job’s target user.
Orchestrator → sub-agent. The orchestrator passes principal to the sub-agent via inherited session context at instantiation time (§5.3 step 3). Sub-agents do not re-derive principal; they inherit it.
Sub-agent → CCS. The sub-agent’s tool wrapper populates EnricherRequest.Principal on every CCS call. Missing principal returns 400 missing_principal from CCS (PS6 AC-10). Principal mismatch against the calling credential returns 403 principal_mismatch (PS6 FR-11).
CCS → sub-agent (verification). When the envelope returns, the sub-agent’s tool wrapper verifies envelope.principal matches the invoking user before passing the payload to the sub-agent’s LLM context. Mismatch is treated as a security incident: suppress the payload, alert via the upstream metric ps6.envelope.principal_mismatch_total{consumer="consumer-agent"}, return a tool result indicating data unavailable. This is not a recoverable error; retry is not appropriate.

Why verification happens at the sub-agent boundary. Two alternatives existed: verify at the orchestrator (post-tool-result) or verify at the sub-agent’s tool wrapper (post-CCS-call). PC2 places verification at the sub-agent’s tool wrapper because:

It is closest to the failure surface (the CCS call itself), keeping blame localizable.
It prevents tainted data from entering the sub-agent’s LLM context, where it could influence subsequent tool calls before mismatch is detected.
It composes naturally — every sub-agent that wraps a CCS endpoint inherits verification via the shared tool wrapper, regardless of which orchestrator dispatched it.

v1 caveat — producer-asserted principal. PS6 R-7 acknowledges that v1 principal is producer-asserted, not cryptographically bound. HMAC signing is deferred to PS6 v2. PC2 inherits this trust assumption; if and when PS6 ships HMAC signing, PC2 sub-agents will verify the signature as part of the same wrapper.

Scheduled-execution and notification paths. When a turn is initiated by PD1 (scheduled execution) or routed through Notification Service (DM delivery), the same chain applies — principal is set at the originating boundary, propagated unchanged, and verified at every downstream consumer. PD1 and PD2/Notification Service are out of scope for PC2’s dispatch contract but inherit the propagation rule.

5.7 Status event primitive

During longer sub-agent operations, sub-agents emit structured status events — typed identifiers describing work in progress (e.g., searching_offers, matching_receipt, looking_up_purchase_history), not prose. The orchestrator renders these into user-visible progress strings, keeping the user informed without surfacing raw event types.

PC2 commits to the primitive; PC3 owns the rest.

PC2’s commitments:

Sub-agents emit typed status events; events are identifiers, not prose. Sub-agents drive cadence (they know what they’re doing); the orchestrator owns the words.
The orchestrator has authority to forward, transform, suppress, or batch any event before it reaches the user. This is per-event policy, not per-sub-agent.
During fan-out (§5.4), suppression policy applies across concurrent emitters; the orchestrator may rate-limit, collapse, or interleave events from parallel sub-agents.

Deferred to PC3:

The typed-event vocabulary — what events exist, their naming, their lifecycle, additions/deprecations.
The transport mechanism — how events flow from sub-agent to orchestrator to user. The reference implementation uses LangGraph’s ambient get_stream_writer(); PC3 owns the protocol contract.
The user-facing event protocol — how events serialize on the wire to the client (SSE shape, payload format, ordering guarantees).
Client-side rendering and telemetry — how the iOS/web client renders progress strings and reports event analytics.

Why split this way. Status events have two surfaces: (1) the dispatch-time contract between sub-agent and orchestrator (PC2-owned, because it’s part of the execution loop), and (2) the streaming protocol from orchestrator to user (PC3-owned, because it’s part of the client-facing event protocol). PC2 specifies that the primitive exists and how the orchestrator participates; PC3 specifies the wire shape and vocabulary.

5.8 Failure propagation

Three failure axes cross PC2’s contract surface. Each has a defined orchestrator response.

Axis 1: Sub-agent invocation fails entirely.

Scenarios: sub-agent runtime exception, sub-agent LLM call times out, sub-agent’s instantiation fails (e.g., model registry lookup error). The dispatch loop (§5.3) does not return a tool result; the runtime returns an error to the orchestrator.

Orchestrator response: treat as a missing answer. In single-intent, the orchestrator’s final response acknowledges inability to handle the request (“I wasn’t able to look into that right now”). In fan-out, the orchestrator composes from the sub-agents that did succeed and acknowledges the missing one. Retry is not automatic at PC2’s layer; retry policy is operational tuning, not specified at the PC-2 layer.

Axis 2: Sub-agent’s tool call to CCS returns status="error".

Scenario: CCS enricher fails on a critical source (PS6 FR-5 forces status="error"). The sub-agent’s tool wrapper receives the envelope, observes status="error", returns a tool result to the sub-agent’s LLM indicating data unavailable.

Sub-agent response: the sub-agent’s LLM may exercise judgment within the current turn (try a different tool, refine a query, or surface unavailability in its narration). This is the LLM’s own reasoning loop, not a runtime retry of the failed tool call. The sub-agent’s final tool result to the orchestrator carries the degraded outcome.

Orchestrator response: composes the final user-facing response from the sub-agent’s degraded outcome, applying natural-language acknowledgment as needed. Same surface as Axis 1.

Axis 3: Partial failure during fan-out.

Scenario: of N concurrent sub-agents, M succeed and (N-M) fail (any combination of Axis 1 or Axis 2 causes). asyncio.gather collects all results regardless of individual outcomes; the orchestrator sees per-sub-agent outcomes.

Orchestrator response: compose from the M successes, acknowledge the (N-M) failures in natural language. The user sees one coherent flow, never an exception trace or raw error.

What PC2 does not own at this layer.

Client-side telemetry on status="error" propagation — does consumer-agent emit a client-facing error signal beyond the natural-language acknowledgment? PC1 §5.4 says this is PC3’s seam; PC2 inherits that boundary. See OQ-5.
Observability emission for failure events — counters, traces, alerts. Observability is §10.2 of every spec; PC2’s §10.2 lists what to count, but the protocol for emitting them is the platform’s standard OpenTelemetry surface, not a PC2 contract.
Operational retry policy — when to retry, how many times, with what backoff. Tunable, not architectural.

5.9 Routing accuracy as a contract

Routing is owned by the orchestrator’s LLM operating on Agent Definition description fields (§5.2). This makes routing quality a function of two things:

Description quality. A vertical’s Agent Definition description is the only signal the orchestrator LLM has about when to dispatch to that sub-agent. Vague descriptions degrade routing. Specific, capability-anchored descriptions improve it.
Orchestrator prompt quality. The orchestrator’s prompt blocks (persona, routing instructions, prior-context-generation) shape how the LLM interprets descriptions. Drift here degrades routing across all verticals at once.

Routing accuracy is measured, not asserted.

Eval coverage on intent-switch pairs across registered verticals (§9.3) catches routing regressions.
Production sampling on a representative slice catches drift between eval and production distribution.
Per-vertical routing-accuracy thresholds gate vertical onboarding (PF1’s promotion semantics).

Lever when routing degrades. When routing accuracy falls below threshold for a specific vertical, the lever is the responsible Agent Definition’s description field. PR-reviewable, git-versioned (PC1 §5.7), evaluable in isolation. Orchestrator prompt changes are the lever when degradation crosses multiple verticals — those go through Opik prompt versioning and the orchestrator’s eval suite.

Routing is not a service. PC2 does not introduce a separate routing service, classifier model, or deterministic dispatch table. The orchestrator’s LLM is the routing primitive; PC2’s contract is the runtime loop around that primitive, not a replacement for it.

6. Cross-Section Impact

Spec	Citation
PC1 (Agent Composition)	PC2 inherits Agent Definition shape and orchestrator pattern from PC1; PC2 owns the runtime execution contract (concurrency model, cap, partial-failure semantics, status-event suppression policy during parallel execution).
PC3 (Execution Modes & Event Streaming)	PC2 commits to the status-event primitive; PC3 owns vocabulary, transport mechanism, and user-facing streaming protocol. Client-side telemetry on `status="error"` propagation is PC3’s seam.
PS6 (Domain Object Enrichment & BFF Assembly)	PC2 dispatches to CCS via sub-agent tools; consumes `EnricherResponse[T]`; honors principal verification, status-based suppression, version skew tolerance. PS6 owns envelope shape; PC2 owns multi-enricher orchestration (explicitly out-of-scope in PS6).
PS2 (Connector Framework)	PC2 references tool reference shape from Agent Definition; tool authoring workflow lives in PS2.
PF1 (Agent Lifecycle)	PC2 inherits Agent Definition active-in-registry state from PF1’s promotion semantics.
PF2 (Teams & Multi-Agent Coordination)	PC2 specifies orchestrator → sub-agent dispatch (single hop). PF2 covers multi-hop coordination patterns if/when needed.

7. Dependencies

Platform spec dependencies: PC1 (Agent Composition). Implementation dependencies: LangChain v1, LangGraph, consumer-context-service (CCS) endpoints. External dependencies: None.

8. Risks & Open Questions

8.1 Risks

R-1: Orchestrator routing quality is the system’s single point of routing failure. The orchestrator’s LLM operating on Agent Definition description fields is the only routing mechanism. Description drift, prompt drift, or model regression degrades the entire dispatch surface at once. Mitigated by routing-accuracy eval coverage on intent-switch pairs (NFR-3, §9.3) plus per-vertical thresholds gating promotion (PF1).

R-2: Fan-out cap heuristic is unmeasured at N greater than 2. Today’s orchestrator has only ask_shopping and ask_support. The cap is framed as tuned via experiment because N=3 reliability cannot be measured until a third sub-agent ships. Mitigated by documenting the gap as NFR-7 and scheduling re-measurement on third-vertical onboarding.

R-3: Partial-failure acknowledgment quality depends on orchestrator prompt. FR-14 requires natural-language acknowledgment of failures across all three axes. Quality is driven by the orchestrator’s prompt instructions — vague prompts produce vague acknowledgments, which can confuse users. Mitigated by eval coverage on partial-failure scenarios (§9.5) and prompt iteration with Opik versioning.

R-4: Principal mismatch is producer-asserted at PS6 v1. PS6 R-7 notes that envelope.principal is producer-asserted, not cryptographically bound, until HMAC signing ships in PS6 v2. PC2 inherits this trust assumption — a malicious or buggy CCS enricher could in principle write any value to envelope.principal. Mitigated by network/IAM boundaries between consumer-agent and CCS today; PC2 will tighten verification when PS6 ships v2.

R-5: Status-event cadence is implementation-tunable, not contract-enforced. NFR-5 sets a soft target (events visible within 2 seconds for operations exceeding the threshold). Sub-agents that emit events too sparsely produce silent-feeling waits; sub-agents that emit too aggressively flood the user. Mitigated by per-sub-agent eval coverage on perceived-latency proxy metrics; tuning is operational.

R-6: Inherited context schema lives outside PC2 but is PC2-load-bearing. FR-5 requires inherited context include principal, episode reference, locale, location, and orchestrator-curated prior_context. The schema lives in the consumer-agent memory architecture. If that schema evolves, PC2’s FR-5 must absorb the redefinition. Mitigated by cross-section contract test (§9.4) and explicit dependency declaration in §6.

R-7: Routing observability traces depend on platform OpenTelemetry surface. FR-15 and AC-10 require routing-decision trace events. If the platform’s OpenTelemetry instrumentation lags or drops events under load, observability degrades and §10.2 metrics become unreliable. Mitigated by platform-standard tracing infrastructure ownership (not PC2’s concern); PC2 commits to the contract surface, not to instrumentation reliability.

R-8: Asyncio.gather error handling is implementation-detail-sensitive. asyncio.gather(*tasks, return_exceptions=True) is the canonical pattern for fan-out, but the choice of return_exceptions vs default error-propagation affects whether one sub-agent’s failure cancels siblings. PC2 commits to the contract (FR-8: other sub-agents continue) without prescribing the implementation pattern. Mitigated by integration tests on partial-failure scenarios (§9.2).

8.2 Open Questions

OQ-1: At what point in the inherited-context construction does the orchestrator’s prior_context summary get generated — eagerly on every cross-vertical dispatch, or lazily only when the orchestrator’s LLM determines the sub-agent will benefit? PC2 leans eager generation when the orchestrator detects intent-switch from the prior turn (matches PC1 §5.6 framing). The trigger heuristic is an implementation detail of the consumer-agent memory architecture; PC2 commits to that it happens, not when.

OQ-2: Sub-agent invocation timeout policy — fixed timeout, per-sub-agent configurable, dynamically adjusted based on user-visible status events? PC2 commits only that timeouts exist and trigger Axis 1 failure (§5.8); the policy is operational. Worth surfacing if fan-out latency degrades production turns.

OQ-3: Status-event rendering policy when sub-agents emit events at different rates during fan-out — strict interleave, batching by time window, batching by sub-agent? PC2 commits to “orchestrator owns rendering policy” (FR-13); the specific algorithm is PC3’s concern.

OQ-4: Whether the orchestrator’s LLM should ever be allowed to make zero tool calls on a turn (declining to route entirely) — versus always being required to route to some sub-agent or to itself. PC1 implicitly assumes routing happens; PC2 inherits without explicit treatment. Edge case; surface only if it becomes a production pattern.

OQ-5 (client-side error telemetry shape): When a sub-agent’s CCS call returns envelope.status="error", PC-3 §5.10 + FR-17 specifies payload-level is_final semantics on error frames with v0.4-compat behavior. The remaining open question is the telemetry surface — counter cardinality, sampling rate, PII discipline. Lean: emit a typed agent.error_emit event with {sub_agent_id, axis, is_final} payload; PII never on the event. Needs PS6-owner (Frank) input on whether enricher_id is acceptable on the telemetry event.

9. Testing Strategy

9.1 Unit tests

Sub-agent tool materialization at factory time: given sub_agents: [a, b, c], the orchestrator’s tool list contains exactly three sub-agent tools with correct name and description
Tool name derivation (ask_<sub_agent_id>) for valid and edge-case agent ids
Inherited-context bundle assembly: principal, episode reference, locale, location, prior-context summary present at sub-agent instantiation
Envelope unwrapping by (domain_type, enricher_id) against the typing registry
Principal verification in sub-agent tool wrappers: pass when match, suppress + alert when mismatch
Status-based suppression decision matrix: ok/partial/error mapped to correct sub-agent outcome
Fan-out cap enforcement: tool call count beyond cap is dropped before runtime execution

9.2 Integration tests

Single-intent dispatch end-to-end: user turn → orchestrator tool call → sub-agent invocation → tool result → orchestrator composition → user-facing response
Mixed-intent fan-out at N=2: both sub-agents invoked concurrently (overlapping invocation timestamps), final response composes from both results
Fan-out partial failure: one sub-agent fails (timeout or status="error"), other succeeds; final response acknowledges failure in natural language
Principal chain of custody across the full dispatch chain: session → orchestrator → sub-agent → CCS, verified against trace events
CCS envelope handling for each status value: ok proceeds, partial proceeds with caveat, error triggers natural-language failure surfacing
Inherited-context propagation: sub-agent receives orchestrator-curated prior_context on intent-switch turns
Status events emitted from a sub-agent surface as user-visible progress strings during operations exceeding the soft threshold

9.3 Eval coverage (Opik)

Routing accuracy per vertical on intent-switch eval pairs (every pairwise combination of registered verticals)
Mixed-intent eval set: turns with multiple intents validate fan-out behavior and final-response composition quality
Partial-failure eval set: scenarios with sub-agent failures or status="error" envelopes validate natural-language acknowledgment quality
Per-vertical response quality on dispatched sub-agent outputs (each vertical’s own judge thresholds)
Routing-accuracy threshold gates per-vertical promotion (PF1’s promotion semantics)

9.4 Contract tests

Cross-section contract with PC1: Agent Definition shape and orchestrator pattern surfaces match
Cross-section contract with PS6: EnricherResponse[T] envelope unwrapping, principal verification, status-based suppression match PS6 §5.2 / FR-5 / FR-11
Cross-section contract with PC3: status-event primitive boundary — sub-agent emits typed events, orchestrator owns rendering, vocabulary/transport in PC3
Cross-section contract with consumer-agent memory architecture: inherited-context bundle conforms to the memory layer’s schema
Cross-section contract with PS2: sub-agent tool reference shape matches connector framework registration

9.5 Failure-mode testing

Sub-agent runtime exception during invocation (Axis 1): orchestrator surfaces natural-language acknowledgment; trace event records failure
Sub-agent LLM timeout (Axis 1): same behavior as runtime exception
CCS returns envelope.status="error" (Axis 2): sub-agent’s tool result indicates data unavailable; orchestrator composes degraded response
Principal mismatch (envelope.principal != invoking_user): payload suppressed, metric emitted, security alert triggered, user receives natural-language failure
Fan-out partial failure at varying success/failure ratios (M of N succeed)
Over-cap fan-out: orchestrator LLM emits N+1 tool calls, only N execute, trace records cap behavior
Sub-agent emits malformed status event (event type not in typed vocabulary): orchestrator drops the event silently or logs but does not surface to user (specific policy deferred to PC3)
CCS version skew: envelope version field mismatches sub-agent expectation; sub-agent tolerates compatible skew, surfaces upstream metric on incompatible skew

10. Rollout & Observability

10.1 Rollout phases

Phase 1 — Spec validation. PC2 reviewed and approved; cross-section contracts confirmed with PC1, PC3, PS6, PS2, PF1 reviewers.

Phase 2 — Implementation tickets. PC2 decomposed into ~3pt Jira tickets per Spec Lab assembly-line workflow.

Phase 3 — Compliance retrofit. Existing sub-agent tool wrappers (shop, support) retrofitted to FR-11 / FR-12 principal handling and FR-10 status-based suppression. Status-event emission added to long-running tool operations.

Phase 4 — Third-vertical onboarding as fan-out validation. First new vertical (e.g., rewards / PointPass) onboarded; mixed-intent fan-out at N=3 measured (resolves NFR-7).

Phase 5 — Eval scale-up. Per-vertical routing-accuracy thresholds operationalized at production scale; intent-switch eval coverage expanded as new verticals onboard.

10.2 Observability metrics

Routing accuracy by vertical (eval + production sample) — catches misclassification regressions
Intent count distribution per turn — histogram (1 / 2 / 3+) informs fan-out cap tuning
Fan-out cap-exceeded count — count of turns where orchestrator LLM emitted more tool calls than cap; non-zero is the signal to raise cap
Sub-agent invocation latency p50/p95/p99 per sub-agent — feeds NFR-1 dispatch-overhead bound and NFR-2 fan-out latency bound
Dispatch overhead latency p95 — runtime cost of dispatch loop excluding sub-agent LLM + CCS calls
ps6.envelope.principal_mismatch_total{consumer="consumer-agent"} — security boundary metric; non-zero triggers immediate alert (NFR-4)
Status event emission rate per sub-agent invocation — informs NFR-5 cadence tuning
Partial-failure rate during fan-out — M/N success ratio across fan-out turns
Per-axis failure count (Axis 1 / 2 / 3) — informs which failure mode dominates production

10.3 Rollback

PC2 is a contract spec, not deployable code. Rollback semantics apply at three layers:

Sub-agent registration rollback — removing a sub-agent from the orchestrator’s sub_agents list reverts to the prior dispatch surface; Agent Definition versioning per PC1 §5.7 / PF1.
Orchestrator prompt rollback — orchestrator prompt-block changes that degrade routing roll back via Opik prompt versioning.
Architecture-level rollback — reverting to legacy classifier path requires platform team intervention; not expected.

11. Appendix

11.1 Source references

PC1: Agent Composition — the primitive PC2 builds on (Agent Definition shape, orchestrator pattern)
PS6: Domain Object Enrichment & BFF Assembly — EnricherResponse[T] envelope, principal verification, status-based suppression
Platform Spec Lab — Wave 0/1 sequencing and dependency graph
PLT-609 Phase C: Production Latency Report — production latency baselines cited in NFR-1
Orchestrator Model Benchmark Report — orchestrator routing model + prompt benchmark; routing-accuracy baseline
Miro design board — supplementary architecture diagrams

11.2 Decisions resolved during design

#	Decision	Resolution
1	Routing primitive	Orchestrator’s LLM operating on Agent Definition `description` fields. No separate classifier, router service, or deterministic dispatch table. Routing is measured via eval; description quality is the lever.
2	Discovery timing	Factory-time only. No runtime registry probe, no MCP-style enumeration, no DB lookup per turn. Feature-flag gating resolved at factory invocation.
3	Composition seam	Orchestrator-mediated composition is unconditional. Single-intent and fan-out both pass through orchestrator composition, regardless of intent count. Inherited from PC1 §5.3 commitment.
4	Sync vs streaming at CCS seam	CCS calls are synchronous per sub-agent tool. Streaming to the user is PC3’s protocol applied over orchestrator’s composed final response, not over CCS calls. Aligned with PS6 out-of-scope statement.
5	Principal verification location	At the sub-agent tool wrapper, post-CCS-call. Closest to the failure surface; prevents tainted data from entering sub-agent LLM context; composes uniformly across all sub-agents wrapping CCS.
6	Partial-failure signal	`envelope.status` is the single decision boundary. Sub-agents do not re-inspect `envelope.partial[]` for criticality. PS6 already encodes criticality in `status`.
7	Failure surfacing to user	Natural-language acknowledgment for all failure axes (FR-14). Raw exceptions / error codes / structured error payloads never reach the user. Client-side telemetry on errors is PC3’s seam (scope refined in OQ-5; PC-3 §5.10 specifies wire semantics, telemetry shape pending).
8	Fan-out concurrency model	`asyncio.gather`-based concurrent dispatch with a small-N cap, top-N-by-relevance selection if exceeded. Cap value operational, not architectural.

11.3 Migration receipts from PC1

PC2 absorbs the following from PC1:

PC1 §5.4 Mixed-intent fan-out → PC2 §5.4 (this spec)

The cleanup sequence applies the topic-ownership discipline: each downstream spec absorbs its receiving content natively, then a follow-up commit on the source spec strips the migrated content and replaces with a one-line reference. This keeps each spec’s surface area scoped to its own ownership boundary.