Sub-Agent Lifecycle (dev → test → promote → rollback)

PF1: Sub-Agent Lifecycle (dev → test → promote → rollback)

1. Problem Statement

The consumer-agent runs multiple sub-agents (Scout / Forethought, Shopping, Earnings Coach when it lands, future Play / eReceipts / Rewards / Restaurant verticals). Each sub-agent is a factory-registered Python module conforming to PC1’s Agent Definition contract. Today: a new sub-agent appears in agent_config.yaml, gets imported by factory.py, and goes live on the next deploy. Removing one is symmetric — delete the entry, redeploy. No staged ramp, no formal promotion criteria, no codified decommission cleanup path.

That works at the current vertical count (Shop + Support/Scout live, others in flight) but doesn’t scale. As Play / eReceipts / Rewards / Restaurant onboard, the “appears in config + deploy” approach hits three failure modes:

No promotion discipline. A new sub-agent’s first production traffic is also its widest blast radius — 100% of routed users see it. Bad sub-agents are caught by user reports, not eval gates.
Rollback requires redeploy. Reverting a bad sub-agent change means a config change + image build + ECS rollout. Slow when the bug is user-facing.
Decommission is undefined. When a sub-agent is permanently removed from the codebase (e.g., a sub-agent is replaced by another, or its capability is folded into a different sub-agent), the old sub-agent’s eval suite, dataset entries, prompt blocks, and tool dependencies leave residue. No documented cleanup contract.

PF1 closes these gaps. Three load-bearing commitments:

A sub-agent moves through an explicit lifecycle state machine. dev → test → promote → rollback. Each transition has a clear gate (PC5 owns the gate semantics) or a clear operator action. Operational sub-states inside promote (ramp percentage progression) are recorded in the PC6 experiment config, not as separate lifecycle states.
Promotion = Agent Definition becomes active in the registry. No per-agent deploy targets, no separate runtimes. PC1 §5.2’s agent_config.yaml::agents registry IS the activation surface; PF1 commits the staged path to landing there.
Rollback is config-revert + factory refresh. Same pattern as PC5’s rule rollback (PC5 §5.7) and PC6’s prompt-version rollback (PC6 §5.10). No consumer-agent redeploy required.

Without PF1:

PC5’s gates have no lifecycle to gate. PC5 §5.9’s PF1-partnership table references the lifecycle states; without PF1, PC5’s three milestones are gates against an undefined backdrop.
Cohort-gated sub-agent rollout has no spec home. Today, cohort gating happens for prompts (PC6) and flags (PF8) but not for whole sub-agents. PF1 owns the sub-agent rollout cohort discipline.
Vertical onboarding stalls at “now what?” A vertical drafts an Agent Definition (PC1’s contract); without PF1, they don’t know how to move it through staged production exposure.

Companion: Platform Spec Lab row PF1. The spec is the source of truth.

2. Capabilities Source

Per the Platform Spec Lab, PF1 owns the Sub-Agent Lifecycle (dev → test → promote → rollback) capability for the AI Assistant Platform’s consumer-agent runtime. The capability has four components:

Lifecycle state machine — the explicit states a sub-agent moves through.
Promotion semantics — what activates a sub-agent in the registry.
Rollback semantics — how a bad sub-agent change reverts.
Decommission cleanup — how a sub-agent’s artifacts are cleanly removed after rollback.

PF1 is scoped to Sub-Agent Lifecycle — the lifecycle is inside the consumer-agent process; sub-agents are factory-registered Python modules, not deployable units of their own. PF1 does not adopt TEAM/FAM-style per-agent deploy targets.

PF1 partners with:

PC1 (Agent Composition) — owns the Agent Definition contract sub-agents conform to. PF1 commits the staged path to becoming active in the registry.
PC5 (Agent CI/CD Pipeline) — owns the three gate milestones (pre_merge, pre_ramp, pre_full). PF1 references gates by milestone name; PC5 §5.9 commits the partnership table.
PF8 (Feature Flag & Cross-Vertical Observability Conventions) — owns flag naming (ai_assistant_*), kill-switch lifecycle, required Grafana panels. PF1 reuses for sub-agent cohort gating.
PF5 (Vertical Scaffolding + Validation Tools) — scaffolds the artifacts a new sub-agent ships with. PF1 commits the state shape PF5 scaffolds toward.

3. Background & Context

3.1 Today’s reality

Sub-agent registration (production):

consumer-agent/agent_config.yaml::agents — declarative registry of Agent Definitions. Each entry:

- id: shop
  description: Handles shopping queries, product discovery, offers
  role: native
  model: gpt-5.4-mini-low
  tools: [search_offers, search_products, ...]
  prompt_blocks: [persona-shop, instructions-shop, safety-base]
  sub_agents: []
  tuning: {reasoning_effort: low}

consumer-agent/src/consumer_agent/factory.py::create_gateway_agent_from_config — reads the registry, resolves the Agent Definition’s references (model from registry, tools from registry, prompt_blocks from registry, sub_agents from registry), constructs a LangGraph CompiledStateGraph.

The orchestrator’s sub_agents list determines what’s callable at the dispatch layer. Adding a sub-agent = adding a new entry to agents + listing it in the orchestrator’s sub_agents. Removing = the reverse.

Cohort-gated behavior (production):

consumer-agent/src/consumer_agent/utils/feature_flags.py — Feature Flipper integration. Used today for cohort-style behavior gating (e.g., consumer_agent_xml_prompt, consumer_agent_sub_agents_as_tools). Naming convention ai_assistant_* per PLT-552; codified by PF8.

What’s NOT yet codified:

A lifecycle state machine for sub-agents — sub-agents either exist in config or don’t
Promotion criteria — there’s no formal “this sub-agent passes the bar for production traffic”
Staged rollout for new sub-agents — first production traffic is also the widest blast radius
Rollback path without redeploy — config changes require image build + ECS rollout
Retirement — no documented cleanup path when a sub-agent is replaced

3.2 What PC1 leaves to PF1

PF1 inherits from PC1 (Agent Composition):

Agent Definition shape (PC1 §5.2) — the declarative contract a sub-agent conforms to. PF1 doesn’t redefine; commits the lifecycle path to activation.
Factory-time binding (PC1 §5.3, PC1 §3.3) — sub-agents resolve at factory invocation. PF1 commits the staged-rollout discipline that determines which factory invocation gets which sub-agent.
Vertical integration model (PC1 §5.9) — verticals contribute Agent Definitions, not runtime classes. PF1 commits the lifecycle each Agent Definition moves through.

3.3 PF1 ↔ PC5 partnership

PC5 §5.9 commits the partnership table. PF1 mirrors it from the lifecycle side:

PF1 lifecycle transition or sub-state	PC5 gate(s)	Trigger
`dev → test`	None (developer-driven)	Developer ready for review
`test → promote` (PR merge + operator initiates ramp)	`pre_merge` (PR-driven, dataset-scoped)	PR merge
Ramp progression inside `promote` — first non-zero step	`pre_ramp` (PC6-driven, trace-sampled)	Operator advances ramp percentage from 0 to first non-zero step
Ramp progression inside `promote` — 100% step	`pre_full` (PC6-driven, larger trace sample)	Operator advances ramp percentage to 100%
`* → rollback`	None	Operator action OR kill-switch fires from any active state

PC5’s three gates fire at distinct points in the PF1 lifecycle: pre_merge at the test → promote transition (PR merge), pre_ramp at the first non-zero ramp step inside promote, pre_full at the 100% ramp step inside promote. Operational sub-states inside promote (initial ramp percentage, intermediate ramp percentages, fully-rolled-out at 100%) are recorded in the PC6 experiment config rather than as distinct lifecycle states. Gate failure halts ramp progression but does not move the lifecycle out of promote — only operator action or kill-switch fires moves the lifecycle to rollback.

3.4 Vocabulary

Term	Meaning
Sub-agent	A factory-registered Python module conforming to PC1’s Agent Definition contract. Examples: Scout, Shopping, Earnings Coach.
Agent Definition	The declarative bundle (id, description, role, model, tools, prompt_blocks, sub_agents, tuning) PC1 commits. Source of truth for a sub-agent’s behavior.
Registry	`consumer-agent/agent_config.yaml::agents`. Where Agent Definitions become active.
Lifecycle state	One of: `dev`, `test`, `promote`, `rollback`.
Promotion	The transition from `test` to `promote` after `pre_merge` passes. The sub-agent’s Agent Definition becomes eligible for production traffic, gated by a Feature Flipper cohort flag whose ramp percentage the operator advances over time.
Ramp progression	The operator-driven progression of the cohort flag’s percentage inside `promote` (e.g., first non-zero step → intermediate steps → 100%). Recorded in the PC6 experiment config; not a lifecycle state change. `pre_ramp` and `pre_full` PC5 gates fire as percentage advances.
Rollback	The transition from any active state (`test`, `promote`) to inactive. Same path as PC5/PC6 rollback: config revert + factory refresh OR kill-switch fires, no redeploy. Sub-agents leaving production reach `rollback`. `rollback` is the terminal state for that Agent Definition; fix-forward requires authoring a new Agent Definition (see FR-12). PF1 has no separate `retired` terminal state.
Decommission cleanup	Operational cleanup of artifacts (Python module deletion, eval-suite tagging, prompt-block cleanup, tool dependency cleanup) for a sub-agent that has been rolled back and is being retired from the codebase. Cleanup runs after the lifecycle state reaches `rollback`; the cleanup itself is not a lifecycle state. See §5.6.
Kill-switch	A Feature Flipper flag (per PF8 naming) that, when off, drops the sub-agent from the orchestrator’s dispatch surface at next factory invocation. Firing the kill-switch is one of the triggers that moves a sub-agent to `rollback`.

4. Requirements

4.1 Functional requirements

FR-1 — Lifecycle state machine. Every sub-agent MUST move through the four explicit states declared in §3.4 (dev, test, promote, rollback). Forward transitions are unidirectional (dev → test → promote); rollback is reachable from any active state.

FR-2 — Promotion semantics. Promotion = the sub-agent’s entry in agent_config.yaml::agents becomes active, gated by a Feature Flipper cohort flag. The entry MAY exist in config without being active by being gated behind a Feature Flipper flag (off by default). Activation = the flag flips on for the configured cohort at some ramp percentage; the operator advances ramp progression over time inside promote.

FR-3 — Cohort-gated rollout for new sub-agents. A new sub-agent’s transition from test to promote MUST be cohort-gated via a Feature Flipper flag (per PF8 naming). The cohort definition follows PC1 §5.5 cohort discipline (Agent Definition version + registry versions + feature flags + experiment-arm membership per PC6 §5.7). The initial activation MUST start below 100% of eligible traffic; the operator advances ramp percentage over time, gated by PC5’s pre_ramp and pre_full checks.

FR-4 — Gates inside promote. The pre_merge gate from PC5 §5.5 fires at the test → promote transition. The pre_ramp and pre_full gates fire as the operator advances the cohort flag’s ramp percentage inside promote. Gate failure halts ramp progression at the current percentage; operator action is required to proceed or to move the sub-agent to rollback.

FR-5 — Rollback without redeploy. Rolling back a sub-agent change MUST be achievable by reverting the agent_config.yaml change (or flipping the cohort flag off) AND triggering a factory refresh. No consumer-agent image build or ECS redeploy required. Matches PC5 §5.7 and PC6 §5.10 rollback patterns.

FR-6 — Decommission cleanup contract. When a sub-agent reaches rollback and is being retired from the codebase (not just paused for investigation), cleanup MUST follow a documented path (§5.6):

The Agent Definition is removed from agent_config.yaml::agents.
The sub-agent’s Python module is deleted from the codebase (or marked deprecated for a transition window).
Eval-suite entries (datasets, judge configs in PC5’s manifest) tagged for the decommissioned sub-agent are removed OR explicitly retained for replay (operator decision).
Prompt blocks unique to the decommissioned sub-agent are deleted (per PC1 §5.7 file-backed registry pattern).
Tool dependencies (PS2 connectors) used only by the decommissioned sub-agent are deleted OR explicitly retained (operator decision).

Decommission cleanup is operational work that follows the lifecycle reaching rollback; it is not itself a lifecycle state.

FR-7 — Rollback target tracking. Every promotion MUST record the previous stable state of the sub-agent (the Agent Definition’s prior git commit). Rollback resolves to this target. Matches PC6 §5.6 rollout config pattern.

FR-8 — In-flight conversation tolerance. A rollback or kill-switch event MUST NOT corrupt in-flight conversations. Sub-agents resolve at factory invocation per PC1 §5.5; in-flight turns complete on the sub-agent they started with. New turns get the rolled-back state.

FR-9 — Kill-switch contract. Every cohort-gated sub-agent rollout MUST have a Feature Flipper kill-switch flag created at the test → promote transition. The kill-switch operates independently of cohort assignment; firing it drops the sub-agent from the orchestrator’s dispatch surface entirely at next factory invocation and moves the lifecycle to rollback.

FR-10 — Lifecycle audit trail. Every state transition MUST emit an audit event capturing: sub-agent ID, source state, target state, trigger (gate / operator / kill-switch), Agent Definition commit, cohort definition. Trace shape is platform-standard OpenTelemetry; events land in PS5’s store.

FR-11 — Cross-sub-agent dependency tracking. If sub-agent A’s Agent Definition includes sub-agent B in its sub_agents list (PC1 §5.2 — the orchestrator pattern), then retiring sub-agent B MUST fail until sub-agent A’s reference is removed. CI enforces this at PR time.

FR-12 — Re-baselining on Agent Definition major changes. A major Agent Definition change (model swap, prompt block additions/removals, tool list changes) MUST be authored as a NEW Agent Definition (new identity/commit), which enters the lifecycle at dev. The new Agent Definition then progresses through the full lifecycle — re-passing pre_merge at the test → promote transition, then pre_ramp and pre_full as ramp percentage advances inside promote. The prior Agent Definition remains in rollback as the rollback target. Modifying an Agent Definition in-place while it is in promote or rollback is not permitted. Minor changes (typo fixes, tuning adjustments) MAY be applied in-place with operator override.

4.2 Non-functional requirements

NFR-1 — State transition latency. Lifecycle state transitions (excluding gate evaluation latency, which is PC5’s NFR) MUST complete in under 1 minute. State is config + flag; transition is operator action + factory refresh.

NFR-2 — Rollback latency. Full rollback path (revert + factory refresh + dispatch-surface update) MUST complete in under 5 minutes. Matches PC6 NFR-3 cancellation cadence; production users see no torn state during the window.

NFR-3 — Audit trail retention. Lifecycle audit events MUST be retained for at least 90 days for post-hoc analysis. Matches PC6 NFR-4 rollout-audit retention. Storage backend is PS5’s trace + event store.

NFR-4 — Cohort-gating overhead. The Feature Flipper cohort check at factory invocation MUST add negligible latency — sub-millisecond per check, dict-lookup against pre-fetched flag state. Cache invalidation on flag change is operational.

NFR-5 — Decommission cleanup window. Decommission cleanup (FR-6) MUST complete within one quarter from initiation. The decommissioned sub-agent’s agent_config.yaml entry MAY remain in a deprecated: true state during the window for replay support.

4.3 Acceptance criteria

AC-1 — Given a new sub-agent’s agent_config.yaml entry added with enabled_via_flag: ai_assistant_subagent_<id> and the flag set off, when the runtime starts, the sub-agent MUST NOT appear in the orchestrator’s dispatch surface. Existing sub-agents MUST be unaffected.

AC-2 — Given a sub-agent at test state with pre_merge gate passed, when an operator initiates the transition to promote via PC6-style rollout config with rollout_mode: experiment, the sub-agent MUST become dispatchable for users in the treatment cohort only. Control users MUST continue without the sub-agent.

AC-3 — Given a sub-agent in promote at an intermediate ramp percentage and the pre_ramp gate fails on the next advancement attempt, the ramp progression MUST halt — no further percentage increase. The sub-agent remains accessible to whatever cohort percentage was active at the moment of failure; operator action is required to proceed or to move to rollback.

AC-4 — Given a kill-switch fires (Feature Flipper flag off) during a sub-agent rollout, when the next factory invocation runs, the sub-agent MUST NOT appear in the orchestrator’s dispatch surface for any user. In-flight conversations complete on the sub-agent they started with; new turns route to the previous stable state. The lifecycle moves to rollback.

AC-5 — Given a sub-agent in promote at 100% ramp and an operator initiates rollback, when the rollback completes within NFR-2’s 5-minute budget, the sub-agent’s prior stable Agent Definition (per FR-7’s recorded target) becomes the active version. No consumer-agent redeploy MUST be required. The lifecycle moves to rollback.

AC-6 — Given sub-agent A’s sub_agents list references sub-agent B, when a PR attempts to decommission sub-agent B without removing A’s reference, CI MUST fail with a clear error per FR-11.

AC-7 — Given a sub-agent has reached rollback and is being decommissioned, the Agent Definition entry MAY remain in agent_config.yaml with status: deprecated for up to one quarter (per NFR-5). After the window, the entry, the sub-agent’s Python module, decommissioned prompt blocks, and unused tools MUST be removed per FR-6’s cleanup contract.

AC-8 — Given any lifecycle state transition, an OpenTelemetry audit event MUST be emitted with the fields declared in FR-10 (sub-agent ID, source/target state, trigger, Agent Definition commit, cohort definition). Event lands in PS5’s trace + event store.

AC-9 — Given a major change to a deployed sub-agent’s Agent Definition (e.g., model swap from gpt-5.4-mini-low to gpt-5.4-nano), when a PR opens with the change, CI MUST require that the change be authored as a NEW Agent Definition entering the lifecycle at dev, with the prior Agent Definition remaining in rollback as the rollback target. The new Agent Definition re-passes pre_merge at the test → promote transition, then pre_ramp and pre_full as ramp percentage advances inside promote. Minor changes MAY be applied in-place with explicit operator override annotation.

AC-10 — Given a sub-agent change PR, the factory-time cohort check MUST complete in sub-millisecond per NFR-4. Per-request overhead from PF1’s cohort-gating MUST be negligible against turn latency.

5. Solution Design

5.1 The architectural through-line

Sub-agents are factory-registered Python modules; the lifecycle is in-process; promotion = registry activation gated by Feature Flipper + PC5’s three gates; rollback is config-revert + factory refresh.

Three properties hold across every PF1 contract:

In-process, not deployable. Sub-agents are Python modules instantiated by factory.py. The lifecycle controls which Agent Definitions are active in the registry, not which deploy targets are running. No K8s pools, no separate processes, no per-agent runtimes.
State is config + flag. A sub-agent’s lifecycle state is fully determined by: (a) its presence in agent_config.yaml::agents, (b) its enabled_via_flag cohort flag’s state, (c) the rollout config’s rollout_mode (cohort vs full). Reading these three sources tells you the state; mutating them is the only way to transition.
All transitions are gated or operator-initiated. Automatic state advancement (e.g., “after 24h with no errors, advance”) is explicitly out of scope. Operator action (or PC5’s gate verdict) drives every transition.

5.2 The state machine

PF1’s lifecycle has four states per PLT-690: dev, test, promote, rollback. Operational sub-states inside promote (initial ramp percentage, intermediate ramp percentages, fully-rolled-out at 100%) are recorded in the PC6 experiment config, not as distinct lifecycle states.

                              ┌───────────────┐
                              │      dev      │
                              └──────┬────────┘
                                     │ developer-driven
                                     ▼
                              ┌───────────────┐
                              │     test      │   (PR opens)
                              └──────┬────────┘
                                     │ pre_merge gate (PC5)
                                     │ — PASS; operator initiates ramp
                                     ▼
                              ┌───────────────┐
                              │    promote    │   (cohort flag on, ramp percentage advanced over time)
                              │               │      pre_ramp gate fires at first non-zero step
                              │               │      pre_full gate fires at 100% advancement
                              └──────┬────────┘
                                     │ operator action OR
                                     │ kill-switch flips off
                                     ▼
                              ┌───────────────┐
                              │   rollback    │
                              └───────────────┘

Note: rollback is reachable from test (kill-switch trigger during gate failures) and from promote (operator action or auto-ramp-down per PC5 §5.7); see §5.4 transition rules.

Transition rules:

dev → test: developer-driven. No platform gate. PR opens.
test → promote: PC5 pre_merge gate fires (PC5 §5.5). Pass → PR mergeable AND operator initiates ramp via PC6-style rollout config; the cohort flag activates at the first non-zero step. Inside promote, the operator advances ramp percentage over time; PC5’s pre_ramp gate fires at the first non-zero step and pre_full fires at 100% advancement. Gate failure halts ramp progression at the current percentage but does not move the lifecycle out of promote.
* → rollback (from test or promote): Operator action OR kill-switch flag flips off. Same path: factory refresh on next invocation drops the sub-agent (or reverts to prior Agent Definition commit per FR-7). rollback is terminal for that Agent Definition — there is no rollback → dev or rollback → test edge. Fix-forward requires authoring a new Agent Definition (see FR-12), which enters the lifecycle at dev. Decommission cleanup per FR-6 runs after the lifecycle reaches rollback if the sub-agent is being permanently removed from the codebase (rather than paused for investigation).

5.3 Promotion semantics

Promotion is a config change activating the sub-agent in the registry, not a deploy. Concretely:

The sub-agent’s Agent Definition entry exists in agent_config.yaml::agents with enabled_via_flag: ai_assistant_subagent_<id>. The flag is off — the sub-agent is registered but not dispatchable.
The orchestrator’s sub_agents list includes the sub-agent’s id (PC1 §5.3 orchestrator pattern).
Operator initiates the test → promote transition via a rollout config (matches PC6 §5.6 schema, but for sub-agent activation instead of prompt-version selection).
PC5 pre_ramp gate fires at the first non-zero ramp percentage; pre_full fires when the operator advances to 100%.
Pass → Feature Flipper flag activates for the configured cohort at the declared ramp percentage.
Next factory invocation reads the flag state → sub-agent becomes dispatchable for treatment-cohort users only at the current ramp percentage.

No deploy, no image build, no ECS rollout. The sub-agent’s Python module is in the deployed image already (from when it landed in the codebase); promotion is purely configuration.

5.4 Rollback semantics

Two trigger paths:

Operator-initiated rollback:

Operator identifies a regression in the currently-active Agent Definition (production traffic + judge scores + manual review).
Operator selects a rollback target — a prior Agent Definition (<prior-commit-id>) that is still in promote from its earlier active period (per FR-12, no Agent Definition that has reached rollback can be re-activated; the rollback target’s lifecycle state is promote, just not currently active).
Operator updates the rollout config to set the active commit to the rollback target. The current Agent Definition transitions to rollback (terminal). The rollback target’s promote state is preserved unchanged — no state transition is required for the rollback target itself.
Factory refresh — next invocation reads the updated rollout config and resolves the Agent Definition at the rollback target commit.
NFR-2 commits 5-minute end-to-end.

Kill-switch-initiated rollback:

Operator flips the Feature Flipper kill-switch flag off (e.g., ai_assistant_subagent_<id> → off).
Factory refresh — sub-agent absent from dispatch surface.
In-flight conversations complete on the prior sub-agent state (no torn state per FR-8); new turns route without it.

Why no consumer-agent redeploy:

Agent Definitions live in agent_config.yaml, read at factory invocation.
Feature Flipper flags are runtime-toggled.
Python modules for the sub-agent are already in the deployed image — they’re loaded if active, ignored if not.

Rollback target tracking (FR-7): every promotion records the prior stable Agent Definition commit — the previously-active Agent Definition whose lifecycle state is still promote. The rollback target is never an Agent Definition in rollback (per FR-12, rollback is terminal). Multiple-level rollback (V_new fails → V_prev → V_prev itself regresses) is operator-managed; each step re-activates a still-in-promote predecessor while the just-deactivated Agent Definition enters rollback. PF1 doesn’t auto-chain (matches PC6 OQ-6 disposition).

Multi-AD in promote: Multiple Agent Definitions MAY occupy promote concurrently. At any time, exactly one is the active Agent Definition (receives traffic per its ramp percentage in PC6’s rollout config); the others are rollback targets (preserved at promote with their last known ramp percentage, no traffic). When a rollback executes, the active AD transitions to rollback (terminal); a rollback target becomes active. Each in-promote AD has independent ramp-percentage state in PC6’s rollout config; PF-1’s lifecycle state machine tracks the AD’s promote/rollback/test/dev label, not the active/target distinction.

Automatic ramp-down vs lifecycle rollback transition. PC5’s post-deploy regression response (PC5 §5.7) provides operational protection by automatically ramping a regressing sub-agent down to 0% on safety/refusal judge floor breach. That auto-ramp-down operates on the cohort flag’s ramp percentage — it pauses exposure without leaving promote. The lifecycle transition to rollback is a deliberate human decision after triage; the operator either advances ramp percentage back up after a hotfix-forward fix (lifecycle stays in promote) or moves the sub-agent to rollback durably. The intermediate operational state — auto-ramp-down has fired, rollback decision pending — is observable in the rollout config’s recorded ramp percentage, not as a separate lifecycle state.

5.5 Cohort-gated rollout

PF1 reuses PF8’s flag conventions and PC6’s cohort tuple discipline:

Flag convention (per PF8):

Naming: ai_assistant_subagent_<id> (kill-switch) and ai_assistant_subagent_<id>_cohort (cohort gating).
Lifecycle: created at the test → promote transition; retired when the sub-agent reaches 100% ramp inside promote AND operator decides the flag’s purpose is served.

Cohort tuple (per PC1 §5.5 + PC6 §5.7):

cohort = (
  agent_definition_version,
  prompt_block_registry_version,
  xml_prompt_component_registry_version,
  active_feature_flags,
  experiment_arm_assignment,
  active_sub_agents,                          # new in PF1
)

PF1 extends the cohort tuple with active_sub_agents — the set of sub-agent IDs the factory resolves as dispatchable for this cohort. Cardinality stays bounded (≤10 sub-agents at maturity per PC1 §11.3 verticals landscape).

Cache-friendly contract preserved: same-cohort users still share a stable prompt prefix per turn. Cohort cardinality grows by active_sub_agents set membership (typically a single-flag flip per sub-agent in flight), staying within the bound PC1 §5.5 commits.

Ramp curve. Declared per sub-agent in the rollout config, not platform-mandated. Fetch does not have a documented standard ramp curve, and PF1 does not invent one; experiment owners pick the curve appropriate to their sub-agent’s risk profile, time-to-stat-sig needs, and the observation window each step warrants. Considerations: safety-sensitive sub-agents warrant a smaller first step and longer observation windows; lower-risk sub-agents can ramp more aggressively. A platform-mandated default may be derived after several sub-agent rollouts produce empirical signal.

5.6 Decommission cleanup contract

Decommission cleanup (FR-6) runs after the lifecycle reaches rollback if the sub-agent is being permanently removed from the codebase (rather than paused for investigation). It is a five-step path:

Remove the Agent Definition entry from agent_config.yaml::agents. Marks the sub-agent as no longer dispatchable.
Update orchestrator sub_agents lists in any Agent Definition that referenced the decommissioned sub-agent. CI catches dangling references (FR-11).
Delete the sub-agent’s Python module(s) from the codebase. Or mark deprecated for a transition window (NFR-5 commits up to one quarter).
Clean up unique prompt blocks — files in prompts/components/ that were only referenced by the decommissioned sub-agent’s Agent Definition. Per PC1 §5.7 file-backed registry pattern.
Operator decides on eval-suite + tool retention:
- Eval suite (PC5): dataset items, judge configs in evaluation_manifest.yaml’s categories. Operator chooses retain (for replay) or remove (cleanup).
- Tool dependencies (PS2 connectors): connectors used only by the decommissioned sub-agent. Operator chooses retain (for reuse by future sub-agents) or remove.

Audit trail: each decommission step emits a lifecycle audit event per FR-10. The decommission is reversible up to the point of code-level deletion (Step 3).

5.7 Audit trail

Per FR-10, every state transition emits:

{
  "event": "subagent.lifecycle.transition",
  "subagent_id": "earnings_coach",
  "source_state": "promote",
  "target_state": "rollback",
  "trigger": "operator_initiated",
  "agent_definition_commit": "abc123",
  "cohort": {
    "agent_definition_version": "5",
    "active_flags": ["ai_assistant_subagent_earnings_coach"],
    "ramp_step_percent": 100
  }
}

Footnote on ramp_step_percent: snapshot at time of lifecycle event; not the trigger — see PC-6 §5.10 for ramp-config events.

Emitted at factory invocation when a transition takes effect. Lands in PS5’s trace + event store per PS5 §boundary contract. Feeds:

PC6’s rollout dashboards (correlate prompt rollouts with sub-agent rollouts)
Per-vertical Grafana panels per PF8
Operational query path for “when did sub-agent X become active in cohort Y”

Ramp-percentage changes inside promote are NOT lifecycle transitions — they are recorded as PC6-rollout-config events per PC6 §5.10, not as PF1 audit events.

5.8 PF1 ↔ PC5 partnership (extends PC5 §5.9 with Trigger column)

Already detailed in §3.3. Restated for symmetry:

PF1 lifecycle transition or sub-state	PC5 gate(s)	Trigger
`dev → test`	None (developer-driven)	Developer ready for review
`test → promote` (PR merge + operator initiates ramp)	`pre_merge` (PR-driven, dataset-scoped)	PR merge
Ramp progression inside `promote` — first non-zero step	`pre_ramp` (PC6-driven, trace-sampled)	Operator advances ramp percentage from 0 to first non-zero step
Ramp progression inside `promote` — 100% step	`pre_full` (PC6-driven, larger trace sample)	Operator advances ramp percentage to 100%
`* → rollback`	None	Operator action OR kill-switch fires from any active state

PF-1 §5.8 extends PC-5 §5.9’s gate-mapping table with a Trigger column (lifecycle-action describing how each gate fires). PC-5 §5.9 is canonical for gate definitions; PF-1 §5.8 is canonical for the lifecycle-trigger pairing. Updates to gate definitions happen in PC-5; updates to trigger semantics happen in PF-1.

5.9 Cross-sub-agent dependency enforcement

FR-11 commits a CI gate that blocks PRs whose changes would leave dangling sub-agent references. Concrete mechanism:

GitHub Actions workflow .github/workflows/subagent-validate-refs.yml runs on PRs that modify agent_config.yaml::agents or src/consumer_agent/agents/. The workflow invokes consumer_agent.subagents.validate_refs (new module; created as part of PF-1 implementation tracked in PLT-690) which:

(a) loads all active Agent Definitions; (b) verifies every sub_agents: entry resolves to an Agent Definition NOT in rollback; (c) for decommissions (Agent Definition being removed), verifies no other active orchestrator’s sub_agents: list still references it; (d) PR diff inspection: if the PR modifies agent_config.yaml::agents[N] where id is unchanged but the model field changed (any value), or tools list added/removed entries, or prompt_blocks list added/removed entries (per FR-12: ‘model swap, prompt block additions/removals, tool list changes’), CI MUST fail with the message: "Major Agent Definition changes require a new \agent_id`; in-place mutation of promoted Agent Definitions is prohibited (FR-12).”`

Failure of any check blocks merge.

6. Cross-Section Impact

Spec	Citation
PC1 (Agent Composition)	Inherits Agent Definition shape + factory-time binding + vertical integration model. PF1 commits the staged path to becoming active in the registry.
PC5 (Agent CI/CD Pipeline)	Partners on the lifecycle ↔ gate mapping (§5.8 / PC5 §5.9). PC5 owns gate semantics; PF1 owns lifecycle states.
PC6 (Agent Variant CI/CD)	Independent — PC6 is per-variant (Agent Definition fork; prompt-fork is the v1 worked example), PF1 is per-sub-agent. They share cohort discipline (PC1 §5.5 + PF1 §5.5).
PF5 (Vertical Scaffolding + Validation Tools)	Scaffolds new sub-agent stubs conforming to PF1’s lifecycle states. PF1 commits the state shape PF5 scaffolds toward.
PF8 (Feature Flag & Cross-Vertical Observability Conventions)	PF1 reuses PF8’s flag naming (`ai_assistant_subagent_<id>`), kill-switch lifecycle, required Grafana panels.
PS5 (Trace + Event Store)	Persists PF1’s lifecycle audit events for post-hoc analysis.
PS2 (Connector Framework)	Decommission cleanup (FR-6 Step 5) coordinates with PS2 on tool-connector cleanup.

7. Dependencies

Platform spec dependencies: PC1 (Agent Composition), PC5 (Agent CI/CD Pipeline).

Implementation dependencies:

LangChain v1, LangGraph (consumer-agent runtime)
Feature Flipper (cohort gating, kill-switch)
Opik (for audit trail correlation; partnered with PS5)

External dependencies: None.

Cross-section soft dependencies:

PF8 — flag naming + kill-switch conventions
PS5 — lifecycle audit-event persistence

8. Risks & Open Questions

8.1 Risks

R-1: Orphaned sub-agent references during decommission. Decommissioning sub-agent B while sub-agent A still references B in its sub_agents list breaks factory invocation. Mitigated by FR-11’s CI enforcement; AC-6 covers the test.

R-2: Mid-conversation kill-switch. A user mid-conversation when a sub-agent’s kill-switch flips off — does the user see torn state? Mitigated by FR-8’s in-flight tolerance (in-flight turns complete on the sub-agent they started with; new turns route without the rolled-back sub-agent).

R-3: Rollback target staleness. Same risk as PC6 OQ-6. If multiple rollbacks chain, the rollback target may itself be regressed. Mitigated by operator-managed multi-level rollback; PF1 doesn’t auto-chain.

R-4: Eval-suite + dataset cleanup ambiguity. Decommissioned sub-agents leave eval-suite artifacts (dataset items tagged with the sub-agent’s category, judge configs referencing it in the manifest). FR-6 Step 5 makes this operator-decided; the risk is that “retain for replay” becomes “forgotten forever” with no cleanup. Mitigated by NFR-5’s one-quarter decommission cleanup window forcing the decision.

R-5: Major-change re-baselining churn. FR-12 requires re-cycling on major Agent Definition changes (model swap, prompt block additions). Defining “major” too strictly causes churn; too loosely misses regressions. Mitigated by operator-override annotation (AC-9); long-term, “major” criteria codified in PC5’s threshold conventions.

R-6: Cohort tuple cardinality from concurrent sub-agent rollouts. Each sub-agent in promote at an intermediate ramp percentage adds a flag dimension to the cohort tuple. N concurrent rollouts → 2^N cohorts in the worst case. Mitigated by the bounded ≤10 sub-agents at maturity (PC1 §11.3) and Feature Flipper’s stable-assignment property.

8.2 Open Questions

OQ-1: RESOLVED. FR-12 defines “major change” concretely: model swap, prompt block additions/removals, tool list changes. Any of these requires authoring a NEW Agent Definition entering at dev. Finer-grained quantitative thresholds (e.g., % of system-prompt token churn) are deferred to PC5’s threshold conventions if needed later.

OQ-2: RESOLVED by NFR-5. Eval-suite retention follows NFR-5’s one-quarter decommission cleanup window: operator decides retain-vs-remove at decommission time; the decision is made within the NFR-5 window. FR-6 Step 5 carries the operator choice.

OQ-3: Cross-vertical orchestrator sub_agents list ownership. The orchestrator’s sub_agents list is platform-owned (PC1 §5.9 vertical integration model). When a vertical decommissions their sub-agent, the orchestrator’s list change crosses team boundaries. Lean: vertical drafts the PR; platform reviews. Codify in PF5’s scaffolding workflow. No external dependency.

OQ-4: Lifecycle state visibility — should there be a CLI command to list current state of every sub-agent (consumer-agent subagent list)? Lean: yes, low-effort, high operational value. No external dependency but worth a follow-on ticket.

OQ-5: Automatic advance vs operator-only. PF1’s design is operator-only between gates (no auto-advance). Lean: keep operator-only for v1; surface auto-advance as a follow-on if operational burden warrants. No external dependency.

9. Testing Strategy

9.1 Unit tests

State-machine transition validation: each valid transition succeeds; each invalid transition raises a clear error
Rollback target resolution: given a Agent Definition commit history, get_rollback_target returns the previous stable commit
Cohort tuple composition: active_sub_agents set is included in the cohort tuple; cardinality stays bounded
Kill-switch override: when the flag is off, factory invocation excludes the sub-agent regardless of other state
In-flight tolerance: simulated factory refresh mid-conversation doesn’t change the active sub-agent for that conversation
Retirement CI check: PR retiring sub-agent B with sub-agent A still referencing B in sub_agents is blocked

9.2 Integration tests

End-to-end promotion: new sub-agent → pre_merge (test → promote) → operator advances ramp percentage; pre_ramp fires at first non-zero step; pre_full fires at 100%
End-to-end rollback: sub-agent at 100% ramp in promote → operator initiates rollback → previous stable Agent Definition resolves at next factory invocation → NFR-2 budget verified
Kill-switch-initiated rollback: flag flips off → next factory invocation drops the sub-agent → lifecycle moves to rollback → in-flight conversations unaffected
Cohort-gated dispatch: cohort A gets the new sub-agent; cohort B doesn’t; trace events differentiate
Major-change re-baseline: PR with model swap → CI requires re-entry to test → all three PC5 gates run as ramp percentage advances

9.3 Eval coverage (Opik)

Per-sub-agent eval suites cover the sub-agent’s declared capabilities (per PC1 §5.2 + PC6 XML prompt components)
Lifecycle audit events visible in PS5; correlate with prompt-rollout (PC6) and DM-rollout (PD3) events for full-platform view

9.4 Contract tests

PC1: Agent Definition shape conforms to PC1 §5.2; promotion = registry activation per PC1 §5.3
PC5: lifecycle transition trigger calls PC5’s gate API per PC5 §5.8; verdict drives transition outcome
PF8: kill-switch flag naming + lifecycle matches PF8 conventions
PS5: lifecycle audit-event shape persists correctly in PS5’s store

9.5 Failure-mode testing

Factory refresh fails mid-rollback: in-flight conversations continue; next factory invocation retries the rollback target resolution
Feature Flipper unreachable: factory falls back to fail-closed (sub-agent absent from dispatch surface)
PC5 gate API outage during transition: transition halts; operator notification; trace event records the gate-API failure
Concurrent state-transition attempts (race): only one succeeds; the other observes the new state and proceeds or aborts cleanly
Decommission with eval-suite cleanup partially completed: re-run idempotent; cleanup completes on retry

10. Rollout & Observability

10.1 Rollout phases

Phase 1 — Spec validation. PF1 reviewed and approved; cross-section contracts confirmed with PC1, PC5, PC6, PF5, PF8 reviewers.

Phase 2 — State machine + audit event implementation. Add the lifecycle state machine to consumer-agent runtime; wire audit-event emission to PS5’s store. Existing agent_config.yaml and factory.py changes shaped by §5.3.

Phase 3 — Cohort-gated rollout machinery. Wire Feature Flipper cohort + kill-switch lookups into factory invocation per §5.5. Cohort tuple extended with active_sub_agents.

Phase 4 — CI enforcement. Add CI checks per FR-11 (dangling sub-agent references) and FR-12 (major-change re-baseline). PF5 scaffolds the corresponding PR stub.

Phase 5 — First gated sub-agent rollout. Pick a low-risk new sub-agent (e.g., the first new vertical’s sub-agent after PF1 lands) and run it through the full lifecycle as the validation.

Phase 6 — Decommission worked example. When a deployed sub-agent is permanently removed (replaced by another sub-agent, or no longer needed), run the decommission cleanup contract as the first end-to-end exercise.

10.2 Observability metrics

subagent.lifecycle.transition_total by subagent_id, source_state, target_state, trigger — transition volume per sub-agent per state
subagent.lifecycle.gate_failed_total by subagent_id, milestone — gate failure rate per sub-agent per milestone; correlate with PC5’s gate metrics
subagent.lifecycle.kill_switch_fired_total by subagent_id — kill-switch firing rate
subagent.lifecycle.rollback_duration_seconds by subagent_id — rollback latency (feeds NFR-2)
subagent.lifecycle.decommission_cleanup_pending_total — decommissioned sub-agents with cleanup not yet complete (FR-6 + NFR-5)

10.3 Rollback

PF1 is a contract spec, not deployable code. Rollback semantics apply at three layers:

Sub-agent-level rollback: per §5.4. Standard operational lever.
CI gate rollback: if a CI check (FR-11 / FR-12) produces false positives, disable in CI config independently of the spec.
Architecture-level rollback: reverting the in-process sub-agent model would require revisiting PC1 §3.3’s factory-time binding decision. Not expected.

11. Appendix

11.1 Source references

PC1: Agent Composition — Agent Definition shape, factory-time binding, vertical integration
PC5: Agent CI/CD Pipeline — eval-gate definitions PF1 references
PC6: Agent Variant CI/CD + Experiment-Gated Rollout — shared cohort discipline
Platform Spec Lab — Wave 1 sequencing; PF1 scope row
consumer-agent/agent_config.yaml — production sub-agent registry
consumer-agent/src/consumer_agent/factory.py — factory invocation logic
consumer-agent/src/consumer_agent/utils/feature_flags.py — Feature Flipper integration

11.2 Decisions resolved during design

#	Decision	Resolution
1	Lifecycle scope	Sub-agents only (Scout, Shopping, etc.) — not generic mesh-agent runtime.
2	Promotion semantics	Registry activation via `agent_config.yaml` + Feature Flipper cohort flag. No deploy, no image build, no ECS rollout.
3	Rollback semantics	Config revert + factory refresh. NFR-2 commits 5-minute target. Matches PC5/PC6 rollback patterns.
4	Cohort-gated rollout	Feature Flipper flag per sub-agent (`ai_assistant_subagent_<id>`); cohort tuple extended with `active_sub_agents` per §5.5.
5	Retirement contract	Five-step cleanup (FR-6): config removal → reference cleanup → code deletion → prompt-block cleanup → eval/tool operator decision. NFR-5 commits one-quarter cleanup window.
6	Major-change re-baselining	FR-12 requires re-cycling on major Agent Definition changes (model swap, significant prompt-block changes, tool list changes). Operator override annotation for minor changes.
7	Lifecycle state machine	Four states: `dev`, `test`, `promote`, `rollback`. Operational ramp-percentage progression inside `promote` is recorded in the PC6 experiment config, not as separate lifecycle states. Transitions are gate-driven or operator-driven, never automatic.
8	In-flight tolerance	In-flight turns complete on the sub-agent they started with; new turns route to current state. Matches PC1 §5.5 factory-time binding.

11.3 Migration receipts

From PC1 §5.2 (Agent Definition shape): PF1 inherits the declarative contract; commits the staged path to registry activation.
From PC5 §5.9 (PF1-partnership table): PF1 §3.3 + §5.8 mirror the table for readability; PC5 is the canonical commitment.
No content migration from other specs: PF1’s lifecycle states are new commitments codifying what’s been informal.