Flag & Observability Conventions

One-page operator reference for the cross-vertical conventions every vertical in the consumer-agent stack inherits. Source: spec PR #300.

Flag naming grammar

Every feature flag follows:

ai_assistant_<domain>_<purpose>[_<modifier>]

<domain> — one of the enumerated vertical names (shop, rewards, play, ereceipts, offerdetails, restaurant, retailer, support), a platform component (orchestrator, gateway, worker, relay), a special scope (pool, infra, user), or a sub-agent id (subagent_<id>).
<purpose> — one of experiment, pool, killswitch, enabled, ramp, test.
<modifier> (optional) — version (v3), variant, or platform constraint.

The validator picks the last recognized purpose token. So ai_assistant_rewards_v3_experiment_killswitch parses as purpose=killswitch, domain rewards_v3_experiment.

Enforced by: scripts/lint_flag_names.py (wired into make lint + pre-commit). Legacy consumer_agent_* flags are grandfathered via LEGACY_FLAG_NAMES in src/consumer_agent/platform/flag_naming.py.

Lifecycle (five computed states)

created → ramping → fully-rolled-out → eligible-for-removal → removed

State is computed from Flipper API + code-grep + flip history, never authored. A flag at 100% for ≥30 days with zero churn moves to eligible-for-removal. Cleanup PRs follow the PLT-553 / PLT-554 working example.

Kill-switch convention

Every production-impacting flag must have a paired flag with the _killswitch suffix in the same enum. Example:

Cohort/test: ai_assistant_rewards_v3_experiment
Paired killswitch: ai_assistant_rewards_v3_experiment_killswitch

The killswitch is operator-flippable (no engineering approval needed at flip time). Defaults ON. OFF immediately drops to the previous stable state at next factory invocation.

Internal-only flags (FORCE_* env-var overrides, debug toggles) are exempt — register them in INTERNAL_ONLY_FLAGS in flag_naming.py with a one-line comment.

Enforced by: scripts/lint_kill_switch_presence.py.

Parent/child pool-test pattern

ai_assistant_user_pool (platform-owned)
├── ai_assistant_rewards_v3_test (rewards-owned)
├── ai_assistant_shop_pricing_test (shop-owned)
└── ai_assistant_restaurant_v1_test (restaurant-owned)

Cohort key is user_id. Sticky assignment via Feature Flipper. Multiple test flags hang off one pool without coordination.

Required Grafana panels per vertical

Every vertical with production traffic has:

Engagement — turns/episode, attach rate, intent-completion, p50/p95 latency.
Judge scores — per-judge score distribution from PC5; threshold lines visible.
Kill-switch state — current state of every kill-switch scoped to the vertical; flips in the last 7 days.

Verticals that emit DM types additionally have:

DM funnel (per PLT-629) — Targeted → Sent → Delivered → Displayed → Tapped → CTA; sliced by dm_type and experiment_arm.

Templates live in PLT-628’s implementation repo; per-vertical dashboards inherit them.

Five mandatory slicing dimensions

Every per-traffic metric label, span attribute, and PS5 event-store column carries:

Dimension	Source	Null when
`vertical`	`consumer_agent.platform.verticals`	Never null; `platform` for cross-vertical orchestrator/gateway
`sub_agent_id`	PC1 Agent Definition `id`	Non-agent emissions (e.g., gateway routing)
`dm_type`	PD3 DM type id	Chat-flow traffic
`experiment_arm`	PC6 cohort assignment	No experiment active
`agent_definition_version`	PC1 + PF1	Non-Agent-Definition emissions

Import from consumer_agent.platform.slicing_dimensions:

from consumer_agent.platform.slicing_dimensions import (
    VERTICAL, SUB_AGENT_ID, DM_TYPE, EXPERIMENT_ARM, AGENT_DEFINITION_VERSION,
)

Enforced by: scripts/lint_metric_dimensions.py. Existing sites that pre-date the convention are in LEGACY_METRIC_SITES; they shrink toward zero as code is touched. Sites that are genuinely not per-traffic opt out with a # convention: not-per-traffic line comment.

Cardinality denylist

Raw user_id, request_id, principal, session_id are forbidden as label keys on metric/span emissions (FR-8). For debug-trail attribution use hashed_user_id():

from consumer_agent.platform.hashing import hashed_user_id

counter.add(1, {
    VERTICAL: "rewards",
    ...
    "user_hash": hashed_user_id(user_id),  # bounded-cardinality
})

Even hashed identifiers shouldn’t appear on metric labels — only on trace attributes where operational cardinality bounds can be enforced.

Enforced by: scripts/lint_metric_cardinality.py.

Required alerts per vertical

Wired to Rootly (per PLT-628 alert-routing track):

Alert	Condition	Severity
Safety/refusal judge floor breach	`eval.judge.score_distribution{judge_classification=safety}` < floor	page
Quality judge regression	same, `quality` < baseline - tolerance	notify
Kill-switch flipped OFF	flag state change observed	page
Engagement drop	engagement metric drops > threshold% vs baseline	notify
DM funnel anomaly (DM verticals only)	tap rate or CTA rate vs baseline > threshold%	notify

Noise budget: ≤ 1 page per on-call week at steady state.

Onboarding a new vertical

See .claude/skills/onboarding-a-vertical-to-conventions/SKILL.md for the five-step sequence. PF5 (Vertical Scaffolding + Validation) invokes it.

Where the code lives

Surface	Path
Domain enum	`src/consumer_agent/platform/verticals.py`
Dimension constants	`src/consumer_agent/platform/slicing_dimensions.py`
Flag-name validator	`src/consumer_agent/platform/flag_naming.py`
Legacy allow-lists	`src/consumer_agent/platform/legacy_metric_sites.py`
Hashing helper	`src/consumer_agent/platform/hashing.py`
Lint scripts	`scripts/lint_flag_names.py`, `lint_kill_switch_presence.py`, `lint_metric_dimensions.py`, `lint_metric_cardinality.py`

Go-side parity lives at internal/platform/ in consumer-graph-worker, rover-agent, and consumer-context-service.