Skip to content

Flag & Observability Conventions

One-page operator reference for the cross-vertical conventions every vertical in the consumer-agent stack inherits. Source: spec PR #300.

Every feature flag follows:

ai_assistant_<domain>_<purpose>[_<modifier>]
  • <domain> — one of the enumerated vertical names (shop, rewards, play, ereceipts, offerdetails, restaurant, retailer, support), a platform component (orchestrator, gateway, worker, relay), a special scope (pool, infra, user), or a sub-agent id (subagent_<id>).
  • <purpose> — one of experiment, pool, killswitch, enabled, ramp, test.
  • <modifier> (optional) — version (v3), variant, or platform constraint.

The validator picks the last recognized purpose token. So ai_assistant_rewards_v3_experiment_killswitch parses as purpose=killswitch, domain rewards_v3_experiment.

Enforced by: scripts/lint_flag_names.py (wired into make lint + pre-commit). Legacy consumer_agent_* flags are grandfathered via LEGACY_FLAG_NAMES in src/consumer_agent/platform/flag_naming.py.

created → ramping → fully-rolled-out → eligible-for-removal → removed

State is computed from Flipper API + code-grep + flip history, never authored. A flag at 100% for ≥30 days with zero churn moves to eligible-for-removal. Cleanup PRs follow the PLT-553 / PLT-554 working example.

Every production-impacting flag must have a paired flag with the _killswitch suffix in the same enum. Example:

  • Cohort/test: ai_assistant_rewards_v3_experiment
  • Paired killswitch: ai_assistant_rewards_v3_experiment_killswitch

The killswitch is operator-flippable (no engineering approval needed at flip time). Defaults ON. OFF immediately drops to the previous stable state at next factory invocation.

Internal-only flags (FORCE_* env-var overrides, debug toggles) are exempt — register them in INTERNAL_ONLY_FLAGS in flag_naming.py with a one-line comment.

Enforced by: scripts/lint_kill_switch_presence.py.

ai_assistant_user_pool (platform-owned)
├── ai_assistant_rewards_v3_test (rewards-owned)
├── ai_assistant_shop_pricing_test (shop-owned)
└── ai_assistant_restaurant_v1_test (restaurant-owned)

Cohort key is user_id. Sticky assignment via Feature Flipper. Multiple test flags hang off one pool without coordination.

Every vertical with production traffic has:

  1. Engagement — turns/episode, attach rate, intent-completion, p50/p95 latency.
  2. Judge scores — per-judge score distribution from PC5; threshold lines visible.
  3. Kill-switch state — current state of every kill-switch scoped to the vertical; flips in the last 7 days.

Verticals that emit DM types additionally have:

  1. DM funnel (per PLT-629) — Targeted → Sent → Delivered → Displayed → Tapped → CTA; sliced by dm_type and experiment_arm.

Templates live in PLT-628’s implementation repo; per-vertical dashboards inherit them.

Every per-traffic metric label, span attribute, and PS5 event-store column carries:

DimensionSourceNull when
verticalconsumer_agent.platform.verticalsNever null; platform for cross-vertical orchestrator/gateway
sub_agent_idPC1 Agent Definition idNon-agent emissions (e.g., gateway routing)
dm_typePD3 DM type idChat-flow traffic
experiment_armPC6 cohort assignmentNo experiment active
agent_definition_versionPC1 + PF1Non-Agent-Definition emissions

Import from consumer_agent.platform.slicing_dimensions:

from consumer_agent.platform.slicing_dimensions import (
VERTICAL, SUB_AGENT_ID, DM_TYPE, EXPERIMENT_ARM, AGENT_DEFINITION_VERSION,
)

Enforced by: scripts/lint_metric_dimensions.py. Existing sites that pre-date the convention are in LEGACY_METRIC_SITES; they shrink toward zero as code is touched. Sites that are genuinely not per-traffic opt out with a # convention: not-per-traffic line comment.

Raw user_id, request_id, principal, session_id are forbidden as label keys on metric/span emissions (FR-8). For debug-trail attribution use hashed_user_id():

from consumer_agent.platform.hashing import hashed_user_id
counter.add(1, {
VERTICAL: "rewards",
...
"user_hash": hashed_user_id(user_id), # bounded-cardinality
})

Even hashed identifiers shouldn’t appear on metric labels — only on trace attributes where operational cardinality bounds can be enforced.

Enforced by: scripts/lint_metric_cardinality.py.

Wired to Rootly (per PLT-628 alert-routing track):

AlertConditionSeverity
Safety/refusal judge floor breacheval.judge.score_distribution{judge_classification=safety} < floorpage
Quality judge regressionsame, quality < baseline - tolerancenotify
Kill-switch flipped OFFflag state change observedpage
Engagement dropengagement metric drops > threshold% vs baselinenotify
DM funnel anomaly (DM verticals only)tap rate or CTA rate vs baseline > threshold%notify

Noise budget: ≤ 1 page per on-call week at steady state.

See .claude/skills/onboarding-a-vertical-to-conventions/SKILL.md for the five-step sequence. PF5 (Vertical Scaffolding + Validation) invokes it.

SurfacePath
Domain enumsrc/consumer_agent/platform/verticals.py
Dimension constantssrc/consumer_agent/platform/slicing_dimensions.py
Flag-name validatorsrc/consumer_agent/platform/flag_naming.py
Legacy allow-listssrc/consumer_agent/platform/legacy_metric_sites.py
Hashing helpersrc/consumer_agent/platform/hashing.py
Lint scriptsscripts/lint_flag_names.py, lint_kill_switch_presence.py, lint_metric_dimensions.py, lint_metric_cardinality.py

Go-side parity lives at internal/platform/ in consumer-graph-worker, rover-agent, and consumer-context-service.