Skip to content

GSI Deployment Guide: GSI_EpisodeMessages

This deployment adds a Global Secondary Index (GSI_EpisodeMessages) to the consumer-agent-history DynamoDB table for efficient episode validation queries. The feature is controlled by Feature Flipper (consumer_agent_use_episode_gsi) to ensure zero downtime during GSI backfill.

  • Added EpisodeId attribute definition
  • Added GSI_EpisodeMessages index (EpisodeId HASH + CreatedAt RANGE)
  • Feature Flipper Integration: Flag consumer_agent_use_episode_gsi is now managed in Feature Flipper service
  • Environment-level flag (no user segments) - applies to all requests in an environment
  • Default: enabled: false in Feature Flipper
  • Environment variable override: FORCE_USE_EPISODE_GSI=true for immediate control
  • feature_flags.py: Added FeatureFlag.EPISODE_GSI enum
  • storage.py: Updated to use Feature Flipper integration:
    • Checks Feature Flipper cached value at runtime
    • Environment variable override support
    • When enabled: true: Use GSI query (O(1) - efficient)
    • When enabled: false: Use FilterExpression (O(N) - works during backfill)
  • api/main.py: Added startup event to initialize environment-wide flags from Feature Flipper

Phase 1: Deploy Infrastructure + Code (GSI Disabled)

Section titled “Phase 1: Deploy Infrastructure + Code (GSI Disabled)”
  1. Merge PR with all changes

  2. Deploy to Stage

    Terminal window
    # Deploy consumer-agent.yml + code
    # GSI_EpisodeMessages will be created in "CREATING" state
  3. Monitor GSI Backfill

    Terminal window
    aws dynamodb describe-table \
    --table-name stage-consumer-agent-history \
    --profile stage-services \
    --query 'Table.GlobalSecondaryIndexes[?IndexName==`GSI_EpisodeMessages`].IndexStatus'

    Wait for status: "ACTIVE"

  4. Enable GSI in Stage

  5. Deploy to Prod

    Terminal window
    # Deploy consumer-agent.yml + code
    # GSI_EpisodeMessages will be created in "CREATING" state
  6. Monitor GSI Backfill in Prod

    Terminal window
    aws dynamodb describe-table \
    --table-name prod-consumer-agent-history \
    --query 'Table.GlobalSecondaryIndexes[?IndexName==`GSI_EpisodeMessages`].IndexStatus'

    Wait for status: "ACTIVE" (may take hours depending on table size)

  7. Enable GSI in Prod

If issues are detected after enabling the GSI:

  1. Immediate Rollback: Disable GSI

    • Option 1 (Fastest): Set environment variable FORCE_USE_EPISODE_GSI=false and restart service
    • Option 2: Toggle enabled: false in Feature Flipper UI
    • Option 3: Restart service to clear cache (will use Feature Flipper’s current value)
    • All requests immediately fall back to FilterExpression (original behavior)
  2. Remove GSI (if needed)

    • Update consumer-agent.yml to remove GSI definition
    • Redeploy infrastructure
# Logs should show: method=FilterExpression for ALL users
logger.debug("Validated existing episode ... method=FilterExpression")
# Logs should show: method=GSI for ALL users
logger.debug("Validated existing episode ... method=GSI")
  • FilterExpression: Scans all user messages, O(N) read units
  • GSI: Direct lookup by EpisodeId, O(1) read units
  • Monitor CloudWatch metrics for reduced read capacity consumption
  • Stage Backfill: ~Minutes (small dataset)
  • Prod Backfill: ~Hours to Days (depends on table size)
    • DynamoDB backfills at ~10-15 WCUs
    • Estimate: (Total Items / 10) seconds

✅ Feature flag defaults to false (safe) ✅ Code works with both GSI and FilterExpression ✅ No breaking changes to API contracts ✅ Backward compatible (can disable GSI anytime) ✅ DynamoDB handles backfill automatically

Watch for these metrics after enabling GSI:

  1. Error Rates: Should remain unchanged
  2. Response Times: May improve for episode validation
  3. DynamoDB Read Units: Should decrease for get_or_create_episode() calls
  4. Logs: Check for “Episode not found” errors (should be rare)