GSI Deployment Guide: GSI_EpisodeMessages
GSI Deployment Guide: GSI_EpisodeMessages
Section titled “GSI Deployment Guide: GSI_EpisodeMessages”Overview
Section titled “Overview”This deployment adds a Global Secondary Index (GSI_EpisodeMessages) to the consumer-agent-history DynamoDB table for efficient episode validation queries. The feature is controlled by Feature Flipper (consumer_agent_use_episode_gsi) to ensure zero downtime during GSI backfill.
What Changed
Section titled “What Changed”Infrastructure (consumer-agent.yml)
Section titled “Infrastructure (consumer-agent.yml)”- Added
EpisodeIdattribute definition - Added
GSI_EpisodeMessagesindex (EpisodeId HASH + CreatedAt RANGE)
Configuration
Section titled “Configuration”- Feature Flipper Integration: Flag
consumer_agent_use_episode_gsiis now managed in Feature Flipper service - Environment-level flag (no user segments) - applies to all requests in an environment
- Default:
enabled: falsein Feature Flipper - Environment variable override:
FORCE_USE_EPISODE_GSI=truefor immediate control
Code Changes
Section titled “Code Changes”feature_flags.py: AddedFeatureFlag.EPISODE_GSIenumstorage.py: Updated to use Feature Flipper integration:- Checks Feature Flipper cached value at runtime
- Environment variable override support
- When
enabled: true: Use GSI query (O(1) - efficient) - When
enabled: false: Use FilterExpression (O(N) - works during backfill)
api/main.py: Added startup event to initialize environment-wide flags from Feature Flipper
Deployment Steps
Section titled “Deployment Steps”Phase 1: Deploy Infrastructure + Code (GSI Disabled)
Section titled “Phase 1: Deploy Infrastructure + Code (GSI Disabled)”-
Merge PR with all changes
-
Deploy to Stage
Terminal window # Deploy consumer-agent.yml + code# GSI_EpisodeMessages will be created in "CREATING" state -
Monitor GSI Backfill
Terminal window aws dynamodb describe-table \--table-name stage-consumer-agent-history \--profile stage-services \--query 'Table.GlobalSecondaryIndexes[?IndexName==`GSI_EpisodeMessages`].IndexStatus'Wait for status:
"ACTIVE" -
Enable GSI in Stage
- Go to Feature Flipper UI: https://stage-feature-flipper.fetchrewards.com/flags/edit/consumer_agent_use_episode_gsi
- Toggle
enabled: true - OR set environment variable:
FORCE_USE_EPISODE_GSI=trueand redeploy - Service will pick up the flag on next check (or restart)
- Test episode validation
- Monitor logs for
method=GSI(all users should use GSI)
-
Deploy to Prod
Terminal window # Deploy consumer-agent.yml + code# GSI_EpisodeMessages will be created in "CREATING" state -
Monitor GSI Backfill in Prod
Terminal window aws dynamodb describe-table \--table-name prod-consumer-agent-history \--query 'Table.GlobalSecondaryIndexes[?IndexName==`GSI_EpisodeMessages`].IndexStatus'Wait for status:
"ACTIVE"(may take hours depending on table size) -
Enable GSI in Prod
- Go to Feature Flipper UI: https://prod-feature-flipper.fetchrewards.com/flags/edit/consumer_agent_use_episode_gsi
- Toggle
enabled: true - OR set environment variable:
FORCE_USE_EPISODE_GSI=trueand redeploy - Service will pick up the flag on next startup
- Monitor metrics:
- Error rates (should be unchanged)
- DynamoDB read capacity (should decrease)
- Response times (may improve)
Rollback Plan
Section titled “Rollback Plan”If issues are detected after enabling the GSI:
-
Immediate Rollback: Disable GSI
- Option 1 (Fastest): Set environment variable
FORCE_USE_EPISODE_GSI=falseand restart service - Option 2: Toggle
enabled: falsein Feature Flipper UI - Option 3: Restart service to clear cache (will use Feature Flipper’s current value)
- All requests immediately fall back to FilterExpression (original behavior)
- Option 1 (Fastest): Set environment variable
-
Remove GSI (if needed)
- Update
consumer-agent.ymlto remove GSI definition - Redeploy infrastructure
- Update
Verification
Section titled “Verification”Before Enabling GSI (Default Behavior)
Section titled “Before Enabling GSI (Default Behavior)”# Logs should show: method=FilterExpression for ALL userslogger.debug("Validated existing episode ... method=FilterExpression")After Enabling GSI (Optimized Behavior)
Section titled “After Enabling GSI (Optimized Behavior)”# Logs should show: method=GSI for ALL userslogger.debug("Validated existing episode ... method=GSI")Performance Comparison
Section titled “Performance Comparison”- FilterExpression: Scans all user messages, O(N) read units
- GSI: Direct lookup by EpisodeId, O(1) read units
- Monitor CloudWatch metrics for reduced read capacity consumption
Timeline Estimate
Section titled “Timeline Estimate”- Stage Backfill: ~Minutes (small dataset)
- Prod Backfill: ~Hours to Days (depends on table size)
- DynamoDB backfills at ~10-15 WCUs
- Estimate: (Total Items / 10) seconds
Safety Checks
Section titled “Safety Checks”✅ Feature flag defaults to false (safe)
✅ Code works with both GSI and FilterExpression
✅ No breaking changes to API contracts
✅ Backward compatible (can disable GSI anytime)
✅ DynamoDB handles backfill automatically
Monitoring
Section titled “Monitoring”Watch for these metrics after enabling GSI:
- Error Rates: Should remain unchanged
- Response Times: May improve for episode validation
- DynamoDB Read Units: Should decrease for
get_or_create_episode()calls - Logs: Check for “Episode not found” errors (should be rare)