Phase 2 Schema Enhancements
Phase 2 Schema Enhancements
Section titled “Phase 2 Schema Enhancements”This document describes the Neo4j schema enhancements needed to unlock full Phase 2 functionality.
Current Status
Section titled “Current Status”✅ Phase 2 tools are WORKING NOW with the current Phase 1 schema by using:
- COALESCE for default values on missing properties
- Intelligent fallbacks and mock data where needed
- Zip code matching as proxy for user similarity
🔄 Full feature richness unlocked when schema is enhanced with the properties documented below.
Node Enhancements
Section titled “Node Enhancements”Product Node
Section titled “Product Node”Current Properties:
Product { product_id: string, name: string, brand: string, category: string, upc: string, norm_name: string}Phase 2 Additions:
Product { // ... existing properties ...
// Generic Product Attributes (not food-specific) product_type: string, // "food", "household", "personal_care", "pet" organic: boolean, // Is product organic gluten_free: boolean, // Is gluten-free vegan: boolean, // Is vegan sustainable: boolean, // Has sustainability certifications local: boolean, // Locally sourced handmade: boolean, // Artisan/handmade eco_friendly: boolean, // Eco-friendly packaging/process tags: [string], // Free-form tags: ["high-protein", "probiotic", "low-sugar"]
// Product Lifecycle Tracking first_seen_date: date, // When product first appeared in data trending: boolean, // Manually or algorithmically flagged as trending
// Optional: Food-Specific Nutrition (if product_type = "food") calories: int, protein_g: float, sugar_g: float, sodium_mg: float, fat_g: float}Migration Strategy:
// Set defaults for all existing productsMATCH (p:Product)SET p.product_type = COALESCE(p.product_type, 'food'), p.organic = COALESCE(p.organic, false), p.gluten_free = COALESCE(p.gluten_free, false), p.vegan = COALESCE(p.vegan, false), p.sustainable = COALESCE(p.sustainable, false), p.local = COALESCE(p.local, false), p.handmade = COALESCE(p.handmade, false), p.eco_friendly = COALESCE(p.eco_friendly, false), p.tags = COALESCE(p.tags, []), p.first_seen_date = COALESCE(p.first_seen_date, date()), p.trending = COALESCE(p.trending, false)Benefits:
- Enables rich filtering in
get_product_details_enhanced - Powers
discover_new_productswith first_seen_date tracking - Supports dietary/lifestyle preferences matching
User Node
Section titled “User Node”Current Properties:
User { user_id: string, zip: string}Phase 2 Additions:
User { // ... existing properties ...
// Already exists in Phase 1, ensure populated: zip: string // Required for location-based features}Note: User.zip already exists and is used by Phase 2 tools. No schema changes needed.
Retailer Node
Section titled “Retailer Node”Current Properties:
Retailer { retailer_id: string, name: string, address: string, city: string, state: string, zip: string}Phase 2 Additions:
Retailer { // ... existing properties ...
// Venue Classification venue_type: string // "grocery", "convenience", "warehouse", "pharmacy", "gas_station", "restaurant", "airport"}Migration Strategy:
// Set default venue_type for existing retailersMATCH (r:Retailer)SET r.venue_type = COALESCE(r.venue_type, 'grocery')
// Optionally classify by name patternsMATCH (r:Retailer)WHERE r.name CONTAINS 'Walmart' OR r.name CONTAINS 'Target'SET r.venue_type = 'warehouse'
MATCH (r:Retailer)WHERE r.name CONTAINS '7-Eleven' OR r.name CONTAINS 'Circle K'SET r.venue_type = 'convenience'Benefits:
- Powers
get_user_location_contextwith venue preferences - Enables venue-based offer filtering
Offer Node
Section titled “Offer Node”Current Properties:
Offer { offer_id: string, title: string, points: int, start: date, end: date, priority: int, venue_type: string // Added in Phase 1 fixes}Phase 2 Additions:
Offer { // ... existing properties ...
// Offer Stacking Rules stackable: boolean, // Can this offer be combined with others? max_stacks_per_transaction: int, // Max times this offer can apply in one transaction incompatible_offers: [string] // List of offer_ids that conflict with this one}Migration Strategy:
// Set defaults for existing offersMATCH (o:Offer)SET o.stackable = COALESCE(o.stackable, true), o.max_stacks_per_transaction = COALESCE(o.max_stacks_per_transaction, 1), o.incompatible_offers = COALESCE(o.incompatible_offers, [])Benefits:
- Powers
optimize_offer_activationwith stacking rules - Prevents invalid offer combinations
Relationship Enhancements
Section titled “Relationship Enhancements”PURCHASED Relationship
Section titled “PURCHASED Relationship”Current Properties:
User -[:PURCHASED { times: int, qty: int, first: datetime, last: datetime, timestamps: [datetime]}]-> ProductPhase 2 Additions:
User -[:PURCHASED { // ... existing properties ...
// Spending Analytics total_spent: float, // Total dollars spent on this product avg_price_per_unit: float, // Average price per unit min_price: float, // Lowest price paid max_price: float, // Highest price paid price_trend: string // "stable", "increasing", "decreasing"}]-> ProductMigration Strategy:
// Initialize with mock data (replace with real transaction data)MATCH (u:User)-[p:PURCHASED]->(prod:Product)SET p.total_spent = COALESCE(p.total_spent, p.qty * 5.0), // Mock: $5 per unit p.avg_price_per_unit = COALESCE(p.avg_price_per_unit, 5.0), p.min_price = COALESCE(p.min_price, 4.5), p.max_price = COALESCE(p.max_price, 5.5), p.price_trend = COALESCE(p.price_trend, 'stable')Benefits:
- Powers
track_spending_patternswith accurate spending data - Enables price trend analysis in
get_product_details_enhanced
VISITED Relationship
Section titled “VISITED Relationship”Current Properties:
User -[:VISITED { times: int, first: datetime, last: datetime, total_spent: float}]-> RetailerPhase 2 Additions:
User -[:VISITED { // ... existing properties ...
// Visit Pattern Analytics dates: [datetime], // Array of visit timestamps for frequency analysis avg_basket_size: float, // Average dollars spent per visit preferred_day_of_week: int // 0=Sunday, 6=Saturday (most common visit day)}]-> RetailerMigration Strategy:
// Initialize with calculated/mock dataMATCH (u:User)-[v:VISITED]->(r:Retailer)SET v.dates = COALESCE(v.dates, [v.last]), v.avg_basket_size = COALESCE(v.avg_basket_size, v.total_spent / v.times), v.preferred_day_of_week = COALESCE(v.preferred_day_of_week, 6) // Default to SaturdayBenefits:
- Powers
get_user_location_contextwith visit frequency calculations - Enables shopping pattern insights
SIMILAR Relationship (Products)
Section titled “SIMILAR Relationship (Products)”Current:
Product -[:SIMILAR { score: float, version: int}]-> ProductPhase 2 Note:
This relationship already exists with the score property needed for get_product_details_enhanced. No changes required.
New Nodes
Section titled “New Nodes”Household Node
Section titled “Household Node”Purpose: Group users into households for shared shopping analysis
Household { household_id: string, // Unique household identifier member_count: int, // Number of household members combined_monthly_budget: float // Optional budget tracking}Creation Strategy:
// Example: Create households based on shared zip codes (simplified)MATCH (u:User)WITH u.zip AS zip, collect(u) AS usersWHERE size(users) > 1CREATE (h:Household { household_id: 'HH_' + zip, member_count: size(users), combined_monthly_budget: 1000.0})WITH h, usersUNWIND users AS userCREATE (user)-[:MEMBER_OF { role: 'primary_shopper', // or 'secondary_shopper' joined_date: date()}]->(h)Benefits:
- Powers
get_household_contextwith household member relationships - Enables shared preference analysis
- Supports multi-user shopping optimization
New Relationships
Section titled “New Relationships”MEMBER_OF (User → Household)
Section titled “MEMBER_OF (User → Household)”User -[:MEMBER_OF { role: string, // "primary_shopper", "secondary_shopper" joined_date: date, // When user joined household contribution_pct: int // Percentage contribution to household shopping (0-100)}]-> HouseholdSIMILAR_TO (User → User)
Section titled “SIMILAR_TO (User → User)”Purpose: Pre-computed user similarity for collaborative filtering
User -[:SIMILAR_TO { similarity_score: float, // 0.0-1.0 similarity measure common_products: int, // Count of shared purchased products computed_date: date // When similarity was last calculated}]-> UserComputation Strategy:
// Find similar users based on shared product purchasesMATCH (u1:User)-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(u2:User)WHERE u1.user_id < u2.user_id // Avoid duplicatesWITH u1, u2, count(DISTINCT p) AS common_productsWHERE common_products >= 5 // Minimum overlap threshold
// Calculate Jaccard similarityMATCH (u1)-[:PURCHASED]->(p1:Product)WITH u1, u2, common_products, count(DISTINCT p1) AS u1_productsMATCH (u2)-[:PURCHASED]->(p2:Product)WITH u1, u2, common_products, u1_products, count(DISTINCT p2) AS u2_productsWITH u1, u2, common_products, u1_products + u2_products - common_products AS union_productsWITH u1, u2, common_products, toFloat(common_products) / toFloat(union_products) AS similarity_scoreWHERE similarity_score >= 0.3 // Minimum similarity threshold
CREATE (u1)-[:SIMILAR_TO { similarity_score: similarity_score, common_products: common_products, computed_date: date()}]->(u2)Benefits:
- Powers
get_community_insightswith similar user discovery - Enables
get_collaborative_recommendationsimprovements - Supports
discover_new_productswith community trends
Maintenance:
- Re-compute periodically (weekly/monthly) as purchase patterns change
- Consider indexing on similarity_score for fast queries
STACKS_WITH (Offer → Offer)
Section titled “STACKS_WITH (Offer → Offer)”Purpose: Define which offers can be combined
Offer -[:STACKS_WITH { max_combined_points: int // Maximum points when both offers are active}]-> OfferCONFLICTS_WITH (Offer → Offer)
Section titled “CONFLICTS_WITH (Offer → Offer)”Purpose: Define mutually exclusive offers
Offer -[:CONFLICTS_WITH]-> OfferBenefits:
- Powers
optimize_offer_activationwith stacking logic - Prevents invalid offer combinations
Migration Checklist
Section titled “Migration Checklist”Immediate (No Breaking Changes)
Section titled “Immediate (No Breaking Changes)”- Add default boolean values to Product nodes (organic, vegan, etc.)
- Add venue_type to Retailer nodes with default “grocery”
- Add stackable properties to Offer nodes with defaults
- Initialize spending properties on PURCHASED relationships
- Initialize visit pattern properties on VISITED relationships
Phase 2 (New Features)
Section titled “Phase 2 (New Features)”- Add product_type and tags to Product nodes
- Add first_seen_date and trending to Product nodes
- Populate nutrition data for food products (if available)
- Create Household nodes for multi-user households
- Create MEMBER_OF relationships
- Compute and create SIMILAR_TO relationships between users
- Create offer stacking relationships (STACKS_WITH, CONFLICTS_WITH)
Data Quality
Section titled “Data Quality”- Validate all boolean fields are true/false (not null)
- Ensure all date fields use proper Neo4j date() format
- Verify price_trend values are: “stable”, “increasing”, or “decreasing”
- Check venue_type values match allowed enum
Performance Considerations
Section titled “Performance Considerations”Indexes Needed
Section titled “Indexes Needed”// Product attribute filteringCREATE INDEX product_organic FOR (p:Product) ON (p.organic);CREATE INDEX product_vegan FOR (p:Product) ON (p.vegan);CREATE INDEX product_type FOR (p:Product) ON (p.product_type);CREATE INDEX product_trending FOR (p:Product) ON (p.trending);CREATE INDEX product_first_seen FOR (p:Product) ON (p.first_seen_date);
// Retailer venue typeCREATE INDEX retailer_venue FOR (r:Retailer) ON (r.venue_type);
// User similarity lookupsCREATE INDEX user_similar_score FOR ()-[r:SIMILAR_TO]-() ON (r.similarity_score);
// Household lookupsCREATE INDEX household_id FOR (h:Household) ON (h.household_id);Query Optimization
Section titled “Query Optimization”- Pre-compute SIMILAR_TO relationships (don’t calculate on-the-fly)
- Use LIMIT clauses to prevent large result sets
- Consider materialized views for expensive aggregations
- Cache frequently accessed user patterns
Backward Compatibility
Section titled “Backward Compatibility”All Phase 2 schema additions are backward compatible:
- New properties use COALESCE with defaults in queries
- Existing Phase 1 tools continue to work unchanged
- No breaking changes to existing node/relationship structures
- Optional relationships (MEMBER_OF, SIMILAR_TO) don’t affect existing queries
Testing Strategy
Section titled “Testing Strategy”After schema updates:
- Run all Phase 1 integration tests - should still pass
- Run all Phase 2 integration tests - should pass with enhanced data
- Verify MCP tools return richer responses
- Load test with realistic data volumes
- Monitor query performance with EXPLAIN/PROFILE