Phase 2 Schema Enhancements

This document describes the Neo4j schema enhancements needed to unlock full Phase 2 functionality.

Current Status

✅ Phase 2 tools are WORKING NOW with the current Phase 1 schema by using:

COALESCE for default values on missing properties
Intelligent fallbacks and mock data where needed
Zip code matching as proxy for user similarity

🔄 Full feature richness unlocked when schema is enhanced with the properties documented below.

Node Enhancements

Product Node

Current Properties:

Product {
  product_id: string,
  name: string,
  brand: string,
  category: string,
  upc: string,
  norm_name: string
}

Phase 2 Additions:

Product {
  // ... existing properties ...

  // Generic Product Attributes (not food-specific)
  product_type: string,        // "food", "household", "personal_care", "pet"
  organic: boolean,             // Is product organic
  gluten_free: boolean,         // Is gluten-free
  vegan: boolean,               // Is vegan
  sustainable: boolean,         // Has sustainability certifications
  local: boolean,               // Locally sourced
  handmade: boolean,            // Artisan/handmade
  eco_friendly: boolean,        // Eco-friendly packaging/process
  tags: [string],               // Free-form tags: ["high-protein", "probiotic", "low-sugar"]

  // Product Lifecycle Tracking
  first_seen_date: date,        // When product first appeared in data
  trending: boolean,            // Manually or algorithmically flagged as trending

  // Optional: Food-Specific Nutrition (if product_type = "food")
  calories: int,
  protein_g: float,
  sugar_g: float,
  sodium_mg: float,
  fat_g: float
}

Migration Strategy:

// Set defaults for all existing products
MATCH (p:Product)
SET p.product_type = COALESCE(p.product_type, 'food'),
    p.organic = COALESCE(p.organic, false),
    p.gluten_free = COALESCE(p.gluten_free, false),
    p.vegan = COALESCE(p.vegan, false),
    p.sustainable = COALESCE(p.sustainable, false),
    p.local = COALESCE(p.local, false),
    p.handmade = COALESCE(p.handmade, false),
    p.eco_friendly = COALESCE(p.eco_friendly, false),
    p.tags = COALESCE(p.tags, []),
    p.first_seen_date = COALESCE(p.first_seen_date, date()),
    p.trending = COALESCE(p.trending, false)

Benefits:

Enables rich filtering in get_product_details_enhanced
Powers discover_new_products with first_seen_date tracking
Supports dietary/lifestyle preferences matching

User Node

Current Properties:

User {
  user_id: string,
  zip: string
}

Phase 2 Additions:

User {
  // ... existing properties ...

  // Already exists in Phase 1, ensure populated:
  zip: string  // Required for location-based features
}

Note: User.zip already exists and is used by Phase 2 tools. No schema changes needed.

Retailer Node

Current Properties:

Retailer {
  retailer_id: string,
  name: string,
  address: string,
  city: string,
  state: string,
  zip: string
}

Phase 2 Additions:

Retailer {
  // ... existing properties ...

  // Venue Classification
  venue_type: string  // "grocery", "convenience", "warehouse", "pharmacy", "gas_station", "restaurant", "airport"
}

Migration Strategy:

// Set default venue_type for existing retailers
MATCH (r:Retailer)
SET r.venue_type = COALESCE(r.venue_type, 'grocery')

// Optionally classify by name patterns
MATCH (r:Retailer)
WHERE r.name CONTAINS 'Walmart' OR r.name CONTAINS 'Target'
SET r.venue_type = 'warehouse'

MATCH (r:Retailer)
WHERE r.name CONTAINS '7-Eleven' OR r.name CONTAINS 'Circle K'
SET r.venue_type = 'convenience'

Benefits:

Powers get_user_location_context with venue preferences
Enables venue-based offer filtering

Offer Node

Current Properties:

Offer {
  offer_id: string,
  title: string,
  points: int,
  start: date,
  end: date,
  priority: int,
  venue_type: string  // Added in Phase 1 fixes
}

Phase 2 Additions:

Offer {
  // ... existing properties ...

  // Offer Stacking Rules
  stackable: boolean,               // Can this offer be combined with others?
  max_stacks_per_transaction: int,  // Max times this offer can apply in one transaction
  incompatible_offers: [string]     // List of offer_ids that conflict with this one
}

Migration Strategy:

// Set defaults for existing offers
MATCH (o:Offer)
SET o.stackable = COALESCE(o.stackable, true),
    o.max_stacks_per_transaction = COALESCE(o.max_stacks_per_transaction, 1),
    o.incompatible_offers = COALESCE(o.incompatible_offers, [])

Benefits:

Powers optimize_offer_activation with stacking rules
Prevents invalid offer combinations

Relationship Enhancements

PURCHASED Relationship

Current Properties:

User -[:PURCHASED {
  times: int,
  qty: int,
  first: datetime,
  last: datetime,
  timestamps: [datetime]
}]-> Product

Phase 2 Additions:

User -[:PURCHASED {
  // ... existing properties ...

  // Spending Analytics
  total_spent: float,        // Total dollars spent on this product
  avg_price_per_unit: float, // Average price per unit
  min_price: float,          // Lowest price paid
  max_price: float,          // Highest price paid
  price_trend: string        // "stable", "increasing", "decreasing"
}]-> Product

Migration Strategy:

// Initialize with mock data (replace with real transaction data)
MATCH (u:User)-[p:PURCHASED]->(prod:Product)
SET p.total_spent = COALESCE(p.total_spent, p.qty * 5.0),  // Mock: $5 per unit
    p.avg_price_per_unit = COALESCE(p.avg_price_per_unit, 5.0),
    p.min_price = COALESCE(p.min_price, 4.5),
    p.max_price = COALESCE(p.max_price, 5.5),
    p.price_trend = COALESCE(p.price_trend, 'stable')

Benefits:

Powers track_spending_patterns with accurate spending data
Enables price trend analysis in get_product_details_enhanced

VISITED Relationship

Current Properties:

User -[:VISITED {
  times: int,
  first: datetime,
  last: datetime,
  total_spent: float
}]-> Retailer

Phase 2 Additions:

User -[:VISITED {
  // ... existing properties ...

  // Visit Pattern Analytics
  dates: [datetime],         // Array of visit timestamps for frequency analysis
  avg_basket_size: float,    // Average dollars spent per visit
  preferred_day_of_week: int // 0=Sunday, 6=Saturday (most common visit day)
}]-> Retailer

Migration Strategy:

// Initialize with calculated/mock data
MATCH (u:User)-[v:VISITED]->(r:Retailer)
SET v.dates = COALESCE(v.dates, [v.last]),
    v.avg_basket_size = COALESCE(v.avg_basket_size, v.total_spent / v.times),
    v.preferred_day_of_week = COALESCE(v.preferred_day_of_week, 6)  // Default to Saturday

Benefits:

Powers get_user_location_context with visit frequency calculations
Enables shopping pattern insights

SIMILAR Relationship (Products)

Current:

Product -[:SIMILAR {
  score: float,
  version: int
}]-> Product

Phase 2 Note: This relationship already exists with the score property needed for get_product_details_enhanced. No changes required.

New Nodes

Household Node

Purpose: Group users into households for shared shopping analysis

Household {
  household_id: string,           // Unique household identifier
  member_count: int,              // Number of household members
  combined_monthly_budget: float  // Optional budget tracking
}

Creation Strategy:

// Example: Create households based on shared zip codes (simplified)
MATCH (u:User)
WITH u.zip AS zip, collect(u) AS users
WHERE size(users) > 1
CREATE (h:Household {
  household_id: 'HH_' + zip,
  member_count: size(users),
  combined_monthly_budget: 1000.0
})
WITH h, users
UNWIND users AS user
CREATE (user)-[:MEMBER_OF {
  role: 'primary_shopper',  // or 'secondary_shopper'
  joined_date: date()
}]->(h)

Benefits:

Powers get_household_context with household member relationships
Enables shared preference analysis
Supports multi-user shopping optimization

New Relationships

MEMBER_OF (User → Household)

User -[:MEMBER_OF {
  role: string,         // "primary_shopper", "secondary_shopper"
  joined_date: date,    // When user joined household
  contribution_pct: int // Percentage contribution to household shopping (0-100)
}]-> Household

SIMILAR_TO (User → User)

Purpose: Pre-computed user similarity for collaborative filtering

User -[:SIMILAR_TO {
  similarity_score: float,  // 0.0-1.0 similarity measure
  common_products: int,     // Count of shared purchased products
  computed_date: date       // When similarity was last calculated
}]-> User

Computation Strategy:

// Find similar users based on shared product purchases
MATCH (u1:User)-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(u2:User)
WHERE u1.user_id < u2.user_id  // Avoid duplicates
WITH u1, u2, count(DISTINCT p) AS common_products
WHERE common_products >= 5  // Minimum overlap threshold

// Calculate Jaccard similarity
MATCH (u1)-[:PURCHASED]->(p1:Product)
WITH u1, u2, common_products, count(DISTINCT p1) AS u1_products
MATCH (u2)-[:PURCHASED]->(p2:Product)
WITH u1, u2, common_products, u1_products, count(DISTINCT p2) AS u2_products
WITH u1, u2, common_products, u1_products + u2_products - common_products AS union_products
WITH u1, u2, common_products, toFloat(common_products) / toFloat(union_products) AS similarity_score
WHERE similarity_score >= 0.3  // Minimum similarity threshold

CREATE (u1)-[:SIMILAR_TO {
  similarity_score: similarity_score,
  common_products: common_products,
  computed_date: date()
}]->(u2)

Benefits:

Powers get_community_insights with similar user discovery
Enables get_collaborative_recommendations improvements
Supports discover_new_products with community trends

Maintenance:

Re-compute periodically (weekly/monthly) as purchase patterns change
Consider indexing on similarity_score for fast queries

STACKS_WITH (Offer → Offer)

Purpose: Define which offers can be combined

Offer -[:STACKS_WITH {
  max_combined_points: int  // Maximum points when both offers are active
}]-> Offer

CONFLICTS_WITH (Offer → Offer)

Purpose: Define mutually exclusive offers

Offer -[:CONFLICTS_WITH]-> Offer

Benefits:

Powers optimize_offer_activation with stacking logic
Prevents invalid offer combinations

Migration Checklist

Immediate (No Breaking Changes)

Add default boolean values to Product nodes (organic, vegan, etc.)
Add venue_type to Retailer nodes with default “grocery”
Add stackable properties to Offer nodes with defaults
Initialize spending properties on PURCHASED relationships
Initialize visit pattern properties on VISITED relationships

Phase 2 (New Features)

Add product_type and tags to Product nodes
Add first_seen_date and trending to Product nodes
Populate nutrition data for food products (if available)
Create Household nodes for multi-user households
Create MEMBER_OF relationships
Compute and create SIMILAR_TO relationships between users
Create offer stacking relationships (STACKS_WITH, CONFLICTS_WITH)

Data Quality

Validate all boolean fields are true/false (not null)
Ensure all date fields use proper Neo4j date() format
Verify price_trend values are: “stable”, “increasing”, or “decreasing”
Check venue_type values match allowed enum

Performance Considerations

Indexes Needed

// Product attribute filtering
CREATE INDEX product_organic FOR (p:Product) ON (p.organic);
CREATE INDEX product_vegan FOR (p:Product) ON (p.vegan);
CREATE INDEX product_type FOR (p:Product) ON (p.product_type);
CREATE INDEX product_trending FOR (p:Product) ON (p.trending);
CREATE INDEX product_first_seen FOR (p:Product) ON (p.first_seen_date);

// Retailer venue type
CREATE INDEX retailer_venue FOR (r:Retailer) ON (r.venue_type);

// User similarity lookups
CREATE INDEX user_similar_score FOR ()-[r:SIMILAR_TO]-() ON (r.similarity_score);

// Household lookups
CREATE INDEX household_id FOR (h:Household) ON (h.household_id);

Query Optimization

Pre-compute SIMILAR_TO relationships (don’t calculate on-the-fly)
Use LIMIT clauses to prevent large result sets
Consider materialized views for expensive aggregations
Cache frequently accessed user patterns

Backward Compatibility

All Phase 2 schema additions are backward compatible:

New properties use COALESCE with defaults in queries
Existing Phase 1 tools continue to work unchanged
No breaking changes to existing node/relationship structures
Optional relationships (MEMBER_OF, SIMILAR_TO) don’t affect existing queries

Testing Strategy

After schema updates:

Run all Phase 1 integration tests - should still pass
Run all Phase 2 integration tests - should pass with enhanced data
Verify MCP tools return richer responses
Load test with realistic data volumes
Monitor query performance with EXPLAIN/PROFILE