Skip to content

Phase 2 Schema Enhancements

This document describes the Neo4j schema enhancements needed to unlock full Phase 2 functionality.

Phase 2 tools are WORKING NOW with the current Phase 1 schema by using:

  • COALESCE for default values on missing properties
  • Intelligent fallbacks and mock data where needed
  • Zip code matching as proxy for user similarity

🔄 Full feature richness unlocked when schema is enhanced with the properties documented below.


Current Properties:

Product {
product_id: string,
name: string,
brand: string,
category: string,
upc: string,
norm_name: string
}

Phase 2 Additions:

Product {
// ... existing properties ...
// Generic Product Attributes (not food-specific)
product_type: string, // "food", "household", "personal_care", "pet"
organic: boolean, // Is product organic
gluten_free: boolean, // Is gluten-free
vegan: boolean, // Is vegan
sustainable: boolean, // Has sustainability certifications
local: boolean, // Locally sourced
handmade: boolean, // Artisan/handmade
eco_friendly: boolean, // Eco-friendly packaging/process
tags: [string], // Free-form tags: ["high-protein", "probiotic", "low-sugar"]
// Product Lifecycle Tracking
first_seen_date: date, // When product first appeared in data
trending: boolean, // Manually or algorithmically flagged as trending
// Optional: Food-Specific Nutrition (if product_type = "food")
calories: int,
protein_g: float,
sugar_g: float,
sodium_mg: float,
fat_g: float
}

Migration Strategy:

// Set defaults for all existing products
MATCH (p:Product)
SET p.product_type = COALESCE(p.product_type, 'food'),
p.organic = COALESCE(p.organic, false),
p.gluten_free = COALESCE(p.gluten_free, false),
p.vegan = COALESCE(p.vegan, false),
p.sustainable = COALESCE(p.sustainable, false),
p.local = COALESCE(p.local, false),
p.handmade = COALESCE(p.handmade, false),
p.eco_friendly = COALESCE(p.eco_friendly, false),
p.tags = COALESCE(p.tags, []),
p.first_seen_date = COALESCE(p.first_seen_date, date()),
p.trending = COALESCE(p.trending, false)

Benefits:

  • Enables rich filtering in get_product_details_enhanced
  • Powers discover_new_products with first_seen_date tracking
  • Supports dietary/lifestyle preferences matching

Current Properties:

User {
user_id: string,
zip: string
}

Phase 2 Additions:

User {
// ... existing properties ...
// Already exists in Phase 1, ensure populated:
zip: string // Required for location-based features
}

Note: User.zip already exists and is used by Phase 2 tools. No schema changes needed.


Current Properties:

Retailer {
retailer_id: string,
name: string,
address: string,
city: string,
state: string,
zip: string
}

Phase 2 Additions:

Retailer {
// ... existing properties ...
// Venue Classification
venue_type: string // "grocery", "convenience", "warehouse", "pharmacy", "gas_station", "restaurant", "airport"
}

Migration Strategy:

// Set default venue_type for existing retailers
MATCH (r:Retailer)
SET r.venue_type = COALESCE(r.venue_type, 'grocery')
// Optionally classify by name patterns
MATCH (r:Retailer)
WHERE r.name CONTAINS 'Walmart' OR r.name CONTAINS 'Target'
SET r.venue_type = 'warehouse'
MATCH (r:Retailer)
WHERE r.name CONTAINS '7-Eleven' OR r.name CONTAINS 'Circle K'
SET r.venue_type = 'convenience'

Benefits:

  • Powers get_user_location_context with venue preferences
  • Enables venue-based offer filtering

Current Properties:

Offer {
offer_id: string,
title: string,
points: int,
start: date,
end: date,
priority: int,
venue_type: string // Added in Phase 1 fixes
}

Phase 2 Additions:

Offer {
// ... existing properties ...
// Offer Stacking Rules
stackable: boolean, // Can this offer be combined with others?
max_stacks_per_transaction: int, // Max times this offer can apply in one transaction
incompatible_offers: [string] // List of offer_ids that conflict with this one
}

Migration Strategy:

// Set defaults for existing offers
MATCH (o:Offer)
SET o.stackable = COALESCE(o.stackable, true),
o.max_stacks_per_transaction = COALESCE(o.max_stacks_per_transaction, 1),
o.incompatible_offers = COALESCE(o.incompatible_offers, [])

Benefits:

  • Powers optimize_offer_activation with stacking rules
  • Prevents invalid offer combinations

Current Properties:

User -[:PURCHASED {
times: int,
qty: int,
first: datetime,
last: datetime,
timestamps: [datetime]
}]-> Product

Phase 2 Additions:

User -[:PURCHASED {
// ... existing properties ...
// Spending Analytics
total_spent: float, // Total dollars spent on this product
avg_price_per_unit: float, // Average price per unit
min_price: float, // Lowest price paid
max_price: float, // Highest price paid
price_trend: string // "stable", "increasing", "decreasing"
}]-> Product

Migration Strategy:

// Initialize with mock data (replace with real transaction data)
MATCH (u:User)-[p:PURCHASED]->(prod:Product)
SET p.total_spent = COALESCE(p.total_spent, p.qty * 5.0), // Mock: $5 per unit
p.avg_price_per_unit = COALESCE(p.avg_price_per_unit, 5.0),
p.min_price = COALESCE(p.min_price, 4.5),
p.max_price = COALESCE(p.max_price, 5.5),
p.price_trend = COALESCE(p.price_trend, 'stable')

Benefits:

  • Powers track_spending_patterns with accurate spending data
  • Enables price trend analysis in get_product_details_enhanced

Current Properties:

User -[:VISITED {
times: int,
first: datetime,
last: datetime,
total_spent: float
}]-> Retailer

Phase 2 Additions:

User -[:VISITED {
// ... existing properties ...
// Visit Pattern Analytics
dates: [datetime], // Array of visit timestamps for frequency analysis
avg_basket_size: float, // Average dollars spent per visit
preferred_day_of_week: int // 0=Sunday, 6=Saturday (most common visit day)
}]-> Retailer

Migration Strategy:

// Initialize with calculated/mock data
MATCH (u:User)-[v:VISITED]->(r:Retailer)
SET v.dates = COALESCE(v.dates, [v.last]),
v.avg_basket_size = COALESCE(v.avg_basket_size, v.total_spent / v.times),
v.preferred_day_of_week = COALESCE(v.preferred_day_of_week, 6) // Default to Saturday

Benefits:

  • Powers get_user_location_context with visit frequency calculations
  • Enables shopping pattern insights

Current:

Product -[:SIMILAR {
score: float,
version: int
}]-> Product

Phase 2 Note: This relationship already exists with the score property needed for get_product_details_enhanced. No changes required.


Purpose: Group users into households for shared shopping analysis

Household {
household_id: string, // Unique household identifier
member_count: int, // Number of household members
combined_monthly_budget: float // Optional budget tracking
}

Creation Strategy:

// Example: Create households based on shared zip codes (simplified)
MATCH (u:User)
WITH u.zip AS zip, collect(u) AS users
WHERE size(users) > 1
CREATE (h:Household {
household_id: 'HH_' + zip,
member_count: size(users),
combined_monthly_budget: 1000.0
})
WITH h, users
UNWIND users AS user
CREATE (user)-[:MEMBER_OF {
role: 'primary_shopper', // or 'secondary_shopper'
joined_date: date()
}]->(h)

Benefits:

  • Powers get_household_context with household member relationships
  • Enables shared preference analysis
  • Supports multi-user shopping optimization

User -[:MEMBER_OF {
role: string, // "primary_shopper", "secondary_shopper"
joined_date: date, // When user joined household
contribution_pct: int // Percentage contribution to household shopping (0-100)
}]-> Household

Purpose: Pre-computed user similarity for collaborative filtering

User -[:SIMILAR_TO {
similarity_score: float, // 0.0-1.0 similarity measure
common_products: int, // Count of shared purchased products
computed_date: date // When similarity was last calculated
}]-> User

Computation Strategy:

// Find similar users based on shared product purchases
MATCH (u1:User)-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(u2:User)
WHERE u1.user_id < u2.user_id // Avoid duplicates
WITH u1, u2, count(DISTINCT p) AS common_products
WHERE common_products >= 5 // Minimum overlap threshold
// Calculate Jaccard similarity
MATCH (u1)-[:PURCHASED]->(p1:Product)
WITH u1, u2, common_products, count(DISTINCT p1) AS u1_products
MATCH (u2)-[:PURCHASED]->(p2:Product)
WITH u1, u2, common_products, u1_products, count(DISTINCT p2) AS u2_products
WITH u1, u2, common_products, u1_products + u2_products - common_products AS union_products
WITH u1, u2, common_products, toFloat(common_products) / toFloat(union_products) AS similarity_score
WHERE similarity_score >= 0.3 // Minimum similarity threshold
CREATE (u1)-[:SIMILAR_TO {
similarity_score: similarity_score,
common_products: common_products,
computed_date: date()
}]->(u2)

Benefits:

  • Powers get_community_insights with similar user discovery
  • Enables get_collaborative_recommendations improvements
  • Supports discover_new_products with community trends

Maintenance:

  • Re-compute periodically (weekly/monthly) as purchase patterns change
  • Consider indexing on similarity_score for fast queries

Purpose: Define which offers can be combined

Offer -[:STACKS_WITH {
max_combined_points: int // Maximum points when both offers are active
}]-> Offer

Purpose: Define mutually exclusive offers

Offer -[:CONFLICTS_WITH]-> Offer

Benefits:

  • Powers optimize_offer_activation with stacking logic
  • Prevents invalid offer combinations

  • Add default boolean values to Product nodes (organic, vegan, etc.)
  • Add venue_type to Retailer nodes with default “grocery”
  • Add stackable properties to Offer nodes with defaults
  • Initialize spending properties on PURCHASED relationships
  • Initialize visit pattern properties on VISITED relationships
  • Add product_type and tags to Product nodes
  • Add first_seen_date and trending to Product nodes
  • Populate nutrition data for food products (if available)
  • Create Household nodes for multi-user households
  • Create MEMBER_OF relationships
  • Compute and create SIMILAR_TO relationships between users
  • Create offer stacking relationships (STACKS_WITH, CONFLICTS_WITH)
  • Validate all boolean fields are true/false (not null)
  • Ensure all date fields use proper Neo4j date() format
  • Verify price_trend values are: “stable”, “increasing”, or “decreasing”
  • Check venue_type values match allowed enum

// Product attribute filtering
CREATE INDEX product_organic FOR (p:Product) ON (p.organic);
CREATE INDEX product_vegan FOR (p:Product) ON (p.vegan);
CREATE INDEX product_type FOR (p:Product) ON (p.product_type);
CREATE INDEX product_trending FOR (p:Product) ON (p.trending);
CREATE INDEX product_first_seen FOR (p:Product) ON (p.first_seen_date);
// Retailer venue type
CREATE INDEX retailer_venue FOR (r:Retailer) ON (r.venue_type);
// User similarity lookups
CREATE INDEX user_similar_score FOR ()-[r:SIMILAR_TO]-() ON (r.similarity_score);
// Household lookups
CREATE INDEX household_id FOR (h:Household) ON (h.household_id);
  • Pre-compute SIMILAR_TO relationships (don’t calculate on-the-fly)
  • Use LIMIT clauses to prevent large result sets
  • Consider materialized views for expensive aggregations
  • Cache frequently accessed user patterns

All Phase 2 schema additions are backward compatible:

  • New properties use COALESCE with defaults in queries
  • Existing Phase 1 tools continue to work unchanged
  • No breaking changes to existing node/relationship structures
  • Optional relationships (MEMBER_OF, SIMILAR_TO) don’t affect existing queries

After schema updates:

  1. Run all Phase 1 integration tests - should still pass
  2. Run all Phase 2 integration tests - should pass with enhanced data
  3. Verify MCP tools return richer responses
  4. Load test with realistic data volumes
  5. Monitor query performance with EXPLAIN/PROFILE