Consumer Graph MCP - Complete Database Schema
Consumer Graph MCP - Complete Database Schema
Section titled “Consumer Graph MCP - Complete Database Schema”Overview
Section titled “Overview”This document provides a comprehensive view of the Neo4j graph database schema for the Consumer Graph MCP server, including all node types, relationships, properties, indexes, and detailed calculation notes.
Last Updated: 2025-11-08 Schema Version: Phase 2 Complete Total Node Types: 5 Total Relationship Types: 6
Node Types
Section titled “Node Types”1. User
Section titled “1. User”Represents a consumer/shopper in the system.
Properties:
| Property | Type | Phase | Required | Description | Calculation Notes |
|---|---|---|---|---|---|
user_id | string | 1 | ✅ | Unique user identifier | Primary key |
zip | string | 1 | ✅ | User’s zip code | Used for location-based features |
created_at | datetime | 1 | ✅ | Account creation timestamp | Set to time.Now().UTC() on creation |
Indexes:
- None currently (recommend:
user_idfor lookups)
Sample Cypher:
CREATE (u:User { user_id: 'USER_123', zip: '60614', created_at: datetime()})Count: ~5 test users (production: thousands)
2. Product
Section titled “2. Product”Represents a purchasable product/item.
Properties:
| Property | Type | Phase | Required | Description | Calculation Notes |
|---|---|---|---|---|---|
product_id | string | 1 | ✅ | Unique product identifier | Primary key |
name | string | 1 | ✅ | Product display name | Human-readable |
norm_name | string | 1 | ✅ | Normalized product name | Lowercase, no special chars |
brand | string | 1 | ✅ | Brand/manufacturer name | |
category | string | 1 | ✅ | Product category | E.g., “Dairy”, “Produce”, “Meat” |
upc | string | 1 | ✅ | Universal Product Code | Barcode identifier |
product_type | string | 2 | ❌ | Product type classification | Default: ‘food’ |
organic | boolean | 2 | ❌ | Organic certification flag | true for Dairy/Produce |
vegan | boolean | 2 | ❌ | Vegan product flag | true for Produce/Grains |
gluten_free | boolean | 2 | ❌ | Gluten-free flag | true for Produce/Dairy/Meat |
sustainable | boolean | 2 | ❌ | Sustainability flag | Default: false |
local | boolean | 2 | ❌ | Locally sourced flag | Default: false |
handmade | boolean | 2 | ❌ | Handmade/artisan flag | Default: false |
eco_friendly | boolean | 2 | ❌ | Eco-friendly packaging flag | Default: false |
tags | array[string] | 2 | ❌ | Descriptive tags | Category-specific (e.g., [‘high-protein’, ‘calcium’]) |
first_seen_date | date | 2 | ❌ | Product introduction date | Default: current date |
trending | boolean | 2 | ❌ | Trending product flag | Random 20% marked as trending |
Indexes:
- None currently (recommend:
product_id,category,brand)
Sample Cypher:
CREATE (p:Product { product_id: 'PROD_123', name: 'Organic Whole Milk', norm_name: 'organic whole milk', brand: 'Horizon', category: 'Dairy', upc: '074660011000', product_type: 'food', organic: true, vegan: false, gluten_free: true, tags: ['high-protein', 'calcium'], first_seen_date: date(), trending: false})Count: ~5 test products (production: thousands)
3. Retailer
Section titled “3. Retailer”Represents a physical store/retailer location.
Properties:
| Property | Type | Phase | Required | Description | Calculation Notes |
|---|---|---|---|---|---|
retailer_id | string | 1 | ✅ | Unique retailer identifier | Primary key |
name | string | 1 | ✅ | Store name | E.g., “Whole Foods Market” |
address | string | 1 | ✅ | Street address | |
city | string | 1 | ✅ | City name | |
state | string | 1 | ✅ | State abbreviation | E.g., “IL” |
zip | string | 1 | ✅ | Zip code | Used for location matching |
venue_type | string | 2 | ❌ | Store type classification | grocery, warehouse, convenience, pharmacy |
Venue Type Classification:
- Default: ‘grocery’
- ‘warehouse’: Walmart, Target, “Warehouse” in name
- ‘convenience’: 7-Eleven, Circle K, “Convenience” in name
- ‘pharmacy’: CVS, Walgreens, “Pharmacy” in name
Indexes:
- None currently (recommend:
retailer_id,zip)
Sample Cypher:
CREATE (r:Retailer { retailer_id: 'RET_123', name: 'Whole Foods Market', address: '1000 W North Ave', city: 'Chicago', state: 'IL', zip: '60642', venue_type: 'grocery'})Count: ~3 test retailers (production: thousands)
4. Offer
Section titled “4. Offer”Represents a promotional offer/reward opportunity.
Properties:
| Property | Type | Phase | Required | Description | Calculation Notes |
|---|---|---|---|---|---|
offer_id | string | 1 | ✅ | Unique offer identifier | Primary key |
title | string | 1 | ✅ | Offer display title | Human-readable description |
points | integer | 1 | ✅ | Points value | Reward points earned |
start | date | 1 | ✅ | Offer start date | Uses Neo4j date() type |
end | date | 1 | ✅ | Offer end date | Uses Neo4j date() type |
priority | integer | 1 | ✅ | Display priority | Higher = more prominent |
venue_type | string | 1 | ❌ | Applicable venue type | grocery, warehouse, convenience, pharmacy, or null (all) |
stackable | boolean | 2 | ❌ | Can stack with other offers | Default: true |
max_stacks_per_transaction | integer | 2 | ❌ | Max stacks allowed | Default: 1 |
incompatible_offers | array[string] | 2 | ❌ | Incompatible offer IDs | Default: [] |
Date Handling:
startandendare Neo4j DATE types- CRITICAL: Must use
date()function in comparisons, NOTdatetime() - Example:
WHERE date() >= o.start AND date() <= o.end
Stacking Rules:
- Offers with ≥1000 points: stackable=false
- Others: stackable=true by default
Indexes:
- None currently (recommend:
offer_id,start,end)
Sample Cypher:
CREATE (o:Offer { offer_id: 'OFFER_123', title: 'Buy 2 Get 500 Points', points: 500, start: date('2025-01-01'), end: date('2025-12-31'), priority: 5, venue_type: 'grocery', stackable: true, max_stacks_per_transaction: 1, incompatible_offers: []})Count: ~5 test offers (production: hundreds)
5. Household
Section titled “5. Household”Phase 2 Only - Represents a household grouping of users.
Properties:
| Property | Type | Phase | Required | Description | Calculation Notes |
|---|---|---|---|---|---|
household_id | string | 2 | ✅ | Unique household identifier | Format: HH_{zip}_{random} |
member_count | integer | 2 | ✅ | Number of household members | Max 3 per household |
combined_monthly_budget | float | 2 | ✅ | Household budget | Default: $1000.00 |
Creation Logic:
- Groups users by zip code
- Maximum 3 users per household
- Only creates if 2+ users in same zip
Sample Cypher:
CREATE (h:Household { household_id: 'HH_60614_742', member_count: 3, combined_monthly_budget: 1000.0})Count: Varies by zip code distribution (test: 0-2)
Relationship Types
Section titled “Relationship Types”1. PURCHASED
Section titled “1. PURCHASED”Pattern: (User)-[:PURCHASED]->(Product)
Tracks user purchase history for a product.
Properties:
| Property | Type | Phase | Required | Description | Calculation Notes |
|---|---|---|---|---|---|
times | integer | 1 | ✅ | Number of times purchased | Count of purchase events |
qty | integer | 1 | ✅ | Total quantity purchased | Sum across all purchases |
first | datetime | 1 | ✅ | First purchase timestamp | Earliest purchase date/time (UTC) |
last | datetime | 1 | ✅ | Last purchase timestamp | Most recent purchase date/time (UTC) |
total_spent | float | 2 | ❌ | Total amount spent | Calculated: qty * avg_price_per_unit |
avg_price_per_unit | float | 2 | ❌ | Average price per unit | Varies by price_trend |
min_price | float | 2 | ❌ | Minimum price paid | Historical minimum |
max_price | float | 2 | ❌ | Maximum price paid | Historical maximum |
price_trend | string | 2 | ❌ | Price trend indicator | ’stable’, ‘increasing’, ‘decreasing’ |
Price Trend Distribution:
- 30% ‘increasing’ (avg $5.50/unit)
- 20% ‘decreasing’ (avg $4.50/unit)
- 50% ‘stable’ (avg $5.00/unit)
Derived Calculations:
- Purchase Interval:
duration.inDays(first, last) / (times - 1)(avg days between purchases) - Avg Qty Per Purchase:
qty / times - Next Purchase Prediction:
last + purchase_interval
Sample Cypher:
CREATE (u:User {user_id: 'USER_1'})-[p:PURCHASED { times: 5, qty: 12, first: datetime('2024-01-15T10:30:00Z'), last: datetime('2024-11-01T14:20:00Z'), total_spent: 66.0, avg_price_per_unit: 5.5, min_price: 4.99, max_price: 5.99, price_trend: 'increasing'}]->(p:Product {product_id: 'PROD_1'})Count: ~5 test relationships (production: millions)
2. VISITED
Section titled “2. VISITED”Pattern: (User)-[:VISITED]->(Retailer)
Tracks user visits to retail locations.
Properties:
| Property | Type | Phase | Required | Description | Calculation Notes |
|---|---|---|---|---|---|
times | integer | 1 | ✅ | Number of visits | Count of visit events |
first | datetime | 1 | ✅ | First visit timestamp | Earliest visit (UTC) |
last | datetime | 1 | ✅ | Last visit timestamp | Most recent visit (UTC) |
total_spent | float | 1 | ✅ | Total amount spent | Sum across all visits |
dates | array[datetime] | 2 | ❌ | Individual visit dates | Array of visit timestamps |
avg_basket_size | float | 2 | ❌ | Average spending per visit | total_spent / times |
preferred_day_of_week | integer | 2 | ❌ | Most common shopping day | 0=Monday, 6=Sunday |
Derived Calculations:
- Visit Frequency:
total_spent / times(avg per visit) - Days Since Last Visit:
duration.inDays(last, datetime()) - Visit Pattern: Analyze
datesarray for patterns
Sample Cypher:
CREATE (u:User {user_id: 'USER_1'})-[v:VISITED { times: 12, first: datetime('2024-01-01T09:00:00Z'), last: datetime('2024-11-05T18:30:00Z'), total_spent: 1200.0, dates: [datetime('2024-01-01T09:00:00Z'), datetime('2024-02-15T10:00:00Z')], avg_basket_size: 100.0, preferred_day_of_week: 6}]->(r:Retailer {retailer_id: 'RET_1'})Count: ~5 test relationships (production: millions)
3. ELIGIBLE_FOR
Section titled “3. ELIGIBLE_FOR”Pattern: (Offer)-[:ELIGIBLE_FOR]->(User)
Defines which users are eligible for which offers.
Properties:
| Property | Type | Phase | Required | Description | Calculation Notes |
|---|---|---|---|---|---|
start | date | 1 | ✅ | Eligibility start date | When user became eligible |
Eligibility Logic:
- User must match offer criteria (implicit)
- Eligibility can start before/after offer.start
- Used for personalized offer targeting
Sample Cypher:
CREATE (o:Offer {offer_id: 'OFFER_1'})-[e:ELIGIBLE_FOR { start: date('2025-01-01')}]->(u:User {user_id: 'USER_1'})Count: ~7 test relationships (production: millions)
4. APPLIES_TO
Section titled “4. APPLIES_TO”Pattern: (Offer)-[:APPLIES_TO]->(Product)
Defines which products an offer applies to.
Properties: None
Usage:
- Links offers to qualifying products
- An offer can apply to multiple products
- Used to determine offer applicability during purchase
Sample Cypher:
CREATE (o:Offer {offer_id: 'OFFER_1'})-[:APPLIES_TO]->(p:Product {product_id: 'PROD_1'})Count: ~4 test relationships (production: thousands)
5. SIMILAR_TO (Product Similarity)
Section titled “5. SIMILAR_TO (Product Similarity)”Pattern: (Product)-[:SIMILAR_TO]->(Product)
Represents product-to-product similarity for recommendations.
Properties:
| Property | Type | Phase | Required | Description | Calculation Notes |
|---|---|---|---|---|---|
score | float | 1 | ✅ | Similarity score | 0.0-1.0, higher = more similar |
v | integer | 1 | ✅ | Version/update counter | Increments on recalculation |
Calculation:
- Based on product attributes (category, brand, tags, etc.)
- Collaborative filtering (users who bought X also bought Y)
- Periodically recomputed
Sample Cypher:
CREATE (p1:Product {product_id: 'PROD_1'})-[s:SIMILAR_TO { score: 0.85, v: 1}]->(p2:Product {product_id: 'PROD_2'})Count: ~5 test relationships (production: millions)
6. SIMILAR_TO (User Similarity)
Section titled “6. SIMILAR_TO (User Similarity)”Phase 2 Only
Pattern: (User)-[:SIMILAR_TO]->(User)
Represents user-to-user similarity for collaborative filtering.
Properties:
| Property | Type | Phase | Required | Description | Calculation Notes |
|---|---|---|---|---|---|
similarity_score | float | 2 | ✅ | Jaccard similarity score | 0.0-1.0 |
common_products | integer | 2 | ✅ | Count of shared products | Minimum: 3 |
computed_date | date | 2 | ✅ | Last computation date | When similarity calculated |
Calculation (Jaccard Similarity):
similarity_score = |products(u1) ∩ products(u2)| / |products(u1) ∪ products(u2)|Creation Criteria:
- Minimum 3 shared products
- Minimum 20% similarity score
- One-way relationships (u1.user_id < u2.user_id to avoid duplicates)
Sample Cypher:
CREATE (u1:User {user_id: 'USER_1'})-[s:SIMILAR_TO { similarity_score: 0.42, common_products: 8, computed_date: date()}]->(u2:User {user_id: 'USER_2'})Count: Varies by purchase overlap (test: 0-10)
7. MEMBER_OF
Section titled “7. MEMBER_OF”Phase 2 Only
Pattern: (User)-[:MEMBER_OF]->(Household)
Links users to their household.
Properties:
| Property | Type | Phase | Required | Description | Calculation Notes |
|---|---|---|---|---|---|
role | string | 2 | ✅ | Household role | ’primary_shopper’ or ‘secondary_shopper’ |
joined_date | date | 2 | ✅ | Date joined household | When relationship created |
contribution_pct | integer | 2 | ✅ | Contribution percentage | Primary: 60%, Secondary: 40% |
Role Assignment:
- First user in household: ‘primary_shopper’ (60%)
- Subsequent users: ‘secondary_shopper’ (40%)
Sample Cypher:
CREATE (u:User {user_id: 'USER_1'})-[m:MEMBER_OF { role: 'primary_shopper', joined_date: date(), contribution_pct: 60}]->(h:Household {household_id: 'HH_60614_123'})Count: Matches household member_count
Indexes & Constraints
Section titled “Indexes & Constraints”Current State
Section titled “Current State”Indexes: None currently defined
Constraints: None currently defined
Recommended Indexes
Section titled “Recommended Indexes”For production performance, consider adding:
// Node property indexesCREATE INDEX user_id_index FOR (u:User) ON (u.user_id);CREATE INDEX product_id_index FOR (p:Product) ON (p.product_id);CREATE INDEX product_category_index FOR (p:Product) ON (p.category);CREATE INDEX product_brand_index FOR (p:Product) ON (p.brand);CREATE INDEX retailer_id_index FOR (r:Retailer) ON (r.retailer_id);CREATE INDEX retailer_zip_index FOR (r:Retailer) ON (r.zip);CREATE INDEX offer_id_index FOR (o:Offer) ON (o.offer_id);CREATE INDEX offer_dates_index FOR (o:Offer) ON (o.start, o.end);CREATE INDEX household_id_index FOR (h:Household) ON (h.household_id);
// Uniqueness constraintsCREATE CONSTRAINT user_id_unique FOR (u:User) REQUIRE u.user_id IS UNIQUE;CREATE CONSTRAINT product_id_unique FOR (p:Product) REQUIRE p.product_id IS UNIQUE;CREATE CONSTRAINT retailer_id_unique FOR (r:Retailer) REQUIRE r.retailer_id IS UNIQUE;CREATE CONSTRAINT offer_id_unique FOR (o:Offer) REQUIRE o.offer_id IS UNIQUE;CREATE CONSTRAINT household_id_unique FOR (h:Household) REQUIRE h.household_id IS UNIQUE;Calculation Notes & Formulas
Section titled “Calculation Notes & Formulas”Purchase Frequency Predictions
Section titled “Purchase Frequency Predictions”Average Days Between Purchases:
MATCH (u:User)-[p:PURCHASED]->(prod:Product)WHERE p.times > 1WITH duration.inDays(p.first, p.last).days AS total_days, p.times AS purchase_countRETURN total_days / (purchase_count - 1) AS avg_interval_daysNext Purchase Prediction:
MATCH (u:User)-[p:PURCHASED]->(prod:Product)WITH p.last AS last_purchase, duration.inDays(p.first, p.last).days / (p.times - 1) AS intervalRETURN last_purchase + duration({days: interval}) AS predicted_next_purchaseSpending Analytics
Section titled “Spending Analytics”Total Spending by Category:
MATCH (u:User)-[p:PURCHASED]->(prod:Product)WITH prod.category AS category, sum(p.total_spent) AS category_totalRETURN category, category_totalORDER BY category_total DESCPrice Trend Analysis:
MATCH (u:User)-[p:PURCHASED]->(prod:Product)RETURN prod.product_id, prod.name, p.price_trend, p.avg_price_per_unit, (p.max_price - p.min_price) / p.avg_price_per_unit * 100 AS price_volatility_pctLocation Intelligence
Section titled “Location Intelligence”Primary Shopping Location:
MATCH (u:User)-[v:VISITED]->(r:Retailer)RETURN r.retailer_id, r.name, v.times, v.total_spentORDER BY v.times DESC, v.total_spent DESCLIMIT 1Venue Type Distribution:
MATCH (u:User)-[v:VISITED]->(r:Retailer)WITH r.venue_type AS venue_type, count(v) AS visit_count, sum(v.total_spent) AS total_spentRETURN venue_type, visit_count, total_spentORDER BY visit_count DESCCollaborative Filtering
Section titled “Collaborative Filtering”Similar Users Query:
MATCH (u1:User {user_id: $user_id})-[s:SIMILAR_TO]->(u2:User)RETURN u2.user_id, s.similarity_score, s.common_productsORDER BY s.similarity_score DESCLIMIT 10Products Purchased by Similar Users:
MATCH (u1:User {user_id: $user_id})-[:SIMILAR_TO]->(u2:User)MATCH (u2)-[:PURCHASED]->(p:Product)WHERE NOT (u1)-[:PURCHASED]->(p)WITH p, count(DISTINCT u2) AS similar_user_count, avg(s.similarity_score) AS avg_similarityRETURN p.product_id, p.name, similar_user_count, avg_similarityORDER BY similar_user_count DESC, avg_similarity DESCLIMIT 20Offer Optimization
Section titled “Offer Optimization”Eligible Active Offers:
MATCH (o:Offer)-[:ELIGIBLE_FOR]->(u:User {user_id: $user_id})WHERE date() >= o.start AND date() <= o.endRETURN oORDER BY o.priority DESC, o.points DESCOffer Stacking Optimization:
MATCH (o:Offer)-[:ELIGIBLE_FOR]->(u:User {user_id: $user_id})MATCH (o)-[:APPLIES_TO]->(p:Product)WHERE date() >= o.start AND date() <= o.end AND (u)-[:PURCHASED]->(p) AND o.stackable = trueWITH o, p, count(DISTINCT p) AS applicable_productsRETURN o.offer_id, o.title, o.points, o.max_stacks_per_transaction, applicable_products, o.points * o.max_stacks_per_transaction AS max_points_possibleORDER BY max_points_possible DESCData Types Reference
Section titled “Data Types Reference”Neo4j Native Types
Section titled “Neo4j Native Types”- string: Text data
- integer: Whole numbers (int64)
- float: Decimal numbers (float64)
- boolean: true/false
- date: Calendar date (no time component) - Use
date()function - datetime: Date + time with timezone - Use
datetime()function - array[T]: List of values of type T
Type Conversion Notes
Section titled “Type Conversion Notes”CRITICAL: Date vs DateTime
- DATE properties (
start,endon Offer): Usedate()for comparisons - DATETIME properties (
created_at,first,last): Usedatetime()for comparisons - Mixing types returns NULL - Always match the property type!
Example:
// CORRECTWHERE date() >= o.start AND date() <= o.end
// WRONG - Returns NULL!WHERE datetime() >= o.start AND datetime() <= o.endSchema Evolution
Section titled “Schema Evolution”Phase 1 → Phase 2 Changes
Section titled “Phase 1 → Phase 2 Changes”New Properties Added:
- Product: 11 new properties (product_type, organic, vegan, etc.)
- Retailer: 1 new property (venue_type)
- PURCHASED: 5 new properties (total_spent, price trends, etc.)
- VISITED: 3 new properties (dates, avg_basket_size, preferred_day)
- Offer: 3 new properties (stackable, max_stacks, incompatible_offers)
New Node Types:
- Household
New Relationship Types:
- User SIMILAR_TO User
- User MEMBER_OF Household
Backward Compatibility:
- All Phase 2 properties use
COALESCEfor defaults - Existing Phase 1 data unchanged
- Queries work with or without Phase 2 enhancements
Performance Considerations
Section titled “Performance Considerations”Query Optimization Tips
Section titled “Query Optimization Tips”- Use Indexes: Always index on properties used in WHERE clauses
- Limit Results: Use LIMIT to prevent large result sets
- Profile Queries: Use
EXPLAINandPROFILEto optimize - Batch Updates: Use UNWIND for bulk operations
- Avoid Cartesian Products: Always have relationship paths in MATCH
Recommended Index Strategy
Section titled “Recommended Index Strategy”High Priority:
- Primary keys:
user_id,product_id,retailer_id,offer_id - Frequently queried:
category,brand,zip - Date ranges:
start,endon Offer
Medium Priority:
- Phase 2 flags:
organic,vegan,trending - Relationship properties:
times,laston PURCHASED/VISITED
Low Priority:
- Descriptive fields:
name,title,address
Maintenance & Updates
Section titled “Maintenance & Updates”Regular Maintenance Tasks
Section titled “Regular Maintenance Tasks”-
Update User Similarity (Monthly)
- Recalculate SIMILAR_TO relationships
- Archive stale relationships (computed_date > 30 days)
-
Update Product Similarity (Weekly)
- Refresh SIMILAR_TO scores
- Increment version counter
-
Clean Expired Offers (Daily)
- Archive offers where
end < date() - Maintain ELIGIBLE_FOR relationships
- Archive offers where
-
Recompute Price Trends (Weekly)
- Analyze PURCHASED.avg_price_per_unit changes
- Update price_trend flags
This schema documentation is current as of Phase 2 completion (2025-11-08)