Skip to content

Consumer Graph MCP - Complete Database Schema

Consumer Graph MCP - Complete Database Schema

Section titled “Consumer Graph MCP - Complete Database Schema”

This document provides a comprehensive view of the Neo4j graph database schema for the Consumer Graph MCP server, including all node types, relationships, properties, indexes, and detailed calculation notes.

Last Updated: 2025-11-08 Schema Version: Phase 2 Complete Total Node Types: 5 Total Relationship Types: 6


Represents a consumer/shopper in the system.

Properties:

PropertyTypePhaseRequiredDescriptionCalculation Notes
user_idstring1Unique user identifierPrimary key
zipstring1User’s zip codeUsed for location-based features
created_atdatetime1Account creation timestampSet to time.Now().UTC() on creation

Indexes:

  • None currently (recommend: user_id for lookups)

Sample Cypher:

CREATE (u:User {
user_id: 'USER_123',
zip: '60614',
created_at: datetime()
})

Count: ~5 test users (production: thousands)


Represents a purchasable product/item.

Properties:

PropertyTypePhaseRequiredDescriptionCalculation Notes
product_idstring1Unique product identifierPrimary key
namestring1Product display nameHuman-readable
norm_namestring1Normalized product nameLowercase, no special chars
brandstring1Brand/manufacturer name
categorystring1Product categoryE.g., “Dairy”, “Produce”, “Meat”
upcstring1Universal Product CodeBarcode identifier
product_typestring2Product type classificationDefault: ‘food’
organicboolean2Organic certification flagtrue for Dairy/Produce
veganboolean2Vegan product flagtrue for Produce/Grains
gluten_freeboolean2Gluten-free flagtrue for Produce/Dairy/Meat
sustainableboolean2Sustainability flagDefault: false
localboolean2Locally sourced flagDefault: false
handmadeboolean2Handmade/artisan flagDefault: false
eco_friendlyboolean2Eco-friendly packaging flagDefault: false
tagsarray[string]2Descriptive tagsCategory-specific (e.g., [‘high-protein’, ‘calcium’])
first_seen_datedate2Product introduction dateDefault: current date
trendingboolean2Trending product flagRandom 20% marked as trending

Indexes:

  • None currently (recommend: product_id, category, brand)

Sample Cypher:

CREATE (p:Product {
product_id: 'PROD_123',
name: 'Organic Whole Milk',
norm_name: 'organic whole milk',
brand: 'Horizon',
category: 'Dairy',
upc: '074660011000',
product_type: 'food',
organic: true,
vegan: false,
gluten_free: true,
tags: ['high-protein', 'calcium'],
first_seen_date: date(),
trending: false
})

Count: ~5 test products (production: thousands)


Represents a physical store/retailer location.

Properties:

PropertyTypePhaseRequiredDescriptionCalculation Notes
retailer_idstring1Unique retailer identifierPrimary key
namestring1Store nameE.g., “Whole Foods Market”
addressstring1Street address
citystring1City name
statestring1State abbreviationE.g., “IL”
zipstring1Zip codeUsed for location matching
venue_typestring2Store type classificationgrocery, warehouse, convenience, pharmacy

Venue Type Classification:

  • Default: ‘grocery’
  • ‘warehouse’: Walmart, Target, “Warehouse” in name
  • ‘convenience’: 7-Eleven, Circle K, “Convenience” in name
  • ‘pharmacy’: CVS, Walgreens, “Pharmacy” in name

Indexes:

  • None currently (recommend: retailer_id, zip)

Sample Cypher:

CREATE (r:Retailer {
retailer_id: 'RET_123',
name: 'Whole Foods Market',
address: '1000 W North Ave',
city: 'Chicago',
state: 'IL',
zip: '60642',
venue_type: 'grocery'
})

Count: ~3 test retailers (production: thousands)


Represents a promotional offer/reward opportunity.

Properties:

PropertyTypePhaseRequiredDescriptionCalculation Notes
offer_idstring1Unique offer identifierPrimary key
titlestring1Offer display titleHuman-readable description
pointsinteger1Points valueReward points earned
startdate1Offer start dateUses Neo4j date() type
enddate1Offer end dateUses Neo4j date() type
priorityinteger1Display priorityHigher = more prominent
venue_typestring1Applicable venue typegrocery, warehouse, convenience, pharmacy, or null (all)
stackableboolean2Can stack with other offersDefault: true
max_stacks_per_transactioninteger2Max stacks allowedDefault: 1
incompatible_offersarray[string]2Incompatible offer IDsDefault: []

Date Handling:

  • start and end are Neo4j DATE types
  • CRITICAL: Must use date() function in comparisons, NOT datetime()
  • Example: WHERE date() >= o.start AND date() <= o.end

Stacking Rules:

  • Offers with ≥1000 points: stackable=false
  • Others: stackable=true by default

Indexes:

  • None currently (recommend: offer_id, start, end)

Sample Cypher:

CREATE (o:Offer {
offer_id: 'OFFER_123',
title: 'Buy 2 Get 500 Points',
points: 500,
start: date('2025-01-01'),
end: date('2025-12-31'),
priority: 5,
venue_type: 'grocery',
stackable: true,
max_stacks_per_transaction: 1,
incompatible_offers: []
})

Count: ~5 test offers (production: hundreds)


Phase 2 Only - Represents a household grouping of users.

Properties:

PropertyTypePhaseRequiredDescriptionCalculation Notes
household_idstring2Unique household identifierFormat: HH_{zip}_{random}
member_countinteger2Number of household membersMax 3 per household
combined_monthly_budgetfloat2Household budgetDefault: $1000.00

Creation Logic:

  • Groups users by zip code
  • Maximum 3 users per household
  • Only creates if 2+ users in same zip

Sample Cypher:

CREATE (h:Household {
household_id: 'HH_60614_742',
member_count: 3,
combined_monthly_budget: 1000.0
})

Count: Varies by zip code distribution (test: 0-2)


Pattern: (User)-[:PURCHASED]->(Product)

Tracks user purchase history for a product.

Properties:

PropertyTypePhaseRequiredDescriptionCalculation Notes
timesinteger1Number of times purchasedCount of purchase events
qtyinteger1Total quantity purchasedSum across all purchases
firstdatetime1First purchase timestampEarliest purchase date/time (UTC)
lastdatetime1Last purchase timestampMost recent purchase date/time (UTC)
total_spentfloat2Total amount spentCalculated: qty * avg_price_per_unit
avg_price_per_unitfloat2Average price per unitVaries by price_trend
min_pricefloat2Minimum price paidHistorical minimum
max_pricefloat2Maximum price paidHistorical maximum
price_trendstring2Price trend indicator’stable’, ‘increasing’, ‘decreasing’

Price Trend Distribution:

  • 30% ‘increasing’ (avg $5.50/unit)
  • 20% ‘decreasing’ (avg $4.50/unit)
  • 50% ‘stable’ (avg $5.00/unit)

Derived Calculations:

  • Purchase Interval: duration.inDays(first, last) / (times - 1) (avg days between purchases)
  • Avg Qty Per Purchase: qty / times
  • Next Purchase Prediction: last + purchase_interval

Sample Cypher:

CREATE (u:User {user_id: 'USER_1'})-[p:PURCHASED {
times: 5,
qty: 12,
first: datetime('2024-01-15T10:30:00Z'),
last: datetime('2024-11-01T14:20:00Z'),
total_spent: 66.0,
avg_price_per_unit: 5.5,
min_price: 4.99,
max_price: 5.99,
price_trend: 'increasing'
}]->(p:Product {product_id: 'PROD_1'})

Count: ~5 test relationships (production: millions)


Pattern: (User)-[:VISITED]->(Retailer)

Tracks user visits to retail locations.

Properties:

PropertyTypePhaseRequiredDescriptionCalculation Notes
timesinteger1Number of visitsCount of visit events
firstdatetime1First visit timestampEarliest visit (UTC)
lastdatetime1Last visit timestampMost recent visit (UTC)
total_spentfloat1Total amount spentSum across all visits
datesarray[datetime]2Individual visit datesArray of visit timestamps
avg_basket_sizefloat2Average spending per visittotal_spent / times
preferred_day_of_weekinteger2Most common shopping day0=Monday, 6=Sunday

Derived Calculations:

  • Visit Frequency: total_spent / times (avg per visit)
  • Days Since Last Visit: duration.inDays(last, datetime())
  • Visit Pattern: Analyze dates array for patterns

Sample Cypher:

CREATE (u:User {user_id: 'USER_1'})-[v:VISITED {
times: 12,
first: datetime('2024-01-01T09:00:00Z'),
last: datetime('2024-11-05T18:30:00Z'),
total_spent: 1200.0,
dates: [datetime('2024-01-01T09:00:00Z'), datetime('2024-02-15T10:00:00Z')],
avg_basket_size: 100.0,
preferred_day_of_week: 6
}]->(r:Retailer {retailer_id: 'RET_1'})

Count: ~5 test relationships (production: millions)


Pattern: (Offer)-[:ELIGIBLE_FOR]->(User)

Defines which users are eligible for which offers.

Properties:

PropertyTypePhaseRequiredDescriptionCalculation Notes
startdate1Eligibility start dateWhen user became eligible

Eligibility Logic:

  • User must match offer criteria (implicit)
  • Eligibility can start before/after offer.start
  • Used for personalized offer targeting

Sample Cypher:

CREATE (o:Offer {offer_id: 'OFFER_1'})-[e:ELIGIBLE_FOR {
start: date('2025-01-01')
}]->(u:User {user_id: 'USER_1'})

Count: ~7 test relationships (production: millions)


Pattern: (Offer)-[:APPLIES_TO]->(Product)

Defines which products an offer applies to.

Properties: None

Usage:

  • Links offers to qualifying products
  • An offer can apply to multiple products
  • Used to determine offer applicability during purchase

Sample Cypher:

CREATE (o:Offer {offer_id: 'OFFER_1'})-[:APPLIES_TO]->(p:Product {product_id: 'PROD_1'})

Count: ~4 test relationships (production: thousands)


Pattern: (Product)-[:SIMILAR_TO]->(Product)

Represents product-to-product similarity for recommendations.

Properties:

PropertyTypePhaseRequiredDescriptionCalculation Notes
scorefloat1Similarity score0.0-1.0, higher = more similar
vinteger1Version/update counterIncrements on recalculation

Calculation:

  • Based on product attributes (category, brand, tags, etc.)
  • Collaborative filtering (users who bought X also bought Y)
  • Periodically recomputed

Sample Cypher:

CREATE (p1:Product {product_id: 'PROD_1'})-[s:SIMILAR_TO {
score: 0.85,
v: 1
}]->(p2:Product {product_id: 'PROD_2'})

Count: ~5 test relationships (production: millions)


Phase 2 Only

Pattern: (User)-[:SIMILAR_TO]->(User)

Represents user-to-user similarity for collaborative filtering.

Properties:

PropertyTypePhaseRequiredDescriptionCalculation Notes
similarity_scorefloat2Jaccard similarity score0.0-1.0
common_productsinteger2Count of shared productsMinimum: 3
computed_datedate2Last computation dateWhen similarity calculated

Calculation (Jaccard Similarity):

similarity_score = |products(u1) ∩ products(u2)| / |products(u1) ∪ products(u2)|

Creation Criteria:

  • Minimum 3 shared products
  • Minimum 20% similarity score
  • One-way relationships (u1.user_id < u2.user_id to avoid duplicates)

Sample Cypher:

CREATE (u1:User {user_id: 'USER_1'})-[s:SIMILAR_TO {
similarity_score: 0.42,
common_products: 8,
computed_date: date()
}]->(u2:User {user_id: 'USER_2'})

Count: Varies by purchase overlap (test: 0-10)


Phase 2 Only

Pattern: (User)-[:MEMBER_OF]->(Household)

Links users to their household.

Properties:

PropertyTypePhaseRequiredDescriptionCalculation Notes
rolestring2Household role’primary_shopper’ or ‘secondary_shopper’
joined_datedate2Date joined householdWhen relationship created
contribution_pctinteger2Contribution percentagePrimary: 60%, Secondary: 40%

Role Assignment:

  • First user in household: ‘primary_shopper’ (60%)
  • Subsequent users: ‘secondary_shopper’ (40%)

Sample Cypher:

CREATE (u:User {user_id: 'USER_1'})-[m:MEMBER_OF {
role: 'primary_shopper',
joined_date: date(),
contribution_pct: 60
}]->(h:Household {household_id: 'HH_60614_123'})

Count: Matches household member_count


Indexes: None currently defined

Constraints: None currently defined

For production performance, consider adding:

// Node property indexes
CREATE INDEX user_id_index FOR (u:User) ON (u.user_id);
CREATE INDEX product_id_index FOR (p:Product) ON (p.product_id);
CREATE INDEX product_category_index FOR (p:Product) ON (p.category);
CREATE INDEX product_brand_index FOR (p:Product) ON (p.brand);
CREATE INDEX retailer_id_index FOR (r:Retailer) ON (r.retailer_id);
CREATE INDEX retailer_zip_index FOR (r:Retailer) ON (r.zip);
CREATE INDEX offer_id_index FOR (o:Offer) ON (o.offer_id);
CREATE INDEX offer_dates_index FOR (o:Offer) ON (o.start, o.end);
CREATE INDEX household_id_index FOR (h:Household) ON (h.household_id);
// Uniqueness constraints
CREATE CONSTRAINT user_id_unique FOR (u:User) REQUIRE u.user_id IS UNIQUE;
CREATE CONSTRAINT product_id_unique FOR (p:Product) REQUIRE p.product_id IS UNIQUE;
CREATE CONSTRAINT retailer_id_unique FOR (r:Retailer) REQUIRE r.retailer_id IS UNIQUE;
CREATE CONSTRAINT offer_id_unique FOR (o:Offer) REQUIRE o.offer_id IS UNIQUE;
CREATE CONSTRAINT household_id_unique FOR (h:Household) REQUIRE h.household_id IS UNIQUE;

Average Days Between Purchases:

MATCH (u:User)-[p:PURCHASED]->(prod:Product)
WHERE p.times > 1
WITH duration.inDays(p.first, p.last).days AS total_days,
p.times AS purchase_count
RETURN total_days / (purchase_count - 1) AS avg_interval_days

Next Purchase Prediction:

MATCH (u:User)-[p:PURCHASED]->(prod:Product)
WITH p.last AS last_purchase,
duration.inDays(p.first, p.last).days / (p.times - 1) AS interval
RETURN last_purchase + duration({days: interval}) AS predicted_next_purchase

Total Spending by Category:

MATCH (u:User)-[p:PURCHASED]->(prod:Product)
WITH prod.category AS category,
sum(p.total_spent) AS category_total
RETURN category, category_total
ORDER BY category_total DESC

Price Trend Analysis:

MATCH (u:User)-[p:PURCHASED]->(prod:Product)
RETURN prod.product_id,
prod.name,
p.price_trend,
p.avg_price_per_unit,
(p.max_price - p.min_price) / p.avg_price_per_unit * 100 AS price_volatility_pct

Primary Shopping Location:

MATCH (u:User)-[v:VISITED]->(r:Retailer)
RETURN r.retailer_id,
r.name,
v.times,
v.total_spent
ORDER BY v.times DESC, v.total_spent DESC
LIMIT 1

Venue Type Distribution:

MATCH (u:User)-[v:VISITED]->(r:Retailer)
WITH r.venue_type AS venue_type,
count(v) AS visit_count,
sum(v.total_spent) AS total_spent
RETURN venue_type, visit_count, total_spent
ORDER BY visit_count DESC

Similar Users Query:

MATCH (u1:User {user_id: $user_id})-[s:SIMILAR_TO]->(u2:User)
RETURN u2.user_id,
s.similarity_score,
s.common_products
ORDER BY s.similarity_score DESC
LIMIT 10

Products Purchased by Similar Users:

MATCH (u1:User {user_id: $user_id})-[:SIMILAR_TO]->(u2:User)
MATCH (u2)-[:PURCHASED]->(p:Product)
WHERE NOT (u1)-[:PURCHASED]->(p)
WITH p, count(DISTINCT u2) AS similar_user_count,
avg(s.similarity_score) AS avg_similarity
RETURN p.product_id,
p.name,
similar_user_count,
avg_similarity
ORDER BY similar_user_count DESC, avg_similarity DESC
LIMIT 20

Eligible Active Offers:

MATCH (o:Offer)-[:ELIGIBLE_FOR]->(u:User {user_id: $user_id})
WHERE date() >= o.start AND date() <= o.end
RETURN o
ORDER BY o.priority DESC, o.points DESC

Offer Stacking Optimization:

MATCH (o:Offer)-[:ELIGIBLE_FOR]->(u:User {user_id: $user_id})
MATCH (o)-[:APPLIES_TO]->(p:Product)
WHERE date() >= o.start
AND date() <= o.end
AND (u)-[:PURCHASED]->(p)
AND o.stackable = true
WITH o, p, count(DISTINCT p) AS applicable_products
RETURN o.offer_id,
o.title,
o.points,
o.max_stacks_per_transaction,
applicable_products,
o.points * o.max_stacks_per_transaction AS max_points_possible
ORDER BY max_points_possible DESC

  • string: Text data
  • integer: Whole numbers (int64)
  • float: Decimal numbers (float64)
  • boolean: true/false
  • date: Calendar date (no time component) - Use date() function
  • datetime: Date + time with timezone - Use datetime() function
  • array[T]: List of values of type T

CRITICAL: Date vs DateTime

  • DATE properties (start, end on Offer): Use date() for comparisons
  • DATETIME properties (created_at, first, last): Use datetime() for comparisons
  • Mixing types returns NULL - Always match the property type!

Example:

// CORRECT
WHERE date() >= o.start AND date() <= o.end
// WRONG - Returns NULL!
WHERE datetime() >= o.start AND datetime() <= o.end

New Properties Added:

  • Product: 11 new properties (product_type, organic, vegan, etc.)
  • Retailer: 1 new property (venue_type)
  • PURCHASED: 5 new properties (total_spent, price trends, etc.)
  • VISITED: 3 new properties (dates, avg_basket_size, preferred_day)
  • Offer: 3 new properties (stackable, max_stacks, incompatible_offers)

New Node Types:

  • Household

New Relationship Types:

  • User SIMILAR_TO User
  • User MEMBER_OF Household

Backward Compatibility:

  • All Phase 2 properties use COALESCE for defaults
  • Existing Phase 1 data unchanged
  • Queries work with or without Phase 2 enhancements

  1. Use Indexes: Always index on properties used in WHERE clauses
  2. Limit Results: Use LIMIT to prevent large result sets
  3. Profile Queries: Use EXPLAIN and PROFILE to optimize
  4. Batch Updates: Use UNWIND for bulk operations
  5. Avoid Cartesian Products: Always have relationship paths in MATCH

High Priority:

  • Primary keys: user_id, product_id, retailer_id, offer_id
  • Frequently queried: category, brand, zip
  • Date ranges: start, end on Offer

Medium Priority:

  • Phase 2 flags: organic, vegan, trending
  • Relationship properties: times, last on PURCHASED/VISITED

Low Priority:

  • Descriptive fields: name, title, address

  1. Update User Similarity (Monthly)

    • Recalculate SIMILAR_TO relationships
    • Archive stale relationships (computed_date > 30 days)
  2. Update Product Similarity (Weekly)

    • Refresh SIMILAR_TO scores
    • Increment version counter
  3. Clean Expired Offers (Daily)

    • Archive offers where end < date()
    • Maintain ELIGIBLE_FOR relationships
  4. Recompute Price Trends (Weekly)

    • Analyze PURCHASED.avg_price_per_unit changes
    • Update price_trend flags

This schema documentation is current as of Phase 2 completion (2025-11-08)