Consumer Graph MCP - Complete Database Schema

Overview

This document provides a comprehensive view of the Neo4j graph database schema for the Consumer Graph MCP server, including all node types, relationships, properties, indexes, and detailed calculation notes.

Last Updated: 2025-11-08 Schema Version: Phase 2 Complete Total Node Types: 5 Total Relationship Types: 6

Node Types

1. User

Represents a consumer/shopper in the system.

Properties:

Property	Type	Phase	Required	Description	Calculation Notes
`user_id`	string	1	✅	Unique user identifier	Primary key
`zip`	string	1	✅	User’s zip code	Used for location-based features
`created_at`	datetime	1	✅	Account creation timestamp	Set to `time.Now().UTC()` on creation

Indexes:

None currently (recommend: user_id for lookups)

Sample Cypher:

CREATE (u:User {
    user_id: 'USER_123',
    zip: '60614',
    created_at: datetime()
})

Count: ~5 test users (production: thousands)

2. Product

Represents a purchasable product/item.

Properties:

Property	Type	Phase	Required	Description	Calculation Notes
`product_id`	string	1	✅	Unique product identifier	Primary key
`name`	string	1	✅	Product display name	Human-readable
`norm_name`	string	1	✅	Normalized product name	Lowercase, no special chars
`brand`	string	1	✅	Brand/manufacturer name
`category`	string	1	✅	Product category	E.g., “Dairy”, “Produce”, “Meat”
`upc`	string	1	✅	Universal Product Code	Barcode identifier
`product_type`	string	2	❌	Product type classification	Default: ‘food’
`organic`	boolean	2	❌	Organic certification flag	true for Dairy/Produce
`vegan`	boolean	2	❌	Vegan product flag	true for Produce/Grains
`gluten_free`	boolean	2	❌	Gluten-free flag	true for Produce/Dairy/Meat
`sustainable`	boolean	2	❌	Sustainability flag	Default: false
`local`	boolean	2	❌	Locally sourced flag	Default: false
`handmade`	boolean	2	❌	Handmade/artisan flag	Default: false
`eco_friendly`	boolean	2	❌	Eco-friendly packaging flag	Default: false
`tags`	array[string]	2	❌	Descriptive tags	Category-specific (e.g., [‘high-protein’, ‘calcium’])
`first_seen_date`	date	2	❌	Product introduction date	Default: current date
`trending`	boolean	2	❌	Trending product flag	Random 20% marked as trending

Indexes:

None currently (recommend: product_id, category, brand)

Sample Cypher:

CREATE (p:Product {
    product_id: 'PROD_123',
    name: 'Organic Whole Milk',
    norm_name: 'organic whole milk',
    brand: 'Horizon',
    category: 'Dairy',
    upc: '074660011000',
    product_type: 'food',
    organic: true,
    vegan: false,
    gluten_free: true,
    tags: ['high-protein', 'calcium'],
    first_seen_date: date(),
    trending: false
})

Count: ~5 test products (production: thousands)

3. Retailer

Represents a physical store/retailer location.

Properties:

Property	Type	Phase	Required	Description	Calculation Notes
`retailer_id`	string	1	✅	Unique retailer identifier	Primary key
`name`	string	1	✅	Store name	E.g., “Whole Foods Market”
`address`	string	1	✅	Street address
`city`	string	1	✅	City name
`state`	string	1	✅	State abbreviation	E.g., “IL”
`zip`	string	1	✅	Zip code	Used for location matching
`venue_type`	string	2	❌	Store type classification	grocery, warehouse, convenience, pharmacy

Venue Type Classification:

Default: ‘grocery’
‘warehouse’: Walmart, Target, “Warehouse” in name
‘convenience’: 7-Eleven, Circle K, “Convenience” in name
‘pharmacy’: CVS, Walgreens, “Pharmacy” in name

Indexes:

None currently (recommend: retailer_id, zip)

Sample Cypher:

CREATE (r:Retailer {
    retailer_id: 'RET_123',
    name: 'Whole Foods Market',
    address: '1000 W North Ave',
    city: 'Chicago',
    state: 'IL',
    zip: '60642',
    venue_type: 'grocery'
})

Count: ~3 test retailers (production: thousands)

4. Offer

Represents a promotional offer/reward opportunity.

Properties:

Property	Type	Phase	Required	Description	Calculation Notes
`offer_id`	string	1	✅	Unique offer identifier	Primary key
`title`	string	1	✅	Offer display title	Human-readable description
`points`	integer	1	✅	Points value	Reward points earned
`start`	date	1	✅	Offer start date	Uses Neo4j date() type
`end`	date	1	✅	Offer end date	Uses Neo4j date() type
`priority`	integer	1	✅	Display priority	Higher = more prominent
`venue_type`	string	1	❌	Applicable venue type	grocery, warehouse, convenience, pharmacy, or null (all)
`stackable`	boolean	2	❌	Can stack with other offers	Default: true
`max_stacks_per_transaction`	integer	2	❌	Max stacks allowed	Default: 1
`incompatible_offers`	array[string]	2	❌	Incompatible offer IDs	Default: []

Date Handling:

start and end are Neo4j DATE types
CRITICAL: Must use date() function in comparisons, NOT datetime()
Example: WHERE date() >= o.start AND date() <= o.end

Stacking Rules:

Offers with ≥1000 points: stackable=false
Others: stackable=true by default

Indexes:

None currently (recommend: offer_id, start, end)

Sample Cypher:

CREATE (o:Offer {
    offer_id: 'OFFER_123',
    title: 'Buy 2 Get 500 Points',
    points: 500,
    start: date('2025-01-01'),
    end: date('2025-12-31'),
    priority: 5,
    venue_type: 'grocery',
    stackable: true,
    max_stacks_per_transaction: 1,
    incompatible_offers: []
})

Count: ~5 test offers (production: hundreds)

5. Household

Phase 2 Only - Represents a household grouping of users.

Properties:

Property	Type	Phase	Required	Description	Calculation Notes
`household_id`	string	2	✅	Unique household identifier	Format: `HH_{zip}_{random}`
`member_count`	integer	2	✅	Number of household members	Max 3 per household
`combined_monthly_budget`	float	2	✅	Household budget	Default: $1000.00

Creation Logic:

Groups users by zip code
Maximum 3 users per household
Only creates if 2+ users in same zip

Sample Cypher:

CREATE (h:Household {
    household_id: 'HH_60614_742',
    member_count: 3,
    combined_monthly_budget: 1000.0
})

Count: Varies by zip code distribution (test: 0-2)

Relationship Types

1. PURCHASED

Pattern: (User)-[:PURCHASED]->(Product)

Tracks user purchase history for a product.

Properties:

Property	Type	Phase	Required	Description	Calculation Notes
`times`	integer	1	✅	Number of times purchased	Count of purchase events
`qty`	integer	1	✅	Total quantity purchased	Sum across all purchases
`first`	datetime	1	✅	First purchase timestamp	Earliest purchase date/time (UTC)
`last`	datetime	1	✅	Last purchase timestamp	Most recent purchase date/time (UTC)
`total_spent`	float	2	❌	Total amount spent	Calculated: `qty * avg_price_per_unit`
`avg_price_per_unit`	float	2	❌	Average price per unit	Varies by price_trend
`min_price`	float	2	❌	Minimum price paid	Historical minimum
`max_price`	float	2	❌	Maximum price paid	Historical maximum
`price_trend`	string	2	❌	Price trend indicator	’stable’, ‘increasing’, ‘decreasing’

Price Trend Distribution:

30% ‘increasing’ (avg $5.50/unit)
20% ‘decreasing’ (avg $4.50/unit)
50% ‘stable’ (avg $5.00/unit)

Derived Calculations:

Purchase Interval: duration.inDays(first, last) / (times - 1) (avg days between purchases)
Avg Qty Per Purchase: qty / times
Next Purchase Prediction: last + purchase_interval

Sample Cypher:

CREATE (u:User {user_id: 'USER_1'})-[p:PURCHASED {
    times: 5,
    qty: 12,
    first: datetime('2024-01-15T10:30:00Z'),
    last: datetime('2024-11-01T14:20:00Z'),
    total_spent: 66.0,
    avg_price_per_unit: 5.5,
    min_price: 4.99,
    max_price: 5.99,
    price_trend: 'increasing'
}]->(p:Product {product_id: 'PROD_1'})

Count: ~5 test relationships (production: millions)

2. VISITED

Pattern: (User)-[:VISITED]->(Retailer)

Tracks user visits to retail locations.

Properties:

Property	Type	Phase	Required	Description	Calculation Notes
`times`	integer	1	✅	Number of visits	Count of visit events
`first`	datetime	1	✅	First visit timestamp	Earliest visit (UTC)
`last`	datetime	1	✅	Last visit timestamp	Most recent visit (UTC)
`total_spent`	float	1	✅	Total amount spent	Sum across all visits
`dates`	array[datetime]	2	❌	Individual visit dates	Array of visit timestamps
`avg_basket_size`	float	2	❌	Average spending per visit	`total_spent / times`
`preferred_day_of_week`	integer	2	❌	Most common shopping day	0=Monday, 6=Sunday

Derived Calculations:

Visit Frequency: total_spent / times (avg per visit)
Days Since Last Visit: duration.inDays(last, datetime())
Visit Pattern: Analyze dates array for patterns

Sample Cypher:

CREATE (u:User {user_id: 'USER_1'})-[v:VISITED {
    times: 12,
    first: datetime('2024-01-01T09:00:00Z'),
    last: datetime('2024-11-05T18:30:00Z'),
    total_spent: 1200.0,
    dates: [datetime('2024-01-01T09:00:00Z'), datetime('2024-02-15T10:00:00Z')],
    avg_basket_size: 100.0,
    preferred_day_of_week: 6
}]->(r:Retailer {retailer_id: 'RET_1'})

Count: ~5 test relationships (production: millions)

3. ELIGIBLE_FOR

Pattern: (Offer)-[:ELIGIBLE_FOR]->(User)

Defines which users are eligible for which offers.

Properties:

Property	Type	Phase	Required	Description	Calculation Notes
`start`	date	1	✅	Eligibility start date	When user became eligible

Eligibility Logic:

User must match offer criteria (implicit)
Eligibility can start before/after offer.start
Used for personalized offer targeting

Sample Cypher:

CREATE (o:Offer {offer_id: 'OFFER_1'})-[e:ELIGIBLE_FOR {
    start: date('2025-01-01')
}]->(u:User {user_id: 'USER_1'})

Count: ~7 test relationships (production: millions)

4. APPLIES_TO

Pattern: (Offer)-[:APPLIES_TO]->(Product)

Defines which products an offer applies to.

Properties: None

Usage:

Links offers to qualifying products
An offer can apply to multiple products
Used to determine offer applicability during purchase

Sample Cypher:

CREATE (o:Offer {offer_id: 'OFFER_1'})-[:APPLIES_TO]->(p:Product {product_id: 'PROD_1'})

Count: ~4 test relationships (production: thousands)

5. SIMILAR_TO (Product Similarity)

Pattern: (Product)-[:SIMILAR_TO]->(Product)

Represents product-to-product similarity for recommendations.

Properties:

Property	Type	Phase	Required	Description	Calculation Notes
`score`	float	1	✅	Similarity score	0.0-1.0, higher = more similar
`v`	integer	1	✅	Version/update counter	Increments on recalculation

Calculation:

Based on product attributes (category, brand, tags, etc.)
Collaborative filtering (users who bought X also bought Y)
Periodically recomputed

Sample Cypher:

CREATE (p1:Product {product_id: 'PROD_1'})-[s:SIMILAR_TO {
    score: 0.85,
    v: 1
}]->(p2:Product {product_id: 'PROD_2'})

Count: ~5 test relationships (production: millions)

6. SIMILAR_TO (User Similarity)

Phase 2 Only

Pattern: (User)-[:SIMILAR_TO]->(User)

Represents user-to-user similarity for collaborative filtering.

Properties:

Property	Type	Phase	Required	Description	Calculation Notes
`similarity_score`	float	2	✅	Jaccard similarity score	0.0-1.0
`common_products`	integer	2	✅	Count of shared products	Minimum: 3
`computed_date`	date	2	✅	Last computation date	When similarity calculated

Calculation (Jaccard Similarity):

similarity_score = |products(u1) ∩ products(u2)| / |products(u1) ∪ products(u2)|

Creation Criteria:

Minimum 3 shared products
Minimum 20% similarity score
One-way relationships (u1.user_id < u2.user_id to avoid duplicates)

Sample Cypher:

CREATE (u1:User {user_id: 'USER_1'})-[s:SIMILAR_TO {
    similarity_score: 0.42,
    common_products: 8,
    computed_date: date()
}]->(u2:User {user_id: 'USER_2'})

Count: Varies by purchase overlap (test: 0-10)

7. MEMBER_OF

Phase 2 Only

Pattern: (User)-[:MEMBER_OF]->(Household)

Links users to their household.

Properties:

Property	Type	Phase	Required	Description	Calculation Notes
`role`	string	2	✅	Household role	’primary_shopper’ or ‘secondary_shopper’
`joined_date`	date	2	✅	Date joined household	When relationship created
`contribution_pct`	integer	2	✅	Contribution percentage	Primary: 60%, Secondary: 40%

Role Assignment:

First user in household: ‘primary_shopper’ (60%)
Subsequent users: ‘secondary_shopper’ (40%)

Sample Cypher:

CREATE (u:User {user_id: 'USER_1'})-[m:MEMBER_OF {
    role: 'primary_shopper',
    joined_date: date(),
    contribution_pct: 60
}]->(h:Household {household_id: 'HH_60614_123'})

Count: Matches household member_count

Indexes & Constraints

Current State

Indexes: None currently defined

Constraints: None currently defined

Recommended Indexes

For production performance, consider adding:

// Node property indexes
CREATE INDEX user_id_index FOR (u:User) ON (u.user_id);
CREATE INDEX product_id_index FOR (p:Product) ON (p.product_id);
CREATE INDEX product_category_index FOR (p:Product) ON (p.category);
CREATE INDEX product_brand_index FOR (p:Product) ON (p.brand);
CREATE INDEX retailer_id_index FOR (r:Retailer) ON (r.retailer_id);
CREATE INDEX retailer_zip_index FOR (r:Retailer) ON (r.zip);
CREATE INDEX offer_id_index FOR (o:Offer) ON (o.offer_id);
CREATE INDEX offer_dates_index FOR (o:Offer) ON (o.start, o.end);
CREATE INDEX household_id_index FOR (h:Household) ON (h.household_id);

// Uniqueness constraints
CREATE CONSTRAINT user_id_unique FOR (u:User) REQUIRE u.user_id IS UNIQUE;
CREATE CONSTRAINT product_id_unique FOR (p:Product) REQUIRE p.product_id IS UNIQUE;
CREATE CONSTRAINT retailer_id_unique FOR (r:Retailer) REQUIRE r.retailer_id IS UNIQUE;
CREATE CONSTRAINT offer_id_unique FOR (o:Offer) REQUIRE o.offer_id IS UNIQUE;
CREATE CONSTRAINT household_id_unique FOR (h:Household) REQUIRE h.household_id IS UNIQUE;

Calculation Notes & Formulas

Purchase Frequency Predictions

Average Days Between Purchases:

MATCH (u:User)-[p:PURCHASED]->(prod:Product)
WHERE p.times > 1
WITH duration.inDays(p.first, p.last).days AS total_days,
     p.times AS purchase_count
RETURN total_days / (purchase_count - 1) AS avg_interval_days

Next Purchase Prediction:

MATCH (u:User)-[p:PURCHASED]->(prod:Product)
WITH p.last AS last_purchase,
     duration.inDays(p.first, p.last).days / (p.times - 1) AS interval
RETURN last_purchase + duration({days: interval}) AS predicted_next_purchase

Spending Analytics

Total Spending by Category:

MATCH (u:User)-[p:PURCHASED]->(prod:Product)
WITH prod.category AS category,
     sum(p.total_spent) AS category_total
RETURN category, category_total
ORDER BY category_total DESC

Price Trend Analysis:

MATCH (u:User)-[p:PURCHASED]->(prod:Product)
RETURN prod.product_id,
       prod.name,
       p.price_trend,
       p.avg_price_per_unit,
       (p.max_price - p.min_price) / p.avg_price_per_unit * 100 AS price_volatility_pct

Location Intelligence

Primary Shopping Location:

MATCH (u:User)-[v:VISITED]->(r:Retailer)
RETURN r.retailer_id,
       r.name,
       v.times,
       v.total_spent
ORDER BY v.times DESC, v.total_spent DESC
LIMIT 1

Venue Type Distribution:

MATCH (u:User)-[v:VISITED]->(r:Retailer)
WITH r.venue_type AS venue_type,
     count(v) AS visit_count,
     sum(v.total_spent) AS total_spent
RETURN venue_type, visit_count, total_spent
ORDER BY visit_count DESC

Collaborative Filtering

Similar Users Query:

MATCH (u1:User {user_id: $user_id})-[s:SIMILAR_TO]->(u2:User)
RETURN u2.user_id,
       s.similarity_score,
       s.common_products
ORDER BY s.similarity_score DESC
LIMIT 10

Products Purchased by Similar Users:

MATCH (u1:User {user_id: $user_id})-[:SIMILAR_TO]->(u2:User)
MATCH (u2)-[:PURCHASED]->(p:Product)
WHERE NOT (u1)-[:PURCHASED]->(p)
WITH p, count(DISTINCT u2) AS similar_user_count,
     avg(s.similarity_score) AS avg_similarity
RETURN p.product_id,
       p.name,
       similar_user_count,
       avg_similarity
ORDER BY similar_user_count DESC, avg_similarity DESC
LIMIT 20

Offer Optimization

Eligible Active Offers:

MATCH (o:Offer)-[:ELIGIBLE_FOR]->(u:User {user_id: $user_id})
WHERE date() >= o.start AND date() <= o.end
RETURN o
ORDER BY o.priority DESC, o.points DESC

Offer Stacking Optimization:

MATCH (o:Offer)-[:ELIGIBLE_FOR]->(u:User {user_id: $user_id})
MATCH (o)-[:APPLIES_TO]->(p:Product)
WHERE date() >= o.start
  AND date() <= o.end
  AND (u)-[:PURCHASED]->(p)
  AND o.stackable = true
WITH o, p, count(DISTINCT p) AS applicable_products
RETURN o.offer_id,
       o.title,
       o.points,
       o.max_stacks_per_transaction,
       applicable_products,
       o.points * o.max_stacks_per_transaction AS max_points_possible
ORDER BY max_points_possible DESC

Data Types Reference

Neo4j Native Types

string: Text data
integer: Whole numbers (int64)
float: Decimal numbers (float64)
boolean: true/false
date: Calendar date (no time component) - Use date() function
datetime: Date + time with timezone - Use datetime() function
array[T]: List of values of type T

Type Conversion Notes

CRITICAL: Date vs DateTime

DATE properties (start, end on Offer): Use date() for comparisons
DATETIME properties (created_at, first, last): Use datetime() for comparisons
Mixing types returns NULL - Always match the property type!

Example:

// CORRECT
WHERE date() >= o.start AND date() <= o.end

// WRONG - Returns NULL!
WHERE datetime() >= o.start AND datetime() <= o.end

Schema Evolution

Phase 1 → Phase 2 Changes

New Properties Added:

Product: 11 new properties (product_type, organic, vegan, etc.)
Retailer: 1 new property (venue_type)
PURCHASED: 5 new properties (total_spent, price trends, etc.)
VISITED: 3 new properties (dates, avg_basket_size, preferred_day)
Offer: 3 new properties (stackable, max_stacks, incompatible_offers)

New Node Types:

Household

New Relationship Types:

User SIMILAR_TO User
User MEMBER_OF Household

Backward Compatibility:

All Phase 2 properties use COALESCE for defaults
Existing Phase 1 data unchanged
Queries work with or without Phase 2 enhancements

Performance Considerations

Query Optimization Tips

Use Indexes: Always index on properties used in WHERE clauses
Limit Results: Use LIMIT to prevent large result sets
Profile Queries: Use EXPLAIN and PROFILE to optimize
Batch Updates: Use UNWIND for bulk operations
Avoid Cartesian Products: Always have relationship paths in MATCH

Recommended Index Strategy

High Priority:

Primary keys: user_id, product_id, retailer_id, offer_id
Frequently queried: category, brand, zip
Date ranges: start, end on Offer

Medium Priority:

Phase 2 flags: organic, vegan, trending
Relationship properties: times, last on PURCHASED/VISITED

Low Priority:

Descriptive fields: name, title, address

Maintenance & Updates

Regular Maintenance Tasks

Update User Similarity (Monthly)
- Recalculate SIMILAR_TO relationships
- Archive stale relationships (computed_date > 30 days)
Update Product Similarity (Weekly)
- Refresh SIMILAR_TO scores
- Increment version counter
Clean Expired Offers (Daily)
- Archive offers where end < date()
- Maintain ELIGIBLE_FOR relationships
Recompute Price Trends (Weekly)
- Analyze PURCHASED.avg_price_per_unit changes
- Update price_trend flags

This schema documentation is current as of Phase 2 completion (2025-11-08)