Graph Schema Reference
Graph Schema Reference
Section titled “Graph Schema Reference”This document describes the Neo4j graph database schema powering the Consumer Graph MCP.
Node Types
Section titled “Node Types”Represents a shopper with purchase history and preferences.
Properties:
user_id(string, unique) - User identifierzip(string) - ZIP code for location contextcreated_at(datetime) - Account creation timestampprofile_type(string) - Shopping profile (e.g., “Health-Conscious”, “Family-Bulk”)latitude(float, optional) - Geographic coordinatelongitude(float, optional) - Geographic coordinate
Relationships:
PURCHASED→ ProductELIGIBLE_FOR→ OfferSHOPS_AT→ RetailerMEMBER_OF→ HouseholdMEMBER_OF→ Community
Product
Section titled “Product”Represents a purchasable product.
Properties:
product_id(string, unique) - Product identifierupc(string) - Universal Product Codename(string) - Product namebrand(string) - Brand namecategory(string) - Product categoryorganic(boolean) - Organic statusvegan(boolean, optional) - Vegan statustrending(boolean) - Trending flagtags(array of strings) - Descriptive tags
Relationships:
IN_CATEGORY→ CategorySIMILAR_TO→ Product (withsimilarity_scoreproperty)HAS_OFFER← Offer
Represents a promotional offer for points/rewards.
Properties:
offer_id(string, unique) - Offer identifierpoints(integer) - Point valuedescription(string) - Offer descriptionstart(date) - Start dateend(date) - End datepriority(string) - Priority level (“High”, “Medium”, “Low”)offer_type(string) - Type of offervenue_type(string) - Where offer applies
Relationships:
HAS_OFFER→ Product (specific product offer)HAS_OFFER→ Brand (brand-level offer via Category node)HAS_OFFER→ Category (category-level offer)ELIGIBLE_FOR← UserAVAILABLE_AT→ Retailer
Retailer
Section titled “Retailer”Represents a store or shopping venue.
Properties:
retailer_id(string, unique) - Retailer identifiername(string) - Retailer namevenue_type(string) - Type of venue (“Grocery”, “Drugstore”, “Warehouse”, etc.)tier(string) - Price tier (“Premium”, “Mid-Range”, “Discount”, etc.)
Relationships:
AVAILABLE_AT← OfferSHOPS_AT← User
Demo Retailers:
- RET_WHOLE_FOODS - Premium grocery
- RET_ALDI - Discount grocery
- RET_TARGET - Big box
- RET_COSTCO - Warehouse club
- RET_MARIANOS - Regional chain
- RET_WALGREENS - Drugstore
- RET_TRADER_JOES - Specialty
- RET_AMAZON_FRESH - Online delivery
Category
Section titled “Category”Intermediary node for product categorization and graph optimization.
Properties:
category_id(string, unique) - Category identifiername(string) - Category nameproduct_count(integer) - Number of products in category
Relationships:
IN_CATEGORY← ProductSHOPS_IN← UserHAS_OFFER← Offer (category-level offers)
Purpose: Reduces graph complexity by serving as intermediary between users and products. Instead of direct User-Product edges, uses User→Category→Product pattern.
Demo Categories:
- Dairy (milk, cheese, eggs, yogurt)
- Produce (fruits, vegetables)
- Meat & Seafood (proteins)
- Bakery (bread, baked goods)
- Pantry (dry goods, canned items)
- Beverages (drinks, juices)
- Snacks (chips, bars, treats)
- Frozen (frozen meals, ice cream)
- Personal Care (hygiene products)
- Household (cleaning, paper goods)
Community
Section titled “Community”Intermediary node representing a group of similar shoppers.
Properties:
community_id(string, unique) - Community identifiername(string) - Community namemember_count(integer) - Number of membersprimary_category(string) - Main category affinityzip_code(string, optional) - Geographic location
Relationships:
MEMBER_OF← UserPOPULAR_IN← Product
Purpose: Graph optimization pattern reducing edges by 70%. Groups similar shoppers to enable efficient collaborative filtering and trend analysis.
Benefits:
- Faster recommendation queries
- Better scalability
- Enables community-based insights
- Reduces graph memory footprint
Household
Section titled “Household”Represents a shared living/shopping unit.
Properties:
household_id(string, unique) - Household identifiermember_count(integer) - Number of household memberscombined_monthly_budget(float) - Total household budgethousehold_type(string) - Type (“Couple”, “Family”, “Roommates”)
Relationships:
MEMBER_OF← User
Purpose: Enables household-level shopping insights and recommendations considering shared context.
Relationship Types
Section titled “Relationship Types”PURCHASED
Section titled “PURCHASED”User purchased a product.
Properties:
quantity(integer) - Items purchasedtimestamp(datetime) - Purchase timeprice(float, optional) - Purchase price
Direction: User → Product
Usage: Purchase history, frequency analysis, preference modeling
ELIGIBLE_FOR
Section titled “ELIGIBLE_FOR”User is eligible to redeem an offer.
Properties: None
Direction: User → Offer
Usage: Personalized offer filtering, targeted promotions
HAS_OFFER
Section titled “HAS_OFFER”Offer applies to product, brand, or category.
Properties: None
Direction: Offer → Product/Category
Usage: Offer discovery, product-to-offer matching
SIMILAR_TO
Section titled “SIMILAR_TO”Products are similar based on collaborative filtering.
Properties:
similarity_score(float) - Similarity strength (0.0 to 1.0)
Direction: Product ↔ Product (bidirectional)
Usage: Product recommendations, alternative suggestions, explainable AI
IN_CATEGORY
Section titled “IN_CATEGORY”Product belongs to a category.
Properties: None
Direction: Product → Category
Usage: Category browsing, categorization
SHOPS_AT
Section titled “SHOPS_AT”User shops at a retailer.
Properties:
frequency(integer) - Visit countlast_visit(datetime) - Most recent visit
Direction: User → Retailer
Usage: Retailer preferences, location-based offers
MEMBER_OF
Section titled “MEMBER_OF”User belongs to household or community.
Properties: None
Direction: User → Household/Community
Usage: Household context, community insights, collaborative filtering
POPULAR_IN
Section titled “POPULAR_IN”Product is popular within a community.
Properties:
purchase_count(integer) - Community purchases
Direction: Product → Community
Usage: Community trending, social proof
SHOPS_IN
Section titled “SHOPS_IN”User shops in a category.
Properties:
purchase_count(integer) - Category purchases
Direction: User → Category
Usage: Category affinity, expansion prediction
Graph Patterns
Section titled “Graph Patterns”User Shopping Profile
Section titled “User Shopping Profile”(User)-[:MEMBER_OF]->(Household)(User)-[:MEMBER_OF]->(Community)(User)-[:SHOPS_IN]->(Category)(User)-[:SHOPS_AT]->(Retailer)(User)-[:PURCHASED]->(Product)Product Relationships
Section titled “Product Relationships”(Product)-[:IN_CATEGORY]->(Category)(Product)-[:SIMILAR_TO]-(Product)(Offer)-[:HAS_OFFER]->(Product)(Product)<-[:POPULAR_IN]-(Community)Offer Discovery
Section titled “Offer Discovery”(User)-[:ELIGIBLE_FOR]->(Offer)(Offer)-[:HAS_OFFER]->(Product/Category)(Offer)-[:AVAILABLE_AT]->(Retailer)Collaborative Filtering (Optimized)
Section titled “Collaborative Filtering (Optimized)”(User)-[:MEMBER_OF]->(Community)<-[:MEMBER_OF]-(OtherUser)(OtherUser)-[:PURCHASED]->(Product)Old pattern (slow): (User)-[:SIMILAR_TO]-(OtherUser)
New pattern (fast): Uses Community as intermediary
Optimization Strategy
Section titled “Optimization Strategy”Problem
Section titled “Problem”Direct user-to-user and user-to-product relationships created 2,182 edges, causing:
- Slow queries
- High memory usage
- Poor scalability
Solution: Intermediary Nodes
Section titled “Solution: Intermediary Nodes”- Community Nodes: Group similar users
- Category Nodes: Group related products
Results
Section titled “Results”- Edges reduced: 2,182 → 662 (70% reduction)
- Query performance: ~3x faster
- Scalability: Linear growth instead of quadratic
- Memory: Reduced graph memory footprint
Pattern Transformation
Section titled “Pattern Transformation”Before: (User)-[:PURCHASED]->(Product) [2,182 edges]After: (User)-[:SHOPS_IN]->(Category)<-[:IN_CATEGORY]-(Product) [662 edges]Indexes
Section titled “Indexes”The following indexes are created for query performance:
User:
user_id(unique constraint)
Product:
product_id(unique constraint)name(text index for search)brand(index)category(index)
Offer:
offer_id(unique constraint)start,end(composite index for date range queries)
Category:
category_id(unique constraint)name(index)
Community:
community_id(unique constraint)
Query Examples
Section titled “Query Examples”Find User’s Purchase History
Section titled “Find User’s Purchase History”MATCH (u:User {user_id: $userId})-[p:PURCHASED]->(prod:Product)RETURN prod, p.quantity, p.timestampORDER BY p.timestamp DESCLIMIT 20Get Active Offers for User
Section titled “Get Active Offers for User”MATCH (u:User {user_id: $userId})-[:ELIGIBLE_FOR]->(o:Offer)WHERE date() >= o.start AND date() <= o.endRETURN oORDER BY o.points DESCFind Similar Products
Section titled “Find Similar Products”MATCH (p1:Product {product_id: $productId})-[s:SIMILAR_TO]-(p2:Product)WHERE s.similarity_score >= $minSimilarityRETURN p2, s.similarity_scoreORDER BY s.similarity_score DESCLIMIT 10Community Recommendations (Optimized)
Section titled “Community Recommendations (Optimized)”MATCH (u:User {user_id: $userId})-[:MEMBER_OF]->(c:Community)MATCH (c)<-[:MEMBER_OF]-(other:User)-[:PURCHASED]->(p:Product)WHERE NOT (u)-[:PURCHASED]->(p)RETURN p, count(*) as purchase_countORDER BY purchase_count DESCLIMIT 10Data Volumes
Section titled “Data Volumes”Demo Dataset
Section titled “Demo Dataset”- Users: 6 demo + 120+ similar users
- Products: 135+ items across 10 categories
- Offers: 120+ active offers
- Retailers: 8 venues
- Categories: 47 categories
- Communities: 45 communities
- Households: 3 households
- Total edges: ~662 (after optimization)
- User relationships: ~180
- Product relationships: ~300
- Offer relationships: ~150
- Household/Community: ~32
Schema Evolution
Section titled “Schema Evolution”Phase 1: Direct Relationships
Section titled “Phase 1: Direct Relationships”- User → Product (PURCHASED)
- User → User (SIMILAR_TO)
- Simple but slow at scale
Phase 2: Community & Category Nodes
Section titled “Phase 2: Community & Category Nodes”- User → Community → User (implicit similarity)
- User → Category → Product (categorized shopping)
- 70% edge reduction, 3x faster queries
Phase 3 (Future):
Section titled “Phase 3 (Future):”- Temporal nodes for time-based patterns
- Brand nodes for brand-level analysis
- Location nodes for geographic clustering
Best Practices
Section titled “Best Practices”For Queries
Section titled “For Queries”- Always use indexes (user_id, product_id, etc.)
- Limit result sets with
LIMITclause - Use date filters on active offers
- Leverage Community nodes for recommendations
- Profile queries with
PROFILEorEXPLAIN
For Data Modeling
Section titled “For Data Modeling”- Use intermediary nodes for many-to-many relationships
- Store frequently accessed properties on nodes
- Use relationship properties for metadata
- Normalize data appropriately
- Consider query patterns in schema design
For Performance
Section titled “For Performance”- Create indexes on frequently queried properties
- Use parameterized queries
- Batch write operations
- Monitor query execution time
- Use APOC procedures for complex operations
Schema Visualization
Section titled “Schema Visualization”Access Neo4j Browser at http://localhost:7474 to visualize the graph:
// Visualize demo user's shopping patternMATCH path = (u:User {user_id: 'DEMO_USER_HEALTH'})-[*1..2]-(n)RETURN pathLIMIT 50// Visualize product category structureMATCH (p:Product)-[:IN_CATEGORY]->(c:Category)RETURN p, cLIMIT 100// Visualize community structureMATCH (u:User)-[:MEMBER_OF]->(c:Community)<-[:POPULAR_IN]-(p:Product)WHERE c.community_id = 'COMM_001'RETURN u, c, p