Phase 2 Tool Development Plan (Revised)

Executive Summary

Key Principle: Focus on graph-backed features that require traversal, relationships, and temporal patterns. LLM can handle: Location finding, price comparison (via external APIs), recipe generation, and substitution reasoning.

Phase 1 Complete: 20 graph tools + 2 Points tools implemented.

Phase 2 Scope: 8 new tools focusing on graph-unique capabilities while maintaining a small, efficient graph.

Design Principles

✅ What Belongs in the Graph

User behavior patterns - purchase history, preferences, temporal patterns
Product relationships - similarity, category hierarchies, brand relationships
User-product affinities - what users buy together, collaborative filtering
Offer eligibility - personalized, time-bound relationships
Retailer visit patterns - where users shop, frequency, spending

❌ What Should Stay External

Real-time pricing - Use external pricing API, too volatile for graph
Stock/inventory - Use external inventory API, changes too frequently
Store locations - Use geocoding service, not graph traversal
Recipe instructions - LLM can generate from product lists
Product substitutions - LLM can reason about similarity

🎯 Graph Size Strategy

Keep graph to users, products, offers, retailers (core entities)
Avoid cartesian explosion (Product × Retailer × Price × Time)
Use aggregates and patterns, not granular transactions
Prune old data (>2 years) periodically

Revised Gap Analysis

🔴 Gap 1: Product Details & Attributes

What’s Missing: Comprehensive product metadata beyond name/brand/category

Why Graph, Not LLM:

User’s historical relationship with products (how often, when, spending)
Product attributes that enable filtering (organic, gluten-free, sustainable)
Dynamic attributes based on user context (previously_purchased, in_usual_rotation)

Solution: Enhance Product node with rich attributes

🔴 Gap 2: Location-Aware User Context

What’s Missing: “What stores are near me?” without querying every time

Why Graph, Not External API:

User’s habitual stores are already in graph (VISITED relationship)
Historical store preferences inform recommendations
Store-offer associations require graph traversal

Solution: Pre-compute user’s “primary stores” cluster, avoid storing all products×stores

🔴 Gap 3: Spending & Budget Analytics

What’s Missing: Track spending patterns over time

Why Graph, Not External:

Temporal spending patterns by category/brand/store
Behavioral changes (spending more/less on category)
Budget adherence tracking over time

Solution: Add spending data to PURCHASED relationship (aggregate)

🔴 Gap 4: Offer Stacking & Optimization

What’s Missing: Which offers work together, optimal activation strategy

Why Graph, Not LLM:

Offer compatibility rules (can’t stack certain offers)
User eligibility across multiple offers
Product-offer mappings for user’s predicted purchases

Solution: Add offer stacking metadata, graph traversal for optimization

🟡 Gap 5: Product Lifecycle & Trends

What’s Missing: New product discovery, discontinuation alerts

Why Graph:

Track when products enter/leave user’s purchase rotation
Identify emerging patterns in user behavior
Collaborative trends (similar users trying new products)

🟡 Gap 6: Household & Family Context

What’s Missing: Multi-user households, shared preferences

Why Graph:

Household relationships between users
Shared purchase patterns
Preference conflicts within household

Phase 2: Enhanced Context & Intelligence (8-10 weeks)

Goal

Add rich product metadata, spending analytics, smarter offer optimization, household context, and social discovery without exploding graph size.

2.1: get_product_details_enhanced

Replaces: get_product_context (enhance existing tool)

Why This Tool: LLM needs structured, personalized product data that only graph can provide:

User’s purchase history with this product
How it relates to user’s preferences
Dynamic attributes based on user context

Tool Spec:

{
  "name": "get_product_details_enhanced",
  "description": "Get comprehensive product details with user-specific context and attributes",
  "input": {
    "user_id": "string (required)",
    "product_id": "string (required)"
  },
  "output": {
    "product": {
      "product_id": "string",
      "name": "Greek Yogurt",
      "brand": "Chobani",
      "category": "Dairy",
      "attributes": {
        "organic": true,
        "gluten_free": true,
        "sustainable_packaging": false,
        "tags": ["high-protein", "probiotic", "low-sugar"]
      }
    },
    "user_context": {
      "purchased_times": 15,
      "last_purchase": "2025-01-01",
      "avg_interval_days": 14,
      "next_predicted": "2025-01-15",
      "in_usual_rotation": true,
      "spending_on_product": 74.85,
      "price_trend": "stable"
    },
    "alternatives": [
      {
        "product_id": "PROD456",
        "name": "Fage Greek Yogurt",
        "similarity_score": 0.92,
        "user_has_tried": false
      }
    ],
    "applicable_offers": [
      {
        "offer_id": "O123",
        "title": "Dairy Bonus",
        "points": 500
      }
    ]
  }
}

Schema Changes:

Product {
  product_id: "P123",
  name: "Greek Yogurt",
  brand: "Chobani",
  category: "Dairy",

  // NEW: Attributes for filtering (not food-specific)
  attributes: {
    organic: true,
    gluten_free: true,
    vegan: false,
    sustainable: false,
    local: false,
    handmade: false,
    eco_friendly: true
  },
  tags: ["high-protein", "probiotic", "low-sugar"],

  // Keep generic - works for any product type
  product_type: "food",  // or "household", "personal_care", "pet"

  // Optional food-specific (if product_type = "food")
  nutrition: {
    calories: 150,
    protein_g: 12,
    sugar_g: 4
  }
}

Implementation Effort: 1 week (enhance existing tool)

2.2: track_spending_patterns

New Tool

Why This Tool: Graph stores temporal purchase patterns with spending - LLM can’t do time-series analysis without this data.

Tool Spec:

{
  "name": "track_spending_patterns",
  "description": "Analyze user spending patterns by category, brand, store, and time period",
  "input": {
    "user_id": "string (required)",
    "lookback_days": "number (default: 90)",
    "group_by": "enum: category|brand|store|week|month"
  },
  "output": {
    "spending_summary": {
      "total_spent": 487.56,
      "period_start": "2024-10-08",
      "period_end": "2025-01-07",
      "breakdown": [
        {
          "group_name": "Dairy",
          "spent": 125.43,
          "percent_of_total": 25.7,
          "purchase_count": 23,
          "vs_prev_period": "+12%",
          "trend": "increasing"
        }
      ]
    },
    "insights": {
      "top_spending_category": "Dairy",
      "fastest_growing_category": "Snacks (+35%)",
      "declining_categories": ["Beverages (-10%)"]
    }
  }
}

Schema Changes:

// Add to existing PURCHASED relationship
PURCHASED {
  times: 5,
  qty: 10,
  first: datetime,
  last: datetime,
  timestamps: [datetime, ...],

  // NEW: Aggregated spending data
  total_spent: 49.95,
  avg_price_per_unit: 4.99,
  min_price: 4.49,
  max_price: 5.49
}

// Optional: Add budget tracking to User
User {
  user_id: "U123",
  zip: "60601",

  // NEW: Budget preferences (optional)
  monthly_budget: 500.0,
  budget_alerts: true
}

Implementation Effort: 2 weeks

2.3: optimize_offer_activation

New Tool

Why This Tool: Graph stores offer eligibility, stacking rules, and product applicability - LLM can’t reason about complex constraint satisfaction.

Tool Spec:

{
  "name": "optimize_offer_activation",
  "description": "Find optimal offer activation strategy for maximum points based on predicted purchases",
  "input": {
    "user_id": "string (required)",
    "prediction_window_days": "number (default: 7)",
    "include_shopping_list": "array of product_ids (optional)"
  },
  "output": {
    "strategy": {
      "total_points_potential": 2500,
      "offers_to_activate": [
        {
          "offer_id": "O123",
          "title": "Dairy Bonus - 500 pts",
          "points": 500,
          "products_needed": ["PROD1", "PROD2"],
          "user_likely_to_buy": ["PROD1", "PROD2"],
          "confidence": "high",
          "stackable_with": ["O456"]
        }
      ],
      "recommended_purchases": [
        {
          "product_id": "PROD1",
          "name": "Greek Yogurt",
          "triggers_offers": ["O123", "O456"],
          "total_points": 800,
          "next_predicted_purchase": "2025-01-10"
        }
      ]
    },
    "stacking_rules": {
      "max_points_per_transaction": 5000,
      "incompatible_offer_pairs": [["O789", "O801"]]
    }
  }
}

Schema Changes:

// Add to Offer node
Offer {
  offer_id: "O123",
  title: "Dairy Bonus",
  points: 500,
  start: date,
  end: date,
  priority: 10,
  venue_type: "grocery",

  // NEW: Stacking rules
  stackable: true,
  max_stacks_per_transaction: 3,
  incompatible_offers: ["O456", "O789"]
}

// NEW: Offer compatibility relationship
CREATE (o1:Offer {offer_id: "O123"})-[:STACKS_WITH {
  max_combined_points: 5000
}]->(o2:Offer {offer_id: "O456"})

// NEW: Offer incompatibility
CREATE (o1:Offer)-[:CONFLICTS_WITH]->(o2:Offer)

Implementation Effort: 2 weeks

2.4: get_user_location_context

New Tool

Why This Tool: Graph stores user’s habitual stores with visit frequency - don’t need to store all products at all stores.

Tool Spec:

{
  "name": "get_user_location_context",
  "description": "Get user's primary shopping locations and venue preferences",
  "input": {
    "user_id": "string (required)"
  },
  "output": {
    "primary_stores": [
      {
        "retailer_id": "R123",
        "name": "Target",
        "address": "123 Main St, Chicago IL 60601",
        "venue_type": "grocery",
        "visit_frequency": "weekly",
        "times_visited": 47,
        "last_visit": "2025-01-05",
        "avg_basket_size": 45.67,
        "user_preference_rank": 1
      }
    ],
    "venue_preferences": {
      "most_frequent_venue": "grocery",
      "venue_distribution": {
        "grocery": 65,
        "convenience": 20,
        "pharmacy": 15
      }
    },
    "location_hints": {
      "user_zip": "60601",
      "typical_shopping_radius_miles": 5.2
    }
  }
}

Why Not Store All Product Locations:

Graph explosion: 100K products × 10K stores = 1B relationships
Real-time data: Inventory/pricing changes minute-by-minute
Solution: Give LLM user’s preferred stores, then LLM calls external location/inventory API

Schema Changes:

// Enhance existing Retailer (add city-level data only)
Retailer {
  retailer_id: "R123",
  name: "Target",
  address: "123 Main St",
  city: "Chicago",
  state: "IL",
  zip: "60601",
  venue_type: "grocery",

  // Don't add: lat/lon (use geocoding service)
  // Don't add: inventory (use external API)
}

// Enhance VISITED relationship
VISITED {
  times: 47,
  first: datetime,
  last: datetime,
  total_spent: 2145.67,

  // NEW: Visit patterns
  typical_visit_frequency_days: 7,
  preferred_day_of_week: "Saturday",
  avg_basket_size: 45.67
}

Implementation Effort: 1 week

2.5: discover_new_products

New Tool

Why This Tool: Graph stores emerging user preferences and collaborative trends - LLM doesn’t have this temporal/social context.

Tool Spec:

{
  "name": "discover_new_products",
  "description": "Discover products user hasn't tried based on evolving preferences and similar users",
  "input": {
    "user_id": "string (required)",
    "category": "string (optional)",
    "limit": "number (default: 20)"
  },
  "output": {
    "new_products": [
      {
        "product_id": "PROD999",
        "name": "Overnight Oats",
        "brand": "Quaker",
        "category": "Breakfast",
        "why_recommended": "Similar users in your zip code recently started buying this",
        "confidence": "medium",
        "similar_to_products_you_buy": ["PROD1", "PROD5"],
        "trending": true,
        "adoption_rate_in_area": 23.5
      }
    ]
  }
}

Why Graph:

Track when products enter graph (new product launches)
Find similar users trying new products (collaborative discovery)
Identify category expansion patterns (user buying from new categories)

Schema Changes:

Product {
  // ... existing fields ...

  // NEW: Product lifecycle
  first_seen_date: date("2024-12-01"),
  is_new_product: true  // Products added in last 90 days
}

// NEW: Track product adoption
CREATE (u:User)-[:DISCOVERED {
  date: date("2025-01-07"),
  source: "recommendation|trending|search"
}]->(p:Product)

Implementation Effort: 1.5 weeks

2.6: get_household_context

New Tool

Why This Tool: Graph stores household relationships between users - critical for shared shopping.

Tool Spec:

{
  "name": "get_household_context",
  "description": "Get household members and shared purchase patterns",
  "input": {
    "user_id": "string (required)"
  },
  "output": {
    "household": {
      "household_id": "HH123",
      "member_count": 4,
      "members": [
        {
          "user_id": "U123",
          "role": "primary_shopper",
          "contribution_to_shopping": 70
        },
        {
          "user_id": "U456",
          "role": "secondary_shopper",
          "contribution_to_shopping": 30
        }
      ]
    },
    "shared_preferences": {
      "common_categories": ["Dairy", "Produce", "Snacks"],
      "common_brands": ["Chobani", "Horizon"],
      "divergent_preferences": {
        "U123_only": ["Coffee"],
        "U456_only": ["Tea"]
      }
    },
    "combined_purchase_power": {
      "monthly_spending": 950.0,
      "points_earned": 15000
    }
  }
}

Schema Changes:

// NEW: Household node
Household {
  household_id: "HH123",
  member_count: 4,
  combined_monthly_budget: 1000.0
}

// NEW: User belongs to household
CREATE (u:User)-[:MEMBER_OF {
  role: "primary_shopper",
  joined_date: date("2023-01-01")
}]->(h:Household)

// NEW: Household-level preferences
CREATE (h:Household)-[:PREFERS {
  strength: 0.9
}]->(p:Product)

Implementation Effort: 2 weeks

2.7: predict_category_expansion

New Tool

Why This Tool: Graph tracks category evolution over time - predict when users will try new categories.

Tool Spec:

{
  "name": "predict_category_expansion",
  "description": "Predict which new product categories user is likely to explore next",
  "input": {
    "user_id": "string (required)"
  },
  "output": {
    "current_categories": ["Dairy", "Produce", "Snacks"],
    "expansion_predictions": [
      {
        "category": "Organic Foods",
        "confidence": "high",
        "reasoning": "You've been buying more organic dairy products",
        "similar_users_who_expanded": 45,
        "suggested_entry_products": ["PROD888", "PROD999"]
      }
    ],
    "category_trends": {
      "growing": ["Organic Foods (+30%)"],
      "stable": ["Dairy", "Produce"],
      "declining": ["Beverages (-5%)"]
    }
  }
}

Why Graph:

Track category adoption timeline per user
Find similar users’ category expansion patterns
Identify bridge products (connect categories)

Implementation Effort: 2 weeks

2.8: get_community_insights

New Tool

Why This Tool: Graph stores collaborative filtering data - “users like you” patterns.

Tool Spec:

{
  "name": "get_community_insights",
  "description": "Get insights from similar users in your area",
  "input": {
    "user_id": "string (required)",
    "category": "string (optional)"
  },
  "output": {
    "similar_users_count": 234,
    "community_trends": [
      {
        "product_id": "PROD555",
        "name": "Oat Milk",
        "adoption_rate": 45.2,
        "growth_rate": "+67% this quarter",
        "user_has_tried": false
      }
    ],
    "emerging_brands": [
      {
        "brand": "Oatly",
        "product_count": 5,
        "users_trying": 89,
        "growth": "+120%"
      }
    ]
  }
}

Why Graph:

Find similar users via collaborative filtering (SIMILAR_TO relationship)
Track community adoption of new products/brands
Geo-filtered trends (same zip/city)

Schema Changes:

// NEW: User similarity (collaborative filtering)
CREATE (u1:User)-[:SIMILAR_TO {
  similarity_score: 0.87,
  common_products: 45,
  computed_date: date("2025-01-07")
}]->(u2:User)

// Pre-compute periodically, don't compute on every query

Implementation Effort: 2 weeks

Phase 2 Summary

Tools to Build (8)

get_product_details_enhanced - Rich product metadata with user context (1 week)
track_spending_patterns - Spending analytics by category/brand/store (2 weeks)
optimize_offer_activation - Offer stacking optimization (2 weeks)
get_user_location_context - Primary stores without graph explosion (1 week)
discover_new_products - Emerging product discovery (1.5 weeks)
get_household_context - Household members and shared patterns (2 weeks)
predict_category_expansion - Category growth predictions (2 weeks)
get_community_insights - Similar users and community trends (2 weeks)

Schema Changes

Enhance Product with attributes/tags and lifecycle tracking
Add spending data to PURCHASED relationship
Add offer stacking rules and relationships
Add visit patterns to VISITED relationship
Add Household nodes and MEMBER_OF relationships
Add User similarity relationships (SIMILAR_TO, pre-computed)
Add category adoption timestamps

What LLM Handles (No Tool Needed)

Store location finding → LLM calls geocoding API with user_zip
Price comparison → LLM calls pricing API with product_id + store_ids
Inventory checking → LLM calls inventory API
Recipe generation → LLM generates from product list
Product substitution → LLM reasons about similarity + user context

Timeline: 10 weeks

Cost: $65-85K (2-3 engineers)

Integration Strategy: Graph + External APIs

Architecture Pattern

User Query
  ↓
LLM Agent
  ↓
┌─────────────────────┬──────────────────────┐
│   Graph Tools       │   External APIs      │
│   (via MCP)         │   (via LLM)          │
├─────────────────────┼──────────────────────┤
│ User patterns       │ Store locations      │
│ Product context     │ Real-time pricing    │
│ Offer eligibility   │ Inventory status     │
│ Purchase history    │ Recipe generation    │
│ Spending trends     │ Substitution logic   │
│ Collaborative data  │ Nutrition lookup     │
└─────────────────────┴──────────────────────┘
  ↓
Combined Response

Example: “Where can I buy Greek yogurt near me?”

LLM Orchestration:

1. LLM calls: get_user_location_context(user_id)
   → Returns: Primary stores = ["Target Main St", "Jewel Oak St"]

2. LLM calls: get_product_details_enhanced(user_id, product_id)
   → Returns: User buys "Chobani Greek Yogurt" regularly

3. LLM uses function calling to: location_api.find_nearby_stores(
     zip="60601",
     store_names=["Target", "Jewel"],
     max_distance=5
   )
   → Returns: Store addresses with lat/lon

4. LLM uses function calling to: inventory_api.check_product_availability(
     product="Chobani Greek Yogurt",
     stores=[store_ids]
   )
   → Returns: Stock status and prices

5. LLM synthesizes:
   "Based on your shopping history, you usually buy Chobani Greek Yogurt.
    Here are your nearby stores:
    - Target (2.3 mi): In stock, $4.99
    - Jewel (3.1 mi): In stock, $4.49 (cheapest!)

    You also have a 500-point offer on dairy purchases this week."

Key Insight: Graph provides user context, external APIs provide real-time data.

Total Investment

8 new graph tools
Enhanced product metadata
Spending analytics
Offer optimization
Household features
Community insights
Category predictions
Cost: $65-85K (2-3 engineers)

Total: 8 new tools, 10 weeks, $65-85K

Success Metrics

Graph Efficiency

Graph size: <5M nodes, <50M relationships
Query latency: P95 <500ms
Data freshness: User purchase data updated daily
Pruning: Archive purchases >2 years old

Feature Adoption

Spending insights: 50% of users engage
Offer optimization: 35% adoption, +20% points earned
Household features: 25% of users are in households
Community insights: 40% view trending products

Risk Mitigation

Graph Explosion Risk

❌ DON’T: Store Product×Retailer×Price (1B+ relationships)
✅ DO: Store user’s primary retailers (50-100 per user)
✅ DO: Aggregate spending, not individual transactions
✅ DO: Prune old data (>2 years)

Real-Time Data Risk

❌ DON’T: Try to keep pricing/inventory in graph
✅ DO: Use external APIs for volatile data
✅ DO: Cache API responses (15-30 min TTL)

LLM vs Graph Boundary Risk

✅ Graph: Relationships, patterns, user context
✅ LLM: Reasoning, substitution, recipe generation
✅ External APIs: Real-time pricing, inventory, locations

Next Steps

Review and prioritize Phase 2 tools by business value
Validate external API availability (pricing, inventory, geocoding)
Prototype get_product_details_enhanced (1 week POC)
Design data ingestion pipeline for spending data
Begin Phase 2 Week 1 implementation

Appendix: Tool Classification

✅ Phase 2 Graph-Backed Tools (Build These - 8 tools)

get_product_details_enhanced
track_spending_patterns
optimize_offer_activation
get_user_location_context
discover_new_products
get_household_context
predict_category_expansion
get_community_insights

🤖 LLM-Native Capabilities (Don’t Build Tools)

Store location finding (geocoding API + LLM reasoning)
Price comparison (pricing API + LLM comparison)
Recipe generation (LLM generates from ingredients)
Product substitution (LLM reasons about similarity)
Inventory checking (inventory API)
Nutrition analysis (nutrition API + LLM interpretation)

🔮 Future Consideration (Not in Phase 2)

track_offer_performance - Offer activation history tracking
optimize_multi_user_shopping - Multi-user household optimization

Phase 2 Tool Development Plan (Revised)

Phase 2 Tool Development Plan (Revised)

Executive Summary

Design Principles

✅ What Belongs in the Graph

❌ What Should Stay External

🎯 Graph Size Strategy

Revised Gap Analysis

🔴 Gap 1: Product Details & Attributes

🔴 Gap 2: Location-Aware User Context

🔴 Gap 3: Spending & Budget Analytics

🔴 Gap 4: Offer Stacking & Optimization

🟡 Gap 5: Product Lifecycle & Trends

🟡 Gap 6: Household & Family Context

Phase 2: Enhanced Context & Intelligence (8-10 weeks)

Goal

2.1: get_product_details_enhanced

2.2: track_spending_patterns

2.3: optimize_offer_activation

2.4: get_user_location_context

2.5: discover_new_products

2.6: get_household_context

2.7: predict_category_expansion

2.8: get_community_insights

Phase 2 Summary

Tools to Build (8)

Schema Changes

What LLM Handles (No Tool Needed)

Timeline: 10 weeks

Cost: $65-85K (2-3 engineers)

Integration Strategy: Graph + External APIs

Architecture Pattern

Example: “Where can I buy Greek yogurt near me?”

Total Investment

Phase 2: Enhanced Context & Social (10 weeks)

Total: 8 new tools, 10 weeks, $65-85K

Success Metrics

Graph Efficiency

Feature Adoption

Risk Mitigation

Graph Explosion Risk

Real-Time Data Risk

LLM vs Graph Boundary Risk

Next Steps

Appendix: Tool Classification

✅ Phase 2 Graph-Backed Tools (Build These - 8 tools)

🤖 LLM-Native Capabilities (Don’t Build Tools)

🔮 Future Consideration (Not in Phase 2)