Pre-Computation Patterns for Read Optimization
Pre-Computation Patterns for Read Optimization
Section titled “Pre-Computation Patterns for Read Optimization”This document describes the pre-computation patterns used to optimize read queries. Background workers and write operations calculate properties ahead of time, enabling O(1) or O(P) read complexity instead of expensive runtime aggregations.
Design Philosophy
Section titled “Design Philosophy”Write Heavy, Read Light
- Accept higher write costs to pre-compute metrics
- Optimize for read performance (most queries are reads)
- Background workers can recalculate properties during off-peak hours
- Event-driven updates maintain consistency
Current Pre-Computed Properties
Section titled “Current Pre-Computed Properties”PURCHASED Relationship Properties
Section titled “PURCHASED Relationship Properties”All properties are maintained incrementally via AddPurchaseEvent (user.go:308-378).
| Property | Type | Calculation | Update Pattern |
|---|---|---|---|
times | int | Count of purchases | Incremented on each purchase |
qty | int | Total quantity purchased | Accumulated on each purchase |
first | datetime | First purchase date | Set on CREATE, never changed |
last | datetime | Most recent purchase | Updated on each purchase |
purchase_dates | datetime[] | Array of all purchase dates | Appended on each purchase |
total_spent | float | Cumulative spend | Accumulated on each purchase |
avg_interval_days | float | Running average of days between purchases | Recalculated using formula below |
Running Average Formula
Section titled “Running Average Formula”The avg_interval_days is calculated incrementally without scanning all purchases:
// Running average: ((old_avg * (n-1)) + new_interval) / n// where n = number of intervals (which is times - 1)rel.avg_interval_days = ((rel.avg_interval_days * (rel.times - 1)) + duration.between(rel.last, $purchase_date).days) / rel.timesWhy This Works:
- Avoids re-scanning
purchase_datesarray - O(1) update complexity per purchase
- Maintains accurate average across all intervals
Example:
Purchase 1: avg_interval = 0.0 (no interval yet)Purchase 2: avg_interval = 30 days (1 interval)Purchase 3: avg_interval = ((30 * 1) + 25) / 2 = 27.5 daysPurchase 4: avg_interval = ((27.5 * 2) + 35) / 3 = 30 daysTools That Benefit from Pre-Computation
Section titled “Tools That Benefit from Pre-Computation”User Context Tools ✅ OPTIMIZED
Section titled “User Context Tools ✅ OPTIMIZED”| Tool | Pre-Computed Properties Used | Query Complexity |
|---|---|---|
get_user_purchase_history | last, times, avg_interval_days, total_spent | O(P) - user’s purchases |
get_user_categories | times, total_spent, avg_interval_days | O(P) |
get_user_brands | times, total_spent | O(P) |
get_purchase_patterns | times, avg_interval_days, last, total_spent | O(P) |
get_category_brand_affinity | times, total_spent, last | O(P) |
All these queries are O(P) where P = user’s product count, typically <100.
Temporal Tools ✅ OPTIMIZED
Section titled “Temporal Tools ✅ OPTIMIZED”| Tool | Pre-Computed Properties Used | Query Complexity |
|---|---|---|
predict_repurchases | times, avg_interval_days, last | O(P) |
get_purchase_timeline | times, avg_interval_days, last | O(P) |
Without pre-computation, these would require:
- Scanning all purchase_dates arrays: O(P × D) where D = avg dates per product
- Calculating intervals in Cypher: expensive datetime operations
- Sorting dates: O(D log D) per product
With pre-computation: Direct property access, simple arithmetic.
Product Discovery Tools ⚠️ COULD BE OPTIMIZED
Section titled “Product Discovery Tools ⚠️ COULD BE OPTIMIZED”| Tool | Current Complexity | Optimization Opportunity |
|---|---|---|
find_similar_products | O(S) where S = similar products | ✅ Already optimal (uses SIMILAR_TO.score) |
find_alternatives_in_category | O(A × S) where A = alternatives, S = tried | ⚠️ Could pre-compute relevance_score |
search_products | O(N) where N = matching products | ✅ Already optimal |
Collaborative Tools ❌ EXPENSIVE (By Design)
Section titled “Collaborative Tools ❌ EXPENSIVE (By Design)”| Tool | Complexity | Why Not Pre-Computed |
|---|---|---|
get_collaborative_recommendations | O(U × P) where U = similar users, P = products | Too dynamic, user-dependent |
get_local_trending_products | O(L × P) where L = local users, P = products | Too dynamic, time-dependent |
These tools intentionally query at runtime because:
- Results are highly personalized
- Trending data changes frequently
- Pre-computing all user combinations is infeasible
- Mitigation: Aggressive caching (20-30 min TTL)
Additional Pre-Computation Opportunities
Section titled “Additional Pre-Computation Opportunities”1. Product Popularity Score ⭐ HIGH VALUE
Section titled “1. Product Popularity Score ⭐ HIGH VALUE”Current State: Not implemented Proposal: Add to Product nodes via background job
// Background job (runs nightly)MATCH (p:Product)<-[purchased:PURCHASED]-()WITH p, count(DISTINCT purchased) AS purchase_count, sum(purchased.times) AS total_purchases, avg(purchased.times) AS avg_repurchasesSET p.popularity_score = total_purchases, p.repurchase_rate = avg_repurchases, p.unique_buyers = purchase_countBenefits:
search_productscan sort by popularity: O(1) per productfind_alternatives_in_categorycan use as tie-breaker- Recommendations can factor in popularity
Update Frequency: Daily (popularity doesn’t change rapidly)
2. Category/Brand Purchase Counts ⭐ MEDIUM VALUE
Section titled “2. Category/Brand Purchase Counts ⭐ MEDIUM VALUE”Current State: Not implemented Proposal: Add aggregates to User nodes
// Background job or incremental updateMATCH (u:User)-[p:PURCHASED]->(prod:Product)WITH u, prod.category AS category, count(DISTINCT prod) AS product_count, sum(p.times) AS total_purchasesSET u.category_counts = collect({category: category, count: total_purchases})Benefits:
get_user_categoriesbecomes O(1) instead of O(P)- Faster filtering for “users who buy coffee”
Trade-off: More complex to maintain, questionable value for O(P) queries
3. Offer Applicability Count ⭐ LOW VALUE
Section titled “3. Offer Applicability Count ⭐ LOW VALUE”Current State: Computed at query time Proposal: Pre-count applicable products per offer
// On offer creation or product changesMATCH (o:Offer)-[:APPLIES_TO]->(p:Product)WITH o, count(p) AS product_countSET o.applicable_product_count = product_countBenefits:
get_active_offerscan include count without sub-query- Minimal value (already fast)
Update: On offer creation/modification
Background Worker Jobs
Section titled “Background Worker Jobs”Current Jobs
Section titled “Current Jobs”None implemented yet.
Proposed Jobs
Section titled “Proposed Jobs”| Job | Frequency | Purpose | Complexity |
|---|---|---|---|
| Recalculate Intervals | Weekly | Verify avg_interval_days accuracy | O(U × P × D) |
| Update Product Popularity | Daily | Set p.popularity_score | O(Products) |
| Prune Old Purchase Dates | Monthly | Trim purchase_dates arrays >2 years old | O(U × P) |
| Rebuild Similarity Graph | Weekly | Refresh SIMILAR_TO relationships | O(P²) |
Job Implementation Pattern
Section titled “Job Implementation Pattern”package workers
type PreComputeWorker struct { repo *repository.Repositories}
func (w *PreComputeWorker) RecalculateIntervals(ctx context.Context) error { // Query all PURCHASED relationships with purchase_dates // Recalculate avg_interval_days from scratch // Verify matches current running average // Log discrepancies for monitoring}
func (w *PreComputeWorker) UpdateProductPopularity(ctx context.Context) error { // Aggregate purchase metrics per product // Set popularity_score, repurchase_rate, unique_buyers}Consistency Guarantees
Section titled “Consistency Guarantees”Strong Consistency (Write Path)
Section titled “Strong Consistency (Write Path)”Properties updated synchronously on every write:
times,qty,last,total_spent- Guaranteed accurateavg_interval_days- Guaranteed accurate via running averagepurchase_dates- Guaranteed complete
Eventual Consistency (Background Jobs)
Section titled “Eventual Consistency (Background Jobs)”Properties updated asynchronously by workers:
p.popularity_score- Eventually consistent (daily refresh)- Staleness acceptable for ranking/sorting
Verification Strategy
Section titled “Verification Strategy”// Weekly verification jobMATCH (u:User)-[p:PURCHASED]->(prod:Product)WHERE p.purchase_dates IS NOT NULL AND size(p.purchase_dates) > 1WITH u, prod, p, // Calculate actual average from purchase_dates reduce(total = 0.0, i IN range(0, size(p.purchase_dates)-2) | total + duration.between(p.purchase_dates[i], p.purchase_dates[i+1]).days ) / (size(p.purchase_dates) - 1) AS calculated_avgWHERE abs(calculated_avg - p.avg_interval_days) > 0.1RETURN u.user_id, prod.product_id, p.avg_interval_days AS stored, calculated_avg AS actual, abs(calculated_avg - p.avg_interval_days) AS diffORDER BY diff DESCPerformance Impact
Section titled “Performance Impact”Write Performance
Section titled “Write Performance”| Operation | Without Pre-Compute | With Pre-Compute | Overhead |
|---|---|---|---|
| Single purchase event | 5-10ms | 8-12ms | +20-40% |
| Batch 1000 purchases | 500ms | 650ms | +30% |
Acceptable trade-off: Writes are infrequent compared to reads.
Read Performance
Section titled “Read Performance”| Query | Without Pre-Compute | With Pre-Compute | Speedup |
|---|---|---|---|
| get_user_purchase_history | 50-100ms | 5-15ms | 5-10x |
| predict_repurchases | 100-200ms | 5-15ms | 10-20x |
| get_purchase_patterns | 80-150ms | 5-10ms | 10-15x |
Massive improvement: Most queries are reads (90%+ of traffic).
Best Practices
Section titled “Best Practices”When to Pre-Compute
Section titled “When to Pre-Compute”✅ DO pre-compute when:
- Property is queried frequently (>10x more than updated)
- Calculation is expensive (aggregations, date arithmetic)
- Result is deterministic and consistent
- Staleness is acceptable (eventual consistency OK)
❌ DON’T pre-compute when:
- Property changes as often as it’s queried
- Calculation is trivial (simple lookups)
- Storage cost outweighs compute cost
- Real-time accuracy is critical
Incremental vs Batch
Section titled “Incremental vs Batch”Incremental (user.go:AddPurchaseEvent pattern):
- Update on every write operation
- Use running averages, counters
- Maintains strong consistency
- Example: avg_interval_days
Batch (background job pattern):
- Update periodically (hourly/daily)
- Full recalculation from source data
- Eventual consistency
- Example: product popularity scores
Testing Pre-Computed Properties
Section titled “Testing Pre-Computed Properties”func TestAvgIntervalDaysAccuracy(t *testing.T) { // Add multiple purchase events // Query avg_interval_days // Calculate expected average manually // Assert they match within tolerance}
func TestRunningAverageFormula(t *testing.T) { // Test edge cases: // - First purchase (no interval) // - Second purchase (first interval) // - Irregular intervals // - Many purchases}Migration Strategy
Section titled “Migration Strategy”When adding new pre-computed properties:
- Add property to schema: Update constraint/index creation
- Implement incremental update: Modify write path (AddPurchaseEvent)
- Backfill existing data: One-time migration query
- Add verification: Background job to check accuracy
- Monitor performance: Log query times before/after
Conclusion
Section titled “Conclusion”Pre-computation is critical for this system’s performance:
- Enables O(P) instead of O(P × D) for temporal queries
- Makes repurchase predictions 10-20x faster
- Acceptable write overhead (+20-40%)
- Running averages maintain accuracy without full scans
Continue this pattern as new tools are added. When in doubt, pre-compute.