Skip to content

Pre-Computation Patterns for Read Optimization

Pre-Computation Patterns for Read Optimization

Section titled “Pre-Computation Patterns for Read Optimization”

This document describes the pre-computation patterns used to optimize read queries. Background workers and write operations calculate properties ahead of time, enabling O(1) or O(P) read complexity instead of expensive runtime aggregations.

Write Heavy, Read Light

  • Accept higher write costs to pre-compute metrics
  • Optimize for read performance (most queries are reads)
  • Background workers can recalculate properties during off-peak hours
  • Event-driven updates maintain consistency

All properties are maintained incrementally via AddPurchaseEvent (user.go:308-378).

PropertyTypeCalculationUpdate Pattern
timesintCount of purchasesIncremented on each purchase
qtyintTotal quantity purchasedAccumulated on each purchase
firstdatetimeFirst purchase dateSet on CREATE, never changed
lastdatetimeMost recent purchaseUpdated on each purchase
purchase_datesdatetime[]Array of all purchase datesAppended on each purchase
total_spentfloatCumulative spendAccumulated on each purchase
avg_interval_daysfloatRunning average of days between purchasesRecalculated using formula below

The avg_interval_days is calculated incrementally without scanning all purchases:

// Running average: ((old_avg * (n-1)) + new_interval) / n
// where n = number of intervals (which is times - 1)
rel.avg_interval_days = ((rel.avg_interval_days * (rel.times - 1)) + duration.between(rel.last, $purchase_date).days) / rel.times

Why This Works:

  • Avoids re-scanning purchase_dates array
  • O(1) update complexity per purchase
  • Maintains accurate average across all intervals

Example:

Purchase 1: avg_interval = 0.0 (no interval yet)
Purchase 2: avg_interval = 30 days (1 interval)
Purchase 3: avg_interval = ((30 * 1) + 25) / 2 = 27.5 days
Purchase 4: avg_interval = ((27.5 * 2) + 35) / 3 = 30 days
ToolPre-Computed Properties UsedQuery Complexity
get_user_purchase_historylast, times, avg_interval_days, total_spentO(P) - user’s purchases
get_user_categoriestimes, total_spent, avg_interval_daysO(P)
get_user_brandstimes, total_spentO(P)
get_purchase_patternstimes, avg_interval_days, last, total_spentO(P)
get_category_brand_affinitytimes, total_spent, lastO(P)

All these queries are O(P) where P = user’s product count, typically <100.

ToolPre-Computed Properties UsedQuery Complexity
predict_repurchasestimes, avg_interval_days, lastO(P)
get_purchase_timelinetimes, avg_interval_days, lastO(P)

Without pre-computation, these would require:

  • Scanning all purchase_dates arrays: O(P × D) where D = avg dates per product
  • Calculating intervals in Cypher: expensive datetime operations
  • Sorting dates: O(D log D) per product

With pre-computation: Direct property access, simple arithmetic.

Product Discovery Tools ⚠️ COULD BE OPTIMIZED

Section titled “Product Discovery Tools ⚠️ COULD BE OPTIMIZED”
ToolCurrent ComplexityOptimization Opportunity
find_similar_productsO(S) where S = similar products✅ Already optimal (uses SIMILAR_TO.score)
find_alternatives_in_categoryO(A × S) where A = alternatives, S = tried⚠️ Could pre-compute relevance_score
search_productsO(N) where N = matching products✅ Already optimal

Collaborative Tools ❌ EXPENSIVE (By Design)

Section titled “Collaborative Tools ❌ EXPENSIVE (By Design)”
ToolComplexityWhy Not Pre-Computed
get_collaborative_recommendationsO(U × P) where U = similar users, P = productsToo dynamic, user-dependent
get_local_trending_productsO(L × P) where L = local users, P = productsToo dynamic, time-dependent

These tools intentionally query at runtime because:

  • Results are highly personalized
  • Trending data changes frequently
  • Pre-computing all user combinations is infeasible
  • Mitigation: Aggressive caching (20-30 min TTL)

1. Product Popularity Score ⭐ HIGH VALUE

Section titled “1. Product Popularity Score ⭐ HIGH VALUE”

Current State: Not implemented Proposal: Add to Product nodes via background job

// Background job (runs nightly)
MATCH (p:Product)<-[purchased:PURCHASED]-()
WITH p,
count(DISTINCT purchased) AS purchase_count,
sum(purchased.times) AS total_purchases,
avg(purchased.times) AS avg_repurchases
SET p.popularity_score = total_purchases,
p.repurchase_rate = avg_repurchases,
p.unique_buyers = purchase_count

Benefits:

  • search_products can sort by popularity: O(1) per product
  • find_alternatives_in_category can use as tie-breaker
  • Recommendations can factor in popularity

Update Frequency: Daily (popularity doesn’t change rapidly)

2. Category/Brand Purchase Counts ⭐ MEDIUM VALUE

Section titled “2. Category/Brand Purchase Counts ⭐ MEDIUM VALUE”

Current State: Not implemented Proposal: Add aggregates to User nodes

// Background job or incremental update
MATCH (u:User)-[p:PURCHASED]->(prod:Product)
WITH u, prod.category AS category,
count(DISTINCT prod) AS product_count,
sum(p.times) AS total_purchases
SET u.category_counts = collect({category: category, count: total_purchases})

Benefits:

  • get_user_categories becomes O(1) instead of O(P)
  • Faster filtering for “users who buy coffee”

Trade-off: More complex to maintain, questionable value for O(P) queries

3. Offer Applicability Count ⭐ LOW VALUE

Section titled “3. Offer Applicability Count ⭐ LOW VALUE”

Current State: Computed at query time Proposal: Pre-count applicable products per offer

// On offer creation or product changes
MATCH (o:Offer)-[:APPLIES_TO]->(p:Product)
WITH o, count(p) AS product_count
SET o.applicable_product_count = product_count

Benefits:

  • get_active_offers can include count without sub-query
  • Minimal value (already fast)

Update: On offer creation/modification

None implemented yet.

JobFrequencyPurposeComplexity
Recalculate IntervalsWeeklyVerify avg_interval_days accuracyO(U × P × D)
Update Product PopularityDailySet p.popularity_scoreO(Products)
Prune Old Purchase DatesMonthlyTrim purchase_dates arrays >2 years oldO(U × P)
Rebuild Similarity GraphWeeklyRefresh SIMILAR_TO relationshipsO(P²)
pkg/workers/precompute.go
package workers
type PreComputeWorker struct {
repo *repository.Repositories
}
func (w *PreComputeWorker) RecalculateIntervals(ctx context.Context) error {
// Query all PURCHASED relationships with purchase_dates
// Recalculate avg_interval_days from scratch
// Verify matches current running average
// Log discrepancies for monitoring
}
func (w *PreComputeWorker) UpdateProductPopularity(ctx context.Context) error {
// Aggregate purchase metrics per product
// Set popularity_score, repurchase_rate, unique_buyers
}

Properties updated synchronously on every write:

  • times, qty, last, total_spent - Guaranteed accurate
  • avg_interval_days - Guaranteed accurate via running average
  • purchase_dates - Guaranteed complete

Properties updated asynchronously by workers:

  • p.popularity_score - Eventually consistent (daily refresh)
  • Staleness acceptable for ranking/sorting
// Weekly verification job
MATCH (u:User)-[p:PURCHASED]->(prod:Product)
WHERE p.purchase_dates IS NOT NULL
AND size(p.purchase_dates) > 1
WITH u, prod, p,
// Calculate actual average from purchase_dates
reduce(total = 0.0, i IN range(0, size(p.purchase_dates)-2) |
total + duration.between(p.purchase_dates[i], p.purchase_dates[i+1]).days
) / (size(p.purchase_dates) - 1) AS calculated_avg
WHERE abs(calculated_avg - p.avg_interval_days) > 0.1
RETURN u.user_id, prod.product_id,
p.avg_interval_days AS stored,
calculated_avg AS actual,
abs(calculated_avg - p.avg_interval_days) AS diff
ORDER BY diff DESC
OperationWithout Pre-ComputeWith Pre-ComputeOverhead
Single purchase event5-10ms8-12ms+20-40%
Batch 1000 purchases500ms650ms+30%

Acceptable trade-off: Writes are infrequent compared to reads.

QueryWithout Pre-ComputeWith Pre-ComputeSpeedup
get_user_purchase_history50-100ms5-15ms5-10x
predict_repurchases100-200ms5-15ms10-20x
get_purchase_patterns80-150ms5-10ms10-15x

Massive improvement: Most queries are reads (90%+ of traffic).

DO pre-compute when:

  • Property is queried frequently (>10x more than updated)
  • Calculation is expensive (aggregations, date arithmetic)
  • Result is deterministic and consistent
  • Staleness is acceptable (eventual consistency OK)

DON’T pre-compute when:

  • Property changes as often as it’s queried
  • Calculation is trivial (simple lookups)
  • Storage cost outweighs compute cost
  • Real-time accuracy is critical

Incremental (user.go:AddPurchaseEvent pattern):

  • Update on every write operation
  • Use running averages, counters
  • Maintains strong consistency
  • Example: avg_interval_days

Batch (background job pattern):

  • Update periodically (hourly/daily)
  • Full recalculation from source data
  • Eventual consistency
  • Example: product popularity scores
func TestAvgIntervalDaysAccuracy(t *testing.T) {
// Add multiple purchase events
// Query avg_interval_days
// Calculate expected average manually
// Assert they match within tolerance
}
func TestRunningAverageFormula(t *testing.T) {
// Test edge cases:
// - First purchase (no interval)
// - Second purchase (first interval)
// - Irregular intervals
// - Many purchases
}

When adding new pre-computed properties:

  1. Add property to schema: Update constraint/index creation
  2. Implement incremental update: Modify write path (AddPurchaseEvent)
  3. Backfill existing data: One-time migration query
  4. Add verification: Background job to check accuracy
  5. Monitor performance: Log query times before/after

Pre-computation is critical for this system’s performance:

  • Enables O(P) instead of O(P × D) for temporal queries
  • Makes repurchase predictions 10-20x faster
  • Acceptable write overhead (+20-40%)
  • Running averages maintain accuracy without full scans

Continue this pattern as new tools are added. When in doubt, pre-compute.