Skip to content

Graph Database Capacity Experiment Plan

Date: 2026-02-16 Author: f.luo Status: Draft — awaiting review

See also:

Results from this experiment plan inform the Action Plan’s decision gates — specifically whether to invest further in EC2 (solidification Phases 1-2) or migrate to a managed service (Phase 3).


  1. Find the breaking point of EC2 Neo4j — How many users can the current r6i.xlarge (32GB) hold before performance degrades? What about with vectors? (Informs Action Plan — Decision Gate A)
  2. Evaluate AuraDB — Is managed Neo4j viable? What’s the cost/performance tradeoff vs self-managed EC2? (Informs Action Plan — Decision Gate B)
  3. Evaluate Neptune Analytics — Can it replace Neo4j? How does graph+vector unification compare to Neo4j + external vector store? (Informs Action Plan — Decision Gate B)
  4. Compare vector storage options — Neo4j native, Neptune Analytics, Valkey VSS, OpenSearch Serverless
  5. Establish cost models — $/user/month for each backend at 100K, 1M, 5M users

Recommendation: New repo — graph-capacity-experiments

Section titled “Recommendation: New repo — graph-capacity-experiments”

Why not keep it in consumer-graph-worker?

Concernconsumer-graph-workerNew repo
LifecycleLong-lived production serviceThrowaway experiments
DependenciesGo + Neo4j driver + KafkaGo + Neo4j driver + Neptune SDK + OpenSearch SDK + vector libs
CI/CDBuild → Docker → ECS deployBuild → run locally or on EC2
Data generatorsNot appropriate in prod codeCore purpose
RiskBenchmark code could accidentally shipIsolated

Repo structure:

graph-capacity-experiments/
├── cmd/
│ ├── datagen/ # Synthetic user + relationship generator (uses real catalog)
│ │ └── main.go
│ ├── loader/ # Multi-backend data loader
│ │ └── main.go
│ └── benchmark/ # Benchmark runner
│ └── main.go
├── scripts/
│ ├── export_catalog.sh # Export real products/categories/retailers from Neo4j
│ ├── export_snowflake_catalog.py # Export larger catalog from Snowflake
│ └── embed.py # Generate embeddings (OpenAI or sentence-transformer)
├── internal/
│ ├── datagen/ # Data generation logic
│ │ ├── users.go # Synthetic user generation
│ │ ├── purchases.go # Purchase relationship generation (assigns real products to fake users)
│ │ └── distributions.go # Statistical distributions matching prod
│ ├── loader/ # Backend-specific loaders
│ │ ├── neo4j.go # Bolt protocol (EC2 + AuraDB)
│ │ ├── neptune.go # openCypher over HTTPS
│ │ └── vectors.go # Vector-specific loaders (Valkey VSS, OpenSearch)
│ ├── benchmark/ # Benchmark queries + harness
│ │ ├── queries.go # Standard query set
│ │ ├── runner.go # Execution + timing
│ │ └── report.go # Results formatting
│ └── model/ # Shared data model (mirrors consumer-graph-worker types)
│ └── types.go
├── data/
│ └── catalog/ # Real product/category/retailer data (exported, gitignored)
├── infra/ # FSD configs for experiment instances
│ ├── experiment-neo4j-ec2.yml
│ └── experiment-neptune.yml
├── results/ # Benchmark results (committed for reference)
│ └── .gitkeep
├── go.mod
├── go.sum
├── Makefile
└── README.md

Alternative considered: Keep in consumer-graph-worker under experiments/. Rejected because it adds unnecessary dependencies to the production module and blurs the boundary.


Clone the existing consumer-graph-neo4j-ec2.yml with modifications:

infra/experiment-neo4j-ec2.yml
variables:
default:
instance_type: 'r6i.xlarge' # Start with same as prod (32GB)
data_volume_size: '500'
stage:
instance_type: 'r6i.xlarge'
data_volume_size: '500'
tags:
service: consumer-graph-neo4j-experiment
purpose: capacity-testing
ttl: 30d # Remind to tear down

Deploy to stage account only (cheaper, no prod risk):

Terminal window
fsd service ec2 deploy --env stage --account stage-services experiment-neo4j-ec2.yml

Later, to test larger instances: Change instance_type to r6i.2xlarge (64GB) or r6i.4xlarge (128GB) and redeploy.

Cost: r6i.xlarge on-demand = ~$0.252/hr = ~$6/day. Budget ~$200 for a month of experiments.

Use AuraDB Professional (not Free — 200K node limit is too restrictive):

  • Create via AuraDB Console
  • Region: us-east-1 (same as our infra)
  • Size: Start with 2GB RAM, scale up as needed
  • Cost: ~$65/mo for 2GB, ~$130/mo for 4GB
  • Connection: Bolt protocol (same Neo4j Go driver, different connection URI)

No FSD config needed — AuraDB is fully managed by Neo4j Inc.

Create a Neptune Analytics graph (serverless, no instance provisioning):

Terminal window
aws neptune-graph create-graph \
--graph-name consumer-graph-experiment \
--provisioned-memory 128 \
--vector-search-configuration dimension=1024 \
--region us-east-1
  • Uses openCypher (compatible with Neo4j Cypher, with caveats)
  • Has native vector search built in
  • Serverless pricing: pay per query + storage
  • No FSD config needed — use AWS CLI or CloudFormation
BackendSetupNotes
Neo4j native (HNSW)Already on the EC2 experiment instanceCREATE VECTOR INDEX
Neptune Analytics vectorsAlready included in Neptune graphBuilt-in
Valkey VSSCreate a separate Valkey node or use existing stage cacheNeeds redis-cli with VSS module
OpenSearch ServerlessCreate a vector search collectionaws opensearchserverless create-collection

Matches the production schema exactly:

(:User {user_id, timezone, created_at, last_updated_at})
-[:PURCHASED {times, last, timestamps[], receipt_ids[], avg_interval_days, repurchase_likelihood}]->
(:Product {product_id, name, brand, category, created_at})
-[:IN_CATEGORY]->
(:Category {category_id, name})
(:User)-[:SHOPS_IN {purchase_count}]->(:Category)
(:User)-[:SHOPS_AT {frequency, last_visit}]->(:Retailer {name, venue_type})
(:User)-[:MEMBER_OF]->(:Community {community_id, name, primary_category, member_count, zip_code})

Use distributions measured from production (from capacity planning doc):

RelationshipDistributionParams
PURCHASED per userLog-normalmean=5.8, median=3, p95=20, max=103
SHOPS_IN per userLog-normalmean=5.7, p95=17
MEMBER_OF per userLog-normalmean=2.6, p95=6
SHOPS_AT per userLog-normalmean=1.6, p95=4
Products (shared)Power-law~2.5 products per user (amortized), popular products purchased by many users
CategoriesFixed catalog~50 realistic categories (Dairy, Bakery, Snacks, etc.)
RetailersFixed catalog~500 realistic retailer names
CommunitiesDerived~3 per zip code × category combination

4c. Real Catalog Data (Products, Categories, Retailers)

Section titled “4c. Real Catalog Data (Products, Categories, Retailers)”

Products, categories, and retailers use real Fetch data — not synthetic names. This ensures embeddings reflect actual product semantics and similarity searches return meaningful results.

Current prod Neo4j catalog (as of 2026-02-16, from 30 backfilled users)

Section titled “Current prod Neo4j catalog (as of 2026-02-16, from 30 backfilled users)”
EntityCountProperties
Products4,586product_id, name, brand, category
Categories2,970category_id, name (3-level hierarchy: GROCERY|DAIRY|MILK)
Retailers212retailer_id, name, venue_type
ScaleProducts NeededSourceMethod
Quick-start (≤50K users)~5KProd Neo4j exportCypher query → CSV
Medium (100K–500K users)~60KSnowflake exportSQL query → CSV
Large (1M+ users)~400K+Snowflake exportSQL query → CSV

Source 1: Prod Neo4j export (immediate, no extra access needed)

-- Export products
MATCH (p:Product)
RETURN p.product_id AS product_id, p.name AS name, p.brand AS brand, p.category AS category
-- Export categories
MATCH (c:Category)
RETURN c.category_id AS category_id, c.name AS name
-- Export retailers
MATCH (r:Retailer)
RETURN r.retailer_id AS retailer_id, r.name AS name, r.venue_type AS venue_type
-- Export productcategory mapping
MATCH (p:Product)-[:IN_CATEGORY]->(c:Category)
RETURN p.product_id AS product_id, c.category_id AS category_id

A scripts/export_catalog.sh script runs these via the Neo4j HTTP API and writes CSVs. This gives us ~4.5K real products with names like “Hormel Black Label Thick Cut Maple Bacon - 12 Oz” and real brands like HORMEL, GREAT VALUE, CELSIUS.

Source 2: Snowflake export (for larger catalogs)

-- Unique products from receipt items (full Fetch catalog)
SELECT DISTINCT
i.FIDO AS product_id,
i.DESCRIPTION AS name,
i.BRAND AS brand,
COALESCE(i.CATEGORY_1, 'UNCATEGORIZED') AS category_l1,
i.CATEGORY_2 AS category_l2,
i.CATEGORY_3 AS category_l3
FROM FETCH_SERVICES_PROD.RECEIPT_SERVICE.RECEIPT_ITEMS i
WHERE i.FIDO IS NOT NULL
AND i.DESCRIPTION IS NOT NULL
AND i.DESCRIPTION != ''
LIMIT 500000;
-- Unique retailers
SELECT DISTINCT
r.STORE_NAME AS name,
r.RETAILER_CHANNEL AS venue_type
FROM FETCH_SERVICES_PROD.RECEIPT_SERVICE.RECEIPTS r
WHERE r.STORE_NAME IS NOT NULL
AND r.STORE_NAME != '';

Run via Snowflake CLI (snowsql) or the Snowflake Python connector. Export to CSV, then use in datagen.

Source 3: Purchase History API (alternative, slower)

If Snowflake access is not available, we can discover more products by calling the Purchase History API for a batch of user IDs. Each user averages ~5.8 unique products. Querying ~30K users at 10 req/s (~50 min) would yield ~50-60K unique products. This uses the existing backfill infrastructure but is slower than a direct Snowflake query.

  1. Load real product/category/retailer CSVs (exported from Neo4j or Snowflake)
  2. Generate synthetic users with fake user_ids
  3. Assign real products to fake users following a power-law (Zipf) distribution — popular products (bananas, eggs, milk) purchased by many users, long-tail products by few
  4. Build SHOPS_IN, SHOPS_AT, MEMBER_OF relationships from the assigned purchases
  5. Communities are derived from category + zip code combinations (synthetic zip codes, real categories)

Product popularity distribution from prod (top products by buyer count):

ProductBuyersCategory
Fresh Fruits12Pantry
Fresh Vegetables10Pantry
Eggs10Pantry
Fresh Bananas9Pantry
Fresh Blueberries8Pantry
Avocados6PRODUCE|FRUITS|AVOCADOS

The Zipf distribution in datagen should match this pattern: ~40% of purchases hit the top 5% of products.

All embeddings use real models so that kNN results reflect actual semantic similarity (not random noise). Three embedding approaches:

ModelDimensionsSpeedCostWhere it runs
OpenAI text-embedding-3-small512 (via dimensions param)~3,000 items/min~$0.02/1M tokensAPI call
OpenAI text-embedding-3-large1024 (via dimensions param)~2,500 items/min~$0.13/1M tokensAPI call
Sentence Transformer (all-MiniLM-L6-v2)384~10,000 items/minFreeLocal (Python)

OpenAI embedding (512-dim) — good quality, low cost:

# Using text-embedding-3-small with dimensions=512
response = openai.embeddings.create(
model="text-embedding-3-small",
input="Organic Whole Milk, Horizon, Dairy",
dimensions=512
)
  • Input text per product: "{name}, {brand}, {category}" (e.g. “Organic Whole Milk, Horizon, Dairy”)
  • Input text per user: Concatenation of their top-5 purchased product names + top-3 categories
  • Cost estimate: 1M products × ~10 tokens each = 10M tokens → ~$0.20

OpenAI embedding (1024-dim) — highest quality, tests scaling with larger vectors:

# Using text-embedding-3-large with dimensions=1024
response = openai.embeddings.create(
model="text-embedding-3-large",
input="Organic Whole Milk, Horizon, Dairy",
dimensions=1024
)
  • Same input text format as 512-dim
  • Cost estimate: 1M products × ~10 tokens each = 10M tokens → ~$1.30
  • 2× storage and index overhead vs 512-dim — tests whether higher quality justifies the cost

Sentence Transformer — free, local, 384-dim:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2') # 384-dim
embeddings = model.encode(["Organic Whole Milk, Horizon, Dairy", ...])
  • Same input text format as OpenAI
  • Runs on local machine (CPU is fine for <1M items)
  • No API cost, fully offline

Embedding pipeline:

  1. Real product catalog is exported from Neo4j or Snowflake (see section 4c)
  2. A Python script (scripts/embed.py) reads the product CSV and generates embeddings via OpenAI API or local sentence-transformer
  3. Embeddings are saved as .npy files (product_id → float32 array)
  4. datagen assigns real products to synthetic users; user embeddings = weighted average of their purchased product embeddings (weighted by purchase count)

Recommended approach:

  • Use OpenAI 512-dim (text-embedding-3-small) as the primary embedding for most experiments (good quality, low cost)
  • Use OpenAI 1024-dim (text-embedding-3-large) to test whether higher dimensionality improves recall enough to justify 2× storage/index overhead
  • Use sentence-transformer 384-dim as a free alternative for rapid iteration and local development
  • Compare all three on HNSW index size, kNN recall, and query latency to determine the best quality/cost/performance tradeoff

Embeddings go on:

  • Product nodes: Embed "{name}, {brand}, {category}" — represents the product semantically
  • User nodes: Weighted average of purchased product embeddings — represents the user’s purchase behavior profile
  • Category nodes (optional): Embed category name — enables category-level similarity search
Terminal window
# Step 0: Export real catalog from prod Neo4j (one-time)
./scripts/export_catalog.sh --env prod --output data/catalog/
# Or from Snowflake for a larger catalog:
python scripts/export_snowflake_catalog.py --output data/catalog/ --limit 500000
# Catalog output:
# data/catalog/products.csv (product_id, name, brand, category)
# data/catalog/categories.csv (category_id, name)
# data/catalog/retailers.csv (retailer_id, name, venue_type)
# data/catalog/in_category.csv (product_id, category_id)
# Step 1: Generate embeddings for real products (Python, one-time per model)
python scripts/embed.py \
--products data/catalog/products.csv \
--model openai-small --dimensions 512 \
--output data/catalog/embeddings-512/
python scripts/embed.py \
--products data/catalog/products.csv \
--model openai-large --dimensions 1024 \
--output data/catalog/embeddings-1024/
python scripts/embed.py \
--products data/catalog/products.csv \
--model sentence-transformer \
--output data/catalog/embeddings-384/
# Embedding output (per model):
# data/catalog/embeddings-512/products.npy (float32 array)
# data/catalog/embeddings-512/manifest.json (model, dimensions, count, cost)
# Step 2: Generate synthetic users + purchase relationships (Go)
./datagen \
--users 100000 \
--seed 42 \
--catalog data/catalog/ \
--embeddings data/catalog/embeddings-512/ \
--output data/100k/
# Generated output (synthetic):
# data/100k/users.csv (user_id, timezone, created_at)
# data/100k/embeddings/users.npy (weighted avg of product embeddings)
# data/100k/embeddings/manifest.json (model, dimensions, count)
# data/100k/communities.csv (community_id, name, primary_category, zip_code)
# data/100k/purchased.csv (user_id, product_id, times, last, ...)
# data/100k/shops_in.csv (user_id, category_id, purchase_count)
# data/100k/shops_at.csv (user_id, retailer_name, frequency, last_visit)
# data/100k/member_of.csv (user_id, community_id)
# data/100k/manifest.json (metadata: counts, seed, generation time)
#
# Products, categories, retailers, and in_category are real data from
# data/catalog/ — shared across all dataset sizes, not regenerated.

Find the maximum user count where:

  • Read query p95 < 100ms
  • Write throughput > 100 nodes/sec
  • Page cache hit ratio > 95%
  • No OOM crashes
  1. Deploy fresh experiment EC2 (r6i.xlarge, 32GB)
  2. Load data incrementally:
CheckpointUsersEst. Graph SizeEst. + Vectors (512-dim)Est. + Vectors (1024-dim)
C110,000~43 MB~150 MB~260 MB
C250,000~215 MB~650 MB~1.1 GB
C3100,000~430 MB~1.3 GB~2.2 GB
C4250,000~1.1 GB~3.1 GB~5.3 GB
C5500,000~2.1 GB~6 GB~10.5 GB
C61,000,000~4.3 GB~11.5 GB~20 GB
C72,000,000~8.6 GB~23 GB~40 GB

Vector estimates (users + products both embedded, HNSW index overhead ≈ 1.5× vector storage):

  • 512-dim float32 = ~2.1 KB/embedding
  • 1024-dim float32 = ~4.2 KB/embedding
  1. At each checkpoint, run the benchmark suite
  2. Stop when performance degrades below thresholds
-- Q1: Single user lookup (point query)
MATCH (u:User {user_id: $uid})-[r:PURCHASED]->(p:Product)
RETURN u, r, p
-- Q2: 2-hop category aggregation
MATCH (u:User {user_id: $uid})-[:PURCHASED]->(p:Product)-[:IN_CATEGORY]->(c:Category)
RETURN c.name, count(p) AS products ORDER BY products DESC
-- Q3: Community-based recommendation (expensive)
MATCH (u:User {user_id: $uid})-[:MEMBER_OF]->(comm:Community)<-[:MEMBER_OF]-(other:User)
MATCH (other)-[:PURCHASED]->(p:Product)
WHERE NOT (u)-[:PURCHASED]->(p)
RETURN p.name, count(DISTINCT other) AS score
ORDER BY score DESC LIMIT 10
-- Q4: User's full profile (all relationship types)
MATCH (u:User {user_id: $uid})
OPTIONAL MATCH (u)-[pur:PURCHASED]->(prod:Product)
OPTIONAL MATCH (u)-[si:SHOPS_IN]->(cat:Category)
OPTIONAL MATCH (u)-[sa:SHOPS_AT]->(ret:Retailer)
OPTIONAL MATCH (u)-[mo:MEMBER_OF]->(comm:Community)
RETURN count(DISTINCT pur) AS purchases,
count(DISTINCT si) AS categories,
count(DISTINCT sa) AS retailers,
count(DISTINCT mo) AS communities
-- Q5: Global aggregation (stress test)
MATCH (u:User)-[:PURCHASED]->(p:Product)
WITH p, count(u) AS buyers
ORDER BY buyers DESC LIMIT 20
RETURN p.name, buyers
-- Q6: Vector similarity (only when vectors loaded)
CALL db.index.vector.queryNodes('product-embedding-index', 10, $queryVector)
YIELD node, score
RETURN node.product_id, node.name, score
-- Q7: Graph + vector combined (graph filter → vector rerank)
MATCH (u:User {user_id: $uid})-[:PURCHASED]->(p:Product)
WITH collect(p) AS purchased
CALL db.index.vector.queryNodes('product-embedding-index', 50, $queryVector)
YIELD node, score
WHERE NOT node IN purchased
RETURN node.product_id, node.name, score LIMIT 10

Each query runs 100 iterations with random user IDs. Record p50, p95, p99, max latency.

RunInstanceVectorsEmbedding ModelMax Users
R1r6i.xlarge (32GB)NoUntil degradation
R2r6i.xlarge (32GB)Yes (512-dim)OpenAI text-embedding-3-smallUntil degradation
R3r6i.xlarge (32GB)Yes (384-dim)Sentence Transformer all-MiniLM-L6-v2Until degradation
R4r6i.xlarge (32GB)Yes (1024-dim)OpenAI text-embedding-3-largeUntil degradation
R5r6i.2xlarge (64GB)Yes (512-dim)OpenAI text-embedding-3-smallUntil degradation
R6r6i.2xlarge (64GB)Yes (1024-dim)OpenAI text-embedding-3-largeUntil degradation

Compare AuraDB Professional vs self-managed EC2 Neo4j on latency, throughput, and cost.

  1. Create AuraDB Professional instance (us-east-1, 4GB RAM)
  2. Load 100K users (same dataset as EC2 experiment)
  3. Run identical benchmark query set
  4. Scale to 500K, 1M if 100K passes
  5. Test vector support (AuraDB Professional supports vector indexes)
  • Cypher compatibility: Is our production Cypher 100% compatible? (MERGE, UNWIND, CASE WHEN, datetime, array properties)
  • Write throughput: How does Bolt-over-internet compare to Bolt-over-VPC?
  • Latency: Network hop to AuraDB vs local VPC EC2
  • Cost: AuraDB pricing vs EC2 + ops overhead
  • Vector support: Same HNSW API as Community Edition?

Same Neo4j Go driver, different connection string:

// EC2
driver, _ := neo4j.NewDriverWithContext("neo4j://10.4.19.205:7687", auth)
// AuraDB
driver, _ := neo4j.NewDriverWithContext("neo4j+s://xxxxx.databases.neo4j.io", auth)

Evaluate Neptune Analytics as a unified graph + vector database, replacing both Neo4j and a separate vector store.

  1. Create Neptune Analytics graph (128 m-NCUs, 1024-dim vector config)
  2. Load data via openCypher endpoint or Bulk Load from S3
  3. Test Cypher compatibility (Neptune uses openCypher, not full Cypher)
  4. Run benchmark queries (translate where needed)
  5. Test vector search with neptune.algo.vectors.topKByNode() and topKByEmbedding()
  6. Measure combined graph traversal + vector similarity queries

Neptune’s openCypher has known gaps vs Neo4j Cypher:

FeatureNeo4jNeptuneRisk
MERGE with ON CREATE/ON MATCHYesYesLow
UNWINDYesYesLow
CASE WHEN in SETYesPartialMedium — test the receipt_ids dedup pattern
datetime()Yestimestamp()High — different function names
Array propertiesYesYes (with caveats)Medium — test append operations
CALL proceduresYes (APOC, vector)Different APIHigh — vector queries use different syntax
CREATE VECTOR INDEXYes (Neo4j 5.x)Not applicableN/A — vectors are built-in differently
-- Neptune Analytics vector similarity (different from Neo4j)
MATCH (p:Product)
CALL neptune.algo.vectors.topKByEmbedding(p, $queryVector, 10)
YIELD node, score
RETURN node.product_id, score

Neptune supports bulk loading from S3 (CSV format with specific headers):

Terminal window
# Upload generated CSV to S3
aws s3 cp data/100k/ s3://experiment-bucket/neptune/100k/ --recursive
# Start bulk load
aws neptune-graph start-import-task \
--graph-identifier consumer-graph-experiment \
--source s3://experiment-bucket/neptune/100k/ \
--role-arn arn:aws:iam::role/NeptuneBulkLoadRole \
--format openCypher

8. Experiment 4: Vector Storage Comparison

Section titled “8. Experiment 4: Vector Storage Comparison”

Compare vector storage options for kNN similarity search at scale. Find the best option for our latency, cost, and operational requirements.

BackendSetupkNN API
Neo4j HNSWVector index on experiment EC2db.index.vector.queryNodes()
Neptune AnalyticsBuilt into Neptune graphneptune.algo.vectors.topKByEmbedding()
Valkey VSSFT.CREATE with HNSW, FT.SEARCH with KNNFT.SEARCH idx @vec:[VECTOR_RANGE ...]
OpenSearch ServerlessVector search collectionknn query via REST API

For each backend, at 100K and 1M vectors (1024-dim):

  1. Insert throughput: vectors/sec for bulk load
  2. kNN latency: k=10, k=50, k=100 — p50, p95, p99
  3. Recall@10: Against brute-force exact results (measures HNSW approximation quality)
  4. Combined query: kNN candidates → graph enrichment (two-hop) — end-to-end latency
  5. Memory usage: Index size in RAM
  6. Cost: $/month at steady state

The production use case is:

  1. Vector search → Find 50 candidate products similar to user’s embedding
  2. Graph filter → Remove products the user already purchased
  3. Graph enrich → Get category, brand, retailer info for remaining candidates
  4. Graph rerank → Boost candidates purchased by users in the same community

This requires either:

  • Single-engine (Neptune Analytics, Neo4j native vectors): One query does it all
  • Two-engine (Neo4j + Valkey, Neo4j + OpenSearch): Vector search → ID list → graph query

Measure end-to-end latency for both patterns.


{
"experiment": "ec2-neo4j-stress",
"checkpoint": "C3",
"users": 100000,
"timestamp": "2026-02-20T14:30:00Z",
"backend": {
"type": "neo4j-ec2",
"instance": "r6i.xlarge",
"memory_gb": 32,
"version": "5.15.0"
},
"data": {
"total_nodes": 180000,
"total_relationships": 1570000,
"store_size_mb": 430,
"vector_index_size_mb": 0
},
"write_metrics": {
"load_duration_sec": 120,
"nodes_per_sec": 1500,
"rels_per_sec": 13000,
"batch_size": 1000
},
"read_metrics": {
"Q1_point_lookup": {"p50_ms": 2, "p95_ms": 5, "p99_ms": 12},
"Q2_2hop_category": {"p50_ms": 5, "p95_ms": 15, "p99_ms": 30},
"Q3_community_rec": {"p50_ms": 50, "p95_ms": 120, "p99_ms": 250},
"Q4_full_profile": {"p50_ms": 3, "p95_ms": 8, "p99_ms": 20},
"Q5_global_agg": {"p50_ms": 200, "p95_ms": 500, "p99_ms": 1000},
"Q6_vector_knn": {"p50_ms": 0, "p95_ms": 0, "p99_ms": 0},
"Q7_graph_plus_vector": {"p50_ms": 0, "p95_ms": 0, "p99_ms": 0}
},
"system_metrics": {
"page_cache_hit_ratio": 0.99,
"heap_used_mb": 3200,
"heap_max_mb": 8192,
"cpu_percent": 15,
"disk_iops": 120
},
"cost": {
"instance_hourly": 0.252,
"estimated_monthly": 181
}
}

Results are committed to results/ in the experiment repo for historical comparison.


Don’t load all data at once. Load in checkpoints so we can measure the system at each scale.

Checkpoint C1: 10K users ──┐
├── Benchmark suite
Checkpoint C2: 50K users ──┤ (runs at each checkpoint)
├──
Checkpoint C3: 100K users ──┤
├──
Checkpoint C4: 250K users ──┤
├──
Checkpoint C5: 500K users ──┤
├──
Checkpoint C6: 1M users ──┤
├──
Checkpoint C7: 2M users ──┘ (EC2 may OOM here with vectors)
  1. Pre-generate all data using datagen (deterministic seed, so regeneration is identical)
  2. Load in batches of 1,000 users — each batch includes the user + all their relationships
  3. Use MERGE (same as production) — allows re-running without duplicates
  4. After each checkpoint: Run full benchmark suite, save results, continue loading
  5. If backend crashes/degrades: Record the failure point, stop loading, note as the limit
BackendProtocolBatch Method
EC2 Neo4jBolt (Go driver)UNWIND + MERGE (same as production writer)
AuraDBBolt (Go driver)Same UNWIND + MERGE
Neptune AnalyticsopenCypher HTTPS or S3 bulk loadBulk load for initial, incremental MERGE for checkpoints
Valkey VSSRedis protocolFT.CREATE index, HSET for each vector
OpenSearch ServerlessREST API_bulk API for batch indexing

A. Neo4j Native Vectors (HNSW)

-- Create vector index
CREATE VECTOR INDEX `product-embedding-index`
FOR (p:Product) ON (p.embedding)
OPTIONS {indexConfig: {
`vector.dimensions`: 512,
`vector.similarity_function`: 'cosine'
}}
-- Query
CALL db.index.vector.queryNodes('product-embedding-index', 10, $queryVector)
YIELD node, score
RETURN node.product_id, score

Pros: Single query engine, simplest architecture Cons: HNSW index must fit in RAM, scaling requires bigger instance

B. Neptune Analytics Vectors

-- Vectors are set as node properties
MATCH (p:Product {product_id: $id})
SET p.embedding = $vector
-- Query (Neptune-specific syntax)
CALL neptune.algo.vectors.topKByEmbedding($queryVector, 10)
YIELD node, score
RETURN node.product_id, score

Pros: Graph + vector in one engine, serverless scaling Cons: Different query syntax, potential Cypher compatibility gaps

C. Valkey VSS (Vector Similarity Search)

Terminal window
# Create index
FT.CREATE product-idx ON HASH PREFIX 1 "product:" \
SCHEMA embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1024 DISTANCE_METRIC COSINE
# Insert
HSET product:abc123 embedding <binary_vector> name "Product Name" category "Dairy"
# kNN search
FT.SEARCH product-idx "*=>[KNN 10 @embedding $query_vec AS score]" \
PARAMS 2 query_vec <binary_vector> \
RETURN 2 name score \
SORTBY score ASC

Pros: Sub-millisecond latency, reuses existing Valkey infrastructure Cons: Two-engine pattern (Valkey kNN → Neo4j graph enrichment), no graph traversal in vector query

D. OpenSearch Serverless (Vector Search Collection)

// Create collection
{
"name": "product-vectors",
"type": "VECTORSEARCH"
}
// Index document
{
"product_id": "abc123",
"name": "Product Name",
"embedding": [0.1, 0.2, ...]
}
// kNN query
{
"query": {
"knn": {
"embedding": {
"vector": [0.1, 0.2, ...],
"k": 10
}
}
}
}

Pros: Fully managed, scales independently, FAISS-backed Cons: Higher latency (~10-50ms), separate service, two-engine pattern


  • Create graph-capacity-experiments repo
  • Export real product catalog from prod Neo4j (~4.5K products) via scripts/export_catalog.sh
  • (Optional) Export larger catalog from Snowflake (~500K products) via scripts/export_snowflake_catalog.py
  • Generate product embeddings for all three models (512-dim, 1024-dim, 384-dim) via scripts/embed.py
  • Implement datagen tool (synthetic users + relationships using real catalog)
  • Implement loader tool (Neo4j Bolt backend first)
  • Implement benchmark tool (query runner + metrics capture)
  • Deploy experiment EC2 Neo4j instance (stage account)

Phase 2: EC2 Neo4j Stress Test (Graph Only)

Section titled “Phase 2: EC2 Neo4j Stress Test (Graph Only)”
  • Generate datasets: 10K → 2M users (no vectors)
  • Run incremental load + benchmark (runs R1)
  • Find the graph-only breaking point
  • Document results
  • Create Neo4j vector indexes on experiment instance
  • Run incremental load + benchmark with 512-dim (run R2)
  • Run with 384-dim sentence-transformer (run R3)
  • Run with 1024-dim on r6i.xlarge (run R4) — expect earlier OOM than 512-dim
  • Test with larger instance r6i.2xlarge: 512-dim (run R5) and 1024-dim (run R6)
  • Document vector overhead and breaking points per dimension
  • Create AuraDB Professional instance
  • Add AuraDB connection to loader
  • Run 100K → 1M user benchmarks
  • Compare with EC2 results
  • Document cost model
  • Create Neptune Analytics graph
  • Add Neptune openCypher loader
  • Test Cypher compatibility (especially MERGE + CASE WHEN patterns)
  • Run benchmarks (graph + vector combined)
  • Document compatibility gaps and performance
  • Set up Valkey VSS + OpenSearch Serverless
  • Load same vector dataset to all four backends
  • Run kNN benchmarks
  • Run combined graph+vector end-to-end benchmarks
  • Produce comparison matrix
  • Compile all results into a recommendation document
  • Cost projections at 100K, 1M, 5M, 10M users
  • Recommended architecture per scale tier
  • Migration effort estimate for each option

All experiment infrastructure should be tagged with purpose: capacity-testing and torn down after experiments:

  • EC2 Neo4j experiment instance
  • AuraDB Professional instance
  • Neptune Analytics graph
  • OpenSearch Serverless collection
  • Any experiment Valkey nodes
  • S3 buckets with generated data

ResourceDurationCost
EC2 r6i.xlarge (experiment Neo4j)30 days~$180
EC2 r6i.2xlarge (upgrade test)7 days~$68
AuraDB Professional (4GB)30 days~$130
Neptune Analytics (128 m-NCUs)30 days~$100-200 (usage-based)
OpenSearch Serverless (2 OCUs)7 days~$60
S3 storage for datasets30 days~$5
Total estimated~$550-650

Go (datagen, loader, benchmark):

// go.mod for graph-capacity-experiments
github.com/neo4j/neo4j-go-driver/v5 // Neo4j + AuraDB
github.com/aws/aws-sdk-go-v2 // Neptune, OpenSearch, S3
github.com/redis/go-redis/v9 // Valkey VSS
golang.org/x/time/rate // Rate limiting for loaders
gonum.org/v1/gonum // Statistical distributions

Python (embedding generation + catalog export):

requirements.txt
openai>=1.0 # OpenAI text-embedding-3-small/large
sentence-transformers>=2.0 # all-MiniLM-L6-v2 (384-dim, local)
numpy>=1.24 # Vector I/O (.npy format)
pandas>=2.0 # CSV reading
snowflake-connector-python>=3.0 # Snowflake catalog export (optional)