Skip to content

Fuzzy-Match Product Exclusion Classification — Test Results

Fuzzy-Match Product Exclusion Classification — Test Results

Section titled “Fuzzy-Match Product Exclusion Classification — Test Results”

A classifier that determines whether a product should be excluded from a Button merchant’s enrichment based on the merchant’s exclusion terms (e.g., Walmart excludes “Grocery, Food & Beverages”, “Beauty”, “Pharmacy”). Today these terms are displayed as T&C text but never enforced programmatically.

The classifier uses fuzzy string matching (RapidFuzz WRatio) to compare product attributes (category, name, brand) against a merchant’s exclusion list. It’s designed to err on the side of letting products through — we’d rather show a product that should be excluded than block one that shouldn’t be.

Product: Dove Body Wash
Category: "PERSONAL CARE" ──┐
Name: "Dove Body Wash..." ──┼── fuzzy match against ──> Walmart exclusions:
Brand: "DOVE" ──┘ "Grocery, Food & Beverages"
"Personal Care (Bath, Body, Oral & Feminine)" ← match (90.0)
"Beauty"
"Baby (Bedding, Bags, Diapers & Hardlines)"
...
Result: EXCLUDED (category matched "Personal Care" at score 90/100)

Matching pipeline:

  1. Extract top-level product category (before first | or , delimiter)
  2. Strip domain-specific stopwords from both product attributes and exclusion terms (“all”, “select”, “products”, “orders”, “over”, etc.)
  3. Skip queries shorter than 4 characters (too noisy)
  4. Send category, name, and brand to fuzzy matcher — first match wins (category checked first for highest precision)
  5. Threshold: 80/100 similarity score (tuned via testing)
  6. Results cached to disk with 30-day TTL

Key design decisions:

  • Category is checked first because it has the highest precision (98%+)
  • Errors in the fuzzy match sidecar return “not excluded” (non-blocking)
  • 10,000 products pulled from production Snowflake (FIDO catalog — real products across all categories)
  • 55 merchants from production Button service (with non-gift-card exclusions and a retailer ID)
  • 550,000 cross-tests (every product against every merchant)
  • Completed in 6.9 seconds
MetricValue
Total exclusions11,388 / 550,000 (2.07%)
Match by category7,402 (65%)
Match by product name3,614 (32%)
Match by brand372 (3%)
MerchantExcluded / 10kExclusion Terms
Walmart2,148 (21.5%)12 terms covering grocery, personal care, beauty, baby, pets, pharmacy, auto, video games
JCPenney1,052 (10.5%)Furniture, Kitchen, Patio, Mattresses, TVs, Game consoles, Electronics, Appliances
Gap981 (9.8%)Brand collaboration exclusions (Gap x DOEN, Gap x Palace, Gap x Love Shack Fancy)
Disney+435 (4.4%)Free trials, Premium Duo Bundle
eBay375 (3.8%)Household cleaners, paper towels, travel, hand sanitizers, heavy equipment
Sam’s Club305 (3.1%)Grocery & Food, Alcohol, Cough/Cold medicine, Restaurant Supplies
Walgreens297 (3.0%)Grocery, Household products, Electronics, Diet & Fitness
Best Buy260 (2.6%)Laptops/Tablets, Video Game Hardware, All Apple Products, POSA cards
Kohl’s176 (1.8%)Kohl’s Cares, Apple/Amazon/Nintendo/Google products, Gaming, Crocs, Jewelry over $2k
Nike57 (0.6%)Gift cards, Air Jordans, Converse, Hyperice
CVS36 (0.4%)Gift cards, Discounts, COVID tests

Merchants with 0 exclusions: Athleta, Belk, Children’s Place, Crocs (only “Limited edition collaborations”), e.l.f., Fanatics, JD Sports, Old Navy, PacSun, StockX, Temu, Zappos

Category matching (65% of exclusions) is highly accurate. These are clear, correct matches:

Product CategoryExclusion TermMerchant
GROCERYGrocery, Food & BeveragesWalmart
BEAUTYBeautyWalmart
DIAPERSBaby (Bedding, Bags, Diapers & Hardlines)Walmart, ACE
WINEWineNaked Wines
AUTOMOTIVEAutomotive TiresWalmart
FURNITUREFurnitureJCPenney
ELECTRONICSElectronicsWalgreens, Verizon
PET FOODGrocery, Food & BeveragesHome Depot

Exact brand matches work perfectly:

BrandExclusionMerchantScore
BISSELLBissellKohl’s100
CONVERSEConverseNike100
CROCSCrocsKohl’s100
REEBOKReebok itemsJourneys100
NIKESelect Nike itemsFinish Line100
HPSelect HP productsStaples100
APPLEAll Apple-Branded ProductsBest Buy90

Product name matching catches real exclusions that category alone misses:

Product NameExclusionMerchant
”Charmin Ultra Strong Toilet Tissue…”Paper towels and toilet papereBay
”Bounty Select-a-size Paper Towel…”Paper towels and toilet papereBay
”Miami Baked Challah Bread”BreadAldi
”Huggies Snug & Dry Baby Diapers…”Baby (Bedding, Bags, Diapers)Walmart

Risk 1: Channel exclusions misread as product exclusions (medium impact)

Section titled “Risk 1: Channel exclusions misread as product exclusions (medium impact)”

~10% of exclusion terms describe how/where you buy, not what you buy. The fuzzy matcher can’t distinguish these:

Exclusion TermMerchantProblem
”Purchases made on BoohooMan (Men’s Clothing)“BoohooMatches all men’s clothing products (3,331 false blocks). This exclusion means purchases on the BoohooMan website — not that men’s clothing is excluded.
”Scan and Go Orders”Sam’s ClubMatches product names containing “Go"
"Orders placed on mobile web”UberMatches product names containing “web"
"Bookings in Russia or Ukraine”IHGNot a product category at all
”Disney Premium Duo Bundle Upgrade”Disney+Matches Disney-branded products

Mitigation: Curate a denylist of channel-type exclusions to skip before fuzzy matching, or add a channel_exclusion flag to the Button data model.

Risk 2: Fuzzy name matching has ~5-10% false positive rate

Section titled “Risk 2: Fuzzy name matching has ~5-10% false positive rate”

Product names are long, natural-language strings. Some false matches are unavoidable:

Product NameFalsely Matched ToScoreWhy
”Crestlive Products 10 Ft Aluminum…""Apple, Amazon, Nintendo, or Google products”85Word “Products” (now mitigated by stopword)
“BALLOON DELIGHTS""Flights” (Expedia)83Suffix “-LIGHTS” ≈ “Flights"
"PAMPERS""Paper” (HP)83PAP substring
”GARMIN""Gaming” (Kohl’s)83GAMI substring

Mitigation: These are conservative — a product falsely excluded at one merchant still enriches normally at other merchants. The 80/100 threshold was tuned to minimize these. Category matching alone (65% of all matches) has <2% FP rate.

Risk 3: Short or generic exclusion terms match too broadly

Section titled “Risk 3: Short or generic exclusion terms match too broadly”
ExclusionMerchantIssue
”All non-grocery items”TargetAfter stopword stripping, becomes “non-grocery” — better, but still vague
”Select SKUs”Athleta, Belk, Fanatics, HSN, etc.Eliminated by stopword stripping — but these merchants get 0 exclusions now, which may or may not be correct
”Discounts”8 merchantsMatches discount-related product names
GuardWhat It DoesImpact
Top-level category onlyStrips sub-categories (e.g. “ORAL CARE” from “ORAL CARE|TOOTHPASTE|ADULT”)Eliminated sub-category FPs like “BEEF” matching “Beer”, “AB” matching “JLab or Beats”
Exclusion stopwordsStrips “all”, “select”, “products”, “orders”, “over”, etc. from both sides before matching”Select SKUs” → 0 FPs, “All non-grocery” 84 → 2, “Apple/Amazon/Nintendo/Google products” → 0
Min query length (4 chars)Skips brands/names shorter than 4 charactersEliminated 122 noisy short brands (ON, SO, GE, HP) from matching
Threshold 80/100Requires high similarity scorePrevents weak partial matches while allowing category-level matches