Visual search lets shoppers upload a photo and instantly find matching or similar products in your catalog. In 2026 it is table-stakes for fashion, home decor, and beauty retailers. This guide walks through every layer of the stack — from training data to production serving.
Architecture Overview
- Image encoder — produces a fixed-length embedding vector for any input image
- ANN index — approximate nearest-neighbour search over your product catalog embeddings
- Re-ranker (optional) — a lighter model that re-scores the top-K results
- Product catalog sync — keeps embeddings fresh as SKUs are added/removed
Step 1 — Choose (or Train) an Image Encoder
Off-the-shelf CLIP or DINOv2 works surprisingly well out of the box. Fine-tuning on domain-specific product images, however, typically improves top-10 recall by 8–15 percentage points. For fine-tuning you need a large, diverse visual search dataset that spans the same product categories your catalog covers.
The most common encoder sizes and trade-offs:
- ViT-B/16 — good balance of speed and accuracy; 512-dim embeddings
- ViT-L/14 — higher accuracy, 3× slower inference
- EfficientNet-B4 — fast CPU inference, slightly lower recall
Step 2 — Build the ANN Index
Popular choices are FAISS (Meta), ScaNN (Google), and Qdrant. For catalogs under 5 million SKUs, FAISS with HNSW indexing is the simplest path to production. Larger catalogs benefit from a managed vector database.
Step 3 — Assemble Training Data
Your encoder fine-tuning dataset should reflect your product mix. If you sell furniture, clothing, and homeware, your training data must cover all three. Under-represented categories will drag down recall scores. Our ecommerce image dataset covers 25 + categories across 60 + retailers, making it a strong base for multi-category encoders.
Step 4 — Evaluate and Iterate
Measure Recall@K (typically K = 5 and K = 20) on a held-out query set. Common failure modes:
- Background leakage — model latches onto shooting-style rather than product
- Colour bias — over-weights colour at the expense of shape
- Category collapse — embeddings for different categories cluster together
Each issue is addressed differently at the data level — see the retail AI dataset guide for attribute-aware sampling strategies.
Next Steps
Download a free sample dataset to start benchmarking your encoder today, or request a custom image bundle matched to your vertical.