How DoorDash’s In-House Search Engine Achieved a 50% Drop in Latency

DoorDash built an in-house search engine to improve food discovery, addressing limitations of third-party solutions like Elasticsearch. The system combines an offline feature generation pipeline with real-time updates, using a two-phase retrieval and ranking approach for low-latency, high-relevance results. Key innovations include custom embeddings for semantic search and a hybrid architecture balancing freshness with performance.
Core Technical Concepts/Technologies
- Two-phase retrieval (candidate generation + ranking)
- Feature stores (offline/online)
- Embeddings (BERT-like models for semantic search)
- Hybrid architecture (batch + real-time updates)
- Query understanding (query rewriting, intent classification)
- Apache Flink (stream processing)
Main Points
-
Motivation:
- Third-party solutions lacked flexibility for food-specific ranking (e.g., dietary preferences, delivery time).
- Needed sub-100ms latency at peak loads (1M+ QPS).
-
Architecture:
- Offline Pipeline: Precomputes store/meal features (popularity, pricing) using Spark.
- Online Pipeline: Real-time updates (e.g., inventory changes) via Flink.
- Feature Store: Syncs offline/online data for consistency.
-
Search Flow:
- Candidate Generation: Fast retrieval using inverted indexes (BM25) + embeddings.
- Ranking: ML model (LightGBM) scores candidates using 100+ features (price, distance, etc.).
-
Embeddings:
- Fine-tuned BERT models convert queries/store descriptions to vectors for semantic matching.
- Hybrid scoring combines BM25 (text) + cosine similarity (embeddings).
-
Optimizations:
- Cached embeddings for high-frequency queries.
- Sharded indexes to distribute load.
Technical Specifications
- Latency: <50ms p99 for ranking phase.
- Scale: 10B+ documents indexed, 1M+ QPS during peaks.
- Embedding Model: DistilBERT variant with 384-dimensional vectors.
- Code Example: Hybrid scoring formula:
final_score = α * BM25(query, doc) + β * cosine_sim(embedding_query, embedding_doc)
Key Takeaways
- Domain-specific tuning (e.g., food preferences) often justifies building over buying.
- Hybrid retrieval (lexical + semantic) improves recall without sacrificing latency.
- Feature consistency between batch/real-time pipelines is critical for relevance.
- Embedding caching reduces computational overhead for common queries.
- Sharding enables horizontal scaling for high QPS.
Limitations & Future Work
- Cold starts: New stores/meals lack historical data for ranking.
- Geo-distribution: Challenges in maintaining low latency globally.
- Exploration: Testing LLMs for query understanding (e.g., "healthy breakfast under $10").
🚀Faster mobile releases with automated QA (Sponsored)
This article was originally published on ByteByteGo
Visit Original Source