How Meta Achieves 99.99999999% Cache Consistency 🎯

Executive Summary
The article explores cache consistency challenges in distributed systems, detailing common strategies like write-through, write-around, and write-back caches. It emphasizes trade-offs between performance and data freshness, introduces coherence protocols (e.g., MESI), and discusses real-world implementations (e.g., CDNs, databases). The piece concludes with best practices for balancing latency and consistency.
Core Technical Concepts/Technologies
- Cache consistency models (strong, eventual)
- Write strategies: write-through, write-around, write-back
- Cache invalidation techniques (TTL, polling, push-based)
- Coherence protocols (MESI, snooping)
- Distributed systems challenges (network partitions, latency)
Main Points
-
Cache Consistency Challenges:
- Stale data risks due to replication delays or failures.
- Trade-offs between low latency (weak consistency) and accuracy (strong consistency).
-
Write Strategies:
- Write-through: Writes to cache and DB simultaneously; high consistency but slower.
- Write-around: Writes bypass cache to DB; avoids cache pollution but may cause misses.
- Write-back: Writes to cache first, DB later; high performance but risk of data loss.
-
Invalidation Techniques:
- TTL: Simple but may serve stale data until expiration.
- Polling: Periodic checks for updates (e.g., HTTP ETags).
- Push-based: Immediate invalidation via pub/sub (e.g., Redis streams).
-
Coherence Protocols:
- MESI (Modified, Exclusive, Shared, Invalid): CPU-level cache synchronization.
- Snooping: Broadcasts changes across caches (scalability limits).
-
Real-World Implementations:
- CDNs use edge caches with TTL for static content.
- Databases like Redis employ write-back + invalidation hooks.
Technical Specifications/Examples
- Code Snippet: Redis cache invalidation using
PUBLISH
to notify clients of key changes. - TTL Configuration: Example:
Cache-Control: max-age=3600
for 1-hour freshness.
Key Takeaways
- Choose write strategies based on use case: Write-back for performance, write-through for critical data.
- Combine invalidation methods: TTL + push-based for balance between overhead and freshness.
- Monitor staleness: Metrics like
hit rate
vs.stale reads
reveal consistency gaps. - Leverage protocols: MESI optimizes multi-core systems; snooping suits small clusters.
Limitations/Further Exploration
- Trade-offs: Strong consistency increases latency (CAP theorem constraints).
- Scalability: Push-based invalidation struggles with large-scale systems.
- Emerging solutions: Explore conflict-free replicated data types (CRDTs) for eventual consistency.
#52: Break Into Meta Engineering (4 minutes)
This article was originally published on The System Design Newsletter
Visit Original Source