How Google Spanner Powers Trillions of Rows with 5 Nines Availability

Google Cloud Spanner is a globally distributed, strongly consistent relational database that combines SQL capabilities with NoSQL-like scalability. It achieves high availability (99.999%) through innovations like Paxos-based replication, TrueTime synchronization, and dynamic sharding, enabling it to handle trillions of rows across multiple regions. Spanner’s architecture leverages Google’s infrastructure (e.g., Colossus file system) to ensure fault tolerance and low-latency access.
Core Technical Concepts/Technologies
- Globally Distributed Architecture: Multi-zone/region deployment.
- Paxos Consensus Algorithm: Ensures consistency during replication.
- TrueTime API: GPS + atomic clocks for global timestamping.
- Dynamic Sharding (Splits): Auto-resizing partitions for load balancing.
- Colossus File System: Fault-tolerant, distributed storage.
- Multi-Version Concurrency Control (MVCC): Timestamped data versions.
- Two-Phase Commit: For cross-split transactions.
Main Points
- Architecture:
- Organized into universes (logical) and zones (physical), with spanservers managing data.
- Data stored as tablets (key-value pairs with timestamps) in Colossus.
- Replication & Consistency:
How to monitor AWS container environments at scale (Sponsored)
This article was originally published on ByteByteGo
Visit Original Source