How Uber Built Odin to Handle 3.8 Million Containers
Uber developed Odin, an automated, technology-agnostic platform for managing 3.8 million containers and 300,000 stateful workloads across 100,000+ hosts. Odin replaced manual database management with declarative automation, self-healing remediation loops, and dynamic resource scheduling, enabling zettabyte-scale storage management for services like ride-hailing and payment processing. Key innovations include make-before-break migrations, colocated databases, and a global coordination system for fault tolerance.
Core Technical Concepts/Technologies
- Declarative state management (goal-driven automation)
- Self-healing remediation loops (Kubernetes-inspired)
- Grail: Real-time global infrastructure monitoring
- Cadence workflows (orchestration)
- Containerized stateful workloads (100 databases/host)
- Make-before-break migration strategy
- Host-level agents (Odin-Agent + tech-specific workers)
- Support for 23+ storage systems (MySQL, Cassandra, Kafka, HDFS)
Main Points
- Scale:
- 100,000+ hosts, 3.8M containers, 300K workloads
- Zettabyte-scale storage (multiple exbibytes)
- Automation:
- Declar
The details in this post have been derived from Uber Engineering Blog and other sources.
This article was originally published on ByteByteGo
Visit Original Source