TechFedd LogoTechFedd

How Netflix Stores 140 Million Hours of Viewing Data Per Day

ByteByteGo

ByteByteGo

Alex Xu • Published 3 months ago • 1 min read

Read Original
How Netflix Stores 140 Million Hours of Viewing Data Per Day

Netflix manages massive amounts of time-series viewing data (140M+ hours/day) by evolving its storage architecture from a simple Apache Cassandra-based system to a multi-tiered, sharded approach. Key optimizations include categorizing data by type (full plays, previews, language preferences), splitting storage by recency (recent, past, historical), and introducing caching (EVCache) and compression. This ensures efficient storage, faster retrieval, and scalability despite exponential data growth.

2. Core Technical Concepts & Technologies

  • Apache Cassandra – Distributed NoSQL database for scalable storage.
  • EVCache – In-memory caching layer for fast data access.
  • Sharding – Splitting data by type and age for optimized storage.
  • Compression – Reducing storage footprint for older data.
  • Time-To-Live (TTL) – Automatic expiration of stale records.
  • Parallel Reads/Writes – Improving retrieval and migration speeds.

3. Main Points

  • Initial Approach:
    • Used Apache Cassandra for flexible, write-heavy workloads (9:1 write-to-read ratio).
    • Stored viewing history under CustomerId with horizontal partitioning.
    • Challenges: Too many SSTables, slow comp

In this article, we’ll learn how Netflix tackled these problems and improved their storage system to handle millions of hours of viewing data every day.

This article was originally published on ByteByteGo

Visit Original Source