Friday, January 28, 2011

NoSQL at Netflix

This is Yury Izrailevsky, Director of Cloud and Systems Infrastructure here at Netflix. As Netflix moved into the cloud, we needed to find the appropriate mechanisms to persist and query data within our highly distributed infrastructure. Our goal is to build fast, fault tolerant systems at Internet scale. We realized that in order to achieve this goal, we needed to move beyond the constraints of the traditional relational model. In the distributed world governed by Eric Brewer’s CAP theorem , high availability (a.k.a. better customer experience) usually trumps strong consistency. There is little room for vertical scalability or single points of failure. And while it is not easy to re-architect your systems to not run join queries, or not rely on read-after-write consistency (hey, just cache the value in your app!), we have found ourselves braving the new frontier of NoSQL distributed databases.

Our cloud-based infrastructure has many different use cases requiring structured storage access. Netflix is all about using the right tool for the job. In this post, I’d like to touch on the reasons behind our choice of three such NoSQL tools: SimpleDB, Hadoop/HBase and Cassandra.

Amazon SimpleDB was a natural choice for a number of our use cases as we moved into AWS cloud. SimpleDB is highly durable, with writes automatically replicated across availability zones within a region. It also features some really handy query and data format features beyond a simple key/value interface, such as multiple attributes per row key, batch operations, consistent reads, etc. Besides, SimpleDB is a hosted solution, administered by our friends at AWS. We love it when others do undifferentiated heavy lifting for us; after all, this was one of the reasons we moved to the cloud in the first place. If you are accustomed to other AWS products and services, using SimpleDB is… well, simple – same AWS account, familiar interfaces, APIs, integrated support and billing, etc.

For our systems based on Hadoop, Apache HBase is a convenient, high-performance column-oriented distributed database solution. With its dynamic partitioning model, HBase makes it really easy to grow your cluster and re-distribute load across nodes at runtime, which is great for managing our ever-growing data volume needs and avoiding hot spots. Built-in support for data compression, range queries spanning multiple nodes, and even native support for distributed counters make it an attractive alternative for many of our use cases. HBase’s strong consistency model can also be handy, although it comes with some availability trade offs. Perhaps the biggest utility comes from being able to combine real-time HBase queries with batch map-reduce Hadoop jobs, using HDFS as a shared storage platform.

Last but not least, I want to talk about our use of Cassandra. Distributed under the Apache license, Cassandra is an open source NoSQL database that is all about flexibility, scalability and performance. DataStax, a company that professionally support Cassandra, has been great at helping us quickly learn and operate the system. Unlike a distributed database solution using e.g. MySQL or even SimpleDB, Cassandra (like HBase) can scale horizontally and dynamically by adding more servers, without the need to re-shard – or reboot, for that matter. In fact, Cassandra seeks to avoid vertical scalability limits and bottlenecks of any sort: there are no dedicated name nodes (all cluster nodes can serve as such), no practical architectural limitations on data sizes, row/column counts, etc. Performance is strong, especially for the write throughput. Cassandra’s extremely flexible data model deserves a special mention. The sparse two-dimensional “super-column family” architecture allows for rich data model representations (and better performance) beyond just a simple key-value look up. And there are no underlying storage format requirements like HDFS; all you need is a file system. Some of the most attractive features of Cassandra are its uniquely flexible consistency and replication models. Applications can determine at call level what consistency level to use for reads and writes (single, quorum or all replicas). This, combined with customizable replication factor, and special support to determine which cluster nodes to designate as replicas, makes it particularly well suited for cross-datacenter and cross-regional deployments. In effect, a single global Cassandra cluster can simultaneously service applications and asynchronously replicate data across multiple geographic locations.

The reason why we use multiple NoSQL solutions is because each one is best suited for a specific set of use cases. For example, HBase is naturally integrated with the Hadoop platform, whereas Cassandra is best for cross-regional deployments and scaling with no single points of failure. Adopting the non-relational model in general is not easy, and Netflix has been paying a steep pioneer tax while integrating these rapidly evolving and still maturing NoSQL products. There is a learning curve and an operational overhead. Still, the scalability, availability and performance advantages of the NoSQL persistence model are evident and are paying for themselves already, and will be central to our long-term cloud strategy.

Building the leading global content streaming platform is a huge challenge. NoSQL is just one example of an exciting technology area that we aggressively leverage (and in the case of open source projects, contribute back to). Our goal is infinite scale. It takes no less than a superstar team to make it a reality. For those technology superstars out there: Netflix is hiring (http://jobs.netflix.com).