Friday, January 28, 2011

NoSQL at Netflix

This is Yury Izrailevsky, Director of Cloud and Systems Infrastructure here at Netflix. As Netflix moved into the cloud, we needed to find the appropriate mechanisms to persist and query data within our highly distributed infrastructure. Our goal is to build fast, fault tolerant systems at Internet scale. We realized that in order to achieve this goal, we needed to move beyond the constraints of the traditional relational model. In the distributed world governed by Eric Brewer’s CAP theorem , high availability (a.k.a. better customer experience) usually trumps strong consistency. There is little room for vertical scalability or single points of failure. And while it is not easy to re-architect your systems to not run join queries, or not rely on read-after-write consistency (hey, just cache the value in your app!), we have found ourselves braving the new frontier of NoSQL distributed databases.

Our cloud-based infrastructure has many different use cases requiring structured storage access. Netflix is all about using the right tool for the job. In this post, I’d like to touch on the reasons behind our choice of three such NoSQL tools: SimpleDB, Hadoop/HBase and Cassandra.

Amazon SimpleDB was a natural choice for a number of our use cases as we moved into AWS cloud. SimpleDB is highly durable, with writes automatically replicated across availability zones within a region. It also features some really handy query and data format features beyond a simple key/value interface, such as multiple attributes per row key, batch operations, consistent reads, etc. Besides, SimpleDB is a hosted solution, administered by our friends at AWS. We love it when others do undifferentiated heavy lifting for us; after all, this was one of the reasons we moved to the cloud in the first place. If you are accustomed to other AWS products and services, using SimpleDB is… well, simple – same AWS account, familiar interfaces, APIs, integrated support and billing, etc.

For our systems based on Hadoop, Apache HBase is a convenient, high-performance column-oriented distributed database solution. With its dynamic partitioning model, HBase makes it really easy to grow your cluster and re-distribute load across nodes at runtime, which is great for managing our ever-growing data volume needs and avoiding hot spots. Built-in support for data compression, range queries spanning multiple nodes, and even native support for distributed counters make it an attractive alternative for many of our use cases. HBase’s strong consistency model can also be handy, although it comes with some availability trade offs. Perhaps the biggest utility comes from being able to combine real-time HBase queries with batch map-reduce Hadoop jobs, using HDFS as a shared storage platform.

Last but not least, I want to talk about our use of Cassandra. Distributed under the Apache license, Cassandra is an open source NoSQL database that is all about flexibility, scalability and performance. DataStax, a company that professionally support Cassandra, has been great at helping us quickly learn and operate the system. Unlike a distributed database solution using e.g. MySQL or even SimpleDB, Cassandra (like HBase) can scale horizontally and dynamically by adding more servers, without the need to re-shard – or reboot, for that matter. In fact, Cassandra seeks to avoid vertical scalability limits and bottlenecks of any sort: there are no dedicated name nodes (all cluster nodes can serve as such), no practical architectural limitations on data sizes, row/column counts, etc. Performance is strong, especially for the write throughput. Cassandra’s extremely flexible data model deserves a special mention. The sparse two-dimensional “super-column family” architecture allows for rich data model representations (and better performance) beyond just a simple key-value look up. And there are no underlying storage format requirements like HDFS; all you need is a file system. Some of the most attractive features of Cassandra are its uniquely flexible consistency and replication models. Applications can determine at call level what consistency level to use for reads and writes (single, quorum or all replicas). This, combined with customizable replication factor, and special support to determine which cluster nodes to designate as replicas, makes it particularly well suited for cross-datacenter and cross-regional deployments. In effect, a single global Cassandra cluster can simultaneously service applications and asynchronously replicate data across multiple geographic locations.

The reason why we use multiple NoSQL solutions is because each one is best suited for a specific set of use cases. For example, HBase is naturally integrated with the Hadoop platform, whereas Cassandra is best for cross-regional deployments and scaling with no single points of failure. Adopting the non-relational model in general is not easy, and Netflix has been paying a steep pioneer tax while integrating these rapidly evolving and still maturing NoSQL products. There is a learning curve and an operational overhead. Still, the scalability, availability and performance advantages of the NoSQL persistence model are evident and are paying for themselves already, and will be central to our long-term cloud strategy.

Building the leading global content streaming platform is a huge challenge. NoSQL is just one example of an exciting technology area that we aggressively leverage (and in the case of open source projects, contribute back to). Our goal is infinite scale. It takes no less than a superstar team to make it a reality. For those technology superstars out there: Netflix is hiring (http://jobs.netflix.com).

Thursday, January 27, 2011

Netflix Performance on Top ISP Networks





Hi there. This is Ken Florance, Director of Content Delivery here at Netflix.
 
As we continue to stream more and more great movies and TV shows, we find ourselves in the unique position of having insight into the performance of hundreds of millions of long duration, high-definition video streams delivered over the Internet.
 
The throughput we are able to achieve with these streams can tell us a great deal about the actual capacity our subscribers are able to sustain to their homes. In the charts below, we’re using a time-weighted bitrate metric to represent the effective data throughput our subscribers receive over many of the top ISPs.
 
Currently, our top HD streams are about 4800 kilobits per second. Clients may switch through a number of bitrates as they ramp up to the highest stream, or shift down from the highest stream if they cannot sustain play at that rate due to throughput constraints. No client would sustain a 4800 stream from start to finish (there would at least be a few smaller streams averaged in for startup) but the higher the sustained average, the greater the throughput the client can achieve, and the greater the image quality over the duration of the play.
 
As we use a number of CDNs, and our clients can adapt to changing network conditions by selecting the network path that’s currently giving them the best throughput, Netflix streaming performance ends up being an interesting way to measure sustained throughput available from a given ISP over time, and therefore the quality of Netflix streaming that ISP is providing to our subscribers. Obviously, this can vary by network technology (e.g. DSL, Cable), region, etc., but it's a great high-level view of Netflix performance across a large number of individual streaming sessions.
 
In the metric below, we’re filtering for titles that have HD streams available, and for devices capable of playing HD streams (which also filters out mobile networks), to highlight what’s achievable in terms of HD performance on the various ISP networks. As you can see, Charter is in the lead for US streams with an impressive 2667 kilobits per second average over the period. Rogers leads in Canada with a whopping 3020 kbps average.
 
We'll update these charts monthly, and we welcome questions, comments and suggestions to help improve our understanding of Netflix performance on top ISP networks.

Wednesday, January 19, 2011

How We Determine Product Success


At Netflix we engage in what we call consumer science: we test new ideas with real customers, at scale, and we measure for statistically significant differences in how they engage with our product. Are members staying with the service longer? Are they instantly watching more TV shows and movies from us?

As an employee, the results of these tests are more important than your confidence in what the outcome will be, what your title is, or your ability to persuade. I’ve seen even our best product minds bet wrong on such tests on occasion. We absolutely believe we couldn’t build one of the best loved internet brands in the world without consumer science at the core of our product development methodology.

Job number one for our product-focused engineers is to effectively innovate for Netflix members. The product we built in 2006 would not satisfy our members today. The best product in our market in 2015 will be far better than Netflix is today. It is our fundamental challenge to figure out what a better product can be on behalf of our members, and to build it.

Innovation involves a lot of failure. If we’re never failing, we aren’t trying for something out on the edge from where we are today. In this regard, failure is perfectly acceptable at Netflix. This wouldn’t be the case if we were operating a nuclear power plant or manufacturing cars. The only real failure that’s unacceptable at Netflix is the failure to innovate.

So if you’re going to fail, fail cheaply. And know when you’ve failed, vs. when you’ve gotten it right.

Product development at Netflix starts with a hypothesis, which typically goes something like this:

Algorithm/feature/design X will increase member engagement with our service, and ultimately member retention.

The idea may be a way to increase the relevance of our search results, a new design for device UIs, or a new feature, such as showing members what their Facebook friends are watching from Netflix. This is the crucial first step in our creative process, from which any improvement we can hope to deliver starts. Our intuition and imagination in how better to serve our members fuels our entire product development approach.

The second step is to design a test that will measure the impact of the hypothesis. Sometimes this simply means build it, but often we can build a prototype more quickly that captures the essence of the concept. Maybe the back end isn’t fully scalable; maybe it lacks polish or all of the bells and whistles we’d like to include if we roll it out for everyone.

This allows us to move quickly and gives us something we can test with our members for a positive or negative signal. There is a big lesson we’ve learned here, which is that the ideal execution of an idea can be twice as effective as a prototype, or maybe even more. But the ideal implementation is never ten times better than an artful prototype. Polish won’t turn a negative signal into a positive one. Often, the ideal execution is barely better than a good prototype, from a measurement perspective. Embracing this simple, battle-tested heuristic can free an innovator to move incredibly quickly by removing extraneous detail in the testing process.

Step three is the test itself. We roll out our prototype to a set of members, and we create an equal cohort set up as a control for the experiment. And then we wait. We let our members quietly tell us what the best product is simply by using our service. We’re always focused on increasing engagement and retention. There are, to use a technical term, zillions of other metrics we measure to understand our results in more detail. But in terms of business value, those headline metrics are what drive success for our product.

Any one test could have hundreds of thousands of members taking part. It could have two tests cells or twenty, each trying a different approach or mixing different new elements. At any one time, we'll have dozens of different consumer tests running.

Here is where the real beauty of the approach comes in. Sometimes our hypothesis is sound, we have a winner for our members, and we add the scale and polish necessary to get our improvement out for everyone. Or, as I mentioned, maybe the idea failed. The wonderful truth is, both outcomes help our product intuition, and therefore increase the chances that our next hypothesis will knock it out of the park. Our product team has great freedom to apply their best thinking to our product, and our collective effort helps each of us improve our understanding of our members’ desires.

It is humbling to be very confident in an idea, and wind up being totally wrong. But that is how we learn and grow, and we always try and take the time to discuss and internalize what we believe the lessons are from each of the scores of tests we run each year. It is the impact of the best ideas we come up with that will determine whether our members elect to continue with our service.

It can be frustrating to be in a product development environment where force of personality or hierarchy determines product outcomes. At Netflix the focus on customer value makes a teachable moment of those times one guesses wrong. My product intuition is vastly better today for the benefit of my mistakes.

I’ll close with this last point about consumer science. Testing our product ideas frees us to make big bets, to try radical or unpopular ideas. It allows the best product thinkers to build a track record based on real customer value. It allows us to build consensus out of debate and to build on our best ideas. It helps us avoid the tyranny of “or,” because we can test many approaches to solving the hardest challenges we face.

-John Ciancutti.