Wednesday, December 22, 2010

HTML5 and Video Streaming

This is Christian Kaiser, VP of Engineering here at Netflix.

My colleague John Ciancutti recently posted about our use of HTML5 for user interfaces; I thought I’d add a perspective on HTML5 and video streaming.

Since HTML5 includes a facility to embed video playback (the <video> tag), it seems like a natural next step for us to use it for streaming video playback within our HTML5-based user interfaces. However, as of today, there is no accepted standard for advanced streaming through the <video> tag.

Such a standard would enable agreement between streaming video clients and services about the following:

  1. The acceptable A/V container formats (e.g. Fragmented MP4, WebM, etc.);
  2. The acceptable audio and video codecs (e.g. H.264, VP8, AAC, etc.);
  3. The streaming protocol (e.g. HTTP, RTP, etc.);
  4. A way for the streaming protocol to adapt to available bandwidth;
  5. A way of conveying information about available streams and other parameters to the streaming player module;
  6. A way of supporting protected content (e.g. with DRM systems);
  7. A way of exposing all this functionality into HTML5.

That is a long list of complex issues. We’ve resolved the first six items for ourselves with our proprietary technology. Similarly, many other companies have their own proprietary, and mutually incompatible, solutions.

But what if we could replace all these proprietary solutions with an industry-wide standard? Then Netflix, or any other video streaming service, could deliver to a standard browser as a pure HTML5 web application, both on computers and in CE devices with embedded browsers. Browser builders and CE manufacturers could support every OS and device they choose, leveraging the same implementations across multiple streaming services instead of building and integrating an one-off implementation for each service. Consumers would benefit by having a growing number of continually evolving choices available on their devices, just like how the web works today for other types of services.
We believe that this is an attractive goal.

In order to help achieve this goal, we are looking into a number of options. We are already actively participating in the MPEG committee for Dynamic Adaptive Streaming over HTTP (DASH) to define an industry standard for adaptive streaming, together with Apple, Microsoft and a number of other companies.

The proposed DASH standard covers the first five items listed above: It defines a way to advertise a range of different streams to a player together with the information it needs to pick which ones to stream. It also defines media file formats suitable for adaptive streaming. The file formats enable efficient and seamless switching between streams, enabling a player to adapt to changing network conditions without pausing playback for re-buffering. The standard considers the differing needs of both on-demand services such as ours, and live services. And it’s all based on the use of industry standard HTTP servers.

We expect to be able to publish a draft of a Netflix profile describing a limited subset of the MPEG DASH standard early next year. It will define the requirements for premium on-demand streaming services like ours and will take advantage of hooks included in the DASH standard to integrate the DRM technologies that we need to fulfill our contractual obligations to the content providers, thus covering the sixth item on our list.

What’s still missing is how to address the last item - how exactly to tie advanced streaming standards (MPEG DASH and others) into the HTML5 <video> tag.

To this end, we are starting to get involved in the community with the goal to help shape a great standard that will be useful to everybody involved in building browsers, CE devices and services for streaming video over the Internet.

We know that achieving this goal will take a while. In the meantime, we’ll continue to evolve our own streaming technology to make sure our members have the best streaming experience possible and to get to as many platforms as we can.

Addendum: We are hiring! Specifically, this opening is for a position related to making video streaming in HTML5 a reality. We have a lot of other opportunities, too. Please visit our careers site!

Thursday, December 16, 2010

5 Lessons We’ve Learned Using AWS

In my last post I talked about some of the reasons we chose AWS as our computing platform. We’re about one year into our transition to AWS from our own data centers. We’ve learned a lot so far, and I thought it might be helpful to share with you some of the mistakes we’ve made and some of the lessons we’ve learned.

1. Dorothy, you’re not in Kansas anymore.

If you’re used to designing and deploying applications in your own data centers, you need to be
prepared to unlearn a lot of what you know. Seek to understand and embrace the differences operating in a cloud environment.

Many examples come to mind, such as hardware reliability. In our own data centers, session-based memory management was a fine approach, because any single hardware instance failure was rare. Managing state in volatile memory was reasonable, because it was rare that we would have to migrate from one instance to another. I knew to expect higher rates of individual instance failure in AWS, but I hadn’t thought through some of these sorts of implications.

Another example: in the Netflix data centers, we have a high capacity, super fast, highly reliable
network. This has afforded us the luxury of designing around chatty APIs to remote systems. AWS networking has more variable latency. We’ve had to be much more structured about “over the wire” interactions, even as we’ve transitioned to a more highly distributed architecture.

2. Co-tenancy is hard.

When designing customer-facing software for a cloud environment, it is all about managing down expected overall latency of response. AWS is built around a model of sharing resources; hardware, network, storage, etc. Co-tenancy can introduce variance in throughput at any level of the stack. You’ve got to either be willing to abandon any specific subtask, or manage your resources within AWS to avoid co-tenancy where you must.

Your best bet is to build your systems to expect and accommodate failure at any level, which introduces the next lesson.

3. The best way to avoid failure is to fail constantly.

We’ve sometimes referred to the Netflix software architecture in AWS as our Rambo Architecture. Each system has to be able to succeed, no matter what, even all on its own. We’re designing each distributed system to expect and tolerate failure from other systems on which it depends.

If our recommendations system is down, we degrade the quality of our responses to our customers, but we still respond. We’ll show popular titles instead of personalized picks. If our search system is intolerably slow, streaming should still work perfectly fine.

One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.

4. Learn with real scale, not toy models.

Before we committed ourselves to AWS, we spent time researching the platform and building test systems within it. We tried hard to simulate realistic traffic patterns against these research projects.

This was critical in helping us select AWS, but not as helpful as we expected in thinking through our architecture. Early in our production build out, we built a simple repeater and started copying full customer request traffic to our AWS systems. That is what really taught us where our bottlenecks were, and some design choices that had seemed wise on the white board turned out foolish at big scale.

We continue to research new technologies within AWS, but today we’re doing it at full scale with real data. If we’re thinking about new NoSQL options, for example, we’ll pick a real data store and port it full scale to the options we want to learn about.

5. Commit yourself.

When I look back at what the team has accomplished this year in our AWS migration, I’m truly amazed. But it didn’t always feel this good. AWS is only a few years old, and building at a high scale within it is a pioneering enterprise today. There were some dark days as we struggled with the sheer size of the task we’d taken on, and some of the differences between how AWS operates vs. our own data centers.

As you run into the hurdles, have the grit and the conviction to fight through them. Our CEO, Reed Hastings, has not only been fully on board with this migration, he is the person who motivated it! His commitment, the commitment of the technology leaders across the company, helped us push through to success when we could have chosen to retreat instead.

AWS is a tremendous suite of services, getting better all the time, and some big technology companies are running successfully there today. You can too! We hope some of our mistakes and the lessons we’ve learned can help you do it well.

-john ciancutti.

Tuesday, December 14, 2010

The Future of the Netflix API

Two weeks ago, Michael Hart (Director of Engineering for Social Systems and founder of the Netflix API) and I were prepared to present the current and future state of the Netflix API at an O'Reilly webinar. After experiencing some technical difficulties, the event never got off the ground. So we are going to give it another shot. The webinar will now be held tomorrow, December 15th at 10am PT.

One benefit to the delay is that it gave me a chance to prepare some slides (found below) detailing some of the ideas that the API team will be pursuing in the coming months. Although we are still in the research phase, we think that some of the ideas in these slides will challenge us in new ways to improve upon our already robust foundation.

Michael's slides for tomorrow's presentation were also posted on this blog and can be found here.

Please join us at the event tomorrow and we look forward to your feedback and questions!

Daniel Jacobson
Director of Engineering, API

Four Reasons We Choose Amazon’s Cloud as Our Computing Platform

One year ago, none of Netflix’s customer traffic was supported out of AWS, Amazon’s cloud services. Today, most of our member traffic is supported by software we’ve built and deployed in AWS. This includes much of our member website at Netflix.com as well as the software supporting many Netflix Ready Devices, for example: the Xbox, PS3, Wii, AppleTV, iPhone, and iPad.

We’ve built and deployed search engines in AWS, recommendation systems, A|B testing infrastructure, streaming servers, encoding software, huge data stores, caching architectures, various SQL and NoSQL solutions, all in 2010 for the first time.

This has been a Herculean effort by our engineering teams, of re-architecture, quick learning, nimble coding, incredible effort and can-do attitude. The teams have managed this feat while at the same time supporting massive business growth throughout the year, adding thousands of new titles to our streaming library, and launching many new Netflix Ready Devices.

Our move to AWS will be the subject of various future posts to this blog, I wanted to kick the subject off with an explanation for why we chose AWS as a platform. This is a question that comes up fairly often when we interview engineering candidates.

1. We needed to re-architect, which allowed us to question everything, including whether to keep building out our own data center solution.

Netflix has been fortunate to experience incredible growth in recent years, both in terms of customers and streaming devices. We started to hit that point where every layer of our software stack needs to be able to scale horizontally. With the shift to streaming, our software needs to be much more reliable, redundant, and fault tolerant.

After many fruitful years of growth on our previous three tier, database oriented architecture, it was time for us to go back to the drawing board. We could have chosen to build out new data centers, build our own redundancy and failover, data synchronization systems, etc. Or, we could opt to write a check to someone else to do that instead.

2. Letting Amazon focus on data center infrastructure allows our engineers to focus on building and improving our business.

Amazon calls their web services “undifferentiated heavy lifting,” and that’s what it is. The problems they are trying to solve are incredibly difficult ones, but they aren’t specific to our business. Every successful internet company has to figure out great storage solutions, hardware failover, networking infrastructure, etc.

We want our engineers to focus as much of their time as possible on product innovation for the Netflix customer experience; that is what differentiates us from our competitors.

3. We’re not very good at predicting customer growth or device engagement.

Netflix has revised our public guidance for the number of subscribers we will end 2010 with three times over the course of the year. We are operating in a fast-changing and emerging market.

How many subscribers would you guess used our Wii application the week it launched? How many would you guess will use it next month? We have to ask ourselves these questions for each device we launch because our software systems need to scale to the size of the business, every time.

Cloud environments are ideal for horizontally scaling architectures. We don’t have to guess months ahead what our hardware, storage, and networking needs are going to be. We can programmatically access more of these resources from shared pools within AWS almost instantly.

4. We think cloud computing is the future.

One year ago, our cloud computing expertise was limited to research and some prior experience at other companies. Today Netflix is running one of the highest volume cloud computing deployments in the world. Engineers who choose to work at Netflix are developing skills that will be increasingly relevant over the years to come as cloud computing becomes the dominant platform for growing internet companies.

We believe this transition will take place. It will help foster a competitive environment for cloud service providers which will help keep innovation high and prices dropping. We chose to be pioneers in this transition so we could leverage our investment as we grow, rather than to double down on a model we expect will decline in the industry. We think this will help differentiate Netflix as a place to work, and it will help us scale our business.

Next up I’ll post on some of the mistakes we’ve made and lessons we’ve learned through our transition to AWS.

-john ciancutti.

Friday, December 10, 2010

Why we use and contribute to open source software

This is Kevin McEntee, VP of Systems & ECommerce Engineering here at Netflix. Netflix is a technology company. We develop and apply great software technology to deliver a great streaming video experience. Our budget, measured in dollars, time, people, and energy, is limited and we must therefore focus our technology development efforts on that streaming video software that clearly differentiates Netflix and creates delight for our customers. These limits require that we stand on the shoulders of giants who have solved technology challenges shared in common by all companies that operate at Internet scale. I'm really just articulating the classical build vs. buy trade off that everyone deals with when developing software. So, for the software we 'buy', the question becomes how do you select the software to buy and whom do you buy it from?


One choice is to buy from commercial vendors. To help select among commercial vendors I have been offered documentation, peer referrals, advertising, and dog 'n pony visits from those vendors. I've also been offered Starbucks gift cards, cameras, luxury suite boxes for major league baseball games, and 3 day golf junkets to Arizona. I've always declined this second set of offers. I love both baseball and playing golf but I don't see what they have to do with my decision for how to spend Netflix's money.


We do utilize some commercial software but there is often the alternative choice of utilizing open source software, preferably open source software that implements an open standard. Open source software projects often originate as a labor of love by software developers who are tired of seeing a shared problem solved over and over again in one off solutions, or perhaps they realize that they can offer a more simple and elegant alternative to a commercial product. The great thing about a good open source project that solves a shared challenge is that it develops it's own momentum and it is sustained for a long time by a virtuous cycle of continuous improvement. At Netflix we jumped on for the ride a long time ago and we have benefited enormously from the virtuous cycles of actively evolving open source projects. We benefit from the continuous improvements provided by the community of contributors outside of Netflix. We also benefit by contributing back the changes we make to the projects. By sharing our bug fixes and new features back out into the community, the community then in turn continues to improve upon bug fixes and new features that originated at Netflix and then we complete the cycle by bring those improvements back into Netflix.


Here is an incomplete sampling of the projects we utilize, we have contributed back to most of them: Hudson, Hadoop, Hive, Honu, Apache, Tomcat, Ant, Ivy, Cassandra, HBase, etc, etc.


Netflix is hiring. If this blog resonates for you then I hope you'll visit http://jobs.netflix.com and apply for one of our open positions.


Thanks,

Kevin

Friday, December 3, 2010

Why We Choose HTML5 for User Experiences on Devices

This is John Ciancutti, VP of Personalization Technology here at Netflix. As you know, last month we released a completely new user experience for Netflix on the PS3. I thought I would tell you about the technology choices we’re using for user experiences on Netflix Ready Devices.

Our PS3 UI was written entirely using HTML5, on a custom build of Webkit ported to the PS3 by our crack team of engineers. If you don’t have a PS3, you can check out what the user experience looks and feels like here.

Doesn’t look like web technology, does it? That’s what HTML5 brings to the table, the freedom to create rich, dynamic and interactive experiences for any platform with a web browser. In fact, we’re also using HTML5 to create the user experience for our iPhone, iPad and Android applications as well.

What HTML5 is capable of speaks for itself, but there are other reasons HTML5 is the right choice for us. The technology teams I manage at Netflix are responsible for the movie and TV show discovery experience we offer our members; both the UIs and the back end server systems. Our core mandate is to relentlessly experiment with the technologies, features and experiences we deliver to our members. We test every new idea, so we can measure the impact we’re having on our customers. Are they finding more content to watch? Are they enjoying the TV shows and movies they’re seeing better?

If so, the idea is a winner and we quickly roll it out to every customer. This approach allows us to understand when we’ve gotten it right; or conversely to fail quickly and cheaply. We will post more fully on our testing methodology in the future. To work well, it requires great execution, a nimble team, and a highly iterative approach. Every week we’re rolling out new tests to one platform or another.

That’s where HTML5 comes in. The technology is delivered from Netflix servers every time you launch our application. This means we can constantly update, test and improve the experience we offer. We’ve already run several experiments on the PS3, for example, and we’re working hard on more as I write this. Our customers don’t have to go through a manual process to install new software every time we make a change, it “just happens.”

This capacity for testing is so critical to how we innovate, we’re willing to forego having a native UI experience to accommodate it. Our market and our customer needs are moving very quickly, and server-delivered experiences allow us to keep up.

HTML5 also means that our world class UI engineers can seamlessly move between working on our website, our mobile experience, and our television-based applications.

We’re testing new ideas across all of these platforms, and every time a new feature wins for our members, it wins for Netflix. Just by using our service, you’re helping us make it better.

Update: Thank you for the comments and questions. I want to clarify for you that this post is focused on the user experience we build for devices, but not on video playback and DRM. Our video playback technology is fodder for an upcoming post; unfortunately we can’t use just HTML5 yet for video streaming on Netflix Ready Devices. We will fill you guys in on what we are doing, though, and where we hope to take that side of our technology stack in a future post. Thanks!

Wednesday, December 1, 2010

API Strategy Evolution at Netflix

In the 90's many businesses began their first web site experiments, trying to determine what new opportunities this new technology might bring. Today, many businesses are starting the same experimentation with web APIs.

At Netflix we are using APIs to accelerate innovation on connected TV and mobile device platforms. Realizing this winning strategy was the end result of a few experiments over the last two years. We've just published a presentation describing the journey we took to discover this winning strategy and the lessons learned along the way.

(note: deck is best read on slideshare with speaker notes enabled)

Tomorrow at 10am PT our new director of API engineering, Daniel Jacobson, and I will discuss this evolution in API strategy and how it has enabled broad device reach and innovation for the Netflix service in a free O'Reilly webcast.

Michael Hart
Director of Engineering, Social Systems

Netflix Tech Blog

Hi there,

This is a new Netflix blog focused purely on technology issues. We'll share our perspectives, decisions and challenges regarding the software we build and use to create the Netflix service.

Netflix is a software company. We don't sell software, but nevertheless software is our lifeblood. We've been a software company since an accomplished engineer, Reed Hastings, co-founded the company in 1997. That competence has been critical to our growth and in our transition to a streaming-focused company from a company that mailed DVDs.

The markets we compete in move very quickly, and Netflix moves quickly within those markets. The deep talent in our engineering teams is the engine that drives our business. Whether it is getting our service onto cutting- edge new consumer devices, improving the start time and quality for streaming, building cloud-based architectures to support a business that's growing more than 50% year-over-year, or inventing new personalization algorithms, it all comes back to our incredible engineering team.

We intend to use this blog to share the details of our approach and the technical challenges we face. We'd like it to be a tool for prospective employees and fellow engineers in our industry to understand the company's take on the technology issues of our day. Thank you for reading and participating in the conversation!

Kevin McEntee, Greg Peters, and John Ciancutti, Netflix VPs of engineering