Monday, June 2, 2014

Building Netflix Playback with Self-Assembling Components


Our 48 million members are accustomed to seeing a screen like this, whether on their TV or one of the 1000+ other Netflix devices they enjoy watching on. But the simple act of pressing play calls into action a deep and complex system that handles the DRM licenses, contract evaluations, CDN selection, and more. This system is known internally as the Playback Service and is responsible for making your Netflix streaming experience seem effortless.

The original Playback Service was built before Netflix was synonymous with streaming.  As our product matured, the existing architecture became more difficult to support and started showing signs of stress.  For example, we debuted HTML5 support last summer on IE11 in a major step towards standards-based playback on web platforms.  This required adoption of a new device security model that works with the emerging HTML5 Web Cryptography API.  It would have been very challenging to integrate this new model into our original architecture due to poor separation of business and security logic.  This and other shortcomings pushed us to re-imagine our design, leading to a radically different solution with the following as our key design requirements:

  • Operation at massive scale
  • High velocity innovation
  • Reusability of components

High-level Architecture

The new Playback Service uses self-assembling components to handle the enormous volume of traffic Netflix gets each day.  The following is an overview of that architecture; with special focus given to how requests are delegated to these components which are dynamically assembled and executed within an internal rule engine.

Playback Service Architecture 3.png

We will examine the building blocks of this architecture and show the benefits of using small, reusable components that are automatically wired together to create an emergent system.  This post will focus on the smallest units of the new architecture and the ideas behind self-assembly.  It will be followed up by others that go deeper into how we implement these concepts in the new architecture and address some challenges inherent to this approach.


We started from the bottom up; defining the building blocks of the new architecture in a way that promotes loose coupling and clear separation of concerns.  These building blocks are called Processors: entities that take zero or more inputs and generate no more than one output.  These are the smallest computational unit of the architecture and behave like commands ala The Gang of Four.  Below is a diagram that shows a processor that takes A, B, C and generates D.

This metaphor generalizes well given that many complex tasks can be subdivided into discrete, function-like steps.  It also matches the way most engineers already think about problem decomposition.  This definition enables processors to be as specialized as necessary, promoting low interconnectedness with other parts of the system.  These qualities make the system easier to reason about, enhance, and test.

The Playback Service--like other complex systems--can be modelled as a black box that takes inputs and generates outputs.  The conversion of some input A to an output E can be defined as a function f(A) = E and modelled as a single processor.
Of course, using a single processor only makes sense for very simple systems.  More complex services  would be decomposed into finer-grained processors as illustrated below.
Here you can see that the computation of E is handled as several processor invocations.  This flow resembles a series of function calls in a Java program, but there are some fundamental differences.  The difficulty with normal functions is someone has to invoke them and decide how they are wired together.  Essentially, the decomposition of f(A) = E above is usually something the entire team needs to understand and maintain.  This places a cap on system evolution since scaling the system means scaling each engineer.  It also increases the cost of scaling the team since minimum ramp-up time is directly proportional to the system complexity.

But what if you could have functions that self-assemble?  What if processors could simply advertise their inputs/outputs and the wiring between them were an emergent property of that particular collection of processors?

Self-assembling Components

The hypothesis is that complex systems can be built efficiently if they are reduced to small, local problems that are solved in relative isolation with processors.  These small blocks are then automatically assembled to reveal a fully formed system.  Such a system would no longer require engineers to understand their entire scope before making significant contributions.  These systems would be free to scale without taxing their engineering teams proportionally.  Likewise, their teams could grow without investing in lots of onboarding time for each new member.

We can use the decomposition we did above for f(A) = E to illustrate how a self-assembly would work.  Here is a simplified version of the diagram we saw earlier.
This system solves for A => E using the processors shown.  However, this could be a more sophisticated system containing other processors that do not participate in the computation of E given A.  Consider the following, where the system’s complete set of processors is included in the diagram.

The other processors are inactive for this computation, but various combinations would become active under different inputs.  Take a case where the inputs were J, and W and processors were in place to handle these inputs such that the computation J,W => Y were possible.

The inputs J and W would trigger a different set of processors than before; leaving those that computed A => E dormant.

The processors triggered for some inputs is an emergent property of the complete set of processors within the system.  An assembler mechanism exists to determine when each processor can participate in the computation.  It makes this decision at runtime, allowing for a fully dynamic wiring for each request.  As a result, processors can be organized in any way and do not need to be aware of each other.  This makes their functionality easier to add, remove, and update than conventional mechanisms like switch statements or inheritance; which are statically determined and more rigidly structured.

Extending traditional systems often means ramping up on a lot of code to understand where the relevant inflection points are for a change or feature.  Self-assembly relaxes the urgency for this deeper context and shifts the focus towards getting the right interaction designs for each component.  It also enables more thorough testing since processors are naturally isolated from each other and simpler to unit test.  They can also be assembled and run as a group with mocked dependencies to facilitate thorough end-to-end validation.

Self-assembly frees engineers to focus on solving local problems and adding value without having to wrestle with the entire end-to-end context.  State validation is a good example of an enhancement that requires only local context with this architecture.  The computation of J,W => Y above can be enhanced to include additional validation of V whenever it is generated.  This could be achieved by adding a new processor that operates on V as an input: illustrated below.

The new processor V => V would take a value and raise an error if that value is invalid for some reason.  This validation would be triggered whenever V is present in the system, whether or not J,W => Y is being computed.  This is by design; meaning each processor is reused whenever its services are needed.

This validator pattern emerges often in the new Playback Service.  For example, we use it to detect whether data sent by clients has been tampered with mid-flight.  This is done using HMAC calculations to verify the data matches a client provided hash value.  As with other processors, the integrity protection service provided this way is available for use during any request.

Challenges of Self-assembly

The use of self-assembling components offers clear advantages over hand wiring.  It enables fluid architectures that can change dynamically at runtime and simplifies feature isolation so components can evolve rapidly with minimal impact to the overall system.  Moreover, it decouples team size from system complexity so the two can scale independently.

Despite these benefits, building a working solution that enables self-assembly is non-trivial.  Such a system has to decide which operations are executed when, and in what order.  It has to manage the computation pipeline without adding too much overhead or complexity; all while scaling up with the set of processors.  It also needs to be relatively unobtrusive so developers can remain focused on building the service.  These were some of the challenges my team had to overcome when building the new Playback architecture atop the concepts of self-assembly.


Subsequent blog posts will take us deeper into the workings of the new Playback Service architecture and provide more details about how we solved the challenges above and other issues intrinsic to self-assembly.  We will also be discussing how this architecture is designed to enable fully dynamic end-points (where the set of rules/processors can change for each request) as well as dynamic services where the set of end-points can change for a running server.

The new Playback Service architecture based on self-assembling components provides a flexible programming model that is easy to develop and test.  It greatly improves our ability to innovate as we continue to enhance the viewing experience for our members.

We are always looking for talented engineers to join us.  So reach out if you are excited about this kind of engineering endeavor and would like to learn more about this and other things we are working on.