Thursday, January 12, 2017

Crafting a high-performance TV user interface using React


The Netflix TV interface is constantly evolving as we strive to figure out the best experience for our members. For example, after A/B testing, eye-tracking research, and customer feedback we recently rolled out video previews to help members make better decisions about what to watch. We’ve written before about how our TV application consists of an SDK installed natively on the device, a JavaScript application that can be updated at any time, and a rendering layer known as Gibbon. In this post we’ll highlight some of the strategies we’ve employed along the way to optimize our JavaScript application performance.

React-Gibbon

In 2015, we embarked on a wholesale rewrite and modernization of our TV UI architecture. We decided to use React because its one-way data flow and declarative approach to UI development make it easier to reason about our app. Obviously, we’d need our own flavor of React since at that time it only targeted the DOM. We were able to create a prototype that targeted Gibbon pretty quickly. This prototype eventually evolved into React-Gibbon and we began to work on building out our new React-based UI.

React-Gibbon’s API would be very familiar to anyone who has worked with React-DOM. The primary difference is that instead of divs, spans, inputs etc, we have a single “widget” drawing primitive that supports inline styling.

React.createClass({
   render() {
       return <Widget style={{ text: 'Hello World', textSize: 20 }} />;
   }
});

Performance is a key challenge

Our app runs on hundreds of different devices, from the latest game consoles like the PS4 Pro to budget consumer electronics devices with limited memory and processing power. The low-end machines we target can often have sub-GHz single core CPUs, low memory and limited graphics acceleration. To make things even more challenging, our JavaScript environment is an older non-JIT version of JavaScriptCore. These restrictions make super responsive 60fps experiences especially tricky and drive many of the differences between React-Gibbon and React-DOM.

Measure, measure, measure

When approaching performance optimization it’s important to first identify the metrics you will use to measure the success of your efforts. We use the following metrics to gauge overall application performance:

  • Key Input Responsiveness - the time taken to render a change in response to a key press
  • Time To Interactivity - the time to start up the app
  • Frames Per Second - the consistency and smoothness of our animations
  • Memory Usage


The strategies outlined below are primarily aimed at improving key input responsiveness. They were all identified, tested and measured on our devices and are not necessarily applicable in other environments. As with all “best practice” suggestions it is important to be skeptical and verify that they work in your environment, and for your use case. We started off by using profiling tools to identify what code paths were executing and what their share of the total render time was; this lead us to some interesting observations.

Observation: React.createElement has a cost

When Babel transpiles JSX it converts it into a number of React.createElement function calls which when evaluated produce a description of the next Component to render. If we can predict what the createElement function will produce, we can inline the call with the expected result at build time rather than at runtime.


// JSX
render() {
   return <MyComponent key='mykey' prop1='foo' prop2='bar' />;
}

// Transpiled
render() {
   return React.createElement(MyComponent, { key: 'mykey', prop1: 'foo', prop2: 'bar' });
}

// With inlining
render() {
   return {
       type: MyComponent,
       props: {
           prop1: 'foo',
           prop2: 'bar'
       },
       key: 'mykey'
   };
}


As you can see we have removed the cost of the createElement call completely, a triumph for the “can we just not?” school of software optimization.

We wondered whether it would be possible to apply this technique across our whole application and avoid calling createElement entirely. What we found was that if we used a ref on our elements, createElement needs to be called in order to hook up the owner at runtime. This also applies if you’re using the spread operator which may contain a ref value (we’ll come back to this later).

We use a custom Babel plugin for element inlining, but there is an official plugin that you can use right now. Rather than an object literal, the official plugin will emit a call to a helper function that is likely to disappear thanks to the magic of V8 function inlining. After applying our plugin there were still quite a few components that weren’t being inlined, specifically Higher-order Components which make up a decent share of the total components being rendered in our app.

Problem: Higher-order Components can’t use Inlining

We love Higher-order Components (HOCs) as an alternative to mixins. HOCs make it easy to layer on behavior while maintaining a separation of concerns. We wanted to take advantage of inlining in our HOCs, but we ran into an issue: HOCs usually act as a pass-through for their props. This naturally leads to the use of the spread operator, which prevents the Babel plug-in from being able to inline.

When we began the process of rewriting our app, we decided that all interactions with the rendering layer would go through declarative APIs. For example, instead of doing:

componentDidMount() {
   this.refs.someWidget.focus()
}

In order to move application focus to a particular Widget, we instead implemented a declarative focus API that allows us to describe what should be focused during render like so:

render() {
   return <Widget focused={true} />;
}

This had the fortunate side-effect of allowing us to avoid the use of refs throughout the application. As a result we were able to apply inlining regardless of whether the code used a spread or not.


// before inlining
render() {
   return <MyComponent {...this.props} />;
}

// after inlining
render() {
   return {
       type: MyComponent,
       props: this.props
   };
}
This greatly reduced the amount of function calls and property merging that we were previously having to do but it did not eliminate it completely.

Problem: Property interception still requires a merge

After we had managed to inline our components, our app was still spending a lot of time merging properties inside our HOCs. This was not surprising, as HOCs often intercept incoming props in order to add their own or change the value of a particular prop before forwarding on to the wrapped component.

We did analysis of how stacks of HOCs scaled with prop count and component depth on one of our devices and the results were informative.


Screenshot 2017-01-11 12.31.30.png


They showed that there is a roughly linear relationship between the number of props moving through the stack and the render time for a given component depth.

Death by a thousand props

Based on our findings we realized that we could improve the performance of our app substantially by limiting the number of props passed through the stack. We found that groups of props were often related and always changed at the same time. In these cases, it made sense to group those related props under a single “namespace” prop. If a namespace prop can be modeled as an immutable value, subsequent calls to shouldComponentUpdate calls can be optimized further by checking referential equality rather than doing a deep comparison. This gave us some good wins but eventually we found that we had reduced the prop count as much as was feasible. It was now time to resort to more extreme measures.

Merging props without key iteration

Warning, here be dragons! This is not recommended and most likely will break many things in weird and unexpected ways.

After reducing the props moving through our app we were experimenting with other ways to reduce the time spent merging props between HOCs. We realized that we could use the prototype chain to achieve the same goals while avoiding key iteration.


// before proto merge
render() {
   const newProps = Object.assign({}, this.props, { prop1: 'foo' })
   return <MyComponent {...newProps} />;
}

// after proto merge
render() {
   const newProps = { prop1: 'foo' };
   newProps.__proto__ = this.props;
   return {
       type: MyComponent,
       props: newProps
   };
}

In the example above we reduced the 100 depth 100 prop case from a render time of ~500ms to ~60ms. Be advised that using this approach introduced some interesting bugs, namely in the event that this.props is a frozen object . When this happens the prototype chain approach only works if the __proto__ is assigned after the newProps object is created. Needless to say, if you are not the owner of newProps it would not be wise to assign the prototype at all.

Problem: “Diffing” styles was slow

Once React knows the elements it needs to render it must then diff them with the previous values in order to determine the minimal changes that must be applied to the actual DOM elements. Through profiling we found that this process was costly, especially during mount - partly due to the need to iterate over a large number of style properties.

Separate out style props based on what’s likely to change

We found that often many of the style values we were setting were never actually changed. For example, say we have a Widget used to display some dynamic text value. It has the properties text, textSize, textWeight and textColor. The text property will change during the lifetime of this Widget but we want the remaining properties to stay the same. The cost of diffing the 4 widget style props is spent on each and every render. We can reduce this by separating out the things that could change from the things that don't.

const memoizedStylesObject = { textSize: 20, textWeight: ‘bold’, textColor: ‘blue’ };


<Widget staticStyle={memoizedStylesObject} style={{ text: this.props.text }} />

If we are careful to memoize the memoizedStylesObject object, React-Gibbon can then check for referential equality and only diff its values if that check proves false. This has no effect on the time it takes to mount the widget but pays off on every subsequent render.

Why not avoid the iteration all together?

Taking this idea further, if we know what style props are being set on a particular widget, we can write a function that does the same work without having to iterate over any keys. We wrote a custom Babel plugin that performed static analysis on component render methods. It determines which styles are going to be applied and builds a custom diff-and-apply function which is then attached to the widget props.


// This function is written by the static analysis plugin
function __update__(widget, nextProps, prevProps) {
   var style = nextProps.style,
       prev_style = prevProps && prevProps.style;


   if (prev_style) {
       var text = style.text;
       if (text !== prev_style.text) {
           widget.text = text;
       }
   } else {
       widget.text = style.text;
   }
}


React.createClass({
   render() {
       return (
           <Widget __update__={__update__} style={{ text: this.props.title }}  />
       );
   }
});


Internally React-Gibbon looks for the presence of the “special” __update__ prop and will skip the usual iteration over previous and next style props, instead applying the properties directly to the widget if they have changed. This had a huge impact on our render times at the cost of increasing the size of the distributable.

Performance is a feature

Our environment is unique, but the techniques we used to identify opportunities for performance improvements are not. We measured, tested and verified all of our changes on real devices. Those investigations led us to discover a common theme: key iteration was expensive. As a result we set out to identify merging in our application, and determine whether they could be optimized. Here’s a list of some of the other things we’ve done in our quest to improve performance:

  • Custom Composite Component - hyper optimized for our platform
  • Pre-mounting screens to improve perceived transition time
  • Component pooling in Lists
  • Memoization of expensive computations

Building a Netflix TV UI experience that can run on the variety of devices we support is a fun challenge. We nurture a performance-oriented culture on the team and are constantly trying to improve the experiences for everyone, whether they use the Xbox One S, a smart TV or a streaming stick. Come join us if that sounds like your jam!


Tuesday, January 3, 2017

Netflix Now Supports Ultra HD 4K on Windows 10 with New Intel Core Processors

We're excited to bring Netflix support for Ultra HD 4K to Windows 10, making the vast catalog of Netflix TV shows and movies in 4K even more accessible for our members around the world to watch in the best picture quality.

For the last several years, we've been working with our partners across the spectrum of CE devices to add support for the richer visual experience of 4K.  Since launching on Smart TVs in 2014, many different devices can now play our 4K content, including Smart TVs, set top boxes and game consoles.  We are pleased to add Windows 10 and 7th Gen Intel® Core™ CPUs to that list.

Microsoft and Intel both did great work to enable 4K on their platforms.  Intel added support for new, more efficient codecs necessary to stream 4K as well as hardware-based content security in their latest CPUs.  Microsoft enhanced the Edge browser with the latest HTML5 video support and made it work beautifully with Intel's latest processors.  The sum total is an enriched Netflix experience. Thanks to Microsoft's Universal Windows Platform, our app for Windows 10 includes the same 4K support as the Edge browser.

As always, you can enjoy all of our movies and TV shows on all supported platforms. We are working hard with our partners to further expand device support of 4K. An increasing number of our Netflix originals are shot, edited, and delivered in this format, with more than 600 hours available to watch, such as Stranger Things, The Crown, Gilmore Girls: A Year in the Life and Marvel's Luke Cage.

By Matt Trunnell, Nick Eddy, and Greg Wallace-Freedman

Monday, December 12, 2016

Netflix Conductor : A microservices orchestrator

The Netflix Content Platform Engineering team runs a number of business processes which are driven by asynchronous orchestration of tasks executing on microservices.  Some of these are long running processes spanning several days. These processes play a critical role in getting titles ready for streaming to our viewers across the globe.

A few examples of these processes are:

  • Studio partner integration for content ingestion
  • IMF based content ingestion from our partners
  • Process of setting up new titles within Netflix
  • Content ingestion, encoding, and deployment to CDN

Traditionally, some of these processes had been orchestrated in an ad-hoc manner using a combination of pub/sub, making direct REST calls, and using a database to manage the state.  However, as the number of microservices grow and the complexity of the processes increases, getting visibility into these distributed workflows becomes difficult without a central orchestrator.

We built Conductor “as an orchestration engine” to address the following requirements, take out the need for boilerplate in apps, and provide a reactive flow :

  • Blueprint based. A JSON DSL based blueprint defines the execution flow.
  • Tracking and management of workflows.
  • Ability to pause, resume and restart processes.
  • User interface to visualize process flows.
  • Ability to synchronously process all the tasks when needed.
  • Ability to scale to millions of concurrently running process flows.
  • Backed by a queuing service abstracted from the clients.
  • Be able to operate over HTTP or other transports e.g. gRPC.

Conductor was built to serve the above needs and has been in use at Netflix for almost a year now. To date, it has helped orchestrate more than 2.6 million process flows ranging from simple linear workflows to very complex dynamic workflows that run over multiple days.

Today, we are open sourcing Conductor to the wider community hoping to learn from others with similar needs and enhance its capabilities.  You can find the developer documentation for Conductor here.

Why not peer to peer choreography?

With peer to peer task choreography, we found it was harder to scale with growing business needs and complexities.  Pub/sub model worked for simplest of the flows, but quickly highlighted some of the issues associated with the approach:
  • Process flows are “embedded” within the code of multiple applications
  • Often, there is tight coupling and assumptions around input/output, SLAs etc, making it harder to adapt to changing needs
  • Almost no way to systematically answer “What is remaining for a movie's setup to be complete”?

Why Microservices?

In a microservices world, a lot of business process automations are driven by orchestrating across services. Conductor enables orchestration across services while providing control and visibility into their interactions. Having the ability to orchestrate across  microservices also helped us in leveraging existing services to build new flows or update existing flows to use Conductor very quickly, effectively providing an easier route to adoption.  

Architectural Overview


At the heart of the engine is a state machine service aka Decider service. As the workflow events occur (e.g. task completion, failure etc.), Decider combines the workflow blueprint with the current state of the workflow, identifies the next state, and schedules appropriate tasks and/or updates the status of the workflow.

Decider works with a distributed queue to manage scheduled tasks.  We have been using dyno-queues on top of Dynomite for managing distributed delayed queues. The queue recipe was open sourced earlier this year and here is the blog post.

Task Worker Implementation

Tasks, implemented by worker applications, communicate via the API layer. Workers achieve this by either implementing a REST endpoint that can be called by the orchestration engine or by implementing a polling loop that periodically checks for pending tasks. Workers are intended to be idempotent stateless functions. The polling model allows us to handle backpressure on the workers and provide auto-scalability based on the queue depth when possible. Conductor provides APIs to inspect the workload size for each worker that can be used to autoscale worker instances.


Worker communication with the engine

API Layer

The APIs are exposed over HTTP - using HTTP allows for ease of integration with different clients. However, adding another protocol (e.g. gRPC) should be possible and relatively straightforward.

Storage

We use Dynomite “as a storage engine” along with Elasticsearch for indexing the execution flows. The storage APIs are pluggable and can be adapted for various storage systems including traditional RDBMSs or Apache Cassandra like no-sql stores.

Key Concepts

Workflow Definition

Workflows are defined using a JSON based DSL.  A workflow blueprint defines a series of tasks that needs be executed.  Each of the tasks are either a control task (e.g. fork, join, decision, sub workflow, etc.) or a worker task.  Workflow definitions are versioned providing flexibility in managing upgrades and migration.

An outline of a workflow definition:
{
 "name": "workflow_name",
 "description": "Description of workflow",
 "version": 1,
 "tasks": [
   {
     "name": "name_of_task",
     "taskReferenceName": "ref_name_unique_within_blueprint",
     "inputParameters": {
       "movieId": "${workflow.input.movieId}",
       "url": "${workflow.input.fileLocation}"
     },
     "type": "SIMPLE",
     ... (any other task specific parameters)
   },
   {}
   ...
 ],
 "outputParameters": {
   "encoded_url": "${encode.output.location}"
 }
}

Task Definition

Each task’s behavior is controlled by its template known as task definition. A task definition provides control parameters for each task such as timeouts, retry policies etc. A task can be a worker task implemented by application or a system task that is executed by orchestration server.  Conductor provides out of the box system tasks such as Decision, Fork, Join, Sub Workflows, and an SPI that allows plugging in custom system tasks. We have added support for HTTP tasks that facilitates making calls to REST services.

JSON snippet of a task definition:
{
 "name": "encode_task",
 "retryCount": 3,
 "timeoutSeconds": 1200,
 "inputKeys": [
   "sourceRequestId",
   "qcElementType"
 ],
 "outputKeys": [
   "state",
   "skipped",
   "result"
 ],
 "timeoutPolicy": "TIME_OUT_WF",
 "retryLogic": "FIXED",
 "retryDelaySeconds": 600,
 "responseTimeoutSeconds": 3600
}

Inputs / Outputs

Input to a task is a map with inputs coming as part of the workflow instantiation or output of some other task. Such configuration allows for routing inputs/outputs from workflow or other tasks as inputs to tasks that can then act upon it. For example, the output of an encoding task can be provided to a publish task as input to deploy to CDN.

JSON snippet for defining task inputs:
{
     "name": "name_of_task",
     "taskReferenceName": "ref_name_unique_within_blueprint",
     "inputParameters": {
       "movieId": "${workflow.input.movieId}",
       "url": "${workflow.input.fileLocation}"
     },
     "type": "SIMPLE"
   }

An Example

Let’s look at a very simple encode and deploy workflow:



There are a total of 3 worker tasks and a control task (Errors) involved:

  1. Content Inspection: Checks the file at input location for correctness/completeness
  2. Encode: Generates a video encode
  3. Publish: Publishes to CDN

These three tasks are implemented by different workers which are polling for pending tasks using the task APIs. These are ideally idempotent tasks that operate on the input given to the task, performs work, and updates the status back.

As each task is completed, the Decider evaluates the state of the workflow instance against the blueprint (for the version corresponding to the workflow instance) and identifies the next set of tasks to be scheduled, or completes the workflow if all tasks are done.

UI

The UI is the primary mechanism of monitoring and troubleshooting workflow executions. The UI provides much needed visibility into the processes by allowing searches based on various parameters including input/output parameters, and provides a visual presentation of the blueprint, and paths it has taken, to better understand process flow execution. For each workflow instance, the UI provides details of each task execution with the following details:

  • Timestamps for when the task was scheduled, picked up by the worker and completed.
  • If the task has failed, the reason for failure.
  • Number of retry attempts
  • Host on which the task was executed.
  • Inputs provided to the task and output from the task upon completion.

Here’s a UI snippet from a kitchen sink workflow used to generate performance numbers:




Other solutions considered

Amazon SWF

We started with an early version using a simple workflow from AWS. However, we chose to build Conductor given some of the limitations with SWF:
  • Need for blueprint based orchestration, as opposed to programmatic deciders as required by SWF.
  • UI for visualization of flows.
  • Need for more synchronous nature of APIs when required (rather than purely message based)
  • Need for indexing inputs and outputs for workflow and tasks and ability to search workflows based on that.
  • Need to maintain a separate data store to hold workflow events to recover from failures, search etc.

Amazon Step Function
Recently announced AWS Step Functions added some of the features we were looking for in an orchestration engine. There is a potential for Conductor to adopt the states language to define workflows.

Some Stats

Below are some of the stats from the production instance we have been running for a little over a year now. Most of these workflows are used by content platform engineering in supporting various flows for content acquisition, ingestion and encoding.

Total Instances created YTD
2.6 Million
No. of distinct workflow definitions
100
No. of unique workers
190
Avg no. of tasks per workflow definition
6
Largest Workflow
48 tasks

Future considerations

  • Support for AWS Lambda (or similar) functions as tasks for serverless simple tasks.
  • Tighter integration with container orchestration frameworks that will allow worker instance auto-scalability.
  • Logging execution data for each task. We think this is a useful addition that helps in troubleshooting.
  • Ability to create and manage the workflow blueprints from the UI.
  • Support for states language.
If you like the challenges of building distributed systems and are interested in building the Netflix studio ecosystem and the content pipeline at scale, check out our job openings.

By Viren Baraiya, Vikram Singh