Thursday, December 17, 2015

HTML5 Video is now supported in Firefox

Today we’re excited to announce the availability of our HTML5 player in Firefox! Windows support is rolling out this week, and OS X support will roll out next year.

Firefox ships with the very latest versions of the HTML5 Premium Video Extensions. That includes the Media Source Extensions (MSE), which enable our video streaming algorithms to adapt to your available bandwidth; the Encrypted Media Extensions (EME), which allows for the viewing of protected content; and the Web Cryptography API (WebCrypto), which implements the cryptographic functions used by our open source Message Security Layer client-server protocol.

We worked closely with Mozilla and Adobe throughout development. Adobe supplies a content decryption module (CDM) that powers the EME API and allows protected content to play. We were pleased to find through our joint field testing that Adobe Primetime's CDM, Mozilla’s <video> tag, and our player all work together seamlessly to provide a high quality viewing experience in Firefox. With the new Premium Video Extensions, Firefox users will no longer need to take an extra step of installing a plug-in to watch Netflix.

We’re gratified that our HTML5 player support now extends to the latest versions of all major browsers, including Firefox, IE, Edge, Safari, and Chrome. Upgrade today to the latest version of your browser to get our best-in-class playback experience.

Wednesday, December 16, 2015

An update to our Windows 10 app

We have published a new version of our Windows 10 app to the Windows Store. This update features an updated user experience that is powered by an entirely new implementation on the Universal Windows Platform.

The New User Experience

The updated Browse experience provides vertical scrolling through categories and horizontal scrolling through the items in a category.


The updated Details view features large, cinematic artwork for the show or movie. The Details view for a Show includes episode information while the Details view for a Movie includes suggestions for other content.


Our members on Windows run across many different screen sizes, resolutions and scaling factors. The new version of the application uses a responsive layout to optimize the size and placement of items based on the window size and scaling factor.

Since many Windows 10 devices support touch input on integrated displays or via gestures on their trackpad, we have included affordances for both in this update. When a member is browsing content with a mouse, paginated scrolling of the content within a row is enabled by buttons on the ends of the row. When a member is using touch on an integrated display or via gestures on their trackpad, inertial scrolling of rows is enabled via swipe gestures.

Using the Universal Windows Platform

Over the last few years we have launched several applications for Windows and Windows Phone. The applications were built from a few code bases that span several technologies including Silverlight, XAML, C#. Bringing new features to our members on Windows platforms has required us to make changes in several code bases and ship multiple application updates.

With the Universal Windows Platform, we’re able to build an application from a single code base and run on many Windows 10 devices. Although the initial release of this application supports desktops, laptops and tablets running Windows 10, we have run our application on other Windows 10 devices and we will be adding support for phones running Windows 10 in the near future.

This new version of the application is a javascript based implementation that utilizes Microsoft’s WinJS library. Like several other teams (see tech blog posts for Netflix Likes React and Making Netflix.com Faster) at Netflix, we chose to use Facebook React. Using javascript to build the application has allowed us to use the same HTML5 video playback engine that is used in our browser based applications.

Windows Features

The new version of the app continues to support two features that are unique to the Windows platform.

When the app is pinned to the Start menu, the application tile is a Live Tile that will show artwork representing the items in a member’s Continue Watching list. We support several Live Tile sizes and large tile size is a new addition in this version of the app.

Our integration with Cortana enables members to search with voice commands. On app start, we register our supported commands with Cortana. Once a member has signed-in, they can issue one of our supported commands to Cortana. Here are the Cortana commands that we support in English:


If a user were to tell Cortana: Netflix find Jessica Jones, Cortana would start the app (if needed) and perform a search for Jessica Jones.

What’s Next?

We’re excited to share this update with our members and we’re hard at work on a new set of features and enhancements. The Universal Windows Platform will enable us to support phones running Windows 10 in the near future.

Visit the Windows Store to get the app today!

by Sean Sharma

Monday, December 14, 2015

Per-Title Encode Optimization


We’ve spent years developing an approach, called per-title encoding, where we run analysis on an individual title to determine the optimal encoding recipe based on its complexity. Imagine having very involved action scenes that need more bits to encapsulate the information versus unchanging landscape scenes or animation that need less. This allows us to deliver the same or better experience while using less bandwidth, which will be particularly important in lower bandwidth countries and as we expand to places where video viewing often happens on mobile networks.

Background
In traditional terrestrial, cable or satellite TV, broadcasters have an allocated bandwidth and the program or set of programs are encoded such that the resulting video streams occupy the given fixed capacity. Statistical multiplexing is oftentimes employed by the broadcaster to efficiently distribute the bitrate among simultaneous programs. However, the total accumulated bitrate across the programs should still fit within the limited capacity. In many cases, padding is even added using null packets to guarantee strict constant bitrate for the fixed channel, thus wasting precious data rate. Furthermore, with pre-set channel allocations, less popular programs or genres may be allocated lower bitrates (and therefore, worse quality) than shows that are viewed by more people.

With the advantages of Internet streaming, Netflix is not bound to pre-allocated channel constraints. Instead, we can deliver the best video quality stream to a member, no matter what the program or genre, tailored to the member’s available bandwidth and viewing device capability. We pre-encode streams at various bitrates applying optimized encoding recipes. On the member’s device, the Netflix client runs adaptive streaming algorithms which instantaneously select the best encode to maximize video quality while avoiding playback interruptions due to rebuffers.

Encoding with the best recipe is not a simple problem. For example, assuming a 1 Mbps bandwidth, should we stream H.264/AVC at 480p, 720p or 1080p? With 480p, 1 Mbps will likely not exhibit encoding artifacts such blocking or ringing, but if the member is watching on an HD device, the upsampled video will not be sharp. On the other hand, if we encode at 1080p we send a higher resolution video, but the bitrate may be too low such that most scenes will contain annoying encoding artifacts.
The Best Recipe for All
When we first deployed our H.264/AVC encodes in late 2010, our video engineers developed encoding recipes that worked best across our video catalogue (at that time). They tested various codec configurations and performed side-by-side visual tests to settle on codec parameters that produced the best quality trade-offs across different types of content. A set of bitrate-resolution pairs (referred to as a bitrate ladder), listed below, were selected such that the bitrates were sufficient to encode the stream at that resolution without significant encoding artifacts:

Bitrate (kbps)
Resolution
235
320x240
375
384x288
560
512x384
750
512x384
1050
640x480
1750
720x480
2350
1280x720
3000
1280x720
4300
1920x1080
5800
1920x1080

This “one-size-fits-all” fixed bitrate ladder achieves, for most content, good quality encodes given the bitrate constraint.  However, for some cases, such as scenes with high camera noise or film grain noise, the highest 5800 kbps stream would still exhibit blockiness in the noisy areas. On the other end, for simple content like cartoons, 5800 kbps is far more than needed to produce excellent 1080p encodes. In addition, a customer whose network bandwidth is constrained to 1750 kbps might be able to watch the cartoon at HD resolution, instead of the SD resolution specified by the ladder above.

The titles in Netflix’s video collection have very high diversity in signal characteristics. In the graph below we present a depiction of the diversity of 100 randomly sampled titles.  We encoded 100 sources at 1080p resolution using x264 constant QP (Quantization Parameter) rate control. At each QP point, for every title, we calculate the resulting bitrate in kbps, shown on the x-axis, and PSNR (Peak Signal-To-Noise Ratio) in dB, shown on the y-axis, as a measure of video quality.


The plots show that some titles reach very high PSNR (45 dB or more) at bitrates of 2500 kbps or less. On the other extreme, some titles require bitrates of 8000 kbps or more to achieve an acceptable PSNR of 38 dB.

Given this diversity, a one-size-fits-all scheme obviously cannot provide the best video quality for a given title and member’s allowable bandwidth. It can also waste storage and transmission bits because, in some cases, the allocated bitrate goes beyond what is necessary to achieve a perceptible improvement in video quality.

Side Note on Quality Metrics:  For the above figure, and many of the succeeding plots, we plot PSNR as the measure of quality. PSNR is the most commonly used metric in video compression. Although PSNR does not always reflect perceptual quality, it is a simple way to measure the fidelity to the source, gives good indication of quality at the high and low ends of the range (i.e. 45 dB is very good quality, 35 dB will show encoding artifacts), and is a good indication of quality trends within a single title.The analysis can also be applied using other quality measures such as the VMAF perceptual metric. VMAF (Video Multi-Method Assessment Fusion) is a perceptual quality metric developed by Netflix in collaboration with University of Southern California researchers.  We will publish details of this quality metric in a future blog.
The Best Recipe for the Content
Why Per-Title?
Consider an animation title where the content is “simple”, that is, the video frames are composed mostly of flat regions with no camera or film grain noise and minimal motion between frames. We compare the quality curve for the fixed bitrate ladder with a bitrate ladder optimized for the specific title:


As shown in the figure above, encoding this video clip at 1920x1080, 2350 kbps (A) produces a high quality encode, and adding bits to reach 4300 kbps (B) or even 5800 kbps (C) will not deliver a noticeable improvement in visual quality (for encodes with PSNR 45 dB or above, the distortion is perceptually unnoticeable). In the fixed bitrate ladder, for 2350 kbps, we encode at 1280x720 resolution (D). Therefore members with bandwidth constraints around that point are limited to 720p video instead of the better quality 1080p video.

On the other hand, consider an action movie that has significantly more temporal motion and spatial texture than the animation title. It has scenes with fast-moving objects, quick scene changes, explosions and water splashes. The graph below shows the quality curve of an action movie.

Encoding these high complexity scenes at 1920x1080, 4300 kbps (A), would result in encoding artifacts such as blocking, ringing and contouring. A better quality trade-off would be to encode at a lower resolution 1280x720 (B), to eliminate the encoding artifacts at the expense of adding scaling. Encoding artifacts are typically more annoying and visible than blurring introduced by downscaling (before the encode) then upsampling at the member’s device. It is possible that for this title with high complexity scenes, it would even be beneficial to encode 1920x1080 at a bitrate beyond 5800 kbps, say 7500 kbps, to eliminate the encoding artifacts completely.

To deliver the best quality video to our members, each title should receive a unique bitrate ladder, tailored to its specific complexity characteristics. Over the last few years, the encoding team at Netflix invested significant research and engineering to investigate and answer the following questions:
  • Given a title, how many quality levels should be encoded such that each level produces a just-noticeable-difference (JND)?
  • Given a title, what is the best resolution-bitrate pair for each quality level?
  • Given a title, what is the highest bitrate required to achieve the best perceivable quality?
  • Given a video encode, what is the human perceived quality?
  • How do we design a production system that can answer the above questions in a robust and scalable way?
The Algorithm
To design the optimal per-title bitrate ladder, we select the total number of quality levels and the bitrate-resolution pair for each quality level according to several practical constraints.  For example, we need backward-compatibility (streams are playable on all previously certified Netflix devices), so we limit the resolution selection to a finite set -- 1920x1080, 1280x720, 720x480, 512x384, 384x288 and 320x240. In addition, the bitrate selection is also limited to a finite set, where the adjacent bitrates have an increment of roughly 5%.

We also have a number of optimality criteria that we consider.
  • The selected bitrate-resolution pair should be efficient, i.e. at a given bitrate, the produced encode should have as high quality as possible.
  • Adjacent bitrates should be perceptually spaced. Ideally, the perceptual difference between two adjacent bitrates should fall just below one JND. This ensures that the quality transitions can be smooth when switching between bitrates. It also ensures that the least number of quality levels are used, given a wide range of perceptual quality that the bitrate ladder has to span.

To build some intuition, consider the following example where we encode a source at three different resolutions with various bitrates.
Encoding at three resolutions and various bitrates. Blue marker depicts encoding point and the red curve indicates the PSNR-bitrate convex hull.

At each resolution, the quality of the encode monotonically increases with the bitrate, but the curve starts flattening out (A and B) when the bitrate goes above some threshold. This is because every resolution has an upper limit in the perceptual quality it can produce. When a video gets downsampled to a low resolution for encoding and later upsampled to full resolution for display, its high frequency components get lost in the process.

On the other hand, a high-resolution encode may produce a quality lower than the one produced by encoding at the same bitrate but at a lower resolution (see C and D). This is because encoding more pixels with lower precision can produce a worse picture than encoding less pixels at higher precision combined with upsampling and interpolation. Furthermore, at very low bitrates the encoding overhead associated with every fixed-size coding block starts to dominate in the bitrate consumption, leaving very few bits for encoding the actual signal.  Encoding at high resolution at insufficient bitrate would produce artifacts such as blocking, ringing and contouring.

Based on the discussion above, we can draw a conceptual plot to depict the bitrate-quality relationship for any video source encoded at different resolutions, as shown below:

We can see that each resolution has a bitrate region in which it outperforms other resolutions. If we collect all these regions from all the resolutions available, they collectively form a boundary called convex hull. In an economic sense, the convex hull is where the encoding point achieves Pareto efficiency. Ideally, we want to operate exactly at the convex hull, but due to practical constraints (for example, we can only select from a finite number of resolutions), we would like to select bitrate-resolution pairs that are as close to the convex hull as possible.

It is practically infeasible to construct the full bitrate-quality graphs spanning the entire quality region for each title in our catalogue. To implement a practical solution in production, we perform trial encodings at different quantization parameters (QPs), over a finite set of resolutions. The QPs are chosen such that they are one JND apart. For each trial encode, we measure the bitrate and quality. By interpolating curves based on the sample points, we produce bitrate-quality curves at each candidate resolution. The final per-title bitrate ladder is then derived by selecting points closest to the convex hull.
Sample Results
BoJack Horseman is an example of an animation with simple content - flat regions and low motion from frame to frame. In the fixed bitrate ladder scheme, we use 1750 kbps for the 480p encode. For this particular episode, with the per-title recipe we start streaming 1080p video at 1540 kbps. Below we compare cropped screenshots (assuming a 1080p display) from the two versions (top: 1750 kbps, bottom: new 1540 kbps). The new encode is crisper and has better visual quality.
BJH_Mid_Zoom_JustBJH_Def.png
BJH_Mid_Zoom_JustBJH_BRL.png

Orange is the New Black has video characteristics with more average complexity. At the low bitrate range, there is no significant quality improvement seen with the new scheme. At the high end, the new per-title encoding assigns 4640 kbps for the highest quality 1080p encode. This is 20% in bitrate savings compared to 5800 kbps for the fixed ladder scheme. For this title we avoid wasting bits but maintain the same excellent visual quality for our members. The images below show a screenshot at 5800 kbps (top) vs. 4640 kbps (bottom).

5_47_vbr_4640_00000424.png5_47_vbr_5800_00000424.png
The Best Recipe for Your Device
In the description above where we select the optimized per-title bitrate ladder, there is an inherent assumption that the viewing device can receive and play any of the encoded resolutions. However, because of hardware constraints, some devices may be limited to resolutions lower than the original resolution of the source content. If we select the convex hull covering resolutions up to 1080p, this could lead to suboptimal viewing experiences for, say, a tablet limited to 720p decoding hardware. For example, given an animation title, we may switch to 1080p at 2000 kbps because it results in better quality than a 2000 kbps 720p stream. However the tablet will not be able to utilize the 1080p encode and would be constrained to a sub-2000 kbps stream even if the bandwidth allows for a better quality 720p encode.
To remedy this, we design additional per-title bitrate ladders corresponding to the maximum playable resolution on the device. More specifically, we design additional optimal per-title bitrate ladders tailored to 480p and 720p-capped devices.  While these extra encodes reduce the overall storage efficiency for the title, adding them ensures that our customers have the best experience.
What does this mean for my Netflix shows?
Per-title encoding allows us to deliver higher quality video two ways: Under low-bandwidth conditions, per-title encoding will often give you better video quality as titles with “simple” content, such as BoJack Horseman, will now be streamed at a higher resolution for the same bitrate. When the available bandwidth is adequate for high bitrate encodes, per-title encoding will often give you even better video quality for complex titles, such as Marvel's Daredevil, because we will encode at a higher maximum bitrate than our current recipe. Our continuous innovation on this front recognizes the importance of providing an optimal viewing experience for our members while simultaneously using less bandwidth and being better stewards of the Internet.


by Anne Aaron, Zhi Li, Megha Manohara, Jan De Cock and David Ronca

Thursday, December 10, 2015

Optimizing Content Quality Control at Netflix with Predictive Modeling

By Nirmal Govind and Athula Balachandran

Over 69 million Netflix members stream billions of hours of movies and shows every month in North and South America, parts of Europe and Asia, Australia and New Zealand. Soon, Netflix will be available in every corner of the world with an even more global member base.

As we expand globally, our goal is to ensure that every member has a high-quality experience every time they stream content on Netflix. This challenging problem is impacted by factors that include quality of the member's Internet connection, device characteristics, content delivery network, algorithms on the device, and quality of content.

We previously looked at opportunities to improve the Netflix streaming experience using data science. In this post, we'll focus on predictive modeling to optimize the quality control (QC) process for content at Netflix.

Content Quality

An important aspect of the streaming experience is the quality of the video, audio, and text (subtitle, closed captions) assets that are used.

Imagine sitting down to watch the first episode of a new season of your favorite show, only to find that the video and audio are off by 20 seconds. You decide to watch it anyway and turn on subtitles to follow along. What if the subtitles are poorly positioned and run off the screen?

Depending on the severity of the issue, you may stop watching, or continue because you’re already invested in the content. Either way, it leaves a bad impression and can negatively impact member satisfaction and retention. Netflix sets a high bar on content quality and has a QC process in place to ensure this bar is met. Let’s take a quick look at how the Netflix digital supply chain works and the role of the QC process.

We receive assets either from the content owners (e.g. studios, documentary filmmakers) or from a fulfillment house that obtains content from the owners and packages the assets for delivery to Netflix. Our QC process consists of automated and manual inspections to identify and replace assets that do not meet our specified quality standards.

Automated inspections are performed before and after the encoding process that compresses the larger “source” files into a set of smaller encoded distribution files (at different bitrates, for different devices, etc.). Manual QC is then done to check for issues easily detected with the human eye: depending on the content, a QCer either spot checks selected points of the movie or show, or watches the entire duration of the content. Examples of issues caught during the QC process include video interlacing artifacts, audio-video sync issues, and text issues such as missing or poorly placed subtitles.

It is worth noting the fraction of assets that fail quality checks is small. However, to optimize the streaming experience, we’re focused on detecting and replacing those sub-par assets. This is even more important as Netflix expands globally and more members consume content in a variety of new languages (both dubbed audio and subtitles). Also, we may receive content from new partners who have not delivered to us before and are not familiar with our quality standards.

Predictive Quality Control

As the Netflix catalog, member base, and global reach grow, it is important to scale the manual QC process by identifying defective assets accurately and efficiently.

Looking at the data

Data and data science play a key role in how Netflix operates, so the natural question to ask was:
Can we use data science to help identify defective assets?

We looked at the data on manual QC failures and observed that certain factors affected the likelihood of an asset failing QC. For example, some combinations of content and fulfillment partners had a higher rate of defects for certain types of assets. Metadata related to the content also showed patterns of failure. For example, older content (by release year) had a higher defect rate, likely due to the use of older formats for the creation and storage of assets. The genre of the content also exhibited certain patterns of failure.

These types of factors were used to build a machine learning model that predicts the probability that a delivered asset would not meet the Netflix quality standards.

A predictive model to identify defective assets helps in two significant ways:

  • Scale the content QC process by reducing QC effort on assets that are not defective.
  • Improve member experience by re-allocating resources to the discovery of hard-to-find quality issues that may otherwise be missed due to spot checks.

Machine Learning

Using results from past manual QC checks, a supervised machine learning (ML) approach was used to train a predictive quality control model that predicts a “fail” (likely has content quality issue) or “pass.” If an asset is predicted to fail QC, it is sent to manual QC. The modified supply chain workflow with the predictive QC model is shown below.

Netflix Supply Chain with Predictive Quality Control

A key goal of the model is to identify all defective assets even if this results in extra manual checks. Hence, we tuned the model for low false-negative rate (i.e. fewer uncaught defects) at the cost of increased false-positive rate.

Given that only a small fraction of the delivered assets are defective, one of the main challenges is class imbalance in the training data, i.e. we have a lot more data on “pass” assets than “fail” assets. We tackled this by using cost-sensitive training that heavily penalizes misclassification of the minority class (i.e. defective assets).

As with most model-building exercises, domain knowledge played an important role in this project. An observation that led to improved model performance was that defective assets are typically delivered in batches. For example, video assets from episodes within the same season of a show are mostly defective or mostly non-defective. It’s likely that assets in a batch were created or packaged around the same time and/or with the same equipment, and hence with similar defects.

We performed offline validation of the model by passively making predictions on incoming assets and comparing with actual results from manual QC. This allowed us to fine tune the model parameters and validate the model before deploying into production. Offline validation also confirmed the scaling and quality improvement benefits outlined earlier.

Looking Ahead

Predictive QC is a significant step forward in ensuring that members have an amazing viewing experience every time they watch a movie or show on Netflix. As the slate of Netflix Originals grows and more aspects of content creation—for example, localization, including subtitling and dubbing—are owned by Netflix, there is opportunity to further use data to improve content quality and the member experience.

We’re continuously innovating with data to build creative models and algorithms that improve the streaming experience for Netflix members. The scale of problems we encounter—Netflix accounts for 37.1% of North American downstream traffic at peak—provides for a set of unique modeling challenges. Also, we partner closely with the engineering teams to design and build production systems that embed such machine learning models. If you're interested in working in this exciting space, please check out the Streaming Science & Algorithms and Content Platform Engineering positions on the Netflix jobs site.

Wednesday, December 9, 2015

High Quality Video Encoding at Scale

At Netflix we receive high quality sources for our movies and TV shows and encode them to the best video streams possible for a given member’s viewing device and bandwidth capabilities. With the continued growth of our service it has been essential to build a video encoding pipeline that is highly robust, efficient and scalable. Our production system is designed to easily scale to support the demands of the business (i.e., more titles, more video encodes, shorter time to deploy), while guaranteeing a high quality of experience for our members.
Pipeline in the Cloud
The video encoding pipeline runs EC2 Linux cloud instances. The elasticity of the cloud enables us to seamlessly scale up when more titles need to be processed, and scale down to free up resources. Our video processing applications don’t require any special hardware and can run on a number of EC2 instance types. Long processing jobs are divided into smaller tasks and parallelized to reduce end-to-end delay and local storage requirements. It also allows us to exploit our internal spot market where instances are dynamically allocated based on real-time availability of the compute resources. If a task does not complete because an instance is abruptly terminated, only a small amount of work is lost and the task is rescheduled for another instance. The ability to recover from these transient errors is essential for a robust cloud-based system.

The figure below shows a high-level overview of our system. We ingest high quality video sources and generate video encodes of various codec profiles, at multiple quality representations per profile. The encodes are packaged and then deployed to a content delivery network for streaming. During a streaming session, the client requests the encodes it can play and adaptively switches among quality levels based on network conditions.

Slide1.png
Video Source Inspection
To ensure that we have high quality output streams, we need pristine video sources. Netflix ingests source videos from our originals production houses or content partners. In some undesirable cases, the delivered source video contains distortion or artifacts which would result in bad quality video encodes – garbage in means garbage out. These artifacts may have been introduced by multiple processing and transcoding steps before delivery, data corruption during transmission or storage, or human errors during content production. Rather than fixing the source video issues after ingest (for example, apply error concealment to corrupted frames or re-edit sources which contain extra content), Netflix rejects the problematic source video and requests redelivery. Rejecting problematic sources ensures that:
  • The best source video available is ingested into the system. In many cases, error mitigation techniques only partially fix the problem.
  • Complex algorithms (which could have been avoided by better processes upstream) do not unnecessarily burden the Netflix ingest pipeline.
  • Source issues are detected early where a specific and actionable error can be raised.
  • Content partners are motivated to triage their production pipeline and address the root causes of the problems. This will lead to improved video source deliveries in the future.
Our preferred source type is Interoperable Master Format (IMF).  In addition we support ProRes, DPX, and MPEG (typically older sources).  During source inspection, we 1) verify that the source is conformed to the relevant specification(s), 2) detect content that could lead to a bad viewing experience and 3) generate metadata required by the encoding pipeline. If the inspection deems the source unacceptable, the system automatically informs our content partner about issues and requests a redelivery of the source.

A modern 4K source file can be quite large. Larger, in fact, than a typical drive on an EC2 instance. In order to efficiently support these large source files, we must run the inspection on the file in smaller chunks. This chunked model lends itself to parallelization. As shown in the more detailed diagram below, an initial inspection step is performed to index the source file, i.e. determine the byte offsets for frame-accurate seeking, and generate basic metadata such as resolution and frame count. The file segments are then processed in parallel on different instances. For each chunk, bitstream-level and pixel-level analysis is applied to detect errors and generate metadata such as temporal and spatial fingerprints. After all the chunks are inspected, the results are assembled by the inspection aggregator to determine whether the source should be allowed into the encoding pipeline.  With our highly optimized inspection workflow, we can inspect a 4K source in less than 15 minutes.  Note that longer duration sources would have more chunks, so the total inspection time will still be less than 15 minutes.

Slide1.jpg
Parallel Video Encoding
At Netflix we stream to a heterogenous set of viewing devices. This requires a number of codec profiles: VC1, H.264/AVC Baseline, H.264/AVC Main and HEVC. We also support varying bandwidth scenarios for our members, all the way from sub-0.5 Mbps cellular to 100+ Mbps high-speed Internet. To deliver the best experience, we generate multiple quality representations at different bitrates (ranging from 100 kbps to 16 Mbps) and the Netflix client adaptively selects the optimal stream given the instantaneous bandwidth.

Slide1.jpg

Similar to inspection, encoding is performed on chunks of the source file, which allows for efficient parallelization. Since we strive for quality control at every step of the pipeline, we verify the correctness of each encoded chunk right after it completes encoding. If a problem is detected, we can immediately triage the problem (or in the case of transient errors, resubmit the task) without waiting for the entire video to complete. When all the chunks corresponding to a stream have successfully completed, they are stitched together by a video assembler. To guard against frame accuracy issues that may have been introduced by incorrect parallel encoding (for example, chunks assembled in the wrong order, or frames dropped or duplicated at chunk boundaries), we validate the assembled stream by comparing the spatial and temporal fingerprints of the encode with that of the source video (fingerprints of the source are generated during the inspection stage).

In addition to straightforward encoding, the system calculates multiple full-reference video quality metrics for each output video stream. By automatically generating quality scores for each encode, we can monitor video quality at scale. The metrics also help pinpoint bugs in the system and guide us in finding areas for improving our encode recipes. We will provide more detail on the quality metrics we utilize in our pipeline in a future blog post.
Quality of Service
Before we implemented parallel chunked encoding, a 1080p movie could take days to encode, and a failure occurring late in the process would delay the encode even further. With our current pipeline, a title can be fully inspected and encoded at the different profiles and quality representations, with automatic quality control checks, within a few hours. This enables us to stream titles within just a few hours of their original broadcast. We are currently working on further improvements to our system which will allow us to inspect and encode a 1080p source in 30 minutes or less.  Note that since the work is done in parallel, processing time is not increased for longer sources.

Before automated quality checks were integrated into our system, encoding issues (picture corruption, inserted black frames, frame rate conversion, interlacing artifacts, frozen frames, etc) could go unnoticed until reported by Netflix members through Customer Support. Not only was this a poor member experience, triaging these issues was costly and inefficient,  often escalating through many teams before the root cause was found.  In addition, encoding failures (for example due to corrupt sources) would also require manual intervention and long delays in root-causing the failure.  With our investment in automated inspection at scale, we detect the issues early, whether it was because of a bad source delivery, an implementation bug, or a glitch in one of the cloud instances, and we provide specific and actionable error messages. For a source that passes our inspections, we have an encode reliability of 99.99% or better.  When we do find a problem that was not caught by our algorithms, we design new inspections to detect those issues in the future.
In Summary
High quality video streams are essential for delivering a great Netflix experience to our members. We have developed, and continue to improve on, a video ingest and encode pipeline that runs on the cloud reliably and at scale. We designed for automated quality control checks throughout so that we fail fast and detect issues early in the processing chain. Video is processed in parallel segments. This decreases end-to-end processing delay, reduces the required local storage and improves the system’s error resilience. We have invested in integrating video quality metrics into the pipeline so that we can continuously monitor performance and further optimize our encoding.

Our encoding pipeline, combined with the compute power of the Netflix internal spot market, has value outside our day-to-day production operations. We leverage this system to run large-scale video experiments (codec comparisons, encode recipe optimizations, quality metrics design, etc.) which strive to answer questions that are important to delivering the highest quality video streams, and at the same time could benefit the larger video research community.

by Anne Aaron and David Ronca