Tuesday, March 21, 2017

Update on HTML5 Video for Netflix

About four years ago, we shared our plans for playing premium video in HTML5, replacing Silverlight and eliminating the extra step of installing and updating browser plug-ins.  

Since then, we have launched HTML5 video on Chrome OS, Chrome, Internet Explorer, Safari, Opera, Firefox, and Edge on all supported operating systems.  And though we do not officially support Linux, Chrome playback has worked on that platform since late 2014.  Starting today, users of Firefox can also enjoy Netflix on Linux.  This marks a huge milestone for us and our partners, including Google, Microsoft, Apple, and Mozilla that helped make it possible.

But this is just the beginning.  We launched 4K Ultra HD on Microsoft Edge in December of 2016, and look forward to high-resolution video being available on more platforms soon.  We are also looking ahead to HDR video.  Netflix-supported TVs with Chromecast built-in—which use a version of our web player—already support Dolby Vision and HDR10.  And we are working with our partners to provide similar support on other platforms over time.

Netflix adoption of HTML5 has resulted in us contributing to a number of related industry standards including:
  • MPEG-DASH, which describes our streaming file formats, including fragmented MP4 and common encryption.  
  • WebCrypto, which protects user data from inspection or tampering and allows us to provide our subscription video service on the web.  
  • Media Source Extensions (MSE), which enable our web application to dynamically manage the playback session in response to ever-changing network conditions.
  • Encrypted Media Extensions (EME), which enables playback of protected content, and hardware-acceleration on capable platforms.

We intend to remain active participants in these and other standards over time.  This includes areas that are just beginning to formulate, like the handling of HDR images and graphics in CSS being discussed in the Color on the Web community group.

Our excitement about HTML5 video has remained strong over the past four years.  Plugin-free playback that works seamlessly on all major platforms helps us deliver compelling experiences no matter how you choose to watch.  This is apparent when you venture through Stranger Things in hardware accelerated HD on Safari, or become transfixed by The Crown in Ultra HD on Edge.  And eventually, you will be able to delight in the darkest details of Marvel’s Daredevil in stunning High Dynamic Range.




Monday, March 13, 2017

Netflix Security Monkey on Google Cloud Platform

Today we are happy to announce that Netflix Security Monkey has BETA support for tracking Google Cloud Platform (GCP) services. Initially we are providing support for the following GCP services:


  • Firewall Rules
  • Networking
  • Google Cloud Storage Buckets (GCS)
  • Service Accounts (IAM)


This work was performed by a few incredible Googlers with the mission to take open source projects and add support for Google’s cloud offerings. Thank you for the commits!


GCP support is available in the develop branch and will be included in release 0.9.0. This work helps to fulfill Security Monkey’s mission as the single place to go to monitor your entire deployment.


To get started with Security Monkey on GCP, check out the documentation.


See Rae Wang, Product Manager on GCP, highlight Security Monkey in her talk, “Gaining full control over your organization's cloud resources (presented at Google Cloud Next '17)”.

Security Monkey’s History

We released Security Monkey in June 2014 as an open source tool to monitor Amazon Web Services (AWS) changes and alert on potential security problems. In 2014 it was monitoring 11 AWS services and shipped with about two dozen security checks. Now the tool monitors 45 AWS services, 4 GCP services, and ships with about 130 security checks.

Future Plans for Security Monkey

We plan to continue decomposing Security Monkey into smaller, more maintainable, and reusable modules. We also plan to use new event driven triggers so that Security Monkey will recognize updates more quickly. With Custom Alerters, Security Monkey will transform from a purely monitoring tool to one that will allow for active response.


More Modular:
  • We have begun the process of moving the service watchers out of Security Monkey and into CloudAux. CloudAux currently supports the four GCP services and three (of the 45) AWS services.
  • We have plans to move the security checks (auditors) out of Security Monkey and into a separate library.
  • Admins may change polling intervals, enable/disable technologies, and modify issue scores from within the settings panel of the web UI.
Event Driven:
  • On AWS, CloudTrail will trigger CloudWatch Event Rules, which will then trigger Lambda functions. We have a working prototype of this flow.
  • On GCP, Stackdriver Logging and Audit Logs will trigger Cloud Functions.
  • As a note, CloudSploit has a product in beta that implements this event driven approach.
Custom Alerters:
  • These can be used to provide new notification methods or correct problems.
  • The documentation describes a custom alerter that sends events to Splunk.
We’ll be following up with a future blog post to discuss these changes in more detail. In the meantime, check out Security Monkey on GitHub, join the community of users, and jump into conversation in our Gitter room if you have questions or comments.

Special Thanks

We appreciate the great community support and contributions for Security Monkey and want to specially thank:

  • Google: GCP Support in CloudAux/Security Monkey
  • Bridgewater Associates: Modularization of Watchers, Auditors, Alerters. Dozens of new watchers. Modifying the architecture to abstract the environment being monitored.

Wednesday, March 8, 2017

Netflix Downloads on Android

By Greg Benson, Francois Goldfain, and Ashish Gupta

Netflix is now a global company, so we wanted to provide a viewing experience that was truly available everywhere even when the Internet is not working well. This led to these three prioritized download use cases:
  1. Better, uninterrupted videos on unreliable Internet
  2. Reducing mobile data usage
  3. Watching Netflix without an Internet connection (e.g. on a train or plane)

... So, What Do We Build?


From a product perspective, we had many initial questions about how the feature should behave: What bitrate & resolution should we download content at? How much configuration should we offer to users? How will video bookmarks work when offline? How do we handle profiles?

We adopted some guiding principles based on general Netflix philosophies about what kind of products we want to create: the Downloads interface should not be so prominent that it's distracting, and the UX should be as simple as possible.

We chose an aggressive timeline for the feature since we wanted to deliver the experience to our members as soon as possible. We aimed to create a great experience with just the right amount of scope, and we could iterate and run A/B tests to improve the feature later on. Fortunately, our Consumer Insights team also had enough time to qualify our initial user-experience ideas with members and non-members before they were built.

How Do Downloads Work?


From an organizational perspective, the downloads feature was a test of coordination between a wide variety of teams. A technical spec was created that represented a balancing act of meeting license requirements, member desires, and security requirements (protecting from fraud). For Android, we used the technical spec to define which pieces of data we'd need to transfer to the client in order to provide a single 'downloaded video':
  • Content manifest (URLs for audio and video files)
  • Media files:
    • Primary video track
    • 2 audio tracks (one primary language plus an alternate based on user language preferences)
    • 2 subtitle tracks (based on user language preferences)
    • Trick play data (images while scrubbing)
  • DRM licenses
  • Title-level metadata and artwork (cached to disk)

Download Mechanics


We initially looked at Android's DownloadManager as the mechanism to actually transfer files and data to the client. This component was easy-to-use and handled some of the functionality we wanted. However, it didn’t ultimately allow us to create the UX we needed.

We created the Netflix DownloadManager for the following reasons:
  • Download Notifications: display download progress in a notification as an aggregate of all the files related to one 'downloadable video'.
  • Pause/Resume Downloads: provide a way for users to temporarily halt downloading.
  • Network Handling: dynamic network selection criteria in case the user changes this preference during a download (WiFi-only vs. any connection).
  • Analytics: understanding the details of all user behavior and the reasons why a download was halted.
  • Change of URL (CDN switching): Our download manifest provides multiple CDNs for the same media content. In case of failures to one CDN we wanted the ability to failover to alternate sources.

Storing Metadata


To store metadata for downloaded titles, our first implementation was a simple solution of serializing and deserializing json blobs to files on disk. We knew there would be problems with this (many objects created, GC churn, not developer-friendly), so while it wasn't our desired long-term solution, it met our needs to get a prototype off the ground.

For our second iteration of managing stored data, we looked at a few possible solutions including built-in SQLite support. We’d also heard a lot about Realm lately and a few companies that had success in using it as a fast and simple data-storage solution. Because we had limited experience with Realm and the downloads metadata case was relatively small and straightforward, we thought it would be a great opportunity to try Realm out.

Realm turned out to be easy to use and has a few benefits we like:
  • Zero-copy IO (by using memory mapping)
  • Strong performance profile
  • It's transactional and has crash safety (via MVCC)
  • Objects are easy to implement
  • Easy to query, no SQL statements
Realm also provides straightforward support for versioning of data, which allows data to be migrated from schema to schema if changed as part of an application update. In this case, a RealmMigration can be created which allows for mapping of data.

The challenges we had that most impacted our implementation included single thread access for objects and a lack of support for vectors such as List<>.

Now that the stability of Realm has been demonstrated in the field with downloads metadata, we are moving forward with adopting it more broadly in the app for generalized video metadata storage.


Updating Metadata


JobScheduler was introduced in Lollipop and allows us to be more resource-efficient in our background processing and network requests. The OS can batch jobs together for an overall efficiency gain. Longer-term, we wanted to build up our experience with this system component since developers will be encouraged more strongly by Google to use it in the future (e.g. Android ‘O’).

For our download use cases, it provided a great opportunity to get low-cost (or effectively free) network usage by creating jobs that would only activate when the user was on an unmetered network. What can our app do in the background?


1. Maintenance jobs:
  • Content license renewals
  • Metadata updates
  • Sync playback metrics: operations, state data, and usage
2. Resume downloads when connectivity restored

There were two major issues we found with JobScheduler. The first was how to provide the updates we needed from JobScheduler on pre-Lollipop devices? For these devices, we wrote an abstraction layer over top of the job-scheduling component, and on pre-Lollipop devices we use the system's Network Connectivity receiver and AlarmManager service to schedule background tasks manually at set times.

The second major problem we encountered with JobScheduler was its issue of crashing in certain circumstances (public bug report filed here). While we weren't able to put in a direct fix for this crash, we were able to determine a workaround whereby we avoided calling JobService.onJobFinished() altogether in certain cases. The job ultimately times out on its own so the cost of operating like this seemed better than permitting the app to crash.


Playback of Content


There are a number of methods of playing video on Android, varying in their complexity and level of control:

Method Comments/Limitations Netflix Usage
MediaPlayer High-level API, hardware decoding but fairly high-level. Limited format support such as DASH, no easy way to apply DRM. Never Used
In-app solution Bundle everything in-app, including media playback and DRM, no use of Android system components. Everything on CPU, more battery drain, potentially lower quality playback. Used for early versions of SD playback
OpenMAX AL Introduced in ICS, low-level APIs from the platform, opened many doors. However, native-c interfaces, deprecated quickly by Google. SD only since all DRM was done in-app. Used for ICS/JB
MediaCodec Low-level but in Java. Playback can be built on top of system components. First version only supported in-app DRM which was complex and only SD. In Android 4.3 Google, introduced a modular framework which allowed us to use built-in/platform Widevine DRM support. Provided for Widevine L1 which allows us to play HD content with a hardware dependency. Used for 4.2 and above, HD for some 4.3+ devices.

Further, playback of offline (non-streaming) content is not supported by the Android system DASH player. It wasn't the only option, but we felt that downloads were a good opportunity to try Google’s new Android ExoPlayer. The features we liked were:
  • Support for DASH, HLS, Smooth Streaming, and local sources
  • Extremely modular design, extensible and customizable
  • Used by Google/OEMs/SoC vendors as part of Android certification (goes to device support and fragmentation)
  • Great documentation and tutorials
The modularity of ExoPlayer was attractive for us since it allowed us to plug in a variety of DRM solutions. Our previous in-app DRM solution did not support offline licenses so we also needed to provide support for an alternate DRM mechanism.

Supporting Widevine


Widevine was selected due to its broad Android support, ability to work with offline licenses, a hardware-based decryption module with a software fallback (suitable for nearly any mobile device), and validation required by Android's Compatibility Test Suite (CTS).

However, this was a difficult migration due to Android fragmentation. Some devices that should have had L3 didn’t, some devices had insecure implementations, and other devices had Widevine APIs that failed whenever we called them. Support was therefore inconsistent, so we had to have reporting in place to monitor these failure rates.

If we detect this kind of failure during app init then we have little choice but to disable the Downloads feature on that device since playback would not be possible. This is unfortunate for users but will hopefully improve over time as the operating system is updated on devices.

Complete block diagram of playback components used for downloads.





Improving Video Quality


Our encoding team has written previously about the specific work they did to enable high-quality, low-bandwidth mobile encodes using VP9 for Android. However, how did we decide to use VP9 in the first place?

Most mobile video streams for Netflix use H.264/AVC with the Main Profile (AVCMain). Downloads were a good opportunity for us to migrate to a new video codec to reduce downloaded content size and pave the way for improved streaming bitrates in the future. The advantages of VP9 encoding for us included:
  • Encodes produced using libvpx are ~32% more efficient than our x264 encodes.
  • Decoder required since Android KitKat, i.e. 100% coverage for current Netflix app deployment
  • Fragmented but growing hardware support: 33% of phones and 4% of tablets using Netflix have a chipset that supports VP9 decoding in hardware
Migrating to support a new video encode had some up-front and ongoing costs, not the least of which was an increased burden placed on our content-delivery system, specifically our Open-Connect Appliances (OCAs). Due to the new encoding formats, more versions of the video streams needed to be deployed and cached in our CDN which required more space on the boxes. This cost was worthwhile for us to provide improved efficiency for downloaded content in the near term, and in the long term will also benefit members streaming on mobile as we migrate to VP9 more broadly.


Results


Many teams at Netflix were aligned to work together and release this feature under an ambitious timeline. We were pleased to bring lots of joy to our members around the world and give them the ability to take their favorite shows with them on the go.

The biggest proportion of downloading has been in Asia where we see strong traction in countries like India, Thailand, Singapore, Malaysia, Philippines, and Hong Kong.

The main suggestion we received for Android was around lack of SD card support, which we quickly addressed in a subsequent release in early 2017. We have now established a baseline experience for downloads, and will be able to A/B test a number of improvements and feature enhancements in coming months.



Tuesday, February 21, 2017

Introducing Netflix Stethoscope

Netflix is pleased to announce the open source release of Stethoscope, our first project following a User Focused Security approach.

The notion of “User Focused Security” acknowledges that attacks
against corporate users (e.g., phishing, malware) are the primary
mechanism leading to security incidents and data breaches, and it’s one of the core principles driving our approach to corporate
information security. It’s also reflective of our philosophy that tools are only effective when they consider the true context of people’s work.

Stethoscope is a web application that collects information for a given user’s devices and gives them clear and specific recommendations for securing their systems.

If we provide employees with focused, actionable information and low-friction tools, we believe they can get their devices into a more secure state without heavy-handed policy enforcement.


Software that treats people like people, not like cogs in the machine

We believe that Netflix employees fundamentally want to do the right thing, and, as a company, we give people the freedom to do their work as they see fit. As we say in the Netflix Culture Deck, responsible people thrive on freedom, and are worthy of freedom. This isn’t just a nice thing to say–we believe people are most productive and effective when they they aren’t hemmed in by excessive rules and process.

That freedom must be respected by the systems, tools, and procedures we design, as well.

By providing personalized, actionable information–and not relying on automatic enforcement–Stethoscope respects people’s time, attention, and autonomy, while improving our company’s security outcomes.

If you have similar values in your organization, we encourage you to give Stethoscope a try.

Education, not automatic enforcement

It’s important to us that people understand what simple steps they can take to improve the security state of their devices, because personal devices–which we don’t control–may very well be the first target of attack for phishing, malware, and other exploits. If they fall for a phishing attack on their personal laptop, that may be the first step in an attack on our systems here at Netflix.

We also want people to be comfortable making these changes themselves, on their own time, without having to go to the help desk.

To make this self service, and so people can understand the reasoning behind our suggestions, we show additional information about each suggestion, as well as a link to detailed instructions.

Security practices

We currently track the following device configurations, which we call “practices”:

  • Disk encryption
  • Firewall
  • Automatic updates
  • Up-to-date OS/software
  • Screen lock
  • Not jailbroken/rooted
  • Security software stack (e.g., Carbon Black)

Each practice is given a rating that determines how important it is. The more important practices will sort to the top, with critical practices highlighted in red and collected in a top banner.

Implementation and data sources

Stethoscope is powered by a Python backend and a React front end. The web application doesn’t have its own data store, but directly queries various data sources for device information, then merges that data for display.


The various data sources are implemented as plugins, so it should be relatively straightforward to add new inputs. We currently support LANDESK (for Windows), JAMF (for Macs), and Google MDM (for mobile devices).

Notifications

In addition to device status, Stethoscope provides an interface for viewing and responding to notifications.

For instance, if you have a system that tracks suspicious application accesses, you could choose to present a notification like this:


We recommend that you only use these alerts when there is an action for somebody to take–alerts without corresponding actions are often confusing and counterproductive.

Mobile friendly

The Stethoscope user interface is responsive, so it’s easy to use on mobile devices. This is especially important for notifications, which should be easy for people to address even if they aren’t at their desk.

What’s next?

We’re excited to work with other organizations to extend the data sources that can feed into Stethoscope. Osquery is next on our list, and there are many more possible integrations.

Getting started

Stethoscope is available now on GitHub. If you’d like to get a feel for it, you can run the front end with sample data with a single command. We also have a Docker Compose configuration for running the full application.

Join us!

We hope that other organizations find Stethoscope to be a useful tool, and we welcome contributions, especially new plugins for device data.

Our team, Information Security, is also hiring a Senior UI Engineer at our Los Gatos office. If you’d like to help us work on Stethoscope and related tools, please apply!

Presentations

We’d like to thank ShmooCon for giving us the chance to present this work earlier this year. The slides and video are now both publicly available.

by Jesse Kriss and Andrew White

Tuesday, February 7, 2017

Introducing HubCommander

By Mike Grima, Andrew Spyker, and Jason Chan

Netflix is pleased to announce the open source release of HubCommander, a ChatOps tool for GitHub management.

Why HubCommander?

Netflix uses GitHub, a source code management and collaboration site, extensively for both open source and internal projects. The security model for GitHub does not permit users to perform repository management without granting administrative permissions. Management of many users on GitHub can be a challenge without tooling. We needed to provide enhanced security capabilities while maintaining developer agility. As such, we created HubCommander to provide these capabilities in a method optimized for Netflix.

Why ChatOps?

Our approach leverages ChatOps, which utilizes chat applications for performing operational tasks. ChatOps is increasingly popular amongst developers, since chat tools are ubiquitous, provide a single context for what actions occurred when and by whom, and also provides an effective means to provide self-serviceability to developers.

How Netflix leverages GitHub:

All Netflix owned GitHub repositories reside within multiple GitHub organizations. Organizations contain the git repositories and the users that maintain them. Users can be added into teams, and teams are given access to individual repositories. In this model, a GitHub user would get invited to an organization from an administrator. Once invited, the user becomes a member of the organization, and is placed into one or more teams.

At Netflix, we have several organizations that serve specific purposes. We have our primary OSS organization “Netflix”, our “Spinnaker” organization that is dedicated to our OSS continuous delivery platform, and a skunkworks organization, “Netflix-Skunkworks”, for projects that are in rough development that may or may not become fully-fledged OSS projects, to name a few.

Challenges we face:

One of the biggest challenges with using GitHub organizations is user management. GitHub organizations are individual entities that must be separately administered. As such, the complexity of user management increases with the number of organizations. To reduce complexity, we enforce a consistent permissions model across all of our organizations. This allows us to develop tools to simplify and streamline our GitHub organization administration.

How we apply security to our GitHub organizations:

The permissions model that we follow is one that applies the principle of least privilege, but is still open enough so that developers can obtain the access they need and move fast. The general structure we utilize is to have all employees placed under an employee’s team that has “push” (write) access to all repositories. We similarly have teams for “bot” accounts to provide for automation. Lastly, we have very few users with the “owner” role, as owners are full administrators that can make changes to the organization itself.

While we permit our developers to have write access to all of our repositories, we do not directly permit them to create, delete, or change repository visibility. Additionally, all developers are required to have multi-factor authentication enabled. All of our developers on GitHub have their IDs linked in our internal employee tracking system, and GitHub membership to our organizations is removed when employees leave the company automatically (we have scripts to automate this).

We also enable third-party application restrictions on our organizations to only allow specific third party GitHub applications access to our repositories.

Why is tooling required?

We want to have self-service tooling that provides an equivalent amount of usability as providing users with administrative access, but without the risk of making all users administrators.

Our tooling provides a consistent permissions model across all of our GitHub organizations. It also empowers our users to perform privileged operations on GitHub in a consistent and supported manner, while limiting their individual GitHub account permissions.

Because we limited individual GitHub account permissions, this can be problematic for developers when creating repositories, since they also want to update the description, homepage, and even set default branches. Many of our developers also utilize Travis CI for automated builds. Travis CI enablement requires that users be administrators of their repositories, which we do not permit. Our developers also work with teams outside of Netflix to collaborate with on projects. Our developers do not have permissions to invite users to our organizations or to add outside collaborators to repositories. This is where HubCommander comes in.

The HubCommander Bot

HubCommander is a Slack bot for GitHub organizational management. It provides a ChatOps means for administering GitHub organizations. HubCommander operates by utilizing a privileged account on GitHub to perform administrative capabilities on behalf of our users. Our developers issue commands to the bot to perform their desired actions. This has a number of advantages:
  1. Self-Service: By providing a self-service mechanism, we have significantly reduced our administrative burden for managing our GitHub repositories. The reduction in administrative overhead has significantly simplified our open source efforts.
  2. Consistent and Supported: The bot performs all of the tasks that are required for operating on GitHub. For example, when creating repositories, the bot will automatically provide the correct teams access to the new repository.
  3. Least Privilege for Users: Because the bot can perform the tasks that users need to perform, we can reduce the GitHub API permissions on our users.
  4. Developer Familiarity: ChatOps is very popular at Netflix, so utilizing a bot for this purpose is natural for our developers.
  5. Easy to Use: The bot is easy to use by having an easily discoverable command structure.
  6. Secure: The bot also features integration with Duo for additional authentication.

HubCommander Features:

Out of the box, HubCommander has the following features:
  • Repository creation
  • Repository description and website modification
  • Granting outside collaborators specific permissions to repositories
  • Repository default branch modification
  • Travis CI enablement
  • Duo support to provide authentication to privileged commands
  • Docker image support
HubCommander is also extendable and configurable. You can develop authentication and command based plugins. At Netflix, we have developed a command plugin which allows our developers to invite themselves to any one of our organizations. When they perform this process, their GitHub ID is automatically linked in our internal employee tracking system. With this linkage, we can automatically remove their GitHub organization membership when they leave the company.
Duo is also supported to add additional safeguards for privileged commands. This has the added benefit of protecting against accidental command issuance, as well as the event of Slack credentials getting compromised. With the Duo plugin, issuing a command will also trigger a "Duo push" to the employee’s device. The command only continues to execute if the request is approved. If your company doesn’t use Duo, you can develop your own authentication plugin to integrate with any internal or external authentication system to safeguard commands.
Using the Bot:
Using the bot is as easy as typing !help in the Slack channel. This will provide a list of commands that HubCommander supports:
To learn how to issue a specific command, simply issue that command without any arguments. HubCommander will output the syntax for the command. For example, to create a new repository, you would issue the !CreateRepo command:
If you are safeguarding commands with Duo (or your own authentication plugin), an example of that flow would look like this:
Contributions:
These features are only a starting point, and we plan on adding more soon. If you’d like to extend these features, we’d love contributions to our repository on GitHub.