Wednesday, May 27, 2015

Netflix Streaming - More Energy Efficient than Breathing

Netflix Streaming: Energy Consumption for 2014 was 0.0013 kWh per Streaming Hour Delivered

  • 36% was from renewable sources

  • 28% was offset with renewable energy credits

  • We plan to be fully offset by 2015, and to increase the contribution of renewable sources
  • Carbon footprint of about 300g of CO2 per customer represents about 0.007% of the typical US household footprint of 43,000 kg (48 tons) of CO2 per year


Since 2007 when Netflix launched its streaming service, usage has grown exponentially. Last quarter alone, our 60 million members collectively enjoyed 10 billion streaming hours worldwide.
Netflix streaming consumes energy in two main ways:
  1. The majority of our technology is operated in the Amazon Web Services (AWS) cloud platform. AWS offers us unprecedented global scale, hosting tens of thousands of virtual instances and many petabytes of data across several cloud regions.
  2. The audio-video media itself is delivered from “Open Connect” content servers, which are forward positioned close to, or inside of, ISP networks for efficient delivery.
In addition, energy is consumed by:
  1. The ISP networks, which carry the data across “the last mile” from our content servers to our customers.
  2. The “consumer premises equipment” (CPE) that includes cable or DSL modems, routers, WiFi access points, set-top boxes, and TVs, laptops, tablets, and phones.
First and foremost, we have focused on efficiency -- making sure that the technology we have built and use is as efficient as possible, which helps with all four components: those for which Netflix is responsible, and those associated with ISP operations and consumer choices.  Then we have focused on procuring renewables or offsets for the power that our own systems consume.

AWS Footprint

Because Netflix relies more heavily on AWS regions that are powered primarily by renewable energy (including the carbon-neutral Oregon region), our energy mix is approximately 50% from renewable sources today. We mitigate all of the remaining carbon emissions, which added up to approximately 10,200 tons of CO2e in 2014, by investing in renewable energy credits (RECs) in the geographic areas that host our cloud footprint; last year, the majority went to RECs for wind projects in North America, with the remainder going to Guarantees of Origin (GOs) for hydropower in Europe.
Purchasing renewable energy credits (RECs) allows us to be carbon-neutral in the cloud, but our main strategy is to be more efficient and consume less energy in the first place. Back in the data center days, long provisioning cycles and spikes in customer demand required us to maintain large capacity buffers that went unused most of the time: overall server utilization percentage was in the single digits. Thanks to the elasticity of the cloud, we are able to instantaneously grow and shrink our capacity along with customer demand, generally keeping our server utilization above 50%. This brought significant benefits to our bottom line (moving to the cloud reduced our server-side costs per streaming hour by 85%), but also allowed us to drastically improve our carbon efficiency.
Open Connect Footprint
Open Connect, the Netflix Content Delivery Network, was designed with power efficiency in mind. Today, the entirety of Netflix’s Content Delivery servers consume 1.4 Megawatts of power. While these servers are located in hundreds of locations across the globe, a majority of them are in major colocation vendors with similar interest as ours in ensuring a bright future for renewable energy.
As we have evolved Open Connect, we have reduced the energy consumption of our servers significantly. At our 2012 launch, we consumed nearly .6 watts per Megabit per second (Mbps) of peak capacity. In 2015, our flash-based servers consume less than .006 watts per Mbps, a 100X improvement. Those flash-based servers generate nearly 70% of Netflix’s global traffic footprint.
When choosing where to locate Open Connect CDN servers, sustainability is a key metric used to evaluate our potential partners. It’s important that our data center providers commit to 100% green power through RECs and that they continue to find new and innovative ways to become carbon neutral.  One such example is Equinix’s experiment with Bloom Energy fuel cells in its SV5 data center in San Jose, one of the facilities in which Netflix equipment is colocated.  Equinix recently announced a major initiative to adopt 100% clean and renewable energy across their global platform. We have a goal to work with datacenter operators to increase their use of renewable sources of power, and we expect to buy offsets for 100% of any power that is not from renewable sources for 2015 and beyond.
We estimate that our Open Connect servers used non-renewable power responsible for about 7,500 Tons of CO2e in 2014.

ISPs

While we don’t control the energy choices of ISPs, we have engineered our Open Connect media servers to minimize the requirements for routers, by providing routing technology as part of the package, so that an ISP who chooses to interconnect directly with Netflix can usually use a smaller, cheaper, and much more power-efficient switch instead of a router for bringing Netflix traffic onto their networks.  In some cases, avoiding the need for a router might eliminate three quarters of the power footprint of a particular deployment.

Consumer Premise Equipment

The energy footprint of the consumers’ home equipment (shared between various entertainment and computing uses in the consumers’ homes) dwarfs all the upstream elements by perhaps two orders of magnitude.  Our focus here has been to provide streaming technology for Smart TVs, set-top boxes, game consoles, tablets, phones, computers that is as efficient as possible.  For example, a big focus for the 2015 Smart TV platforms has been suspend and resume capabilities, which ensure that Netflix can be started quickly from a powered-down state, which helps TV manufacturers build energy-star compliant TVs that don’t waste energy while the user is not watching.  This is one of several components in our “Netflix Recommended TV” program.  Similarly, our choice of encoder technology takes into account the hardware acceleration capabilities of devices such as smart phones, tablets, and laptop graphics chips, which have the ability to reduce power consumption of video rendering, which might extend tablet battery life by 4x with matching reduction in total power consumption due to streaming activity.
A typical household watching Netflix might include 5W for the cable modem, 10W for the WiFi access point, and 100W for the Smart-TV.  115Wh of home power is responsible for about 70g CO2e for one hour of viewing.
We encourage our CE partners to make energy-wise designs, but ultimately the choices that customers make are also governed by their other home entertainment and computing needs and desires, and accordingly we don’t measure or attempt to offset those impacts.

Comparisons

In 2014, Netflix infrastructure generated only 0.5g of CO2e emissions for each hour of streaming. The average human breathing emits about 40g/hour, nearly 100x as much.  Sitting still while watching Netflix probably saves more CO2 than Netflix burns.
The amount of carbon equivalent emitted in order to produce a single quarter-pound hamburger can power Netflix infrastructure to enable viewing by 10 member families for an entire year!
A viewer who turned off their TV to read books would consume about 24 books a year in equivalent time, for a carbon footprint around 65kg CO2e - over 200 times more than Netflix streaming servers, while the 100W reading light they might we use would match the consumption of the TV they could have watched instead!

Monday, May 4, 2015

Introducing FIDO: Automated Security Incident Response



We're excited to announce the open source release of FIDO (Fully Integrated Defense Operation - apologies to the FIDO Alliance for acronym collision), our system for automatically analyzing security events and responding to security incidents.

Overview

The typical process for investigating security-related alerts is labor intensive and largely manual. To make the situation more difficult, as attacks increase in number and diversity, there is an increasing array of detection systems deployed and generating even more alerts for security teams to investigate.

Netflix, like all organizations, has a finite amount of resources to combat this phenomenon, so we built FIDO to help. FIDO is an orchestration layer that automates the incident response process by evaluating, assessing and responding to malware and other detected threats.

The idea for FIDO came from a simple proof of concept a number of years ago. Our process for handling alerts from one of our network-based malware systems was to have a help desk ticket created and assigned to a desktop engineer for follow-up - typically a scan of the impacted system or perhaps a re-image of the hard drive. The time from alert generation to resolution of these tickets spanned from days to over a week. Our help desk system had an API, so we had a hypothesis that we could cut down resolution time by automating the alert-to-ticket process. The simple system we built to ingest the alerts and open the tickets cut the resolution time to a few hours, and we knew we were onto something - thus FIDO was born.

Architecture and Operation

This section describes FIDO's operation, and the following diagram provides an overview of FIDO’s architecture.




Detection

FIDO’s operation begins with the receipt of an event via one of FIDO’s detectors. Detectors are off the shelf security products (e.g. firewalls, IDS, anti-malware systems) or custom systems that detect malicious activities or threats. Detectors generate alerts or messages that FIDO ingests for further processing. FIDO provides a number of ways to ingest events, including via API (the preferred method), SQL database, log file, and email. FIDO supports a variety of detectors currently (e.g. Cyphort, ProtectWise, CarbonBlack/Bit9) with more planned or under development.

Analysis and Enrichment

The next phase of FIDO operation involves deeper analysis of the event and enrichment of the event data with both internal and external data sources. Raw security events often have little associated context, and this phase of operation is designed to supplement the raw event data with supporting information to enable more accurate and informed decision making.

The first component of this phase is analysis of the event’s target - typically a computer and/or user (but potentially any targeted resource). Is the machine a Windows host or a Linux server? Is it in the PCI zone? Does the system have security software installed and the latest patches? Is the targeted user a Domain Administrator? An executive? Having answers to these questions allows us to better evaluate the threat and determine what actions need to be taken (and with what urgency). To gather this data, FIDO queries various internal data sources - currently supported are Active Directory, LANDesk, and JAMF, with other sources under consideration.

In addition to querying internal sources, FIDO consults external threat feeds for information relevant to the event under analysis. The use of threat feeds help FIDO determine whether a generated event may be a false positive or how serious and pervasive the issue may be. Another way to think of this step is ‘never trust, always verify.’ A generated alert is simply raw data - it must be enriched, evaluated, and corroborated before actioning. FIDO supports several threats feeds, including ThreatGrid and VirusTotal, with additional feeds under consideration.

Correlation and Scoring

Once internal and external data has been gathered about a given event and its target(s), FIDO seeks to correlate the information with other data it has seen and score the event to facilitate ultimate disposition. The correlation component serves several functions - first - have multiple detectors identified this same issue? If so, it could potentially be a more serious threat. Second - has one of your detectors already blocked or remediated the issue (for example - a network-based malware detector identifies an issue, and a separate host-based system repels the same item)? If the event has already been addressed by one of your controls, FIDO may simply provide a notification that requires no further action. The following image gives a sense of how the various scoring components work together.


Scoring is multi-dimensional and highly customizable in FIDO. Essentially, what scoring allows you to do is tune FIDO’s response to the threat and your own organization’s unique requirements. FIDO implements separate scoring for the threat, the machine, and the user, and rolls the separate scores into a total score. Scoring allows you to treat PCI systems different than lab systems, customer service representatives different than engineers, and new event sources different than event sources with which you have more experience (and perhaps trust). Scoring leads into the last phase of FIDO’s operation - Notification and Enforcement.

Notification and Enforcement

In this phase, FIDO determines and executes a next action based on the ingested event, collected data, and calculated scores. This action may simply be an email to the security team with details or storing the information for later retrieval and analysis. Or, FIDO may implement more complex and proactive measures such as disabling an account, ending a VPN session, or disabling a network port. Importantly, the vast majority of enforcement logic in FIDO has been Netflix-specific. For this reason, we’ve removed most of this logic and code from the current OSS version of FIDO. We will re-implement this functionality in the OSS version when we are better able to provide the end-user reasonable and scalable control over enforcement customization and actions.

Open Items & Future Plans

Netflix has been using FIDO for a bit over 4 years, and while it is meeting our requirements well, we have a number of features and improvements planned. On the user interface side, we are planning for an administrative UI with dashboards and assistance for enforcement configuration. Additional external integrations planned include PAN, OpenDNS, and SentinelOne. We're also working on improvements around correlation and host detection. And, because it's now OSS, you are welcome to suggest and submit your own improvements!

Netflix's FIDO - Fully Integrated Defense Operation - is not a part of or service of the FIDO Alliance)

-Rob Fry, Brooks Evans, Jason Chan