Monday, June 15, 2015

NTS: Real-time Streaming for Test Automation

by Peter Hausel and Jwalant Shah

Netflix Test Studio



Netflix members can enjoy instant access to TV shows & Movies on over 1400 different device/OS permutations. Assessing long-duration playback quality and delivering a great member experience on such a diverse set of playback devices presented a huge challenge to the team.


Netflix Test Studio (NTS) was created with the goal of creating a consistent way for internal and external developers to deploy and execute tests. This is achieved by abstracting device differences. NTS also provides a standard set of tools for assessing the responsiveness and quality of the overall experience. NTS now runs over 40,000 long-running tests each day on over 600 devices around the world.


Overview


NTS is a cloud-based automation framework that lets you remote control most Netflix Ready Devices. In this post we’ll focus on two key aspects of the framework:
  • Collect test results in near-realtime.
    • A highly event driven architecture allows us to accomplish this: JSON snippets sent from the single page UI to the device and JavaScript listeners on the device firing back events. We also have a requirement to be able to play back events as they happened, just like a state machine.
  • Allow testers to interact with both the device and various Netflix services during execution.
    • Integrated tests require the control of the test execution stream in order to simulate real-world conditions. We want to simulate failures, pause, debug and resume during test execution.


A typical user interface for Test Execution using NTS














A Typical NTS Test:



Architecture overview

Early implementation of NTS had a relatively simplistic design: hijack a Netflix Ready Device for automation via various redirection methods, then a Test Harness (test executor) would coordinate the execution with the help of a central, public facing Controller service. Eventually, we would get data out from the device via long polling, validate steps, and bubble up validation results back to the client. We built separate clusters of this architecture for each Netflix SDK version.

Original Architecture using Long Polling

nts_legacy_with devices.png

Event playback is not supported

This model worked relatively well in the beginning. However, as the number of supported devices, SDK’s and test cases grew, we started seeing the limitations of this approach: messages were sometimes lost, there was no way of knowing what exactly happened, error messages were misleading, tests were hard to monitor and playback real-time, finally, maintaining almost identical clusters with different test content and SDK versions was introducing an additional maintenance burden as well.

In the next iteration of the tool, we removed the Controller service and most of the polling by introducing a WebSockets proxy (built on top of JSR-356) that was sitting between the clients and Test Executors. We also introduced JSON-RPC as the command protocol.

Updated Version - Near-Realtime (Almost There)

nts_mono_devices.png

Pub/Sub without event playback support

  • Test Executor submits events in a time series fashion to a Websocket Bus which terminates at Dispatcher.
  • Client connects to a Dispatcher with session Id information. One-to-many relationship between Dispatcher and TestExecutors.
  • Dispatcher instance keeps an internal lookup of test execution session id’s to Websocket connections to Test Executors and delivers messages received over those connections to the Client.

This approach solved most of our issues: fewer indirections, real-time streaming capabilities, push-based design. There were only two remaining issues: message durability was still not supported and more importantly, the WebSockets proxy was difficult to scale out due to its stateful nature.

At this point, we started looking into Apache Kafka to replace the internal WebSocket layer with a distributed pub/sub and message queue solution.

Current version - Kafkants_kafka_messaging_devices.png
Pub/Sub with event playback support

A few interesting properties of this pub/sub system:
  • Dispatcher is responsible for handling client requests to subscribe to Test Execution events stream.
  • Kafka provides a scalable message queue between Test Executor and Dispatcher. Since each session id is mapped to a particular partition and each message sent to client includes the current Kafka offset, we can now guarantee reliable delivery of messages to clients with support for replay of messages in case of network reconnection.
  • Multiple clients can subscribe to the same stream without additional overhead and admin users can view/monitor remote users test execution in real time.
  • The same stream is consumed for analytics purposes as well.
  • Throughput/Latency: during load testing, we could get ~90-100ms latency per message consistently with 100 concurrent users (our test setup was 6 brokers deployed on 6 d2.xlarge instances). In our production system, latency is often lower due to batching.

Where do we go from here?

With HTTP/2 on the horizon, it’s unclear where WebSockets will fit in the long-run. That said, if you need a TCP-based, persistent channel now, you don’t have a better option. While we are actively migrating away from JSR-356 (and Tomcat Websocket) to RxNetty due to numerous issues we ran into, we continue to invest more in WebSockets.

As for Kafka, the transition was not problem free either. But Kafka solved some very hard problems for us (distributed event bus, message durability, consuming a stream both as a distributed queue and pub/sub etc.) and more importantly, it opened up the door for further decoupling. As a result, we are moving forward with our strategic plan to use this technology as the unified backend for our data pipeline needs.

(Engineers who worked on this project: Jwalant Shah, Joshua Hua, Matt Sun)

Thursday, June 4, 2015

Localization Technologies at Netflix

The localization program at Netflix is centered around linguistic excellence, a great team environment, and cutting-edge technology. The program is only 4 years old, which for a company our size is unusual to find. We’ve built a team and toolset representative of the scope and scale that a localization team needs to operate at in 2015, not one that is bogged down with years of legacy process and technology, as is often the case.
We haven’t been afraid to experiment with new localization models and tools, going against localization industry norms and achieving great things along the way. At Netflix we are given the freedom to trailblaze.
In this blog post we’re going to take a look at two major pieces of technology we’ve developed to assist us on our path to global domination…
Netflix Global String Repository
Having great content by itself is not enough to make Netflix successful; how the content is presented has a huge impact. Having an intuitive, easy to use, and localized user interface (UI) contributes significantly to Netflix's success. Netflix is available on the web and on a vast number of devices and platforms including Apple iOS, Google Android, Sony PlayStation, Microsoft Xbox, and TVs from Sony, Panasonic, etc. Each of these platforms has their own standards for internationalization, and that poses a challenge to our localization team.
Here are some situations that require localization of UI strings:
- New languages are introduced
- New features are developed
- Fixes are made to current text data
Traditionally, getting UI strings translated is a high-touch process where a localization PM partners with a dev team to understand where to get the source strings from, what languages to translate them into, and where to deliver the final localized files. This gets further complicated when multiple features are being developed in parallel using different branches in Git.
Once translations are completed and the final files delivered, an application typically goes through a build, test and deploy process. For device UIs, a build might need additional approval from a third party like Apple. This causes unnecessary delays, especially in cases where a fix to a string needs to be rolled out immediately.
What if we can make this whole process transparent to the various stakeholders – developers, and localization? What if we can make builds unnecessary when fixes to text need to be delivered?
In order to answer those questions we have developed a global repository for UI strings, called Global String Repository, that allows teams to store their localized string data and pull it out at runtime. We have also integrated Global String Repository with our current localization pipeline making the whole process of localization seamless. All translations are available immediately for consumption by applications.
Global String Repository allows isolation through bundles and namespaces. A bundle is a container for string data across multiple languages. A namespace is a placeholder for bundles that are being worked upon. There is a default namespace that is used for publishing. A simple workflow would be:
  1. A developer makes a change to the English string data in a bundle in a namespace
  2. Translation workflows are automatically triggered
  3. Linguist completes the translation workflow
  4. Translations are made available to the bundle in the namespace
Applications have a choice when integrating with Global String Repository:
  • Runtime: Allows fast propagation of changes to UIs
  • Build time: Uses Global String Repository solely for localization but packages the data with the builds
Global String Repository allows build time integration by making all necessary localized data available through a simple REST API.
We expose the Global String Repository via the Netflix edge APIs and it is subjected to the same scaling and availability requirements as the other metadata APIs. It is a critical piece especially for applications that are integrating at runtime. With over 60 million customers, a large portion of whom stream Netflix on devices, Global String Repository is in the critical path.
True to the Netflix way, Global String Repository is comprised of a back-end microservice and a UI. The microservice is built as a Java web application using Apache Cassandra and ElasticSearch. It is deployed in AWS across 3 regions. We collect telemetry for every API interaction.
The Global String Repository UI is developed using Node.js, Bootstrap and Backbone and is also deployed in the AWS cloud.
On the client side, Global String Repository exposes REST APIs to retrieve string data and also offers a Java client with in-built caching.
While we have Global String Repository up and running, there is still a long way to go. Some of the things we are currently working on are:
- Enhancing support for quantity strings (plurals) and gender based strings
- Making the solution more resilient to failures
- Improving scalability
- Supporting multiple export formats (Android XML, Microsoft .Resx, etc)
The Global String Repository has no binding to Netflix's business domain, so we plan on releasing it as open source software.
Hydra
Netflix, as a soon-to-be global service, supports many locales across myriad of device/UI combinations; testing this manually just does not scale. Previously, members of the localization and UI teams would manually use actual devices, from game consoles to iOS and Android, to see all of these strings in context to test for both the content as well as any UI issues, such as truncations.
At Netflix, we think there is always a better way; with that attitude we rethought how we do in context, on device localization testing, and Hydra was born.
The motivation behind Hydra is to catalogue every possible unique screen and allow anyone to see a specific set of screens that they are interested in, across a wide range of filters including devices and locales. For example, as a German localization specialist you could, by selecting the appropriate filters, see the non-member flow in German across PS3, Website and Android. These screens can then be reviewed in a fraction of the time it would take to get to all of those different screens across those devices.
How Screens Reach Hydra
Hydra itself does not take any of the screens, it serves to catalogue and display them. To get screens into Hydra, we leverage our existing UI automation. Through Jenkins CI jobs, data driven tests are run in parallel across all supported locales, to take screenshots and post them screens to Hydra with appropriate metadata, including page name, feature area, major UI platform, and one critical piece of metadata, unique screen definition.
The purpose of the unique screen definition is to have a full catalogue of screens without any unnecessary overlap. This allows for fewer screens to be reviewed as well as for longer term to be able to compare a given screen against itself over time. The definition of a unique screen is different from UI to UI, for browser it is a combination of page name, browser, resolution, local and dev environment.
The Technology
hydraPost.jpg
Hydra is a full stack web application deployed to AWS. The Java based backend has two main functions, it processes incoming screenshots and exposes data to the frontend through rest APIs. When the UI automation posts a screen to Hydra, the image file itself is written to S3, allowing for more or less infinite storage, and the much smaller metadata is written to a RDS database so as to be queried later through the rest APIs. The rest endpoints provide a mapping of query string params to MySQL queries.
For example:
REST/v1/lists/distinctList?item=feature&selectors=uigroup,TVUI;area,signupwizard;locale,da-DK
This call would essentially map to this query to populate the values for the ‘feature’ filter:
select distinct feature where uigroup = ‘TVUI’ AND area = ‘signupwizard’ AND locale = ‘da-DK’
The JavaScript frontend, which leverages knockout.js, serves to allow users to select filters and view the screens that match those filters. The content of the filters as well as the screens that match the filters that are already selected are both provided by making calls to the rest endpoints mentioned above.
Allowing for Scale
With Hydra in place and the automation running, adding support for new locales becomes as easy as adding one line to an existing property file that feeds the testNG data provider. The screens in the new locale will then flow in with the next Jenkins builds that run.
Next Steps
One known improvement is to have a mechanism to know when a screen has changed. In its current state, if a string changes there is nothing that automatically identifies that a screen has changed. Hydra could evolve into more or less a work queue, localization experts could login and see only the specific set of screens that have changed.
Another feature would be to have the ability to map individual string keys map to which screens. This would allow a translator to change a string, and then search for that string key, and see the screens that are affected by that change. This allows the translator to be able to see that string change in context before even making it.
If what we’re doing here at Netflix with regards to localization technology excites you, please take a moment to review the open positions on our Localization Platform Engineering team:


We like big challenges and have no shortage of them to work on. We currently operate in 50 countries, by the end of 2016 that number will grow to 200. Netflix will be a truly global product and our localization team needs to scale to support that. Challenges like these have allowed us to attract the best and brightest talent, and we’ve built a team that can do what seems impossible.

Wednesday, May 27, 2015

Netflix Streaming - More Energy Efficient than Breathing

Netflix Streaming: Energy Consumption for 2014 was 0.0013 kWh per Streaming Hour Delivered

  • 36% was from renewable sources

  • 28% was offset with renewable energy credits

  • We plan to be fully offset by 2015, and to increase the contribution of renewable sources
  • Carbon footprint of about 300g of CO2 per customer represents about 0.007% of the typical US household footprint of 43,000 kg (48 tons) of CO2 per year


Since 2007 when Netflix launched its streaming service, usage has grown exponentially. Last quarter alone, our 60 million members collectively enjoyed 10 billion streaming hours worldwide.
Netflix streaming consumes energy in two main ways:
  1. The majority of our technology is operated in the Amazon Web Services (AWS) cloud platform. AWS offers us unprecedented global scale, hosting tens of thousands of virtual instances and many petabytes of data across several cloud regions.
  2. The audio-video media itself is delivered from “Open Connect” content servers, which are forward positioned close to, or inside of, ISP networks for efficient delivery.
In addition, energy is consumed by:
  1. The ISP networks, which carry the data across “the last mile” from our content servers to our customers.
  2. The “consumer premises equipment” (CPE) that includes cable or DSL modems, routers, WiFi access points, set-top boxes, and TVs, laptops, tablets, and phones.
First and foremost, we have focused on efficiency -- making sure that the technology we have built and use is as efficient as possible, which helps with all four components: those for which Netflix is responsible, and those associated with ISP operations and consumer choices.  Then we have focused on procuring renewables or offsets for the power that our own systems consume.

AWS Footprint

Because Netflix relies more heavily on AWS regions that are powered primarily by renewable energy (including the carbon-neutral Oregon region), our energy mix is approximately 50% from renewable sources today. We mitigate all of the remaining carbon emissions, which added up to approximately 10,200 tons of CO2e in 2014, by investing in renewable energy credits (RECs) in the geographic areas that host our cloud footprint; last year, the majority went to RECs for wind projects in North America, with the remainder going to Guarantees of Origin (GOs) for hydropower in Europe.
Purchasing renewable energy credits (RECs) allows us to be carbon-neutral in the cloud, but our main strategy is to be more efficient and consume less energy in the first place. Back in the data center days, long provisioning cycles and spikes in customer demand required us to maintain large capacity buffers that went unused most of the time: overall server utilization percentage was in the single digits. Thanks to the elasticity of the cloud, we are able to instantaneously grow and shrink our capacity along with customer demand, generally keeping our server utilization above 50%. This brought significant benefits to our bottom line (moving to the cloud reduced our server-side costs per streaming hour by 85%), but also allowed us to drastically improve our carbon efficiency.
Open Connect Footprint
Open Connect, the Netflix Content Delivery Network, was designed with power efficiency in mind. Today, the entirety of Netflix’s Content Delivery servers consume 1.4 Megawatts of power. While these servers are located in hundreds of locations across the globe, a majority of them are in major colocation vendors with similar interest as ours in ensuring a bright future for renewable energy.
As we have evolved Open Connect, we have reduced the energy consumption of our servers significantly. At our 2012 launch, we consumed nearly .6 watts per Megabit per second (Mbps) of peak capacity. In 2015, our flash-based servers consume less than .006 watts per Mbps, a 100X improvement. Those flash-based servers generate nearly 70% of Netflix’s global traffic footprint.
When choosing where to locate Open Connect CDN servers, sustainability is a key metric used to evaluate our potential partners. It’s important that our data center providers commit to 100% green power through RECs and that they continue to find new and innovative ways to become carbon neutral.  One such example is Equinix’s experiment with Bloom Energy fuel cells in its SV5 data center in San Jose, one of the facilities in which Netflix equipment is colocated.  Equinix recently announced a major initiative to adopt 100% clean and renewable energy across their global platform. We have a goal to work with datacenter operators to increase their use of renewable sources of power, and we expect to buy offsets for 100% of any power that is not from renewable sources for 2015 and beyond.
We estimate that our Open Connect servers used non-renewable power responsible for about 7,500 Tons of CO2e in 2014.

ISPs

While we don’t control the energy choices of ISPs, we have engineered our Open Connect media servers to minimize the requirements for routers, by providing routing technology as part of the package, so that an ISP who chooses to interconnect directly with Netflix can usually use a smaller, cheaper, and much more power-efficient switch instead of a router for bringing Netflix traffic onto their networks.  In some cases, avoiding the need for a router might eliminate three quarters of the power footprint of a particular deployment.

Consumer Premise Equipment

The energy footprint of the consumers’ home equipment (shared between various entertainment and computing uses in the consumers’ homes) dwarfs all the upstream elements by perhaps two orders of magnitude.  Our focus here has been to provide streaming technology for Smart TVs, set-top boxes, game consoles, tablets, phones, computers that is as efficient as possible.  For example, a big focus for the 2015 Smart TV platforms has been suspend and resume capabilities, which ensure that Netflix can be started quickly from a powered-down state, which helps TV manufacturers build energy-star compliant TVs that don’t waste energy while the user is not watching.  This is one of several components in our “Netflix Recommended TV” program.  Similarly, our choice of encoder technology takes into account the hardware acceleration capabilities of devices such as smart phones, tablets, and laptop graphics chips, which have the ability to reduce power consumption of video rendering, which might extend tablet battery life by 4x with matching reduction in total power consumption due to streaming activity.
A typical household watching Netflix might include 5W for the cable modem, 10W for the WiFi access point, and 100W for the Smart-TV.  115Wh of home power is responsible for about 70g CO2e for one hour of viewing.
We encourage our CE partners to make energy-wise designs, but ultimately the choices that customers make are also governed by their other home entertainment and computing needs and desires, and accordingly we don’t measure or attempt to offset those impacts.

Comparisons

In 2014, Netflix infrastructure generated only 0.5g of CO2e emissions for each hour of streaming. The average human breathing emits about 40g/hour, nearly 100x as much.  Sitting still while watching Netflix probably saves more CO2 than Netflix burns.
The amount of carbon equivalent emitted in order to produce a single quarter-pound hamburger can power Netflix infrastructure to enable viewing by 10 member families for an entire year!
A viewer who turned off their TV to read books would consume about 24 books a year in equivalent time, for a carbon footprint around 65kg CO2e - over 200 times more than Netflix streaming servers, while the 100W reading light they might we use would match the consumption of the TV they could have watched instead!