Friday, March 29, 2013

NetflixOSS Meetup Series 1 Episode 2 - Announcing the Cloud Prize

by Adrian Cockcroft

On March 13th we held our second NetflixOSS meetup. It was well attended and used the same format of some presentations followed by demonstrations of the latest projects. Videos and slides are embedded below.

We started off with lightning talks by the engineers who own Karyon, our RSS Reader Recipe, EVCache, Denominator, Aminator, Netflix Graph, and Continuous integration services for all projects. Coming on the same day that Google announced that they are ending support for Google Reader, we wonder if anyone wants to use our recipe as the basis for a replacement?

Ruslan then announced that our first project “Curator”, a set of Apache Zookeeper recipes is now an Apache Incubator project, and is on a path to eventually become part of Zookeeper itself. It’s an example of giving up control of a body of code, in return for wider adoption of the technology.

Looking ahead Ruslan talked about upcoming projects to release more monkeys into the wild, our Genie big data analytics platform, a visualization tool for the Pig language that we will call Lipstick, and our Explorers dashboard user interface toolkit.

I then took over to make the surprise announcement for the evening. In 2012 Netflix was fortunate to win an Emmy award for our streaming service technology, and when we were thinking about how to inspire engineers, we thought about how proud everyone was that we had collectively won an Emmy.




Wouldn’t it be great if we could give prizes for contributions to the NetflixOSS platform?
The contributions would have to be Apache licenced and hosted on github, like our own code.

What would the prize categories be? 

How about these ten:
Best example application mash-up
Best new monkey
Best contribution to code quality
Best new feature
Best contribution to operational tools, availability and manageability
Best portability enhancement
Best contribution to performance improvements
Best datastore integration
Best usability enhancement
Judges choice award

How long should the contest be open?
We opened the Netflix Prize Contest during the meeting, and it will run for six months, until September 15th.

Who can enter and win?
Almost anyone, almost anywhere. We have to exclude current and former Netflix employees and AWS employees, and a few countries for legal reasons.

Who decides who wins?
We expect there to be a lot of entries, so there will be a nominating committee made up of people who aren’t eligible to enter, Netflix engineers and managers, who will track the submissions over the six month period and come up with the best candidates in each category. The final decision on who wins will be made by a panel of expert independent judges.

We are very happy to have such a distinguished and diverse team of judges.



What are the attributes that they will be looking for in a winning submission?

  • eligible, Apache License, Version 2.0
  • identifies pull requests that have been made against existing NetflixOSS projects
  • provides an original and useful contribution to the NetflixOSS platform
  • follows good accepted code quality and structure practices
  • contains documentation on how to build and run code provided as part of the Submission
  • contains code that successfully builds and passes a test suite provided as part of the Submission
  • provides evidence that code is in use by other projects, or is running in production at Netflix or elsewhere
  • has a large number of watchers, stars and forks on github.com

What everyone really wanted to know at this point was the prize...

What do the ten winners get?
They get to be guests of Netflix for a visit to AWS Re:Invent in Las Vegas in November
We’ll have a prize giving ceremony and host a dinner with the winners.
They get a trophy. It won’t be quite as nice as the Emmy, but what we have in mind will be cool and geeky, and you will want to show it to your friends.
The big prize is that they each get $10,000 in cash from Netflix, and a $5,000 credit from AWS!

To enter the contest, you first need a github account. From the Netflix site at http://netflix.github.com you go to the http://github.com/Netflix/Cloud-Prize repo, read the rules, fork the repo, fill in a form to tell us who you are, and create your submission by updating your fork of the Cloud-Prize.
Here’s an info-graphic that summarizes the prize:



Wrapping up, we set the date of the next NetflixOSS meetup to be about half way through the Cloud Prize period, in June, but haven’t set an exact date yet. People who have entered the Cloud Prize and given us their email addresses will be given early sign-up privileges for future meetups.

This is the opening video that covers the lightning talks on the projects.


And here are the slides.





This video continues with the Netflix OSS Cloud Prize Announcement. Unfortunately the end of this section had a technical problem and wasn't recorded, so that part was re-recorded in a empty room a few days later.





Wednesday, March 27, 2013

System Architectures for Personalization and Recommendation

by Xavier Amatriain and Justin Basilico


In our previous posts about Netflix personalization, we highlighted the importance of using both data and algorithms to create the best possible experience for Netflix members. We also talked about the importance of enriching the interaction and engaging the user with the recommendation system. Today we're exploring another important piece of the puzzle: how to create a software architecture that can deliver this experience and support rapid innovation. Coming up with a software architecture that handles large volumes of existing data, is responsive to user interactions, and makes it easy to experiment with new recommendation approaches is not a trivial task. In this post we will describe how we address some of these challenges at Netflix.

To start with, we present an overall system diagram for recommendation systems in the following figure. The main components of the architecture contain one or more machine learning algorithms. 


The simplest thing we can do with data is to store it for later offline processing, which leads to part of the architecture for managing Offline jobs. However, computation can be done offline, nearline, or online. Online computation can respond better to recent events and user interaction, but has to respond to requests in real-time. This can limit the computational complexity of the algorithms employed as well as the amount of data that can be processed. Offline computation has less limitations on the amount of data and the computational complexity of the algorithms since it runs in a batch manner with relaxed timing requirements. However, it can easily grow stale between updates because the most recent data is not incorporated. One of the key issues in a personalization architecture is how to combine and manage online and offline computation in a seamless manner. Nearline computation is an intermediate compromise between these two modes in which we can perform online-like computations, but do not require them to be served in real-time. Model training is another form of computation that uses existing data to generate a model that will later be used during the actual computation of results. Another part of the architecture describes how the different kinds of events and data need to be handled by the Event and Data Distribution system. A related issue is how to combine the different Signals and Models that are needed across the offline, nearline, and online regimes. Finally, we also need to figure out how to combine intermediate Recommendation Results in a way that makes sense for the user. The rest of this post will detail these components of this architecture as well as their interactions. In order to do so, we will break the general diagram into different sub-systems and we will go into the details of each of them. As you read on, it is worth keeping in mind that our whole infrastructure runs across the public Amazon Web Services cloud.

Offline, Nearline, and Online Computation



As mentioned above, our algorithmic results can be computed either online in real-time, offline in batch, or nearline in between. Each approach has its advantages and disadvantages, which need to be taken into account for each use case.

Online computation can respond quickly to events and use the most recent data. An example is to assemble a gallery of action movies sorted for the member using the current context. Online components are subject to an availability and response time Service Level Agreements (SLA) that specifies the maximum latency of the process in responding to requests from client applications while our member is waiting for recommendations to appear. This can make it harder to fit complex and computationally costly algorithms in this approach. Also, a purely online computation may fail to meet its SLA in some circumstances, so it is always important to think of a fast fallback mechanism such as reverting to a precomputed result. Computing online also means that the various data sources involved also need to be available online, which can require additional infrastructure.

On the other end of the spectrum, offline computation allows for more choices in algorithmic approach such as complex algorithms and less limitations on the amount of data that is used. A trivial example might be to periodically aggregate statistics from millions of movie play events to compile baseline popularity metrics for recommendations. Offline systems also have simpler engineering requirements. For example, relaxed response time SLAs imposed by clients can be easily met. New algorithms can be deployed in production without the need to put too much effort into performance tuning. This flexibility supports agile innovation. At Netflix we take advantage of this to support rapid experimentation: if a new experimental algorithm is slower to execute, we can choose to simply deploy more Amazon EC2 instances to achieve the throughput required to run the experiment, instead of spending valuable engineering time optimizing performance for an algorithm that may prove to be of little business value. However, because offline processing does not have strong latency requirements, it will not react quickly to changes in context or new data. Ultimately, this can lead to staleness that may degrade the member experience. Offline computation also requires having infrastructure for storing, computing, and accessing large sets of precomputed results.

Nearline computation can be seen as a compromise between the two previous modes. In this case, computation is performed exactly like in the online case. However, we remove the requirement to serve results as soon as they are computed and can instead store them, allowing it to be asynchronous. The nearline computation is done in response to user events so that the system can be more responsive between requests. This opens the door for potentially more complex processing to be done per event. An example is to update recommendations to reflect that a movie has been watched immediately after a member begins to watch it. Results can be stored in an intermediate caching or storage back-end. Nearline computation is also a natural setting for applying incremental learning algorithms.

In any case, the choice of online/nearline/offline processing is not an either/or question. All approaches can and should be combined. There are many ways to combine them. We already mentioned the idea of using offline computation as a fallback. Another option is to precompute part of a result with an offline process and leave the less costly or more context-sensitive parts of the algorithms for online computation.

Even the modeling part can be done in a hybrid offline/online manner. This is not a natural fit for traditional supervised classification applications where the classifier has to be trained in batch from labeled data and will only be applied online to classify new inputs. However, approaches such as Matrix Factorization are a more natural fit for hybrid online/offline modeling: some factors can be precomputed offline while others can be updated in real-time to create a more fresh result. Other unsupervised approaches such as clustering also allow for offline computation of the cluster centers and online assignment of clusters. These examples point to the possibility of separating our model training into a large-scale and potentially complex global model training on the one hand and a lighter user-specific model training or updating phase that can be performed online.

Offline Jobs



Much of the computation we need to do when running personalization machine learning algorithms can be done offline. This means that the jobs can be scheduled to be executed periodically and their execution does not need to be synchronous with the request or presentation of the results. There are two main kinds of tasks that fall in this category: model training and batch computation of intermediate or final results. In the model training jobs, we collect relevant existing data and apply a machine learning algorithm produces a set of model parameters (which we will henceforth refer to as the model). This model will usually be encoded and stored in a file for later consumption. Although most of the models are trained offline in batch mode, we also have some online learning techniques where incremental training is indeed performed online. Batch computation of results is the offline computation process defined above in which we use existing models and corresponding input data to compute results that will be used at a later time either for subsequent online processing or direct presentation to the user.

Both of these tasks need refined data to process, which usually is generated by running a database query. Since these queries run over large amounts of data, it can be beneficial to run them in a distributed fashion, which makes them very good candidates for running on Hadoop via either Hive or Pig jobs. Once the queries have completed, we need a mechanism for publishing the resulting data. We have several requirements for that mechanism: First, it should notify subscribers when the result of a query is ready. Second, it should support different repositories (not only HDFS, but also S3 or Cassandra, for instance). Finally, it should transparently handle errors, allow for monitoring, and alerting. At Netflix we use an internal tool named Hermes that provides all of these capabilities and integrates them into a coherent publish-subscribe framework. It allows data to be delivered to subscribers in near real-time. In some sense, it covers some of the same use cases as Apache Kafka, but it is not a message/event queue system.

Signals & Models



Regardless of whether we are doing an online or offline computation, we need to think about how an algorithm will handle three kinds of inputs: models, data, and signals. Models are usually small files of parameters that have been previously trained offline. Data is previously processed information that has been stored in some sort of database, such as movie metadata or popularity. We use the term "signals" to refer to fresh information we input to algorithms. This data is obtained from live services and can be made of user-related information, such as what the member has watched recently, or context data such as session, device, date, or time.

Event & Data Distribution


Our goal is to turn member interaction data into insights that can be used to improve the member's experience. For that reason, we would like the various Netflix user interface applications (Smart TVs, tablets, game consoles, etc.) to not only deliver a delightful user experience but also collect as many user events as possible. These actions can be related to clicks, browsing, viewing, or even the content of the viewport at any time. Events can then be aggregated to provide base data for our algorithms. Here we try to make a distinction between data and events, although the boundary is certainly blurry. We think of events as small units of time-sensitive information that need to be processed with the least amount of latency possible. These events are routed to trigger a subsequent action or process, such as updating a nearline result set. On the other hand, we think of data as more dense information units that might need to be processed and stored for later use. Here the latency is not as important as the information quality and quantity. Of course, there are user events that can be treated as both events and data and therefore sent to both flows.

At Netflix, our near-real-time event flow is managed through an internal framework called Manhattan. Manhattan is a distributed computation system that is central to our algorithmic architecture for recommendation. It is somewhat similar to Twitter's Storm, but it addresses different concerns and responds to a different set of internal requirements. The data flow is managed mostly through logging through Chukwa to Hadoop for the initial steps of the process. Later we use Hermes as our publish-subscribe mechanism.

Recommendation Results


The goal of our machine learning approach is to come up with personalized recommendations. These recommendation results can be serviced directly from lists that we have previously computed or they can be generated on the fly by online algorithms. Of course, we can think of using a combination of both where the bulk of the recommendations are computed offline and we add some freshness by post-processing the lists with online algorithms that use real-time signals.

At Netflix, we store offline and intermediate results in various repositories to be later consumed at request time: the primary data stores we use are Cassandra, EVCache, and MySQL. Each solution has advantages and disadvantages over the others. MySQL allows for storage of structured relational data that might be required for some future process through general-purpose querying. However, the generality comes at the cost of scalability issues in distributed environments. Cassandra and EVCache both offer the advantages of key-value stores. Cassandra is a well-known and standard solution when in need of a distributed and scalable no-SQL store. Cassandra works well in some situations, however in cases where we need intensive and constant write operations we find EVCache to be a better fit. The key issue, however, is not so much where to store them as to how to handle the requirements in a way that conflicting goals such as query complexity, read/write latency, and transactional consistency meet at an optimal point for each use case.

Conclusions

In previous posts, we have highlighted the importance of data, models, and user interfaces for creating a world-class recommendation system. When building such a system it is critical to also think of the software architecture in which it will be deployed. We want the ability to use sophisticated machine learning algorithms that can grow to arbitrary complexity and can deal with huge amounts of data. We also want an architecture that allows for flexible and agile innovation where new approaches can be developed and plugged-in easily. Plus, we want our recommendation results to be fresh and respond quickly to new data and user actions. Finding the sweet spot between these desires is not trivial: it requires a thoughtful analysis of requirements, careful selection of technologies, and a strategic decomposition of recommendation algorithms to achieve the best outcomes for our members. We are always looking for great engineers to join our team. If you think you can help us, be sure to look at our jobs page.



Friday, March 22, 2013

AMI Creation with Aminator

by Michael Tripoli & Karate Vick

Aminator is a tool for creating custom Amazon Machine Images (AMIs). It is the latest implementation of a series of AMI creation tools that we have developed over the past three years. A little retrospective on AMI creation at Netflix will help you better understand Aminator.

Building on the Basics

Very early in our migration to EC2, we knew that we would leverage some form of auto-scaling in the operation of our services. We also knew that application startup latency would be very important, especially during scale-up operations. We concluded that application instances should have no dependency on external services, be they package repositories or configuration services. The AMI would have to be discrete and hermetic. After hand rolling AMIs for the first couple of apps, it was immediately clear that a tool for creating custom AMIs was needed. There are generally two strategies for creating Linux AMIs:
  1. Create from loopback
  2. Customize an existing AMI.
The loopback method has its place for creating a foundation AMI and is analogous to a bare-metal OS installation. This method is too complex and time consuming to automate at the scale we need. Our tools follow the latter strategy. This strategy requires a source, or base AMI against which customizations can be applied.

Foundation AMI

The initial component of our AMI construction pipeline is the foundation AMI. These AMIs will generally be pristine Linux distribution images, but in an AMI form that we can work with. Starting with a standard Linux distribution such as CentOS or Ubuntu, we mount an empty EBS volume, create a file system, install the minimal OS, snapshot and register an AMI based on the snapshot. That AMI and EBS snapshot are ready for the next step.

Base AMI

Most of our applications are Java / Tomcat based. To simplify development and deployment, we provide a common base platform that includes a stable JDK, recent Tomcat release, and Apache along with Python, standard configuration, monitoring, and utility packages.

The base AMI is constructed by mounting an EBS volume created from the foundation AMI snapshot, then customizing it with a meta package (RPM or DEB) that, through dependencies, pulls in other packages that comprise the Netflix base AMI. This volume is dismounted, snapshotted, and then registered as a candidate base AMI which makes it available for building application AMIs.
This base AMI goes through a candidate test and release process every week or two, which yields common stable machine configurations marching through time. Developers can choose to "aminate" against the current release base AMI, or elect to use a candidate base AMI that may have improvements that will benefit their application or contain a critical update that they can help verify.

“Get busy baking or get busy writing configs” ~ mt

Customize Existing AMIs

Phase 1: Launch and Bake

Our first approach to making application AMIs was the simplest way: customize an existing AMI by first running an instance of it, modifying that, and then snapshotting the result. There are roughly five steps in this launch / bake process.
  1. Launch an instance of a base AMI.
  2. Provision an application package on the instance.
  3. Cleanup the instance to remove state established by running the instance.
  4. Run the ec2-ami-tools on the instance to create and upload an image bundle.
  5. Register the bundle manifest to make it an AMI. 
This creates an instance-store or S3 backed AMI.

S3

While functional, this process is slow and became an impediment in the development lifecycle as our cloud footprint grew. As an idea of how slow, an S3 bake often takes between 15 and 20 minutes. The slowness of the creation of an S3 AMI is due to it being so I/O intensive. The I/O involved in the launch / bake process includes these operations:
  1. Download an S3 image bundle.
  2. Unpack bundle into the root file system.
  3. Provision the application package.
  4. Copy root file system to local image file.
  5. Bundle the local image file.
  6. Upload the image bundle to S3.
Wouldn’t it be great if this I/O could somehow be reduced?

EBS

The advent of EBS backed AMIs was a boon to the AMI creation process. This is in large part due to the incremental nature of EBS snapshots. The launch / bake process significantly improved when we converted to EBS backed AMIs. Notice that there are fewer I/O operations (not to be confused with iops):
  1. Provision EBS volume.
  2. Load enough blocks from EBS to get a running OS.
  3. Provision the application package.
  4. Snapshot the root volume.
The big win here is with the amount of data being moved. First, no on-instance copying is involved. Second, considering a 100MB application package, the amount of data copied to S3 in the incremental snapshot of the root volume is roughly 7-8% that of a typical image bundle. The resulting bake time for EBS backed AMIs is typically in the range of 8-10 minutes.

Phase 2: Bakery

The AMI Bakery was the next step in the evolution of our AMI creation tools. The Bakery was a big improvement over the launch/bake strategy as it does not customize a running instance of the base AMI. Rather, it customizes an EBS volume created from the base AMI snapshot. The time to obtain a serviceable EC2 instance is replaced by the time to create and attach an EBS volume.

The Bakery consists of a collection of bash command-line utilities installed on long running bakery instances in multiple regions. Each bakery instance maintains a pool of base AMI EBS volumes which are asynchronously attached and mounted. Bake requests are dispatched to bakery instances from a central bastion host over ssh. Here is an outline of the bake process.
  1. Obtain a volume from the pool.
  2. Provision the application package on the volume.
  3. Snapshot the volume.
  4. Register the snapshot.
The bakery reduced AMI creation time to under 5 minutes. This improvement led to further automation by engineers around Netflix who began scripting bakery calls in their Jenkins builds. Coupled with Asgard deployment scripts, by committing code to SCM, developers can have the latest build of their application running on an EC2 instance in as little as 15 minutes.

The Bakery has been the de facto tool for AMI creation at Netflix for nearly two years but we are nearing the end of its usefulness. The Bakery is customized for our CentOS base AMI and does not lend itself to experimenting with other Linux OSs such as Ubuntu. At least, not without major refactoring of both the Bakery and the base AMI. There has also been a swell of interest in our Bakery from external users of our other open source projects but it is not suitable for open sourcing as it is replete with assumptions about our operating environment.

Phase 3: Aminator

Aminator is a complete rewrite of the Bakery but utilizes the same operations to create an AMI:

  1. Create volume.
  2. Attach volume.
  3. Provision package.
  4. Snapshot volume.
  5. Register snapshot.

Aminator is written in Python and uses several open source python libraries such as boto, PyYAML, envoy and others. As released, Aminator supports EBS backed AMIs for Redhat and Debian based Linux distributions in Amazon's EC2. It is in use within Netflix for creating CentOS-5 AMIs and has been tested against Ubuntu 12.04 but this is not the extent of its possibilities. The Aminator project is structured using a plugin architecture leveraging Doug Hellman's stevedore library. Plugins can be written for other cloud providers, operating systems, or packaging formats.

Aminator has fewer features than the Bakery. First, Aminator does not utilize a volume pool. Pool management is an optimization that we sacrificed for agility and manageability. Second, unlike the Bakery, Aminator does not create S3 backed AMIs. Since we have a handful of applications that deploy S3 backed AMIs, we continue to operate Bakery instances. In the future, we intend to eliminate the Bakery instances and run Aminator on our Jenkins build slaves. We also plan to integrate Amazon’s cross-region AMI copy into Aminator.

Aminator offers plenty of opportunity for prospective Netflix Cloud Prize entrants. We'll welcome and consider contributions related to plugins, enhancements, or bug fixes. For more information on Aminator, see the project on github.

Tuesday, March 12, 2013

Public Continuous Integration Builds for our OSS Projects

by Gareth Bowles

As Netflix continues to open source our platform (26 projects at the last count), our GitHub repos are seeing more and more changes, both commits from the project owners and pull requests from external contributors.

To provide more visibility into the quality of those changes, we're setting up public builds for all our open source projects so that everyone can see the latest build status and test results.
  
The builds run on a public Jenkins hosted by our friends at CloudBees, using their Dev@Cloud service which is free for open source projects.

Building Commits to Master

We build and test each commit to a project's master branch.  The current build status is shown in the README on the GitHub page.

Clicking on the build status badge takes us to the Jenkins build that gets triggered on every push to the master, where we can look at the build details and test results.


Verifying Pull Requests

We automatically verify pull requests by merging the contents of the pull request with the current tip of the branch that the request was made against, then building and running tests.  The pull request builds are executed in an isolated environment to protect against malicious code.  

When the build finishes, it will add a comment to the pull request in GitHub. 


Auto-Generating Build Jobs

We use the Jenkins Job DSL plugin, available via the public Jenkins plugin repository, to automate the creation of master and pull request builds for each new OSS project.  The plugin allows Jenkins jobs to be created by writing a small amount of Groovy code; since the jobs for each project are very similar (the only significant variation is the GitHub repo URL), this is much faster than clicking through the UI and allows us to have a one-click build setup for new projects.  Here's the Groovy DSL code that sets up the master build job; it's available on GitHub at https://github.com/Netflix-Skunkworks/netflixoss-dsl-seed.
   
    job {
        name "${repo.name}-master"
        description ellipsize(repo.description, 255)
        logRotator(60,-1,-1,20)
        scm {
            git(repo.git_url, 'master')
        }
        jdk('Sun JDK 1.6 (latest)')
        configure { project ->
            project / triggers / 'com.cloudbees.jenkins.GitHubPushTrigger'(plugin:'github@1.5') / spec {
            }
        }
        steps {
            gradle('clean build')
        }
        publishers {
            archiveJunit('**/build/test-results/TEST*.xml')
        }
    }

If you would like to contribute to any of our open source projects, please take a look at http://netflix.github.com. You can follow us on twitter at @NetflixOSS.
We will be hosting a NetflixOSS Open House on the 13th of March, 2013 (limited seats. please RSVP), during which we will showcase our public CI builds and other OSS projects.

We are constantly looking for great talent to join us and we welcome you to take a look at our Jobs page or contact @bettyatnetflix for positions in the Engineering Tools team.



Monday, March 11, 2013

Introducing the first NetflixOSS Recipe: RSS Reader

by Prasanna Padmanabhan, Shashi Madappa, Kedar Sadekar and Chris Fregly

Over the past year, Netflix has open sourced many of its components such as Hystrix, Eureka, Servo, Astyanax, Ribbon, etc.


While we continue to open source more of our components, it would be useful to open source a set of applications that can tie up these components together.  This is the first in the series of “Netflix OSS recipes” that we intend to open source in the coming year that showcases the various components and how they work in conjunction with each other.  While we expect many users to cherry pick the parts of Netflix OSS stack, we hope to show how using some of the Netflix OSS components in unison is very advantageous for you.

Our goal is illustrate the power of the Netflix OSS platform by showing real life examples.  We hope to increase adoption by building out the Netflix OSS stack, increase awareness by holding more Netflix OSS meetups, lower the barriers by working on push button deployments of our recipes in the coming months, etc.


Netflix OSS Recipe Name: RSS Reader application

This document explains how to build a simple RSS Reader application (also commonly referred to as news aggregator application). As defined in Wikipedia, a RSS reader application does the following:

“In computing, a news aggregator, also termed a feed aggregator, feed reader, news reader, RSS reader or simply aggregator, is client software or a Web application which aggregates syndicated web content such as news headlines, blogs, podcasts, and video blogs (vlogs) in one location for easy viewing.”




The source code for this RSS recipe application can be found on Github.  This application uses the following Open Source components.


  • Archaius: Dynamic configurations properties client.
  • Astyanax: Cassandra client and pattern library.
  • Blitz4j: Non-blocking logging.
  • Eureka: Service registration and discovery.
  • Governator: Guice based dependency injection.
  • Hystrix: Dependency fault tolerance.
  • Karyon: Base server for inbound requests.
  • Ribbon: REST client for outbound requests.
  • Servo: Metrics collection.


RSS Reader Recipes Architecture




Recipes RSS Reader is composed of the following three major components:

Eureka Client/Server

All middle tier instances, upon startup, get registered to Eureka using a unique logical name. It is later used by the edge service to discover the available middle tier instances, querying by that logical name.

RSS Middle Tier Service

RSS Middle tier is responsible for fetching the contents of RSS feeds from external feed publishers, parsing the RSS feeds and returning the data via REST entry points. Ribbon’s HTTP client is used to fetch the RSS feeds from external publishers. This tier is also responsible for persisting the user’s RSS subscriptions into Cassandra or into an InMemory hash map. Astyanax is used to persist data into Cassandra. The choice of the data store, cassandra host configurations, retry policies etc are specified in a properties file, which is accessed using Archaius.  Latencies, success and failure rates of the calls made to Cassandra and external publishers are captured using Servo. The base container for this middle tier is built on top of Karyon and all log events are captured using Blitz4j.

RSS Edge Service

Eureka is used to discover the middle tier instances that can fetch the RSS feeds subscribed by the user. Hystrix is used to provide greater tolerance of latency and failures when communicating with the middle tier service. The base container for this edge service is also built on top of Karyon. A simple UI, as shown below, is provided by the edge service to add, delete and display RSS feeds.


Hystrix dashboard for the Recipes RSS application during a sample run is as shown below: 



Getting Started

Our Github page for Recipes RSS Reader has instructions for building and running the application. Full documentation is available on the wiki pages. The default configurations that come along with this application allows all these different services to run in a single machine.


Future Roadmap

In the coming months, we intend to open source other commonly used recipes that share best practices on how to integrate the various bits of Netflix OSS stack to build highly scalable and reliable cloud-based applications.  

Our next NetflixOSS Meetup will be on March 13th and we are going to announce some exciting news at the event. Follow us on twitter follow at @NetflixOSS.

Python at Netflix

By Roy Rapoport, Brian Moyles, Jim Cistaro, and Corey Bertram

We’ve blogged a lot about how we use Java here at Netflix, but Python’s footprint in our environment continues to increase.  In honor of our sponsorship of PyCon, we wanted to highlight our many uses of Python at Netflix.

Developers at Netflix have the freedom to choose the technologies best suited for the job. More and more, developers turn to Python due to its rich batteries-included standard library, succinct and clean yet expressive syntax, large developer community, and the wealth of third party libraries one can tap into to solve a given problem. Its dynamic underpinnings enable developers to rapidly iterate and innovate, two very important qualities at Netflix. These features (and more) have led to increasingly pervasive use of Python in everything from small tools using boto to talk to AWS, to storing information with python-memcached and pycassa, managing processes with Envoy, polling restful APIs to large applications with requests, providing web interfaces with CherryPy and Bottle, and crunching data with scipy. To illustrate, here are some current projects taking advantage of Python:

Alerting

The Central Alert Gateway (CAG) is a RESTful web application written in Python to which any process can post an alert, though the vast majority of alerts are triggered by our telemetry system, Atlas (which will be open sourced in the near future).  CAG can take these alerts and based on configuration send them via email to interested parties, dispatch them to our notification system to page on call engineers, suppress them if we’ve already alerted someone, or perform automated remediation actions (for example, reboot or terminate an EC2 instance if it starts appearing unhealthy).  At our scale, we generate hundreds of thousands of alerts every day and handling as many of these automatically -- and making sure to only notify people of new issues rather than telling them again about something they’re aware of -- is critical to our production efficiency (and quality of life).

Chaos Gorilla

We’ve talked before about how we use Chaos Monkey to make sure our services are resilient to the termination of any small number of instances.  As we’ve improved resiliency to instance failures, we’ve been working to set the reliability bar much, much higher.  Chaos Gorilla integrates with Asgard and Edda, and allows us to simulate the loss of an entire availability zone in a given region.  This sort of failure mode -- an AZ either going down or simply becoming inaccessible to other AZs -- happens once in a blue moon, but it’s a big enough problem that simulating it and making sure our entire ecosystem is resilient to that failure is very important to us.


Security Monkey and Howler Monkey

Security Monkey is designed to keep track of configuration history and alert on changes in EC2 security-related policies such as security groups, IAM roles, S3 access control lists, etc.  This makes our Cloud Security team very happy, since without it there’s no way to know when, or how, a change occurred in the environment.  

Howler Monkey is designed to automatically discover and keep track of SSL certificates in our environments and domain names, no matter where they may reside, and alert us as we get close to an SSL certificate’s expiration date, with flexible and powerful subscription and alerting mechanisms.  Because of it, we moved from having an SSL certificate expire surprisingly and with production impact about once a quarter to having no production outages due to SSL expirations in the last eighteen months.  It’s a simple tool that makes a huge difference for us and our dozens of SSL certificates.  

Chronos

We push hard to always increase our speed of innovation, and at the same time reduce the cost of making changes in the environment.  In the datacenter days, we forced every production change to be logged in a change control system because the first question everyone asks when looking at an issue is “What changed recently?”.  We found a formal change control system didn’t work well for with our culture of freedom and responsibility, so we deprecated a formal change control process for the vast majority of changes in favor of Chronos.  Chronos accepts events via a REST interface and allows humans and machines to ask questions like “what happened in the last hour?” or “what software did we deploy in the last day?”.  It integrates with our monkeys and Asgard so the vast majority of changes in our environment are automatically reported to it, including event types such as deployments, AB tests, security events, and other automated actions.

Aminator


Readers of the blog or those who have seen our engineers present on the Netflix Platform may have seen numerous references to baking -- our name for the process by which we take an application and turn it into a deployable Amazon Machine Image. Aminator is the tool that does the heavy lifting and produces almost every single image that powers Netflix.

Aminator attaches a foundation image to a running EC2 instance, preps the image, installs packages into the image, and turns the resultant image into a complete Netflix application. Simple in concept and execution, but absolutely critical to our success. Pre-staging images and avoiding post-launch configuration really helps when launching hundreds or thousands of instances.


Cass Ops

Netflix Cassandra Operations uses Python for automation and monitoring tools.  We have created many modules for management and maintenance of our Cassandra clusters.  These modules use REST APIs to interface with other Netflix tools to manage our instances within AWS as well as interfacing directly with the Cassandra instances themselves.  These activities include creating clusters using Asgard, tracking our inventory with Edda, monitoring Eureka to make sure clusters are visible to clients, managing Cassandra repairs and compactions, and doing software upgrades.  In addition to our internally developed tools, we take advantage of various Python packages.  We use JenkinsAPI to interface with Jenkins for both job configuration and status information on our monitoring and maintenance jobs.  Pycassa is used to access our operational data stored in Cassandra.  Boto gives us the ability to communicate with various AWS services such as S3 storage.  Paramiko allows us to ssh to instances without needing to create a subprocess.  Use of Python for these tools has allowed us to rapidly develop and enhance our tools as Cassandra has grown at Netflix.

Data Science and Engineering

Our Data Science and Engineering teams rely heavily on Python to help surface insights from the vast quantities of data produced by the organization.  Python is used in tools for monitoring data quality, managing data movement and syncing, expressing business logic inside our ETL workflows, and running various web applications to visualize data.  

One such application is Sting, a lightweight RESTful web service that slices, dices, and produces visualizations of large in-memory datasets.  Our data science teams use Sting to analyze and iterate against the results of Hive queries on our big data platform.  While a Hive query may take hours to complete, once the initial dataset is loaded in Sting, additional iterations using OLAP style operations enjoy sub-second response times.  Datasets can be set to periodically refresh, so results are kept fresh and up to date.  Sting is written entirely in Python, making heavy use of libraries such as pandas and numpy to perform fast filtering and aggregation operations.

General Tooling and the Service Class

Pythonistas at Netflix have been championing the adoption of Python and striving to make its power accessible to everyone within the organization.  To do this we wrapped libraries for many of the OSS tools now being released by Netflix as well as a few internal services in a general use ‘Service’ class.  With this we have helped our users quickly and easily stand up new services that have access to many common actions such as alerting, telemetry, Eureka, and easy AWS API access.  We expect to make many of these these libraries available this year and will be around to chat about them at PyCon!
Here is an example of how easily we can stand up a service that has Eureka registration, Route 53 registration, a basic status page and exposes a fully functional Bottle service:


These systems and applications comprise a glimpse of the overall use and importance of Python to Netflix. They contribute heavily to our overall service quality, allow us to rapidly innovate, and are a whole lot of fun to work on to boot!

We’re sponsoring PyCon this year, and in addition to a slew of Netflixers attending we’ll also have a booth at the expo area and give a talk expanding on some of the use cases above.  If any of this sounds interesting, come by and chat.  Also, we’re hiring Senior Site Reliability Engineers, Senior DevOps Engineers, and Data Science Platform Engineers.