Tuesday, January 31, 2012

Announcing Astyanax

By Eran Landau 

Over the past year we have been investing heavily in Cassandra as our primary persistent storage solution. We currently run 55 separate clusters, ranging from 6 to 48 nodes. We've been active contributors to Cassandra and have developed additional tools and our own client library. Today we are open sourcing that client, Astyanax, as part of our ongoing open source initiative. Astyanax started as a re-factor of Hector, but our experience running with a large number of diverse clusters has enabled us to tune the client for various scenarios and focus on wide range of client use cases.

What is Astyanax?

Astyanax is a Java Cassandra client. It borrows many concepts from Hector but diverges in the connection pool implementation as well as the client API. One of the main design considerations was to provide a clean abstraction between the connection pool and Cassandra API so that each may be customized and improved separately. Astyanax provides a fluent style API which guides the caller to narrow the query from key to column as well as providing queries for more complex use cases that we have encountered. The operational benefits of Astyanax over Hector include lower latency, reduced latency variance, and better error handling.
Astyanax is broken up into three separate parts:
connection poolThe connection pool abstraction and several implementations including round robin, token aware and bag of connections.
cassandra-thrift API implementationCassandra Keyspace and Cluster level APIs implementing the thrift interface.
recipes and utilitiesUtility classes built on top of the astyanax-cassandra-thrift API.

Astyanax API

Astyanax implements a fluent API which guides the caller to narrow or customize the query via a set of well defined interfaces. We've also included some recipes that will be executed efficiently and as close to the low level RPC layer as possible. The client also makes heavy use of generics and overloading to almost eliminate the need to specify serializers.
Some key features of the API include:
  • Key and column types are defined in a ColumnFamily class which eliminates the need to specify serializers.
  • Multiple column family key types in the same keyspace.
  • Annotation based composite column names.
  • Automatic pagination.
  • Parallelized queries that are token aware.
  • Configurable consistency level per operation.
  • Configurable retry policy per operation.
  • Pin operations to specific node.
  • Async operations with a single timeout using Futures.
  • Simple annotation based object mapping.
  • Operation result returns host, latency, attempt count.
  • Tracer interfaces to log custom events for operation failure and success.
  • Optimized batch mutation.
  • Completely hide the clock for the caller, but provide hooks to customize it.
  • Simple CQL support.
  • RangeBuilders to simplify constructing simple as well as composite column ranges.
  • Composite builders to simplify creating composite column names.


Recipes for some common use cases:
  • CSV importer.
  • JSON exporter to convert any query result to JSON with a wide range of customizations.
  • Parallel reverse index search.
  • Key unique constraint validation.

Connection pool

The Astyanax connection pool was designed to provide a complete abstraction from the client API layer. One of our main goals when preparing Astyanax to be open sourced was to properly decouple components of the connection pool so that others may easily plug in their customizations. For example, we have our own middle tier load balancer that keeps track of nodes in the cluster and have made it the source of seed nodes to the client.
Key features of the connection pool are:
HostSupplier/NodeAutoDiscoveryBackground task that frequently refreshes the client host list. There is also an implementation that consolidates a describe ring against the local region's list of hosts to prevent cross-regional client traffic.
TokenAwareThe token aware connection pool implementation keeps track of which hosts own which tokens and intelligently directs traffic to a specific range with fallback to round robin.
RoundRobinThis is a standard round robin implementation
BagWe found that for our shared cluster we needed to limit the number of client connections. This connection pool opens a limited number of connections to random hosts in a ring.
ExecuteWithFailoverThis abstraction lets the connection pool implementation capture a fail-over context efficiently
RetryPolicyRetry on top of the normal fail-over in ExecuteWithFailover. Fail-over addresses problems such as connections not available on a host, a host going down in the middle of an operation or timeouts. Retry implements backoff and retrying the entire operation with an entirely new context.
BadHostDetectorDetermine when a node has gone down, based on timeouts
LatencyScoreStrategyAlgorithm to determine a node's score based on latency. Two modes, unordered (round robin) ordered (best score gets priority)
ConnectionPoolMonitorMonitors all events in the connection pool and can tie into proprietary monitoring mechanisms. We found that logging deep in the connection pool code tended to be very verbose so we funnel all events to a monitoring interface so that logging and alerting may be controlled externally.
Pluggable real-time configurationThe connection pool configuration is kept by a single object referenced throughout the code. For our internal implementation this configuration object is tied to volatile properties that may change at run time and are picked up by the connection pool immediately thereby allowing us to tweak client performance at runtime without having to restart.
RateLimiterLimits the number of connections that can be opened within a given time window. We found this necessary for certain types of network outages that cause thundering herd of connection attempts overwhelming Cassandra.

A taste of Astyanax

Here's a brief code snippet to give you a taste of what the API looks like

Accessing Astyanax

Friday, January 27, 2012

Ephemeral Volatile Caching in the cloud

by Shashi Madappa

In most applications there is some amount of data that will be frequently used. Some of this data is transient and can be recalculated, while other data will need to be fetched from the database or a middle tier service. In the Netflix cloud architecture we use caching extensively to offset some of these operations.  This document details Netflix’s implementation of a highly scalable memcache-based caching solution, internally referred to as EVCache.

Why do we need Caching?

Some of the objectives of the Cloud initiative were
    • Faster response time compared to Netflix data center based solution
    • Session based App in data center to Stateless without sessions in the cloud
    • Use NoSQL based persistence like Cassandra/SimpleDB/S3

To solve these we needed the ability to store data in a cache that was Fast, Shared and Scalable. We use cache to front the data that is computed or retrieved from a persistence store like Cassandra, or other Amazon AWS’ services like S3 and SimpleDB and they can take several hundred milliseconds at the 99th percentile, thus causing a widely variable user experience.  By fronting this data with a cache, the access times would be much faster & linear and the load on these datastores would be greatly reduced. Caching also enables us to respond to sudden request spikes more effectively. Additionally, an overloaded service can often return a prior cached response; this ensures that user gets a personalized response instead of a generic response. By using caching effectively we have reduced the total cost of operation.  

What is EVCache?

EVCache is a memcached & spymemcached based caching solution that is well integrated with Netflix and AWS EC2 infrastructure.

EVCache is an abbreviation for:

Ephemeral  - The data stored is for a short duration as specified by its TTL1 (Time To Live).
Volatile  - The data can disappear any time (Evicted2).
Cache – An in-memory key-value store.

How is it used?

We have over 25 different use cases of EVCache within Netflix. A particular use case is a users Home Page. For Ex, to decide which Rows to show to a particular user, the algorithm needs to know the Users Taste, Movie Viewing History, Queue, Rating, etc. This data is fetched from various services in parallel and is fronted using EVCache by these services.


We will now detail the features including both add-ons by Netflix and those that come with memcache.

  • Overview
    • Distributed Key-Value store,  i.e., the cache is spread across multiple instances
    • AWS Zone-Aware and data can be replicated across zones.  
    • Registers and works with Netflix’s internal Naming Service for automatic discovery of new nodes/services.
    • To store the data, Key has to be a non-null String and value can be a non-null byte-array, primitives, or serializable object. Value should be less than 1 MB.
    • As a generic cache cluster that can be used across various applications, it supports an optional Cache Name, to be used as namespace to avoid key collisions.
    • Typical cache hit rates are above 99%.
    • Works well with Netflix Persister Framework7. For E.g., In-memory ->backed by EVCache -> backed by Cassandra/SimpleDB/S3
  • Elasticity and deployment ease:  EVCache is linearly scalable. We monitor capacity and can add capacity within a minute and potentially re-balance and warm data in the new node within a few minutes.  Note that we have pretty good capacity modeling in place and so capacity change is not something we do very frequently but we have good ways of adding capacity while actively managing the cache hit rate.  Stay tuned for more on this scalable cache warmer in an upcoming blog post.  
  • Latency: Typical response time in low milliseconds.  Reads from EVCache are typically served back from within the same AWS zone.  A nice side effect of zone affinity is that we don’t have any data transfer fees for reads.
  • Inconsistency: This is a Best Effort Cache and the data can get inconsistent. The architecture we have chosen is speed instead of consistency and the applications that depend on EVCache are capable of handling any inconsistency. For data that is stored for a short duration, TTL ensures that the inconsistent data expires and for the data that is stored for a longer duration we have built consistency checkers that repairs it.
  • Availability: Typically, the cluster never goes down as they are spread across multiple Amazon Availability Zones. When instances do go down occasionally, cache misses are minimal as we use consistent hashing to shard the data across the cluster.
  • Total Cost of Operations: Beyond the very low cost of operating the EVCache cluster, one has to be aware that cache misses are generally much costlier - the cost of accessing services AWS SimpleDB, AWS S3, and (to a lesser degree) Cassandra on EC2, must be factored in as well. We are happy with the overall cost of operations of EVCache clusters which are highly stable, linearly scalable.

Under the Hood

Server: The Server consist of the following:
  • memcached server.
  • Java Sidecar - A Java app that communicates with the Discovery Service6( Name Server)  and hosts admin pages.
  • Various apps that monitor the health of the services and report stats.

Client:  A Java client discovers EVCache servers and manages all the CRUD3 (Create, Read, Update & Delete) operations. The client automatically handles the case when servers are added to or removed from the cluster.  The client replicates data (AWS Zone5 based) during Create, Update & Delete Operations; on the other hand, for Read operations the client gets the data from the server which is in the same zone as the client.

We will be open sourcing this Java client sometime later this year so we can share more of our learnings with the developer community.

Single Zone Deployment

The figure below image illustrates the scenario in AWS US-EAST Region4 and Zone-A where an EVCache cluster with 3 instances has a Web Application performing CRUD operations (on the EVcache system).

  1. Upon startup, an EVCache Server instance registers with the Naming Service6 (Netflix’s internal name service that contains all the hosts that we run).
  2. During startup of the Web App, the EVCache Client library is initialized which looks up for all the EVCache server instances registered with the Naming Services and establishes a connection with them.
  3. When the Web App needs to perform CRUD operation for a key the EVCache client selects  the instance on which these operations can be performed. We use Consistent Hashing to shard the data across the cluster.

Multi-Zone Deployment

The figure below illustrates the scenario where we have replication across multiple zones in AWS US-EAST Region. It has an EVCache cluster with 3 instances and a Web App in Zone-A and Zone-B.  

  1. Upon startup, an EVCache Server instance in Zone-A registers with the Naming Service in Zone-A and Zone-B.
  2. During the startup of the Web App in Zone-A , The Web App initializes the EVCache Client library which looks up for all the EVCache server instances registered with the Naming Service and connects to them across all Zones.
  3. When the Web App in Zone-A needs to Read the data for a key, the EVCache client looks up the EVCache Server instance in Zone –A which stores this data and fetches the data from this instance.
  4. When the Web App in Zone-A needs to Write or Delete the data for a key, the EVCache client looks up the EVCache Server instances in Zone–A and Zone-B and writes or deletes it.

Case Study : Movie and TV show similarity

One of the applications that uses caching heavily is the Similars application. This application suggests Movies and TV Shows that have similarities to each other. Once the similarities are calculated they are persisted in SimpleDB/S3 and are fronted using EVCache. When any service, application or algorithm needs this data it is retrieved from the EVCache and result is returned.
  1. A Client sends a request to the WebApp requesting a page and the algorithm that is processing this requests needs similars for a Movie to compute this data.
  2. The WebApp that needs a list of similars for a Movie or TV show looks up EVCache for this data. Typical cache hit rate is above 99.9%.
  3. If there is a cache miss then the WebApp calls the Similars App to compute this data.
  4. If the data was previously computed but missing in the cache then Similars App will read it from SimpleDB. If it were missing in SimpleDB then the app Calculates the similars for the given Movie or TV show.
  5. This computed data for the Movie or TV Show is then written to EVCache.
  6. The Similars App then computes the response needed by the client and returns it to the client.

Metrics, Monitoring, and Administration

Administration of the various clusters is centralized and all the admin & monitoring of the cluster and instances can be performed via web illustrated below.

The server view below shows the details of each instance in the cluster and also rolls up by the stats for the zone. Using this tool the contents of a memcached slab can be viewed

The EVCache Clusters currently serve over 200K Requests/sec at peak loads. The below chart shows number of requests to EVCache every hour. 

The average latency is around 1 millisecond to 5 millisecond. The 99th percentile is around 20 millisecond. 
Typical cache hit rates are above 99%.

Join Us

Like what you see and want to work on bleeding edge performance and scale?  
We’re hiring !


  1. TTL : Time To Live for data stored in the cache. After this time the data will expire and when requested will not be returned.
  2. Evicted : Data associated with a key can be evicted(removed) from the cache even though its TTL has not yet exceed. This happens when the cache is running low on memory and it needs to make some space to add the new data that we are storing. The eviction is based on LRU (Least Recently Used).
  3. CRUD : Create, Read, Update and Delete are the basic functions of storage.
  4. AWS Region :  It is a Geographical region and currently in US East (virginia), US West, EU (Ireland), Asia Pacific (Singapore), Asia Pacific (Tokyo) and South America (Sao Palo).
  5. AWS Zone: Each availability zone runs on its own physically distinct and independent infrastructure. You can also think this as a data center.
  6. Naming Service : It is a service developed  by Netflix and is a registery for all the instances that run Netflix Services.
  7. Netflix Persister Framework : A Framework developed by Netflix that helps user to persist data across various datastore like In-Memory/EVCache/Cassandra/SimpleDB/S3 by providing a simple API.

by Shashi Madappa, Senior Software Engineer, Personalization Infrastructure Team

Wednesday, January 18, 2012

WebKit in Your Living Room

Hi, it's Matt Seeley, engineer on the device UI team at Netflix.  My team uses WebKit, JavaScript, HTML5 and CSS3 to build user interfaces for the PlayStation 3, Wii, Blu-ray players, Internet-connected TVs, phones and tablets.

Recently I spoke at the HTML5 Dev Conf about WebKit-based UI development on consumer electronics.  I discussed:
  • Responding to user input quickly while deferring expensive user interface updates
  • Managing main and video memory footprint of a single page application
  • Understanding WebKit's interpretation and response to changes in HTML and CSS 
  • Achieving highest possible animation frame rates using accelerated compositing
Watch the video presentation from the conference:

Slides are also available in PDF

Astute readers will realize that portions of the content are also suitable for mobile and desktop development. It's all about building great user interfaces that take best possible advantage of the device.

Interested in joining our team? We're hiring!

Auto Scaling in the Amazon Cloud

Since we began migrating our production infrastructure to the cloud in 2010, we have used Amazon's auto scaling groups to manage all of the server pools that we run. We believe auto scaling greatly improves the availability of our services and provides an excellent means of optimizing our cloud costs. With more than two years of experience using the auto scaling service, we thought it a good time to share why and how we use it, as well as some of the lessons that we have learned. For those who aren't familiar with Auto Scaling, it is an automation service Amazon provides as part of their cloud offering. Auto Scaling provides features to manage a running pool of servers, including the capability to replace failed instances and automatically grow and shrink the size of the pool. For a more thorough description, please see the Amazon Documentation.


Every running server in our production environment must be part of an auto scaling group; we even verify this with one of the simian army members, which locates and terminates any stray instances in the environment. The goal is to detect and terminate unhealthy instances as quickly as possible, we can count on them being replaced automatically by Amazon. This also applies to any hosts that are terminated by Amazon or have hardware failures. This works for both stateful and stateless services, because even our stateful services know how to setup everything they need when the AMI launches. However, most of our services are stateless, which makes this really easy to handle and provides the opportunity to also use auto scaling for optimization.

While I would say that availability is the most important use of auto scaling, cost and resource optimization is certainly its more sexy side. Being able to allocate resources based on need, and pay for them accordingly, is one of the big promises of the cloud. Very few applications have a constant workload, and Netflix is no exception. In fact, we have a large variation in the peak to trough for our usage pattern. Auto scaling allows us to vary the size of our pools, based on usage, which saves money and allows us to adapt to unforeseen spikes without having an outage or needing someone to manually size the capacity.

Making it Work

Configuring auto scaling is a complex task. Briefly, the configuration consists of three primary steps: first, identify the constraining resources (e.g. memory, CPU). Then, have a way to track the constraining resource in CloudWatch, Amazon's cloud resource and monitoring service. Finally, configure alarms and polices to take the correct action when the profile associated with your constraining resource changes. This is especially true given the only usable metric provided by AWS out of the box is CPU Utilization, which isn't a great indicator for all application types. We have created two levels of tooling, and some basic scripts to help make this process easier.

The first level of tooling is a monitoring library providing the infrastructure to export application metrics for monitoring to CloudWatch. The library provides annotations to make export easy for developers. When a field is annotated with the "@Monitor" tag, it is automatically registered with JMX, and can be published to CloudWatch based on a configurable filter. More features exist, but the critical step is to tag, for export, a field to be used by the auto scaling configuration. Look for a blog in the coming weeks discussing Netflix open sourcing this library.

The second level of tooling is a set of features built into our Netflix Application Console (slides). The tool as a whole drives our entire cloud infrastructure, but lets us focus on what we've added to make auto scaling easy. As part of our push process, we create a new auto scaling group for each new version of code. This means that we need to make sure the entire configuration from the old group is copied to the new group. The tool also displays and allows users to modify the rule settings in a simple HTML UI. In addition, we have some simple scripts to help setup auto scaling rules, configure SNS notifications, and create roll-back options. The scripts are available at our github site.

The final and most important piece of having dynamic auto scaling work well is to understand and test the application's behavior under load. We do this by either squeezing down the traffic in production to a smaller set of servers, or generating artificial load against a single server. Not understanding how an application behaves under load, or what the true limiting factors of the application are, may result in an ineffective or even destructive auto scaling configuration.

The End Result

Below is a set of graphs that show our request traffic over two days. The number of servers we are running to support that traffic and the aggregate CPU utilization of the pool. Notice that server count mirrors request rate and that under load the aggregate CPU is essentially flat.

Lessons learned

Scale up early, scale down slowly
At Netflix, we prefer to scale up early and scale down slowly. We advocate teams use symmetric percentages and periods for auto scaling policies and CloudWatch alarms, more here.

To scale up early we recommend tripping a CloudWatch alarm at 75% of the target threshold for a small amount of time. We typically recommend 5-10 minutes to trigger an event. Note, be mindful about the time required to start an instance, consider both EC2 and application startup time. The 25% headroom provides excess capacity for short irregular request spikes. It also protects against the loss of capacity due to instances failing on startup. For example, if max CPU utilization is 80%, set the alarm to trigger after 5 minutes at 60% CPU.

Scaling down slowly is important to mitigate the risk of removing capacity too quickly, or incorrectly reducing capacity. To prevent these scenarios we use time as a proxy to scaling slowly. For example, scale up by 10% if CPU utilization is greater than 60% for 5 minutes, scale down by 10% if CPU utilization is less than 30% for 20 minutes. The advantage to using time, as opposed to asymmetric scaling policies, is to prevent capacity 'thrashing', or removing too much capacity followed by quickly re-adding the capacity. This can happen if the scale down policy is too aggressive. Time-based scaling can also prevent incorrectly scaling down during an unplanned service outage. For example, suppose an edge service temporarily goes down. As a result of reduced requests associated with the outage, the middle tier may incorrectly scale down. If the edge service is down for less than the configured alarm time, no scale down event will occur.

Provision for availability zone capacity
Auto scaling policies should be defined based on the capacity needs per availability zone. This is especially critical for auto scaling groups configured to leverage multiple availability zones with a percent-based scaling policy. For example, suppose an edge service is provisioned in three zones with the min and max set to 10 and 30 respectively. The same service was load tested with a max 110 requests per second (RPS), per instance. With a single elastic load balancer (ELB), fronting the service, each request is uniformly routed, round robin, per zone. Effectively, each zone must be provisioned with enough capacity to handle one third of the total traffic. With a percent based policy one or more zones may become under provisioned. Assume 1850 total RPS, 617 RPS per zone. Two zones with 6 instances, the other having 5 instances, 17 total. The zones with 6 instances, on average, are processing 103 RPS per server. The zone with 5 instances, on average, are processing 124 RPS per instance, about 13% beyond desired (load tested) RPS. The root of the problem is the unbalanced availability zone. A zone can become unbalanced by scaling up/down by a factor less than the number of zones. This tends to occur when using a percent based policy. Also note, the aggregate, computed by CloudWatch, is a simple, equally weighted, average, masking the under-provisioned zone.


Auto scaling is a very powerful tool, but it can also be a double-edged sword. Without the proper configuration and testing it can do more harm than good. A number of edge cases may occur when attempting to optimize or make the configuration more complex. As seen above, when configured carefully and correctly, auto scaling can increase availability while simultaneously decreasing overall costs. For more details on our tools and lessons we have learned, check out our auto scaling github project which has the source for some our tools as well as a lot of documentation on the wiki.

Greg Orzell
Justin Becker