Dynomite has been adopted widely inside Netflix due to its high performance and low latency attributes. In our recent blog, we showcased the performance of Dynomite with Redis as the underlying data storage engine. At this point (Q2 2016), there are almost 50 clusters with more than 1000 nodes, centrally managed by the Cloud Database Engineering (CDE) team. CDE team has a wide experience with other data stores, such as Cassandra, ElasticSearch and Amazon RDS.
Dynomite is used at Netflix both as a:
- Cache, with global replication, in front of Netflix’s data store systems e.g. Cassandra, ElasticSearch etc.
- Data store layer by itself with persistence and backups
The latter is achieved by keeping multiple copies of the data across AWS regions and Availability zones (high availability), client failover, cold bootstrapping (warm up), S3 backups, and other features. Most of these features are enabled through the use of Dynomite-manager (internally named Florida).
A Dynomite node consists of three processes:
- Dynomite (the proxy layer)
- Storage Engine (Redis, Memcached, RocksDB etc)
Fig.1 Two Dynomite instances with Dynomite, Dynomite-manager and Redis as a data store
Depending on the requirements, Dynomite can support multiple storage engines from in-memory data stores like Redis and Memcached to SSD optimized storage engines like RocksDB, LMDB, ForestDB, etc.
Dynomite-manager is a sidecar specifically developed to manage Netflix’s Dynomite clusters and integrate it with the AWS (and Netflix) Ecosystem. It follows similar design principles from more than 6 years of experience of managing Cassandra with Priam, and ElasticSearch clusters with Raigad. Dynomite-manager was designed based on Quartz in order to be extensible to other data stores, and platforms. In the following, we briefly capture some of the key features of Dynomite-manager.
Service Discovery and Healthcheck
Dynomite-manager schedules a Quartz (lightweight thread) every 15 seconds that checks the health of both Dynomite and the underlying storage engine. Since most of our current production deployments leverage Redis, the healthcheck involves a two-step approach. In the first step, we check if Dynomite and Redis are running as Linux processes, and in the second step, Dynomite-manager uses the Redis API to perform a PING to both Dynomite and Redis. A Redis PING, and the corresponding response, Redis PONG, ensures that both processes are alive and are able to serve client traffic. If any of these healthcheck steps fail, Dynomite-manager informs Eureka (Netflix Service registry for resilient mid-tier load balancing and failover) and the node is removed from Discovery. This ensures that the Dyno client can gracefully failover the traffic to another Dynomite node with the same token.
Fig.2 Dynomite healthcheck failure on Spinnaker
Token Management and Node Configuration
Dynomite occupies the whole token range on a per rack basis. Hence, it uses a unique token in each rack (differently from Cassandra that uses unique token throughout the cluster). Therefore, tokens can repeat across racks and in the same datacenter.
Dynomite-manager calculates the token of every node by looking at the number of slots (nodes), by which the token range is divided by the rack, and the position of the node. The tokens are then stored in an external data store along with application id, availability zone, datacenter, instance id, hostname, and elastic IP. Since nodes are by nature volatile in the cloud, if a node gets replaced, Dynomite-manager in the new node queries the data store to find if a token was pre-generated. At Netflix, we leverage a Cassandra cluster to store this information.
Dynomite-manager receives other instance metadata from AWS, and dynamic configuration through Archaius Configuration Management API or through the use of external data sources like SimpleDB. For the instance metadata, Dynomite-manager also includes an implementation for local deployments.
Monitoring and Insights Integration
Dynomite-manager exports the statistics of Dynomite and Redis to Atlas for plotting and time-series analysis. We use a tiered architecture for our monitoring system.
- Dynomite-manager receives information about Dynomite through a REST call;
- Dynomite-manager receives information about Redis through the INFO command.
Currently, Dynomite-manager leverages the Servo client to publish the metrics for time series processing. Nonetheless, other Insight clients can be added in order to deliver metrics to a different Insight system.
Cold bootstrapping, also known as warm up, is the process of populating a node with the most recent data before joining the ring. Dynomite has a single copy of the data on each rack, essentially having multiple copies per datacenter (depending on the number of racks per datacenter). At Netflix, when Dynomite is used as a data store, we use three racks per datacenter for high availability. We manage these copies as separate AWS Auto scaling groups. Due to the volatile nature of the cloud or Chaos Monkey exercises, nodes may get terminated. During this time the Dyno client fails over to another node that holds the same token in a different availability zone.
When the new node comes up, Dynomite-manager on the new node is responsible for cold bootstrapping Redis in the same node. The above process enables our infrastructure to sustain multiple failures in production with minimal effect on the client side. In the following, we explain the operation in more detail:
- Dynomite-manager boots up due to auto-scaling activities, and a new token is generated for that node. In this case the new token is : 1383429731
Fig.3 Autoscaling brings a new node without data
- Dynomite-manager queries the external data store to identify which nodes within the local region have the same token. Dynomite-manager does not warm up from remote regions to avoid cross-region communication latencies. Once a target node is identified, Dynomite-manager tries to connect to it.
Fig.4 A peer node with the same token is identified
- Dynomite-manager issues a Redis SLAVEOF command to that peer. Effectively the target Redis instance sets itself as a slave of the Redis instance on the peer node. For this, it leverages Redis diskless master-slave replication. In addition, Dynomite-manager sets Dynomite in buffering (standby) mode. This effectively allows the Dyno client to continuously failover to another AWS availability zone during the warm up process.
Fig.5 Dynomite-manager enables Redis replication
- Dynomite-manager continuously checks the offset between the Redis master node (the source of truth), and the Redis slave (the node that needs warm up). The offset is determined based on what the Redis master reports via the INFO command. A Dynomite node is considered fully warmed up, if it has received all the data from the remote node, or if the difference is less than a pre-set value. We use the latter to limit the warm-up duration in high throughput deployments.
Fig.6 Data streaming across nodes through Redis diskless replication
- Once master and slave are in sync, Dynomite-manager sets Dynomite to allow writes only. This mode allows writes to get buffered and flushed to Redis once everything is complete.
- Dynomite-manager stops Redis from peer syncing by using the Redis SLAVEOF NO ONE command.
- Dynomite-manager sets Dynomite back to normal state, performs a final check if Dynomite is operational and notifies Service Discovery through the healthcheck.
Fig.8 Dynomite node is warmed up
At Netflix, Dynomite is used as a single point of truth (data store) as well as a cache. A dependable backup and recovery process is therefore critical for Disaster Recovery (DR) and Data Corruption (CR) when choosing a data store in the cloud. With Dynomite-manager, a daily snapshot for all clusters that leverage Dynomite as a data store is used to back them up to Amazon S3. S3 was an obvious choice due to its simple interface and ability to access any amount of data from anywhere.
Dynomite-manager initiates the S3 backups. The backups feature leverages the persistence feature of Redis to dump data to the drive. Dynomite-manager supports both the RDB and the AOF persistence of Redis, offering the ability to the users to use a readable format of their data for debugging or a memory direct snapshot. The backups leverage the IAM credentials in order to encrypt the communication. Backups (a) can be scheduled using a date in the configuration (or by leveraging Archaius, Netflix configuration management API), and (b) on demand using the REST API.
Dynomite-manager supports restoring a single node through a REST API, or the complete ring. When performing a restore, Dynomite-manager (on each node), shuts down Redis and Dynomite, locates the snapshot files in S3, and orchestrates the download of the files. Once the snapshot is transferred to the node, Dynomite-manager starts Redis and waits until the data are in memory, and then follows up with starting the Dynomite process. Dynomite-manager can also restore data to clusters with different names. This allows us to spin up multiple test clusters with the same data, enabling refreshes. Refreshes are very important at Netflix, because cluster users can leverage production data in a test environment, hence perform realistics benchmarks and offline analysis on production data. Finally, Dynomite-manager allows for targeted refreshes on a specific date, allowing cluster users to restore data to point prior to the data corruption, test production data for a specific time frame and opening the doors for many other use cases that we have not yet explored.
In regards to credentials, Dynomite-manager supports Amazon’s Identity and Access Management (IAM) key profile management. Using IAM Credentials allows the cluster administrator can provide access to the AWS API without storing an AccessKeyId or SecretAccessKey on the node itself. Alternatively, one can implement the IAMCredential interface.
With Dynomite-manager we can perform upgrades and rolling restarts of Dynomite clusters in production without any down time. For example, when we want to upgrade or restart Dynomite-manager itself, we increase the polling interval of the Discovery service, allowing the reads/writes to Dynomite and Redis to flow. On the other hand, when performing upgrades of Dynomite and Redis, we take the node out of the Discovery service by shutting down Dynomite-manager itself, and therefore allowing Dyno to gracefully fail over to another availability zone.
Dynomite-manager provides a REST API for multiple management activities. For example, the following administration operations can be performed through Dynomite-manager:
/start: starts Dynomite
/stop: stops Dynomite
/startstorageprocess: starts storage process
/stopstorageprocess: stops storage process
/get_seeds: responds with the hostnames and tokens
/cluster_describe: responds with a JSON file of the cluster level information
/backup: forces an S3 backup
/restore: forces an S3 restore
/takesnapshot: persist the storage data (if Redis) to the drive (based on configuration properties, this can be RDB or AOF)
/status: returns the status of the processes managed by Dynomitemanager and itself.
Future Ideas: Dynomite-manager 2.0
- Backups: we will be investigating the use a bandwidth throttler during backup operation to reduce disk and network I/O. This is important for nodes that are receiving thousands of OPS. For better DR, we will also investigate the diversification of our backups across multiple object storage vendors.
- Warm up: we will explore further resiliency in our warm up process. For example, we will be considering the use of incrementals for clusters that need to vertically scale to better instance types, as well as perform parallel warm up from multiple nodes if all nodes in the same region are healthy.
- In line updates and restarts: currently, we manage Dynomite and the storage engine through python and shell scripts that are invoked through REST calls by our continuous integration system. Our plan is to integrate most of these management operations inside Dynomite-manager (binary upgrades, rolling restarts etc)
- Healthcheck: Dynomite-manager has the perfect view of every Dynomite node, hence as Dynomite gets more mature, we plan to integrate auto-remediation inside Dynomite-manager. This can potentially minimize the amount of involvement of our engineers once the cluster is operational.
Today, we are open-sourcing Dynomite-manager https://github.com/netflix/dynomite-manager
We will be looking forward to feedback, issues, and bugs so that we can improve the Dynomite Ecosystem.