Tuesday, November 6, 2012

Edda - Learn the Stories of Your Cloud Deployments

By Cory Bennett, Engineering Tools

Operating "in the cloud" has its challenges, and one of those challenges is that nothing is static. Virtual host instances are constantly coming and going, IP addresses can get reused by different applications, and firewalls suddenly appear as security configurations are updated. At Netflix we needed something to help us keep track of our ever-shifting environment within Amazon Web Services (AWS). Our solution is Edda.

Today we are proud to announce that the source code for Edda is open and available.

What is Edda?


Edda is a service that polls your AWS resources via AWS APIs and records the results. It allows you to quickly search through your resources and shows you how they have changed over time.

Previously this project was known within Netflix as Entrypoints (and mentioned in some blog posts), but the name was changed as the scope of the project grew. Edda (meaning "a tale of Norse mythology"), seemed appropriate for the new name, as our application records the tales of Asgard.

Why did we create Edda?


Dynamic Querying


At Netflix we need to be able to quickly query and analyze our AWS resources with widely varying search criteria. For instance, if we see a host with an EC2 hostname that is causing problems on one of our API servers then we need to find out what that host is and what team is responsible, Edda allows us to do this. The APIs AWS provides are fast and efficient but limited in their querying ability. There is no way to find an instance by the hostname, or find all instances in a specific Availability Zone without first fetching all the instances and iterating through them.

With Edda's REST APIs we can use matrix arguments to find the resources that we are looking for. Furthermore, we can trim out unnecessary data in the responses with Field Selectors.

Example: Find any instances that have ever had a specific public IP address:
$ curl "http://edda/api/v2/view/instances;publicIpAddress=1.2.3.4;_since=0"
["i-0123456789","i-012345678a","i-012345678b"]
Now find out what AutoScalingGroup the instances were tagged with:
$ export INST_API=http://edda/api/v2/view/instances
$ curl "$INST_API;publicIpAddress=1.2.3.4;_pp;_since=0;_expand:(instanceId,tags)"
[
  {
    "instanceId" : "i-0123456789",
    "tags" : [
      {
        "key" : "aws:autoscaling:groupName",
        "value" : "app1-v123"
      }
    ]
  },
  {
    "instanceId" : "i-012345678a",
    "tags" : [
      {
        "key" : "aws:autoscaling:groupName",
        "value" : "app2-v123"
      }
    ]
  },
  {
    "instanceId" : "i-012345678b",
    "tags" : [
      {
        "key" : "aws:autoscaling:groupName",
        "value" : "app3-v123"
      }
    ]
  }
]

History/Changes


When trying to analyze causes and impacts of outages we have found the historical data stored in Edda to be very valuable. Currently AWS does not provide APIs that allow you to see the history of your resources, but Edda records each AWS resource as versioned documents that can be recalled via the REST APIs. The "current state" is stored in memory, which allows for quick access. Previous resource states and expired resources are stored in MongoDB (by default), which allows for efficient retrieval. Not only can you see how resources looked in the past, but you can also get unified diff output quickly and see all the changes a resource has gone through.

For example, this shows the most recent change to a security group:
$ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2"
--- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810
+++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504
@@ -1,33 +1,33 @@
 {
   "class" : "com.amazonaws.services.ec2.model.SecurityGroup",
   "description" : "App1",
   "groupId" : "sg-0123456789",
   "groupName" : "app1-frontend",
   "ipPermissions" : [
     {
       "class" : "com.amazonaws.services.ec2.model.IpPermission",
       "fromPort" : 80,
       "ipProtocol" : "tcp",
       "ipRanges" : [
         "10.10.1.1/32",
         "10.10.1.2/32",
+        "10.10.1.3/32",
-        "10.10.1.4/32"
       ],
       "toPort" : 80,
       "userIdGroupPairs" : [ ]
     }
   ],
   "ipPermissionsEgress" : [ ],
   "ownerId" : "2345678912345",
   "tags" : [ ],
   "vpcId" : null
 }

High Level Architecture


Edda is a Scala application that can both run on a single instance or scale up to many instances running behind a load balancer for high availability. The data store that Edda currently supports is MongoDB, which is also versatile enough to run on either a single instance along with the Edda service, or be grown to include large replication sets. When running as a cluster, Edda will automatically select a leader which then does all the AWS polling (by default every 60 seconds) and persists the data. The other secondary servers will be refreshing their in-memory records (by default every 30 seconds) and handling REST requests.

Currently only MongoDB is supported for the persistence layer, but we are analyzing alternatives. MongoDB supports JSON documents and allows for advanced query options, both of which are necessary for Edda. However, as our previous blogs have indicated, Netflix is heavily invested in Cassandra. We are therefore looking at some options for advance query services that can work in conjunction with Cassandra.

Edda was designed to allow for easily implementing custom crawlers to track collections of resources other than those of AWS. In the near future we will be releasing some examples we have implemented which track data from AppDynamics, and others which track our Asgard applications and clusters.

Configuration


There are many configuration options for Edda. It can be configured to poll a single AWS region (as we run it here) or to poll across multiple regions. If you have multiple AWS accounts (ie. test and prod), Edda can be configured to poll both from the same instance. Edda currently polls 15 different resource types within AWS. Each collection can be individually enabled or disabled. Additionally, crawl frequency and cache refresh rates can all be tweaked.

Coming up


In the near future we are planning to release some new collections for Edda to monitor. The first will be APIs that allow us to pull details about application health and traffic patterns out of AppDynamics. We also plan to release APIs that track our Asgard application and cluster resources.

Summary


We hope you find Edda to be a useful tool. We'd appreciate any and all feedback on it. Are you interested in working on great open source software? Netflix is hiring! http://jobs.netflix.com

Edda Links