Wednesday, September 30, 2015

Moving from Asgard to Spinnaker

Six years ago, Netflix successfully jumped headfirst into the AWS Cloud and along the way we ended up writing quite a lot of software to help us out. One particular project proved instrumental in allowing us to efficiently automate AWS deployments: Asgard

Asgard created an intuitive model for cloud-based applications that has made deployment and ongoing management of AWS resources easy for hundreds of engineers at Netflix. Introducing the notion of clusters, applications, specific naming conventions, and deployment options like rolling push and red/black has ultimately yielded more productive teams who can spend more time coding business logic rather than becoming AWS experts.  What’s more, Asgard has been a successful OSS project adopted by various companies. Indeed, the utility of Asgard’s battle-hardened AWS deployment and management features is undoubtedly due to the hard work and innovation of its contributors both within Netflix and the community.

Netflix, nevertheless, has evolved since first embracing the cloud. Our footprint within AWS has expanded to meet the demand of an increasingly global audience; moreover, the number of applications required to service our customers has swelled. Our rate of innovation, which maintains our global competitive edge, has also grown. Consequently, our desire to move code rapidly, with a high degree of confidence and overall visibility, has also increased. In this regard Asgard has fallen short.

Everything required to produce a deployment artifact, in this case an AMI, has never been addressed in Asgard. Consequently, many teams at Netflix constructed their own Continuous Delivery workflows. These workflows were typically related Jenkins jobs that tied together code check-ins with building and testing, then AMI creations and, finally, deployments via Asgard. This final step involved automation against Asgard’s REST API, which was never intended to be leveraged as a first class citizen.

Roughly a year ago a new project, dubbed Spinnaker, kicked off to enable end-to-end global Continuous Delivery at Netflix. The goals of this project were to create a Continuous Delivery platform that would:
  • enable repeatable automated deployments captured as flexible pipelines and configurable pipeline stages
  • provide a global view across all the environments that an application passes through in its deployment pipeline
  • offer programmatic configuration and execution via a consistent and reliable API
  • be easy to configure, maintain, and extend  
  • be operationally resilient  
  • provide the existing benefits of Asgard without a migration

What’s more, we wanted to leverage a few lessons learned from Asgard. One particular goal of this new platform is to facilitate innovation within its umbrella. The original Asgard model was difficult to extend so the community forked Asgard to provide alternative implementations. Since these changes weren’t merged back into Asgard, those innovations were lost to the wider community. Spinnaker aims to make it easier to extend and enhance cloud deployment models in a way that doesn't require forking. Whether the community desires additional cloud providers, different deployment artifacts or new stages in a Continuous Delivery pipeline, extensions to Spinnaker will be available to everyone in the community without the need to fork. 

We additionally wanted to create a platform that, while replacing Asgard, doesn’t exclude it. A big-bang migration process off Asgard would be out of the question for Netflix and for the community. Consequently, changes to cloud assets via Asgard are completely compatible with changes to those same assets via our new platform. And vice versa!

Finally, we deliberately chose not to reimplement everything in Asgard. Ultimately, Asgard took on too much undifferentiated heavy lifting from the AWS console. Consequently, for those features that are not directly related to cluster management, such as SNS, SQS, and RDS Management, Netflix users and the community are encouraged to use the AWS Console.

Our new platform only implements those Asgard-like features related to cluster management from the point of view of an application (and even a group of related applications: a project). This application context allows you to work with a particular application’s related clusters, ASGs, instances, Security Groups, and ELBs, in all the AWS accounts in which the application is deployed.

Today, we have both systems running side by side with the vast majority of all deployments leveraging our new platform. Nevertheless, we’re not completely done with gaining the feature parity we desire with Asgard. That gap is closing rapidly and in the near future we will be sunsetting various Asgard instances running in our infrastructure. At this point, Netflix engineers aren’t committing code to Asgard’s Github repository; nevertheless, we happily encourage the OSS community’s active participation in Asgard going forward. 

Asgard served Netflix well for quite a long time. We learned numerous lessons along our journey and are ready to focus on the future with a new platform that makes Continuous Delivery a first-class citizen at Netflix and elsewhere. We plan to share this platform, Spinnaker, with the Open Source Community in the coming months.

-Delivery Engineering Team at Netflix