Wednesday, August 3, 2016

Vizceral Open Source

Previously we wrote about our traffic intuition tool, Flux.  We have some announcements and updates to share about this project.  First, we have renamed the project to Vizceral.  More importantly, Vizceral is now open source!

Open Source

Vizceral transformed the way we understand and digest information about the state of traffic flowing into the Netflix control plane. We wanted to be able to intuit decisions based on the holistic state of the system. To get that, we needed a tool that gives us an intuitive understanding of the entire system at a glance. We can’t afford to be bogged down in analysis of quantitative or numerical data, or otherwise ‘parse’ the information in a typical dashboard. When we can apply an intuitive approach instead of relying on the need to parse data, we can minimize the time an outage impacts millions of members. We call the practice of building these types of systems Intuition Engineering. Vizceral is our first, flagship example of Intuition Engineering.

Here is a video of a simulation of the global view of Vizceral when moving traffic between regions.

Here is a screenshot of the same global view.

Global View.  Note that the numbers in the screenshot are example data, not real.
After proving the importance of Intuition Engineering internally on the Traffic Team at Netflix, we weighed the responsibilities of maintaining an open source project against the benefit that we get from diverse input from and the benefit that we think we can provide to the community at large. The feedback that we received after the initial blog post was overwhelmingly positive. Several individuals and companies expressed interest in the code, and in contributing back to the project. This gave us a pretty strong signal of value to the community. Ultimately we decided to take the plunge and share our solution. Here are four repos that we are open sourcing:
  • vizceral: The main UI component that lets you view and interact with the graph data. 
  • vizceral-react: A react component wrapper around vizceral to make it easier to integrate the visualization into a react project. 
  • vizceral-component: A web component wrapper around vizceral to make it easier to integrate the visualization into a project using web components. 
  • vizceral-example: An example project that uses vizceral-react and sample data as a proof of concept and a jumping off point for integrating the visualization into your own data sources. 
The component takes a simple JSON definition of graph data (nodes and connections) with some metrics and handles all of the rendering.

Internally at Netflix, we have a server-side service that gathers data from Atlas and our internal distributed request tracing service called Salp. This server-side service transforms the data into the format needed for the Vizceral component and updates the UI via web sockets. We separated the logic into the distinct parts of Vizceral and the server-side service so that we can reuse the visualization with any number of data sources.

Regional View

In the previous post, we discussed the global view, showing the traffic flowing into all the Netflix regions and the traffic being proxied between regions. What if you want to get more detail about a specific region?

Introducing the regional view:

Here is a screenshot of the same view.

Regional View
If we click on one of the regions, it brings us to a zoomed-in view of the microservices operating in that region. The far left side has one node which represents the ‘internet’ and all the connections from the internet are the entry points into the stack. We use similar concepts as in the global view, but simplified: circle nodes with connections between them with traffic dots flowing on the connections.

We minimized the inter-node connections to a single lane of travel to minimize noise. The traffic dots represent the same thing as the global view, with yellow and red dots showing degraded and error responses between services. The nodes also can change color based on assumed health of the underlying service to give another quick focal point for where problems might exist in the system.

We tried a bunch of standard graph layout algorithms, but all of the ones we found were more focused on ‘grouping close nodes’ or ‘not overlapping connections.’ Grouping close nodes actually did us a disservice since closeness of nodes does not mean they are dependent on one another. Connections not overlapping would be nice, but not at the expense of left-to-right flow. We tested our own, very simple layout algorithm that focused on a middle weighted, left-to-right flow with a few simple modifications. This algorithm has much room for improvement, but we were immediately happier with this layout than any of pre-canned options. Even with the less than perfect layout, this visualization provides a great overall picture of the traffic within a given region and a good gut feeling about the current state of the region.

If you want to look at a service in even more detail, you can hover over the node to highlight incoming and outgoing connections.

Service Highlighted
You can click on a node, and a contextual panel pops up that we can fill with any relevant information.

Context Panel for Highlighted Service

Currently, we just show a tabular view of the connections, and the list of services that make up this node, but we are adding some more detailed metrics and integrations with our other insight tooling.

If you want to dig in even further, you can double click on the node to enter the node focused view.

Focus on Service
This view allows us to really focus on traffic between the service and its upstream and downstream dependencies without being distracted by the rest of the region.

Getting Started

The easiest way to get started would be to follow the setup instructions in the vizceral-example project. This will setup a fully functional project with dummy data, running on your development machine.

If you would like more information on this project, check out the following presentation. Justin Reynolds, tech lead on this project, gave a talk at Monitorama on 6/29/2016 about Vizceral that provides additional context on the how and the why.

Vizceral has proven extremely useful for us on the Traffic Team at Netflix, and we are happy to have the opportunity to share that value. Now that it is open sourced, we are looking forward to discussions about use cases, other possible integrations, and any feature/pull requests you may have.

-Intuition Engineering Team at Netflix
Justin Reynolds, Casey Rosenthal