Previously we wrote about our traffic intuition tool, Flux. We have some announcements and updates to share about this project. First, we have renamed the project to Vizceral. More importantly, Vizceral is now open source!
Open SourceVizceral transformed the way we understand and digest information about the state of traffic flowing into the Netflix control plane. We wanted to be able to intuit decisions based on the holistic state of the system. To get that, we needed a tool that gives us an intuitive understanding of the entire system at a glance. We can’t afford to be bogged down in analysis of quantitative or numerical data, or otherwise ‘parse’ the information in a typical dashboard. When we can apply an intuitive approach instead of relying on the need to parse data, we can minimize the time an outage impacts millions of members. We call the practice of building these types of systems Intuition Engineering. Vizceral is our first, flagship example of Intuition Engineering.
Here is a video of a simulation of the global view of Vizceral when moving traffic between regions.
Here is a screenshot of the same global view.
|Global View. Note that the numbers in the screenshot are example data, not real.|
- vizceral: The main UI component that lets you view and interact with the graph data.
- vizceral-react: A react component wrapper around vizceral to make it easier to integrate the visualization into a react project.
- vizceral-component: A web component wrapper around vizceral to make it easier to integrate the visualization into a project using web components.
- vizceral-example: An example project that uses vizceral-react and sample data as a proof of concept and a jumping off point for integrating the visualization into your own data sources.
Internally at Netflix, we have a server-side service that gathers data from Atlas and our internal distributed request tracing service called Salp. This server-side service transforms the data into the format needed for the Vizceral component and updates the UI via web sockets. We separated the logic into the distinct parts of Vizceral and the server-side service so that we can reuse the visualization with any number of data sources.
Regional ViewIn the previous post, we discussed the global view, showing the traffic flowing into all the Netflix regions and the traffic being proxied between regions. What if you want to get more detail about a specific region?
Introducing the regional view:
Here is a screenshot of the same view.
We minimized the inter-node connections to a single lane of travel to minimize noise. The traffic dots represent the same thing as the global view, with yellow and red dots showing degraded and error responses between services. The nodes also can change color based on assumed health of the underlying service to give another quick focal point for where problems might exist in the system.
We tried a bunch of standard graph layout algorithms, but all of the ones we found were more focused on ‘grouping close nodes’ or ‘not overlapping connections.’ Grouping close nodes actually did us a disservice since closeness of nodes does not mean they are dependent on one another. Connections not overlapping would be nice, but not at the expense of left-to-right flow. We tested our own, very simple layout algorithm that focused on a middle weighted, left-to-right flow with a few simple modifications. This algorithm has much room for improvement, but we were immediately happier with this layout than any of pre-canned options. Even with the less than perfect layout, this visualization provides a great overall picture of the traffic within a given region and a good gut feeling about the current state of the region.
If you want to look at a service in even more detail, you can hover over the node to highlight incoming and outgoing connections.
|Context Panel for Highlighted Service|
Currently, we just show a tabular view of the connections, and the list of services that make up this node, but we are adding some more detailed metrics and integrations with our other insight tooling.
If you want to dig in even further, you can double click on the node to enter the node focused view.
|Focus on Service|
Getting StartedThe easiest way to get started would be to follow the setup instructions in the vizceral-example project. This will setup a fully functional project with dummy data, running on your development machine.
If you would like more information on this project, check out the following presentation. Justin Reynolds, tech lead on this project, gave a talk at Monitorama on 6/29/2016 about Vizceral that provides additional context on the how and the why.
Vizceral has proven extremely useful for us on the Traffic Team at Netflix, and we are happy to have the opportunity to share that value. Now that it is open sourced, we are looking forward to discussions about use cases, other possible integrations, and any feature/pull requests you may have.
-Intuition Engineering Team at Netflix
Justin Reynolds, Casey Rosenthal