In a distributed environment, failure of any given service is inevitable. Hystrix is a library designed to control the interactions between these distributed services providing greater tolerance of latency and failure. Hystrix does this by isolating points of access between the services, stopping cascading failures across them, and providing fallback options, all of which improve the system's overall resiliency.
Hystrix evolved out of resilience engineering work that the Netflix API team began in 2011. Over the course of 2012, Hystrix continued to evolve and mature, eventually leading to adoption across many teams within Netflix. Today tens of billions of thread-isolated and hundreds of billions of semaphore-isolated calls are executed via Hystrix every day at Netflix and a dramatic improvement in uptime and resilience has been achieved through its use.
The following links provide more context around Hystrix and the challenges that it attempts to address:
- Making the Netflix API More Resilient
- Fault Tolerance in a High Volume, Distributed System
- Performance and Fault Tolerance for the Netflix API
Hystrix is available on GitHub at http://github.com/Netflix/Hystrix
Full documentation is available at http://github.com/Netflix/Hystrix/wiki including Getting Started, How To Use, How It Works and Operations examples of how it is used in a distributed system.
You can get and build the code as follows:
$ git clone git://github.com/Netflix/Hystrix.git $ cd Hystrix/ $ ./gradlew build
In the near future we will also be releasing the real-time dashboard for monitoring Hystrix as we do at Netflix:
We hope you find Hystrix to be a useful library. We'd appreciate any and all feedback on it and look forward to fork/pulls and other forms of contribution as we work on its roadmap.
Are you interested in working on great open source software? Netflix is hiring!