All the systems need a Simian Army

Published at techblog.netflix.com on July 19, 2011, Netflix explain a tool that randomly disables their production instances to make sure to survive a failure.
With an army of 7 kinds of “soldiers” everyone have a specific job:
  • Latency Monkey induces artificial delays in our RESTful client-server communication layer to simulate service degradation and measures if upstream services respond appropriately.
  • Conformity Monkey finds instances that don’t adhere to best-practices and shuts them down.
  • Doctor Monkey taps into health checks that run on each instance as well as monitors other external signs of health to detect unhealthy instances.
  • Janitor Monkey ensures that the cloud environment is running free of clutter and waste.
  • Security Monkey finds security violations or vulnerabilities.
  • 10–18 Monkey detects configuration and run time problems in instances serving customers in multiple geographic regions
  • Chaos Gorilla simulates an outage of an entire Amazon availability zone.
It’s interesting ask seven years after if every system need a Simian Army to ensure the availability of the platforms.
More information on: The Netflix Simian Army
Enjoy!!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: