Docs > Random > The Netflix Simian Army: reliability, security, resiliency and recoverability

The Netflix Simian Army: reliability, security, resiliency and recoverability

Along with price and scalability, redundancy and fault-tolerance are possibly the most important triggers driving cloud migration.
The cloud architecture should allow failure without affecting the availability of the entire system.
We want to be able to test the failure scenarios.

Chaos Monkey

Randomly disables production instances
Testing ability to survive the failure without overall impact on the service
Leads to building automatic recovery mechanism to deal with system failures

Latency Monkey

Induces artificial delays to RESTful client-server communication layer to simulate service degradation.
Measures if upstream services respond appropriately.
Simulate a node or an entire service downtime without physically bringing these instances down.

Conformity Monkey

Finds instances that don’t adhere to best-practices and shut them down.

Doctor Monkey

Detecting unhealthy instances using health checks and other external signs of health.
Removes unhealthy instances from service.

Janitor Monkey

Searches for unused resources and disposes them.

Security Monkey

Finds security violations and vulnerabilities and terminates the offending instances.

10-18 Monkey (Localization / Internalization)

Detects configuration and run time problems in instances serving customers in different multiple geographic regions.

Chaos Gorilla

Simulates an outage of an entire Amazon availability zone.
Services should re-balance to the functional AZs without user-visible impact or manual intervention.

The Simian Army project on Github has retired and the functionality has been moved to other Netflix projects. Check the Simian Army Github page to find more details about hte new projects.