Chaos Engineer: Building Confidence in System Behavior through Experiments

Quick Summary: This was a good (and brief) introduction to Chaos Engineering by some top-notch software engineers. It was quick and easy to read, and the book contains some nice case studies.

Book Notes:

– Using Chaos Engineering may be as simple as manually running kill -9 on a box inside of your staging environment to simulate failure of a service.

– Failure testing breaks a system in some preconceived way, but doesn’t explore the wide open field of weird, unpredictable things that could happen.

– In testing, an assertion is made: given specific conditions, a system will emit a specific output. Tests are typically binary, and determine whether a property is true or false. Strictly speaking, this does not generate new knowledge about the system, it just assigns valence to a known property of it.

– Chaos Engineering is a form of experimentation that generates new knowledge about the system.

– Complexity is a challenge and an opportunity for engineers.

– Any organization that designs a system (defined broadly) will inevitably produce a design whose structure is a copy of the organization’s communication structure. – Melvin Conway, 1967

– The performance of complex systems is typically optimized at the edge of chaos, just before system behavior will become unrecognizably turbulent. – Sidney Dekker, Drift Into Failure

– In our field, the idea of doing software verification in a production environment is generally met with derision. “We’ll test it in prod” is a form of gallows humor, which translates to “we aren’t going to bother verifying this code properly before we deploy it.”