It's not too often that we talk about Corporations displaying courage. The stories that make it to the newspapers are usually concerning bureaucratic incompetence or uncaring actions, both of which I could argue largely arise from group-think, and the failure of any individual to stand out and take chances outside of the system. But I'd like to call attention today to Netflix, which made a startlingly intelligent and risky decision.
During a recent outage of Amazon Cloud services, Netflix was one of the very few Amazon's customers which felt minimal impact from the incident. This was because of an internal tool at Netflix called (don't laugh) "Chaos Monkey". It seems that Netflix decided early on that it was important to have redundant systems, and they wanted to make sure any single failure wouldn't take down their environment.
So far so good. Most companies make the same decision, and it seems a safe and easy decision for a CIO to put a bit more money into his budget for redundancy. If it doesn't work, hey, don't look at me, I tried! But Netflix went enormously further. They built "Chaos Monkey", designed to take down their own servers, anytime, anywhere. This isn't in a controlled sandbox. It isn't even their development environment. This is production. Developers who don't design 100% redundant systems find out about it REAL fast.
I've worked in many corporations, some of them quite good, and have yet to work in an environment which would have the stomach for such a radical proposal. And yet what could be more effective at achieving their goals?
What are you convictions? And what would you be willing to risk to live them?