The Resilient Enterprise - Learning to Fail, Part 1

Let's talk a little bit more on The Resilient Enterprise, and why organizations need to learn to fail.  Failing isn't necessarily good, or bad ...it just is.  Failing is a fact of humanity and a fact of the universe.  Failure is one of those immutable universal laws that is driven by chaos.  Furthermore, I think we can probably agree at some base level that failure is one of the only things we can count on as a universal constant.

 

Having said all that, you may think me all "doom and gloom" but don't misunderstand - I repeat that failure isn't bad or good - it just is.

 

In my conversations with Genefa Murphy and Matt Morgan, some of HP's top minds here at HP Discover 2012 in Las Vegas when it comes to DevOps, I believe even more firmly that one of the things that will link the tribe (as Gene Kim calls it) of diverse people into a cohesive DevOps group is failure.

 

When things go badly we as IT teams have two options, and unfortunately for just about everywhere I've ever worked, we pick the same option consistently.  See if you agree with me here...

 

Option 1 is by far the one anyone who's ever worked an operations role in an enterprise is used to.  Option 1 involves a large mass of people getting together, most likely virtually, and doing a huddle and trying to pass the blame off on one another.  We always start with the firewall, don't we?  It's always the firewall's fault somehow, or at least that's how I remember it.  There are two terrible habits at work here, the first is the finger-pointing.  The second, if we get over the "Who's fault is it anyway?" game is the fun situation where out of the 30 people on the conference call someone says "Now, <name> try <random option>. Done? OK, does it work now?" ... sound familiar to anyone?  This situation is exactly what the DevOps movement is trying to eradicate as much as possible since it makes the whole process painful, creates animosity and makes us no better at figuring out failure the next time.  We simply repeat this train wreck again.

 

Option 2 is subtly different.  While we still have a group of people getting together, again most likely virtually, it's a different type of scenario.  I'm thinking of a situation where we have a representative from each team who has a stake in the application or system, and is intimately familiar with the deployment and architecture.  It should be rare for people to meet the first time here, or be part of a "one ops team triages all" type of function.  If a project I'm responsible for falls over at 4am, I get woken up to fix it ...odds are I can diagnose it and get it back serving business faster than some team whose job it is to simply do "operations".  As a side effect since these are all stake-holders we have the tribe Gene talks about form, and the knowledge of failure and repair stays within the group and maybe, just maybe, gets documented for future use.  Hopefully the failure is documented and we can build resiliency against that type of failure into the application either now or in the future.

 

Adam Shostack told me we don't learn enough from our failures in IT.  I whole-heartedly agree, Adam.  Learning to fail and get back up is critical ...and I think this makes the DevOps tribe idea that much more crucial and realistic.  I think this is where IT is evolving to, and watching as an outsider across these different functions I'm noticing these types of patterns appear.  Enterprise resiliency is a brilliant concept that I'm sure has been talked about before but could not be any more crucial than it is today.  We're at an inflection point, and things must, absolutely must, evolve.

 

If the agile enterprise is to become a reality, not just something we talk about at conferences and write books about, then it needs to be a core ideal, served by every technical and non-technical function and products and services to enable that core ideal.  The road to the agile enterprise starts with an awakening to DevOps.  Step 1, learning to fail, recover and move on.

 

Next time, I'll give you some of the how behind this idea.

Comments
| ‎06-10-2012 12:15 PM

There's another, third option here.  Not failing.

 

As a musician, there are many tricks that you can use to obfuscate your mistakes from the audience so that they might never know that you accidentaly started on the wrong note, or came in late on a measure.  You combine conviction with a musical escape route - no one will know your failure.  This artistic license that you employ during a performance can be used sparingly, or liberally - mostly at the expense of your ticket sales over the long-run of the show.  These "tricks" work well because the majority of the audience members don't have a clue about how to play your instrument and they are expecting to hear something unique and different, typically.

 

Audiences for IT are different.  First, they usually have more than just a clue about how your application is supposed to be working - in fact, as we deliver our applications to an ever-increasing technology savvy market - most of the audience members know how apps are supposed to work, just like you do.  E-commerce applications are a good example.  Second, unless your application is explicitly creative (e.g. movies, entertainment, games) the audience is absolutely not interested in something unique and different.  They just want it to work correctly and efficiently - moreso as they experience in the increased failure rates.

 

I don't agree with failure.  IT management should take off the training wheels and stop accepting rationalization of failure. Don't be sorry.  Be right.

northwaysg | ‎06-10-2012 02:29 PM

What Mark said.....

Michael Fornal(anon) | ‎06-12-2012 10:53 AM

Have to say that i don't agree with the first comment at all.  Failure is a part of life and no one likes it but ...it just is. It's easier to play the blame game then to stop and look at what went wrong and try and learn from it and correct it now or in the future. I think if more companies tried option number 2 they would find that they would be better productive and produce a better quality of product for their customers.

 

Thnxs,

@fornalm 

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
About the Author


Follow Us
Community Announcements
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation