In my most recent post, I wrote about good IT configuration management. This post turns to the discipline of problem management: identifying and classifying problems; determining their root causes; and providing timely resolution to prevent recurring incidents. Doing this well accomplishes many things, including increasing availability, improving service levels, reducing IT costs, and improving customer satisfaction by reducing the sheer number of operational problems.
So while incident management is about getting the customer up and running, problem management is about making incidents go way permanently. Simply put, problem management is an incident terminator. Executed well, problem management can be a game changer.
5 metrics to track for optimal problem management
I think the essence of effective problem management is resolving IT problems so they do not reoccur. How can work toward this goal? COBIT 5 suggests five metrics you should track:
1) Number of recurring incidents caused by unresolved problems (should decrease over time)
2) Percent of major incidents for which problems were logged
3) Percent of workarounds defined for open problems
4) Percent of problems logged as part of proactive problem management activity
5) Number of problems for which a satisfactory resolution that addressed root causes were found (should increase over time)
Together, these measures show whether problem management is effectively managed within an IT organization whether the quality of your problem management is in fact improving. Clearly recurring incidents should be reduced with effective problem management, but also workarounds should be more quickly defined. Problem management should become more proactive so that solutions are found more quickly. And obviously determining root causes is kind of the proof of the pudding.
So where should you start?
As always, my suggestion is that you start where the most immediate value can be driven. But if it were up to just me, I would start with the number of problems for which a satisfactory resolution that addressed root causes were found. We need to drive this number up to increase availability and reliability. What do you think? What would be first on your list? I would love to hear back from you.
Blog post: Making COBIT 5 part of your IT strategy
Solution page: IT Performance Management
Solution page: Problem Management
Solution page: Problem Isolation Software