Application Performance Management that meets today’s Ops challenges head-on

I have been allowed away from my desk of late to see the inner workings of our customer’s operations support teams. I don’t get a chance to do it as often as I used to, so I enjoy taking an almost time-lapse view of how Operations Management is evolving over the years.

 

We have seen tectonic shifts in technology over the past four to five decades (for the record, not quite personally in my case). Each era has brought its own set of challenges to isolating infrastructure issues so they can be diagnosed and fixed. With the mainframe, we had to diagnose a simple terminal device, through (early on) an SNA network, to “THE HOST”—then it was very reasonably easy (no offence to the Mainframe Jedis who maintain the equilibrium) to spot the failed jobs once there.

 

Client server introduced a new component to the mix, namely the local client. More intelligence was built into the client to reduce the non-essential workload on the server and to improve the user experience. Then came the browser wars of the mid-90s leading into the noughties and Java becoming ready for the enterprise. Welcome to the world of the n-tier architecture separating presentation, logic and persistence onto different platforms, making it possible to spread users even farther apart. We obviously got too good at managing that, as we decided to push the envelope to the mobile age, where we take the web n-tier model and add complexity on the front by way of a cellular network connection and a plethora of almost (but not quite) alike client devices.

 

When we moved into the web age, we found that siloed support teams no longer worked most efficiently in an n-tier architecture, as diagnosing an incident required a multitude of skills from across most—if not all—core capabilities of IT, i.e. networks, systems administration, developers and DBAs. It was a challenging time trying to adjust to this horizontal support model in a compartmentalised environment. We introduced skillsets involving the knowledge of the transaction path through the system rather than focussing on just the technologies it passed through along the way. As a consequence, operational support systems needed to adapt and grow to better support the new world order. The transactions now touch an alarming amount of different technologies on their way from the user to the persistence layer and back again. Monitoring and overall service health has never been so fraught with complexity. And it will only get more difficult.

 

So we need a business service management solution that goes deeper and broader than before, and covers three main areas of assurance: user experience, transaction integrity and application diagnostics. It has to be able to monitor all of the technology along the way and be able to help isolate a problem at any point along that transaction path. Further, we may need to individuate a single transaction and diagnose it for unique characteristics that are causing the issue. As we are talking about an n-tier application, we may be in the situation where different service providers are responsible for stages of the transaction path. We may therefore not be allowed to deploy agents or other monitoring hooks into code or application servers, depending on warranties or SLAs.

 

So we find ourselves trapped.

 

Or not. At HP, we have years of technology-monitoring experience. We have amassed decades of experience in managing the IT infrastructure correctly through shifts in technology, while all along bringing knowledge, experience and excellence in monitoring technology.

 

Back to my customer visits: I was somewhat disappointed to see that we still have large organisations out there that still perform only the minimum of monitoring in their environments, thereby missing out on critical accelerators of mean-time-to-resolution. I was dumbfounded to hear one customer stating that they had a two-day outage, due in part to internal political issues around getting access to log files! Even worse, within an hour of the outage, they suspected where the problem was. As usual, someone had done a patch update on a dependant system and it had broken the path. It took the rest of the days to get the logs and prove that was the fault. I had hoped we had moved beyond these times.

 

With a well-implemented Application Performance Management (APM) system, they would have immediately been able to demonstrate precisely where that problem was in the path. Further, had there been diagnostics in place, it would have isolated an individual API, system or service call that caused the issue, and it would have presented it in an easy-to-interpret way that would have reduced this two-day outage to less than a day.

 

What would have been even better, would be predictive analytics into APM. APM is a fantastic leap forward in monitoring and managing the health of a business service, but even the millisecond it takes to respond still needs the event or at least the symptoms to begin to occur in order to trap it. Even the ultimate solution happens after the fact.

 

Moving into Operations Analytics and predictive service health monitoring can finally get us ahead of this curve and help stop events before they become a problem. Big Data as an engine to help us is a massive opportunity for us to gain insight into the problems we see on a daily basis. Previously the sheer volume of logging and event information meant we just couldn’t make use of that data economically enough. This tide is turning, and I am excited to see the new era of system health monitoring evolving. But how do we justify the effort to put in monitoring? We need self-learning and self-healing systems that can detect and remediate issues without intervention.

 

Terminator fans, we are getting ever closer to the day that “Skynet comes online and becomes self-aware.”

 

Related links:

The Agile—on balance—enterprise

Make Big Data work for Ops

 

Ken O'Hagan is director of software presales at UK&I at Hewlett-Packard. Before coming to HP, Ken amassed close to 10 years of technical experience, working for companies such as Perot Systems and The Bank of Ireland. During his time at the latter, he was responsible for architecture definition/validation, hardware specification, technical design, and implementation and was a key part of the team that successfully implemented the five largest programs ever delivered for Bank of Ireland.

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
This account is for guest bloggers. The blog post will identify the blogger.
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.