What a CIO should know about DevOps

The term DevOps has taken 3 years to become something of an overnight success. There are probably more myths and misunderstandings about it than there are hard facts. This is especially challenging if you’re a CIO or senior IT leader trying to make sense of whether it’s worth taking a look at.

 

To help cut through the noise, I sat down with Gene Kim and Patrick Debois, two of the DevOps movement’s most widely practiced individuals. We sat down to swap experiences and to discuss how a CIO should be thinking about DevOps. We discussed when to consider it, the business case for adopting it, patterns of success and importantly, patterns of failure in adopting a DevOps approach.

 

Follow this link to listen to the podcast from my interview with Gene Kim and Patrick Debois or you can read a full transcript on the Discover Performance blog.

 

Can it work for me?

One of the most challenging questions for an IT leadership team to consider is whether DevOps will benefit your organization. You don’t want to simply charge into it just because the last consultant that walked through your door made you feel silly for not having already kicked off a transformation project…

 

Between us we agreed that there are three general attributes and KPIs that provide a good indication that DevOps might help:

1). Work in progress (un-deployed change such as a new feature or application) is on the rise.

 

2). “Fragile”, poor availability applications finding their way into production resulting in a low tolerance for experimentation.

 

3). Long waiting lines for line-of-business projects and rampant “shadow IT”.

 

Each of these symptoms can be traced back to a root cause problem where DevOps might be able to help, but you need to take the time to understand the root cause. Let’s take a look at each in turn.

 

Work-in-progress

“Kanban” author Dave Anderson’s dislike of excessive work-in-progress (WIP) is so profound he helped found the limitedwipsociety.org website and the wonderfully misappropriated “Yes we kanban!” slogan. But beneath the humor, lies a deep truth – excess work in progress is often the root cause of fragile applications, change avoidance and the “shadow IT”. These result from frustrated line-of-business executives feeling they have no choice but to bypass corporate IT in order to execute against their goals.

 

Let’s assume you’re utilizing Agile and Kanban boards, measuring and establishing KPIs for excess WIP that is relatively straightforward. We’re looking for work that’s been started but not finished—not being actively tested by QA or not in production being tested by real users/customers.  The simplest way to establish this measure is to chat with your Agile team leader and take a look at the boards (going to gemba as it were).

 

Sadly, we’re not all at that stage of maturity. There are a few steps to take. First, check in with you Finance or Program Management Office and look for long running projects and/or projects burning through a lot of cash and that haven’t been delivered into production. Second,  simply go and chat with the line-of-business executives and get a sense of the tempo of delivery vs. project initiation.

 

Fragile

Measuring the fragility of applications and services is a more subtle exercise. Virtually everyone will complain that an application or service “crashes” from time to time. What we’re looking for here are signs of systemic failure to establish hardened, resilient services where the root causes can be traced back to design time decisions. This is not about  finger-pointing at developers for producing buggy code or operations staff for not having paid for enough “nines” in their 99.xxx% availability infrastructure.

 

The KPIs we’re looking for here are preventable defects in production, extended fault diagnosis times due to insufficient diagnostic/log file telemetry, outages/bugs due to discrepancies between development, test and production environments.  The ideal person to help with this process is your problem manager – assuming you’ve established a problem management process!

 

Long waiting lines

By far the easiest thing to measure is the waiting line of projects that cannot commence due to insufficient resources being available. Again your Program Management Office should be able to furnish you with this, although it might be easier to get at your next senior leadership team meeting (just remember to wear a flak jacket if you don’t already know the answer!).

 

Sadly there are a number of anti-patterns which might disguise how bad the situation really is. I am mainly thinking of the use of shadow IT resources, particularly cloud services, to get around IT bottlenecks. These users are often harder to find because they’ve often gone to great lengths to hide their footprints. Especially if shadow IT is not endorsed by the CxO suite.

 

Thankfully it’s easier to identify where teams have sought to reduce waiting lines by increasing the number of projects they run in parallel. Again your Agile team leaders and PMO should be able to furnish these numbers. But remember, that it’s a zero-sum game, your teams have simply substituted the frustration of waiting for your project to start to the frustration of waiting for the project to finish. Unrealised WIP increases as does pressure to ship code before it’s ready and before you know it, you’re in a worse position than when you started.

 

Not quiet as Agile as you thought?

“Wait a minute!!!” I hear you say, “didn’t we implement Agile specifically to address all of these concerns?!?!”. Before you reach for your scrum master’s throat, some clarification is in order.

 

Agile can and should improve the timeliness, quality and cost of new applications, maintenance and services. You wouldn’t believe the number of Fortune 1000 IT shops I have visited where the mere mention of Agile near an Operations manager results in a 30 minute tirade. Where it has failed to achieve these goals is where it results in a local local optimisation, that is where the optimization ignores the impact up and downstream of the development process., This leads to the symptoms we describe, effectively undermining the Agile transformation and giving it a bad name.

 

It’s because of these near-sighted optimisations that “bad Agile” can create an impedance mismatch between development and the other parts of the IT supply chain, in particular security and operations functions.  Agile teams function in terms of a “release per day” philosophy (ie: the whole code package should be able to be compiled and “run” at the end of each working day) – something that you can only confirm by subjecting it to QA on a test environment.

 

Operations teams don’t have the time to build and deploy unfinished assets on a daily basis. The work is left to the dev and QA teams to jerry-rig their own build and deployment processes together. This typically results in a massive difference between the system under development and test vs. the ultimate production system.

 

Assets and artifacts from the Agile build process are not placed under effective change control, they are rarely utilized downstream from development. The teams that will be spending the most time with the application (75-95 percent of an applications’ lifecycle is spent in production vs. development) are not involved in the requirements management process until far too late, if at all.

 

DevOps, along with the principals of continuous integration, delivery and deployment, aim to reduce the impedance mismatch. They do it by aligning people and processes with automated tools along the value chain in order to help deliver applications and services faster, at each stage of the application lifecycle.

 

The DevOps payback

We all agreed it’s innovation that is the real goal here. It’s all about looking for and removing bottlenecks to innovation.

 

It’s not about saving cost, it’s about activating innovation. In other words, if technology is a side-line, then frankly, I’m not sure you’re not going to see a lot of payback.  Chances are that you’re not doing continuous development, delivery  or deployment.

 

In cases like this, a typical combination of waterfall/Agile, ITIL and COBIT processes are likely to suffice – assuming the cost of carrying large amounts of work-in-progress are not excessive of course.

 

That’s not to say that you wouldn’t benefit from better collaboration across organizational lines, tool consolidation and the rest. The return on that investment is going to have to come from hard cost savings, not reduced opportunity cost/top-line benefits.

 

 

Patterns and anti-patterns of success

There are so many patterns for success, rather than skip through them, we’ll save that for a later post. Let’s say you should be aiming to reduce quantum—the size—of changes and increase the frequency of change (a phrase borrowed from manufacturing is to “reduce the batch size”). This is a critical insight that runs against the common sense experience that fewer changes means fewer outages.

We’ve all had  the experience of arriving at work on Monday morning only to find your bleary-eyed operations staff sleeping under their desks. They just saved the organization after backing out a major change that went horribly wrong – as a result we’ve learned to avoid change. Change avoidance is a critical anti-pattern of a DevOps transformation.

 

That’s not to say small changes can’t hurt you – they are often the source of some of the biggest outages.  An example is Google’s outage several years back when a single punctuation mark destroyed the system—you need to change how you think about quality to catch defects earlier.

 

It’s one of the reasons why Quality Assurance/testing must become a core discipline for DevOps and Agile shops. To make it stick and to keep the cost of quality low, it should be automated and comprehensive. Most importantly it should reflect the production environment to the greatest degree possible. Ideally this is through automation of the dev-to-test-to-stage-to-production automation process.

 

Speaking of anti-patterns, much like a fool with a tool is still a fool, a Dev who thinks like Ops is still a flop if they’re unable to execute due to organizational mis-alignment outside of IT. Patrick in particular highlighted this by discussing that a CIO might be thinking about aligning incentives within their own organization; they should remember that people outside of the IT group can be just as big a bottleneck to innovation.

 

As the most senior leader in the IT team, the CIO’s role in a DevOps transformation is to anticipate those bottlenecks (such as over zealous compliance teams) and ensure they’re on board and part of the process, not part of the problem.

 

DevOps is not an all-or-nothing proposition. Any transformation should start with identifying where its unique value can be applied. Where’s yours?

Labels: Agile| DevOps| kanban
Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Paul Muller leads the global IT management evangelist team within the Software business at HP. In this role, Muller heads the team responsib...


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation