The Incident Management Paradox

A connection of mine on Twitter was looking to hire an incident manager within their organization and was wanted to bounce the job posting off me to see if there was anything that could be added or removed in the post. Since many organizations are looking to cut costs wherever they can I asked if this was a replacement person. They said that they were looking to hire an additional person since their incident rates had been going up.

As I read this I had two thoughts. On the one side, this organization was looking to ensure that they were able to manage the issues that they were seeing as effectively as they could. On the other hand that same organization is enabling the mismanagement of issues by not understanding what drives them in the first place.

Here is an example. If we had a performance issue with an application and the suggestion was to bolster the clustered environment buy throwing more servers at it we could fix the issue. That would work but what is the value proposition of doing that? Not particularly good I would say.

Eventually when we are looking at a value of the service we are providing this just does not make sense.

The addition of the incident manager is a similar situation. We should have a fundamental understanding of what is driving these issues in the first place. While initially we may need this person to handle the flood of work that is coming in I instructed my friend to think of a long term strategy to reduce incidents in the first place.

To start, take a look at the top 10 issues which you seem to be facing and determine their source. By source I am talking about breaking it down into chunks like this:

There are a few others you could use but this will at least target areas where we as an IT capability can look at some issues and where we may need to look at other teams to help facilitate the issues which are being experienced.

While the application and infrastructure issues are fairly straight forward, training issues may apply to the way that training was handled after a new tool was implemented or updated and as a result we may need to vet similar rollout or updates in a more managed and consistent way to reduce these escalations.

User perception issues are similar in that they may have expected something as a result of a deployment and are not seeing what was expected, this could be a result of communication breakdowns or user acceptance to name a few challenges but we need to know what they are if we are going to build a strategy to reduce these.

Depending on how projects are transitioned into operations we may see a combination of the two previous examples or in some cases if IT was not looped in throughout the project until just before ‘go live’ we may have other issues which are being escalated which could have been prevented.

In the end IT needs to broaden its understanding of the services which it provides so that it can take a list of issues and plan a strategy to reduce them one at a time and improve the customer experience overall.

Follow me on Twitter @ryanrogilvie or connect with me on LinkedIn

 

 

Labels: , , , , ,