Friday, 5 April 2013

Incidents Caused by Changes, Nessie and the Yeti

I recently overheard someone say that they were having application issues, was there a change? At an initial look it didn’t appear so. Much like the Yeti, and my favorite Nessie, having actual proof of the existence of the change relationship there are those which believe they exist. There is a statistic that says that 80% of Incidents are the result of a Change. The real question here is how you quantify the number of changes which actually cause the incidents and how can you better position yourself to be able to do so.

In some cases identifying that a Change has inadvertently caused an issue is easy (and obvious – network change and network outage). While other times the relationship is not quite as clear, network change and application issues. Some factors which can complicate matters include:

·         the time elapsed since the change completed and the incident is reported,
·         the description is not reflective of the symptoms which are being displayed
·         there were change activities which may not have been tracked in the known change or
·         the change was not tracked at all.

Leaving the latter aside let’s take this example into consideration:
Assuming we have a Problem, Change and Incident management process owner we should start by getting them together to review activities at regular intervals. Each of the process owners bring the following information to the table

Change Management
There was 200 Changes last month. The success rate was 99 percent
This meant there were 2 Changes which were reported to have caused Incidents
There are 10 Incidents related in total to these Changes

Incident Management
There were 100 incidents reported last month of varying severities

Problem management
What are they working on or has been escalated to them.

So if we are working on the premise that 80% of incidents are the result of a change we should have had 80 incidents
related to changes rather than 10. Obviously this metric is isn’t an exact science, however what it represents is important.

Can we really quantify the relationships between Incidents, Problems and Changes? We now need to ask ourselves were there more incidents as a result of a Change that were not identified? Also how many of these incidents were common to one particular issue or required a Problem Management review. The more you discuss these activities amongst yourselves the more you will realize what other moving parts which can contribute to streamlining improved service.

These types of reviews at the very least within your ITSM teams should be going on right now, and if they are not it is time to take a closer look at what you are doing, review your KPI’s and work towards improving the customer experience.

Follow us on Twitter @ryanrogilvie

1 comment:

  1. Could the use of analytical techniques help to identify Changes that are likely to cause Incidents? I was wondering if we could use transfer functions for this?