At a recent service management event I was
speaking with a manager who was looking to “improve the way that incidents were
handled, in both time reductions and amount”. She said that recently they were
reviewing the incident records for a self-audit and felt that they didn’t have
enough information in them to make lasting improvements.
The tool she is using has a template within it
for incidents, which has a standard set of questions. The trouble, as she pointed out, were that many of the questions were going unanswered. A review was conducted of the questions which resulted in the removal of some of them as they appeared to provide little value. Now they only have a few questions which are regularly
asked along with additional information which is gathered by the service desk
depending on the escalation. To me the questions which are not asked may have more information in them than you might think.
I explained that I have also had a similar experience
however I found the questions that were unanswered could provide as much
information as the tangible data which was captured.
Here are some practical examples complete with
responses and translations:
When did the issue begin?
Response: “I don’t know”
Translation: We may not know when the issue began but we should have an idea when was it working as it was meant to. From here we can fine
tune when the issue began even if we are unable to tell exactly when that was by any quantifiable way. For example, this could be the
result of a change gone wrong.
How many people are impacted?
Response: “I don’t know, it’s only me, I am the
only one on site”
Translation: depending on the business
operations you may have very few people on sites where gathering information
from a selection of people could be limited. Since this person represents the
entire office you may want to investigate what else they are unable to do in an
effort to understand any impact beyond what is reported. For example they may
not be able to access a particular application but after further questioning
you may identify that they can access the network, email, phones. This will
help to identify the scope of the issue occurring at their site which might not
be immediately clear otherwise.
What steps were taken to reach this error?
Response: “I just can’t perform activity X”
Translation: typically when people escalate
some issue they indicate what they are not able to do. In some cases we need to
know all the steps leading up to the particular activity which is not working
to diagnose what the underlying cause is. You are likely not that familiar with
all the workflows of your business. Get the caller to take you through step by
step what is working and then what does not. Get a screen capture whenever
possible.
In some cases the issue may have nothing to do
with infrastructure or applications. It could boil down to communication or
training.
Communication
We have all heard that a large majority of
incidents are generated by changes but what about the escalations that are
coming in as a result of a change which is underway. For example someone calls
in and they are unable to access application X. in the initial check you
identify that they can’t access the application because there is a scheduled
outage underway. Be careful how you communicate this to them. We don’t want to
belittle them, but we want to understand why they were not aware. It could be
that they simply did not see any communication OR that they did not get the
communication in the first place. This information should be shared with the
manager of the service desk as well as any pertinent service management teams
such as change management.
Training
Applications tend to change in both functionality and appearance over time. Because of this there
are situations which are escalated that are not actual issues, rather as a result in the person not having the appropriate training for the new functionality. Again we want to capture this information so that we
can build a knowledge article in the event that anyone else has a similar
challenge. We should also communicate this information back through the team
responsible for the update as well as change and release management.
In the end we want to position IT to be able to
respond to escalations quickly but to also make lasting improvements through
understanding all of the moving parts of the issues which we are working to
reduce for our customers.
Follow me on Twitter @ryanrogilvie or connect with me on LinkedIn
Labels: Continual Service Improvement, Incident Management, ITIL, ITSM, Service Management