Monday, 6 February 2017

What Happened? Performing Service Management Reviews

Whether you are looking to review implementations, Incidents or some other activity, a review allows you to go back and see where you are able to improve. This applies to anything. Take Super Bowl 51 or example. At the halftime I am sure that the losing team, who had made some significant errors in the first half of the game, was reviewing what had happened. There is now the opportunity which exists to take what we have learned and apply it so that we can move forward. We can’t change the past but we can work to improving the future.

In practical terms here are a few examples which we should be reviewing with some regularity.

Incident Reviews
A critical service had failed over the weekend and several resources were engaged to restore it. After several hours, a conference call, and many escalations to support resources your team was able to fix the situation. Reviewing these incidents after the fact will allow you to see where you could have improved the time to restore service. There may have been escalation challenges, vendor engagement delays or it was simply an issue which required a high level of technical expertise. Gathering the resources who worked on the issue to review what could have been improved and what when well will allow your team to ensure that future critical incidents are focusing on the resolution rather than other internal process issues.
Change Reviews
A change has been implemented last week which looks to have been successful. However a few days later it turns out that there were some unforeseen side effects. These issues were corrected but we need to ensure that this change which caused some incidents does not happen again. The first question we ask is what went wrong? It looked as though the change was successful through our testing. The challenge may be that we missed a piece of testing, that another change occurring in the same time frame inadvertently impacted the results of our change. Whatever the case is we need to ensure that we learned something from the failure and ensure that in future it does not happen again. Depending on how you manager your knowledge this finding may have a particular home or format. Whatever the case is the information should be at the very least in the change record itself
Disaster Recovery
This may be more of a test than a review in a sense, but think about your organization. How are they set up to deal with a service impacting disaster? Do you review your IT continuity plans at regular intervals? While it may not always be practical to test all situations and functionalities, this exercise should at least identify gaps or areas of weakness which should be understood and where appropriate corrected. Certainly no one wants to have a disaster occur but being prepared on what to do when it happens would be invaluable. Some of the key items to address are:
  • Initiation – what to do and when to do it
  • What is involved with recovery of services, likely critical ones
  • How the disaster recovery is carried out and managed

How you choose to test your disaster recovery may vary in scope and frequency, but in doing so you will have a level of comfort that if something does happen you will be able to address with confidence how you will keep services operating.
There is really no end to the types of reviews you can complete. As I have mentioned above these are a sample of just a few. The key is sharing the findings with appropriate stakeholders so that the information can be leveraged and reviewed again and again to continue on the road to improvement.

Follow me on Twitter @ryanrogilvie or connect with me on LinkedIn