Monday, 7 December 2015

Blame Free Post Mortems

In Latin, mortem means "death," and post means "after," in other words something that happens after death.

This morose definition lends itself to a less than proactive viewpoint on a critical improvement activity. However, how this is managed will make the difference to improve service through transparency rather than people holding back as a result of fear.

Recently, and part of the reason I am sharing this post, I ran into a friend who was in the middle of his holiday. After exchanging the usual questions and answers he ended up on the topic of work, as people tend to do. He was thrilled that he was off work this week as they were going over a post mortem for a recent outage his company had with one of its key applications.

He continued to tell me that there are very few things dreaded as much as the post mortem. People hate them so much that even during the outage they are already thinking about what actions will get them into hot water during the review. Imagine that?

He said that the post mortem at his company is lovingly referred to as “the blame game”. To add insult to injury he said that the team he is on, infrastructure, typically gets the lion share of the blame sine they aren’t as “inventive” with their explanations for issues and as such are not able to conclusively rule out that they aren’t responsible for the issue to some degree.

This should clearly not be the intention for a post mortem

By its very nature these post mortems should be an exercise in understanding, sharing and learning.

These principals should be applied early on. Involving all parties who were a part of the restoration of the service(s) as well as anyone who have a vested interest in the service(s) should be invited to the meeting to review as well as getting the document that outlines all the findings and outcomes. We need to foster transparency, and as such our culture should allow us to be open enough to be able to see where we can make improvements without worrying about who to blame.

After all the issue happened, and was fixed. That is the hard part, now we need to ensure that we can learn enough from this exercise to ensure we don’t repeat the same mistakes if we can avoid them.

Digging deep into the timeline will allow us to clearly see what actions we took, and why, at intervals throughout the issue. After all we may have experienced many different symptoms which led us to make particular assessments, which at the time seemed appropriate, but afterwards might not have.

Personally I avoid the use of the phrase ‘post mortem’ whenever I can and replace it with incident review. While you are making a step in the right direction to hold these reviews, if you are not fostering a culture of collaboration and transparency, you risk some details being supressed in fear of some form of punitive action.

Key components of a blame free incident review:
  • People involved during the issue
  • What contributing factors came into play during the issue
  • What was the impact of the issue
  • What did we learn as a result of this issue

Keep these activities in mind, take small actions as a result of the discussion you have after the incident and you will be setting yourself up for your team to make improvements on these issues rather than pointing fingers.
 
If you like this article please take a few minutes to share on social media or comment

Follow me on Twitter @ryanrogilvie or connect with me on LinkedIn

 

No comments:

Post a Comment