It turned out that CAB had a better attendance, due to availability, on Monday vs Wednesday which was when they used to have the CAB in the past. This got me thinking. What does the cycle look like from the implementation of a change, into an incident in some unfortunate situations, and into a resolution of those incidents through another change (or roll back).
Looking at this example closer we have a CAB on Mondays which allows changes to be implemented (for good or bad) as early as 24 hours later, in this case Tuesday. When issues arise from these changes we are starting to see them on Wednesday which correlates to the influx of incidents. As a result, depending on the circumstances, some changes are reverted to a pre-change state while others require an emergency change to be implemented to correct these issues.
To me the issue seems obvious. However I often find that the obvious is only seen by those who are looking at it with fresh eyes. Those who are waist deep in the situation may not see it as clearly as someone external or if they can see the issue cannot visualize the solution as clearly. When I asked my friend about it they said, “I think we have always had incidents as a result of change but since the CAB date change we don’t get as many of them on Mondays like we used to, so it seems as though it has improved.” They continued to say that in actuality the number of incidents now is just more balanced and this is why the issue isn’t pressed as hard from an IT perspective.
Wait a second… I thought that they were putting me on for a second but they weren’t. I had to ask about the fact that they were simply moving the issues from one day to another without addressing the main issue – the fact that changes were failing. Despite the fact that the issue isn’t that “pressing” the end result is the delivery of services. Iuf you were to ask them about the issues the customer might suggest that the service has not improved one bit over the course of time, and they would be correct.
As you can see in the picture below, from a customer experience perspective the issues are still present despite what day the changes are implemented. If you were only to shuffle the days at the bottom axis around the humps which represent changes and incidents remain.
“Why does no one see that this is an issue?” I asked.
The response was that they report on service management metrics (incident and change) separately and do not connect the dots between the two.
That has to change, I insisted. As a start I would look at the following:
1. The number of changes which cause incidents and/or are rolled back. In the case where changes are not updated to reflect any issues and we see a high rate of implementation success (and if the two teams are not working collaboratively this will be likely) we need to see how many emergency changes we are creating and why. Even in a state where change is measured in a silo we should see that a change caused an emergency change.
2. We may need to take a look at the timeframe where something is reviewed in CAB and then implemented in production. It might be possible that the changes which are not successful are also the ones which are being implemented 24 hours after CAB. Knowing which changes are impacting the business in a negative way will allow us as an IT organization to better assess what is not working as well as it could be an build a strategy to improve it.
Whatever it might be we want to ensure that the customer is getting a good experience and that we are not reviewing the provision of that service as a unit through our metrics by challenging the inputs and outputs for each process as it pertains to delivering service. Getting these teams together regularly with key stakeholders will allow some visibility on the areas which need improvement to better provide a solid customer experience.
Follow me on Twitter @ryanrogilvie or connect with me on LinkedIn