Because of this need to ensure service is restored as quickly as possible, many of the support people outside of the actual incident become very hands off in an effort not to slow things down with too many hands working to help. However I started to think about this a bit further…
Just because you are not an incident manager doesn’t mean that you can’t help improve the process.
Think about that for a moment, everyone has some part in improving how an incident can impact your business.Here's just a sample:
Service DeskDepending on your organization setup the service desk analysts may not be managing the incidents themselves however they are the IT that faces the business so what they do during the incidents is important. While they are likely capturing the escalations, this is a good time to also capture some knowledge about the service which is impacted. We may know that a capability is unavailable but does IT truly understand the business impact. Gathering further information from the business will allow us as IT to better understand the impact and improve communications. After the incident details such as these are important in a post mortem so that if we need to adjust our responses we can do so based on the impact the business is reporting.
IT Operations ManagerInfrastructure monitoring is something that is done ‘by operations for operations’ in many organizations as we just haven’t tied it into service management for one reason or another. It would make sense to correlate these alerts into real time incidents, so why isn’t this being done? While doing this would allow us to identify issues before the business sees the impact, in reality many times the alerting mechanism is set up as an afterthought to the incident process. For the Ops team to stream line what this looks like they would be able to weed out the garbage alerts that they currently get and in the process better track what their infrastructure is doing.
IT Application ManagerWe have all been involved in an incident that was escalated to networks because we all know that ‘this must be a networks issue. One of the many challenges for incidents as they apply to application level issues is that the symptoms could point to many things. From an application management perspective having a solid knowledge repository of issues allows the incident manager or even the service desk to ask better questions in the event of an issue. Rather than saying that the users are not able to see module x on the application they would be able to lookup previous issues to see that when an issue with module x arises you need to check the following three items to better determine a cause for the issue. Remember knowledge is power. When we review the incident at the post mortem ensure someone from IT applications team is invited, even if this wasn’t an application issue. They will get a sense of the issue and they may have some better insight to the service we provide as a hole and any potential areas which have weaknesses. Getting input from various angles is important to be able to improve.
Everyone plays a part in incident management, big or small. From dealing with escalations to event management and improving communications. Start to think about you can do, not only to improve your incident management process, but your overall delivery of services.
Follow me on Twitter @ryanrogilvie or connect with me on LinkedIn