Monday, 8 July 2013

Event Management – Often Used yet Overlooked

Given the organizational need for technology it would be a safe assessment that infrastructure and applications are monitored in some capacity for perfomance. So why is it that this piece seems to be “off to the side” when we are discussing Service Management.

Since Event management is part of Service Operations one would think that in a journey to provide exceptional service this process would have the same level of oversight that lets say, Incident management does. There should be a consideration on providing service to our customers from an "end to end" approach. This would include proactive monitoring to reduce the impact and volume of incidents

In short, Event management, is monitoring your infrastructure or applications to notify you if something “bad” is happening or about to happen. It also provides you the opportunity to understand what 'normal' looks like and be able to generate a baseline to measure against.

One of the primary challenges you will face is that because it is an activity that is done in the background it may not get the visibility it deserves. Depending on the organization we may also see that monitoring may only be utilized by infrastructure teams to look for simple 'up and down' alerts. 

Whether you are at the early stages of monitoring or an advance stages you should be leveraging these alerts to actually improve the delivery of services. These alerts are only going to be as good as the improvements we allow them to make through improvement initiatives.
Here is an example:

Application X has the following infrastructure
What do we know about Application X?
John E User uses the application Monday to Friday from 10am to 4pm

Infrastructure Team
  • There are two application servers with a load balancer
  • Infrastructure teams monitor this 24x7 and a threshold is in place to show that the application server is strictly up or down
  • Server OGILP02 has had some issues lately and has only had an uptime of 75% on average
  • The plan is to replace it but since there are is another taking the load the rush to beg for money isn’t quite there.

Application Team
  • The Application support team also monitors the devices from an application level
  • Server OGILP01 has shown some issues where when the load exceeds 80% some users experience performance slowness

None of these issues have been tracked in the form of an Incident or a Problem.

From the perspective of the IT department, the service is up and running and is looking pretty solid. From the business perspective, aside from some intermittent slowness the service seems pretty stable. For the most part life is good for John E User and the people at AnyCorp but what they don’t realize is that there is something bad about to happen.

Monday morning John rolls in to find that he cannot launch Application X. He calls his Service Desk who assures him that they will take care of this. They escalate to the Infrastructure team who indicate that they had an alert last night indicating once again OGILP02 had fallen over and required a manual restart. The Service Analyst, who is sharp as a tack, also calls the Application Manager. Their discussion outlined that the load on the application itself has reached critical mass and the application running on one server on a Monday morning is unusable. Both the Infrastructure and Applications teams independantly fix their issues everything seems good…. or is it.

What did we learn?

While the monitoring did tell us everything we need to know in basic terms we did not track it anywhere. By integrating these events within Service Operation processes we could:
  • Address Availability concerns by quantifying any known issues
  • Establish any capacity targets
  • Allow us to investigate root cause for OGILP01
  • Giving us solid statistics to raise capital for a more robust environment.  3 servers would allow us to provide high availability etc.

The challenge, as always, is to market to our teams why we need to make sure that all activities are tied together from incidents to problems to changes. This doies not always mean that our service management tools are limiting us. In some cases we do not have governanace over the process or have a lack of communication.

While there may be a little work in the beginning you will save that effort later.

It boils down to proactively solving issues before they are apparent to the business (which is why you have the monitoring in the first place oddly enough). Implementing this may require small steps, but there successes will enable you to show other teams that Event Managements has significant benefits.

Follow me on Twitter @ryanrogilvie or connect with me on LinkedIn

If you like these articles please take a few minutes to share on social media or comment


  1. Get the best exhibition stand in Dubai, don’t worry for arrangement as Dubai best contractors are here to place stands which can affect your product sales.
    Exhibition Stand Dubai

  2. Finally this the best service to event management, I like this
    Event Management Dubai 

  3. Fit Out Companies are now facilitating you with best interior design services so you can boost up your products.
    Fit Out Companies Dubai 

  4. Dubai best contractors are here to place stands which can affect your product sales.
    Exhibition Stand Dubai