Monday, 31 March 2014

DevOps – Breaking Down the Wall of Confusion

Why does it seem that the volume of requirements from the business is increasing and our ability to deliver is not matching up to expectations? Is there something from an IT perspective that we are doing wrong?

More often than not the needs of our business are changing at a rapid rate. As a result our scalability on the way we can handle these demands has not changed effectively over the same period of time. Taking a closer look at the processes we currently have in place we might categorize them as appropriate to deliver exceptional service. However since the business has changed over time IT has not adapted in the way they manage business requests. The execution of processes is not as streamlined as it could be and as a result produce results which from the business perspective are less than satisfactory. When Development and Operations teams take a closer look at the overall delivery they may be quick to blame the other team for any bottlenecks. This “Us and Them” mentality is part of the problem that may exist.

The DevOps methodology looks to improve upon this. In essence DevOps attempts to enable development and operations teams to improve communication and collaboration with each other for the betterment of service delivery through the breaking down of silos where they may exist.

Typically organizations which are not leveraging DevOps, have challenges with regards to something being developed and deployed in relative isolation. The common complaint here is that once it is ready for production, operations raises issues with a piece of development for whatever reason citing, “this can’t work in a production environment in this manner, this needs to be fixed”. Development then takes this back to task which in turn adds some timeline to the project. This addition of time and rework comes at a price, not only from a financial perspective but our business sees this as another delayed project delivered from IT.  

This is not strictly a development problem or accountability. The Operations teams need to ensure that they are at the table at the beginning to make sure that the requirements they have for operational delivery will be reviewed long before the move to production. They are just as accountable for the timely implementation of this project as well. This is after all an IT solution we are delivering so working together as an IT team is pretty important.

If you do any reading on the subject, DevOps can improve delivery efficiencies through faster releases with fewer errors, however implementing this isn’t as simple as all that. You culturally need to be in a position to manage this effectively. Delivering these releases continuously in an agile way may be something that is easier for a smaller organization rather than an enterprise one. You may have heard the term “Two Pizza Teams” which refers to smaller teams which could share 2 pizzas.

Start to think about what makes sense to your organization both from the perspectives of both your IT teams and business needs. Take away the components of this methodology that work for you and use them to help improve your service delivery. At the very least improving the communication and collaboration within your IT teams will net some positive results. In starting these discussions you may identify the common bottlenecks which impede you from timely delivery of new services. These results might have been obvious to operations and development separately but until they are discussed they are not going to change. Use your findings as a base for future improvements.

Follow me on Twitter @ryanrogilvie

Monday, 24 March 2014

Incident Management, Executive Gargoyles and Communication

A critical business service is unavailable… you, the Incident Manager are coordinating the fix with several IT support resources including applications and infrastructure teams. Two things are immediately obvious:

·         You (IT) need to restore service as quickly as possible.
·         You (The Incident Manager) need to communicate with your business on the progress of the restoration.

Depending on your experiences from this type of scenario you may recall having people from an executive level who like to “get in there” to help out. At times they have the appearance of a gargoyle, complete with frightening face, around the corners of your desk. From your perspective they do not appear to be providing any value or assistance to restoring service.

Why is this happening?
Don’t forget, your IT executives are accountable to your business as well. They are likely responding to a flurry of enquires regarding the nature of the issue and how long it will take to restore service.

There are a number of reasons why they are choosing to take this approach, with the most likely reason a communication breakdown. It’s possible that as a result of other issues, recent or otherwise, that they are lacking the confidence in the support teams ability to communicate current progress effectively. Notice that we are talking about communication of the issue not the actual restoration. As such the executives are regularly being quizzed on the progress of restoration from their superiors and the business. If they are lacking in details they may look uniformed of the situation which has an impact on the comfort level that the situation is under control, even though it may well be.

Take the incident manager for example. Those who have had to coordinate resources for a fix of any kind will refer to the herding of cats, however, trying to get a status update may be like trying to bathe the cats as well. IT support resources don’t like to say “we are still working on it” they would rather have some tangible solution in the works each time they are asked. However this might not always be the case. Incident managers, from time to time, may need to be wordsmiths to a degree. Instead of telling your executive that the issue is “not fixed yet” try saying "we have tried and ruled out 3 possible issues which have happened in the past as part of our troubleshooting. Our plan is to review and attempt the next possibility which has been supplied by our vendor support resources." In other words - not fixed yet.
Within your Incident process you should have a communications plan to manage the business and your leadership with regards to communications. For example for all critical outages you have a communication that goes out to all IT stakeholders (could be managers and senior leaders) indicating what progress you are making or not. This communication may be more technical in nature, while a second communication which is more service specific is delivered to your business. This should be scheduled regularly, perhaps every hour or so.

If you can position yourself in front of the questions from your senior leadership they will be able to do the same for their stakeholders. Having your leadership actively engaging you and your teams shouldn’t be a bad thing, they are likely looking to support you and IT in getting things fixed. This relationship just needs to be managed to some level. Once you can get a consistent process of informing your stakeholders, it should allow your leadership to build confidence on the updates you are providing. After the issue is over ensure you wrap up the incident with a post incident review. Make sure it is complete with findings from the incident whether technical or not to make improvements for the next time an issue should arise.

Keeping people informed is an important part of the incident process, so ensure you keep your communications skills sharpened. 

Feel free to connect with me on Twitter@ryanrogilvie and/or on LinkedIn

If you like these articles please take a few minutes to share on social media or comment


Monday, 17 March 2014

Peeling Back the Reporting Onion

There is a question that gets asked over and over again. “With so much information available, what should I report on?” More often than not we find ourselves reporting on more than we really need to in the absence of knowing what is really required. But the real answer is that we need to report on what makes our business outcomes a reality.

This is why I refer to reporting in terms of an onion. Sometimes peeling it back will make you cry, well maybe only figuratively, but sometimes it might feel this way.

For a moment let’s call the outer layer of the onion the customer experience. This is the service(s) that the business consumes each day. Overall we need to report on our ability to provide an exceptional customer experience.  This might be where some of the crying starts, but this is only because we tend to over complicate things.

Start small
What does the business see in terms of the service that is provided, the key question - is it available? If not, how often are outages occurring and for how long? You are starting to see how the scope starts to increase which is why the reporting gets out of control. one of the most important questions you should ask is whether the service enables the business to achieve its outcomes. Your business has a purpose, whatever it might be, and we need to make sure from an IT perspective that we are enabling them to achieve whatever that goal is.

In the onion analogy there are several layers, so too is the level of reporting available. Assuming that we want to report on service availability we need to know what is making that happen, or when it isn’t what the cause for the downtime is.

Start small... yes again
First IT needs to know what makes your services work in the first place. What are the ‘things’ that make up your service? Some of you might be thinking to yourselves we have a "CMDB". While others may have a less formalized configuration management in place with an understanding that there are several components supported by several teams. Whether you have a formalized system or a spreadsheet or even if it is on cocktail napkins, gather together what you know about the service and start to report on what makes it function. Understand the service

This is where we need to ensure that we have some alignment on the goals to enable your business outcomes. Remember, this is not a finger pointing exercise. In one report we may identify application issues and in the next reporting cycle there may be a network outage. Working together we can ensure that we are putting our best foot forward to provide great service by correcting issues accordingly.

Initially we should be able to identify any gaps which are holding us back. Some of these may present themselves as issues while others may be issues with our ability to report. Either way this information will allow us to make improvements each time we collect data and report our findings.
Overall we may see from our IT layers of reporting that the service is operating consistently but if our customers indicate otherwise we need to investigate where reporting may be inconsistent. For example, we report to our business that the service was up 100% last month; they may challenge that indicating that there were 2 relatively small but impactful outages that may have not been escalated. While there is an expected feeling of embarassment that we missed something, put that aside and welcome this information.

Identifying this inaccuracy allows us to look for a hidden gap in our reporting. Whether it is a challenge in our escalation process, training or if our IT monitoring or reporting is incorrect we can still learn from this and make some improvements which will move us closer to achieving our goals of providing a great customer experience.

Follow me on Twitter @ryanrogilvie or connect with me on LinkedIn


Tuesday, 11 March 2014

Process Automation – Enter the Numbers and Push the Button

I recently heard that a managed process was referred to as a “push button” activity. When I asked my colleague about this she said that the process pretty much handled itself.  I would say that I have more questions on this subject than answers and would start with this question – When can automation take over and are we in a position to have the process managed by itself?

Let’s assume for a moment that we have achieved a high level of capability for a particular process. For an example we will use… change management.

Taking a closer look we can see that our example IT organization has been using this process for almost a decade, the stakeholders who leverage it take part in a semi-annual CSI review to see where it can be streamlined further.  At the latest stakeholder meeting the question was asked “Do we need to have people to facilitate this process in a day to day operational way any longer?”

There are always going to be pro’s and con’s for this discussion point depending which side of the table you ask. The real question is “Can you mitigate the risks that present themselves as a result of the automation activity?” It doesn’t only apply to this example, but in general.

Back at the stakeholder review the subject of who is on the hook for change management comes up as whether or not they think this is even possible from a governance standpoint. Looking at their RACI chart they see that someone is responsible and / or accountable to the process. Can they be either if they do not actively take part in the day to day management of this process?

Take a closer look at the general definitions for responsible and accountable:
     - an obligation to do something, or having control over or care for someone, as part of one's job or role.
     - (of a person, organization, or institution) required or expected to justify actions or decisions

… so could it be possible that a person(s) could be either and not actively “do” anything?

Going back to change management, if we automated the daily activities for changes in a ticketing system to only request approvals if the fields were filled in for example would we need someone to review the documentation within the record itself. This is where the challenges lies with PEOPLE, remember them... one of those "P's". Since not all people are created equal we might find that in the beginning that there were very few issues with automating this. However after a while some people might take the route where less is more and over time there is less documentation in the change record. It can be a bit of a tightrope. We might have to have some level of oversight on what goes in the record, which means we might not truly be able to automate this level of work. At the end of the day we still have someone who is responsible for the output of the change. If it is not successful who will we look to for answers? It is likely that the ones responsible for the process would not see the automation as a way to improve anything with regards to the delivery of services to the customer. Since providing an exception customer experience is the top priority the automation might not be worth pursuing

Follow me on Twitter @ryanrogilvie