Problem Management
eBook - ePub

Problem Management

An implementation guide for the real world

Michael G. Hall

Share book
  1. 190 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Problem Management

An implementation guide for the real world

Michael G. Hall

Book details
Book preview
Table of contents
Citations

About This Book

Problem management is the one IT service management process that tends to return more benefits more quickly than any of the others. This book offers practical, real-world guidance on all aspects of implementing and running an effective problem management function. Offering advice and recommendations tailored to different types of organisations, it gives IT practitioners, consultants and managers the tools to add real value to their businesses.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Problem Management an online PDF/ePUB?
Yes, you can access Problem Management by Michael G. Hall in PDF and/or ePUB format, as well as other popular books in Informatica & Amministrazione di sistemi. We have over one million books available in our catalogue for you to explore.

Information

SECTION 1 – INTRODUCING PROBLEM MANAGEMENT
The title has a deliberate double meaning and the chapters included in this section have two separate but related aims:
  • Firstly, to introduce problem management to you and confirm your understanding of what it is and what it does, so that you can persuasively explain it to others.
  • Secondly, to outline a practical business case for introducing problem management into your organisation or area.
Of course, how you build or modify the business case depends on your particular situation and organisation. For example:
  • There might be a lack of transparency. Customers and management are worried about their risk exposure to incidents that have affected them. They don’t know what is going on, whether problems are being investigated or being fixed, or whether and when an incident will recur, with more impact on their business processes.
  • There might be a concern about the lack of effective problem-solving techniques. Support teams are not equipped with the skills and tools to solve the problems they face; there is a lot of variation in the skill levels of staff; and no well-understood, consistent and repeatable process is in place to manage problems as they arise.
  • The stability of the IT services offered might be poor or even getting worse. If it is improving, it is not doing so quickly enough to satisfy management, customers or the IT service teams themselves.
  • Root cause investigation might be working well; however, it might be challenging to get the fixes implemented that are needed to resolve the causes.
  • Outsourced, vendor-managed or cloud-sourced service delivery models make it difficult to resolve problems.
  • Or all of the above in some combination.
It is important to know from where you are starting. A careful assessment will help you decide what the most important issues are and what needs to be addressed first. This is covered in Chapter 2, as well as in Chapter 5.
1 WHAT IS PROBLEM MANAGEMENT?
So what is problem management and what does it aim to achieve? And how is it different from incident management?
OBJECTIVES
image
This is what ITIL Service Operation (Cabinet Office, 2011, page 97) says problem management is all about:
  • Prevent problems and resulting incidents from happening.
  • Eliminate recurring incidents.
  • Minimise the impact of incidents that cannot be prevented.
In other words, problem management is about improving stability across platforms. There is quite a lot of excellent guidance in the problem management section of the 2011 edition of the ITIL Service Operation book. The chapter is only about 14 pages long and well worth reading in detail. The following short section draws on some of this material, as well as on practical experience and other sources.
SCOPE
ITIL distinguishes between reactive and proactive problem management. This is a debatable distinction because the only real difference between the two types of problem activities is in how the problem is detected. It causes extensive discussion in the problem management community of practice. See Chapter 11 for more on this topic.
Reactive problem management is where the problem management process flows from an incident that has occurred (Figure 1.1 – I am indebted to Chris Finden-Browne, a Distinguished Engineer at IBM UK, who produced the original version of this diagram).
Once service has been restored, at least partially, problem management follows on from incident management to investigate the root cause and implement a fix to prevent the incident happening again.
Proactive problem management is where the emphasis is on identifying and resolving problems before they cause incidents (Figure 1.2).
Figure 1.1 Reactive problem management
image
Figure 1.2 Proactive problem management
image
Proactive problem management uses data such as patterns and trends, monitoring, knowledge, the outputs of other processes and intuition to find potential problems. Chapter 11 provides in-depth coverage of how this analysis can be approached.
Naturally, it is better to find a problem and fix it proactively before it causes an incident; however, this is not always possible, so effective investigation and correction will usually be required after an incident as well.
Once a problem has been identified (‘detected’ in the ITIL terminology), the process is the same for both reactive and proactive problems:
  • Diagnose the real cause of the actual or potential problem. Effective problem management requires methods for finding the true or root cause and showing ‘beyond a reasonable level of doubt’ that it causes the problem and is not just a symptom or an apparent cause.
  • Determine the resolution required.
  • Ensure that the resolution is implemented completely. That is, a fix needs to be developed and applied that can then be tested in some way to prove that it has eliminated the problem.
  • Maintain information, throughout the problem life cycle, in a format that is useful for the future and readily accessible, including:
    • the root cause;
    • any workarounds that were used to recover from further incidents while the fix was being developed and applied;
    • the resolution that was implemented;
    • any lessons learned and so on.
Chapter 6 contains a section that goes into the details of the relationship between incident, problem and knowledge management and how the known error records this information and makes it available. Section 3 goes through the details of how to achieve these objectives.
The debate about proactive versus reactive problem management
Surprisingly, the question of what constitutes reactive and proactive problem management is the subject of much debate, even to the point of asking the question ‘Is there any such thing as proactive problem management?’ The ITIL Service Operation book (Cabinet Office, 2011, pp.97–99) differentiates between the two categories of problems, but it is not very definitive.
The labels themselves are useful, but debate about the meaning of the two terms is not. It does not matter whether a problem directly investigates the cause of an incident or incidents or if it is found in some other way. Therefore, in this book, I take the pragmatic approach as follows:
There is more to problem management than just reacting to incidents as they occur. It is more effective to look at the many other ways that possible problems in the environment can be discovered and try to do something about them before they cause incidents.
PROBLEM MANAGEMENT IS DIFFERENT FROM INCIDENT MANAGEMENT
When introducing problem management, the difference between it and incident management can often be the most difficult concept to get across to staff and management. Getting people to understand that incident and problem management have different objectives and benefits is a critical success factor for any implementation.
Although incident and problem management aim to achieve very different outcomes, the words ‘incident’, ‘problem’, ‘outage’ and similar terms are often used interchangeably by customers, as well as by IT staff. It makes sense to distinguish between an incident – a full or partial failure to a service being used by a customer – and a problem – the underlying cause of an incident, which must be found and eliminated to prevent more occurrences of that incident. ITIL introduced the two terms ‘incident’ and ‘problem’ to make this distinction. The aims of incident management and problem management are different.
  • Incident management has a primary focus on restoring service rapidly to minimise the down time and business impact of incidents. It is justifiably reactive by nature and does not (or should not) focus on underlying causes. Although excellent incident management can reduce business impact by reducing down time, by itself it cannot reduce the number of service interruptions in the long term, because it does not focus on finding causes and eliminating them. The emphasis is very firmly on what is wrong and putting it right, not on why it happened – it is about finding the immediate cause of the service interruption and removing it, so that services start working again. This immediate cause can also be referred to as the indicative, proximate or technical cause. Finding the root cause (why the immediate cause happened) is not the objective, although it is at times required to restore service.
  • Problem management investigates to find the real cause, usually referred to as the root cause. Once the cause is established, problem management then makes sure that the problem is fixed completely, so that it cannot happen again. To reiterate, the aim is to:
    • establish the real reason for either an incident that has occurred, or a risk or situation that has potential to cause an incident; and then to
    • execute a plan to fix the cause permanently.
An example of how the approaches differ is a recent episode where I had a strange printer issue. For an unknown reason, the print dialogue box looked different from usual and printing did not work. The service desk referred the incident to the desktop support team, who solved the incident by deleting all the installed printers from the laptop and adding them back, which forced them to pick up freshly installed printer drivers from the server. This resolved the incident and printing now worked. At this point, it was job done, case closed.
Although it was an example of good incident management, it was an unsatisfying experience from a problem manager’s point of view. Questions such as why it happened, whether it would happen again and if others were affected were all left unanswered.
Incident and problem management are complementary and work closely together to achieve the desired outcome of reducing the number and impact of incidents. Incident management reduces impact by applying a structured and organised approach to restoring service as quickly as possible. The objective is to minimise the business impact of an incident that has actually happened. Problem management reduces the overall number of incidents through finding and fixing problems before they cause future incidents, as well as stopping repeat incidents by fixing problems after they have triggered incidents.
In practice, incident management should hand off smoothly to problem management at some point. Unfortunately, it is common for the line between the two to blur – it is not always clear when incident management stops and problem management starts. In fact, the relationship between incident management and problem management is one of the most important to discuss, make clear and formalise as part of the problem management implementation project. The best organisations put a lot of effort into getting this right and I discuss how best to achieve it in Chapter 11.
Now that we have clarified what problem management is about, let us move on to look at some of the factors that are critical to a successful implementation.
2 FACTORS FOR SUCCESS
Personal experiaence has shown that several factors make the difference between a good implementation and a less successful one. I want to highlight:
  • some common challenges every implementation faces;
  • winning the support of management;
  • the importance of training;
  • stakeholder management and communications; and
  • the decisions and agreements that need to be made before you can propose the implementation of problem management.
CHALLENGES
Starting point
It might sound obvious to say this, but it is important to know where you are starting from, so that you know how big a change implementing problem management is going to be. As mentioned in the introduction to this section, different organisations will have different priorities, attitudes and ‘maturity’. Comparing perception with reality can be revealing, particularly if there are measurements available to define the current state. An obvious measurement is the number of incidents and their impact, duration and whether the numbers are constant or trending up or down. What does this measure say about stability and the quality of the current IT service? That is, is the situation getting better or worse, or staying the same? What about the number of recurring incidents (incidents that have the same or similar underlying causes, or are perceived to be the same or of the same type)? In addition, are there any unresolved problems in the environment (perceived or real), and if so, how many?
Assessing the ...

Table of contents