The term mission critical applies to any activity, system, or equipment whose failure can result in the disruption of an organization’s operations. Depending on the organization involved, the consequences of failure can be very wide-ranging.
At one extreme, the failure of an online vendor’s website can result in a loss of sales. While this could be disastrous to the business concerned, the impact is limited in scope and recovery may not be difficult. Most of us have experienced problems accessing Amazon, Facebook, or Twitter, for instance. While these outages may make the news and can result in a significant financial impact for the business concerned, operations are normally returned promptly and there are few, if any, lasting consequences.
At the other extreme, the failure of control systems in a petrochemical operation could result in injury and loss of life to personnel and the public, as well as harm to the environment from which recovery may be extremely time-consuming, expensive, and difficult. Consider, for example, the Deepwater Horizon accident in 2010. Eleven people lost their lives on the day of the accident, which resulted in the largest oil spill ever in U.S. waters. The negative impact to the environment is still being experienced in the Gulf region of the United States and will be for many years to come. As of 2017, the costs arising from the accident, including financial settlements and fines, exceeds $62 billion.
Mission critical operations can be impacted by a wide variety of factors: hardware or software failures, network communications problems, accidental damage or disruption, or natural disasters. One factor making the news regularly is cyber attack. High-profile incidents have impacted household names such as Sony, Target, eBay, P.F. Chang’s, and Domino’s Pizza. In these cases, confidential information was stolen, resulting in the need for major disaster-recovery activities. On Christmas Day 2014, a group known as the Lizard Squad successfully brought down the Xbox Live and PlayStation networks. As a result, 48 million Xbox Live subscribers and 110 million PlayStation users were unable to access their respective networks, causing major disruptions on one of the year’s biggest days of demand.
In the industrial space, reports indicate a 10-fold increase in the number of successful cyber attacks on infrastructure control systems since 2000. This is partly a consequence of advances in control systems, enabling them to be integrated into the business environment. Although this has proven to be a huge benefit for businesses, allowing better visibility of process information in near real-time, the increased connectivity has exposed new vulnerabilities that can be targeted by attackers. The connection between industrial (or operational technology—OT) and information technology (IT) systems has created problems for both types of systems. For instance, in Germany in December 2014, a steel mill was attacked and the blast furnace suffered major damage. The origin of the attack was the business network, where the attackers were able to navigate to the control system network and disrupt the emergency shutdown systems that were designed to prevent major damage to the plant.
There are many potential cyber attackers, such as hackers seeking to prove their capabilities, criminals seeking access to financial gain, and state-funded operations designed to damage another state’s activities. As a result, mission critical systems must be designed and operated to cope with accidental and deliberate incidents. In addition, the management of such systems requires an enhanced level of diligence, as the nature and source of threats is always changing.
A whole culture of mission critical operations specialists has emerged. These specialists understand the threats, risks, and consequences of failure. Although they may focus on areas such as robust IT network design, control system security, control room operations, and alarm handling, these specialists will normally have a broad understanding of all key aspects of mission critical systems. No other career requires so many different aspects to be brought together in one role.
About This Book
This book is a primer on mission critical operations. The objective of the book is to provide a high-level overview of key concepts. There are many aspects to mission critical operations and each one can be studied in further depth. It is not the author’s intent to repeat the detail already provided in other books, a list is provided for further reading.
Intended Audience
This book is intended for those who need a high-level understanding of the key concepts in mission critical operations, including students on entry-level programs and those beginning their careers.
Critical Infrastructure
Modern society is dependent on the underlying critical infrastructure that provides power, water, waste disposal, transportation, financial services, and emergency services. Mission critical operations are essential to the smooth and continued availability of these services.
Technology plays a major role in modern mission critical operations management. There are two distinct forms of technology that exist in mission critical organizations:
•Information technology (IT) – This includes computing equipment and systems, networking equipment and systems, and associated processes required to manage a typical business. Most mission critical organizations will have an IT function or department that is responsible for the technology and processes.
•Operational (or operations) technology (OT) – This includes the systems, devices, and associated processes that are required to manage physical processes and plants, such as control valves, engines, conveyors, and other machines. In general, OT is the responsibility of an engineering function or department.
Availability, Integrity, and Confidentiality
Management of IT and OT involves many similar aspects, but there are some crucial differences. One fundamental difference is the relative importance of the following factors, as shown in Figure 2-1:
•Availability – Making sure the system or information is there when it is needed.
•Integrity – Making sure the system is operating correctly or that information is complete and not corrupted.
Figure 2-1. Availability, Integrity, and Confidentiality
•Confidentiality – Protecting information from falling into the wrong hands.
All three aspects are important in OT and IT systems, but the relative importance of each varies depending on the type of system.
OT is responsible for monitoring and controlling industrial processes; and failure could have a significant impact on safety, production, and the environment. For OT, the relative order of importance is:
•Availability
•Integrity
•Confidentiality
In other words, OT places priority on the continuous and accurate operations of systems rather than securing confidential information. However, a breach of confidentiality can indirectly lead to loss of availability and integrity. In addition, access to OT systems could provide access to proprietary information, such as details of a novel manufacturing process or product recipe or contents.
For IT, the relative order of importance is:
•Confidentiality
•Integrity
•Availability
In other words, IT plac...