System and Task Analysis
Much has been said in safety literature about the effects of human error. Causes of accidents involving human error of some type range from an estimated 60 to 90 percent. However, root causes of these errors can often be traced to fundamental problems in the design of systems, processes and tasks related to the human activities in which these errors occurred. Sound safety planning, including hazard identification, risk management and safety assurance, must be based on a thorough understanding of the processes and activities of people in the system, and the other components of the systems and environments in which they work. Consequently, the world of aviation safety has accepted the practice of system safety and looked to the concept of safety management systems.
The foundational building blocks of a safety management system (SMS) are referred to as the four components, or the four pillars, of safety management. These pillars are policy, safety risk management, safety assurance, and safety promotion (FAA, 2006a; ICAO, 2009). The processes of safety risk management (SRM) and safety assurance are in turn built upon system and task analysis. Systems analysis, within the context of SMS, is a process for describing and evaluating organizational systems, and evaluating risk and compliance with safety regulations at the system level. Task analysis is a methodology that supports system analysis by documenting the activities and workplace conditions of a specific task to include people, hardware, software and environment. There are a number of objectives of system and task analysis within SMS, including:
• Initial design of processes (describing workflows, attributes, etc.)
• Task and procedure development (including documentation, job aids, etc.)
• Hazard identification (what if things go wrong?)
• Training development (assessment, design, development, implementation and evaluation)
• Shaping safety assurance processes (what needs to be monitored, measured and evaluated?)
• Performance assessment (how are we doing?)
The system and task analysis should completely explain the interactions among the hardware, software, people and environment that make up the system in sufficient detail to identify hazards and perform risk analyses. Systems and task analysis (including design) are the primary means of proactively identifying and addressing potential problems before the system or process goes into operation. Such problems consist of hazards that are conditions and not outcomes. We cannot
directly manage errors that occur, but we can manage the conditions that caused them. Oftentimes, hazard identification is viewed as a process that starts more or less spontaneously. For example, employees report conditions that they encounter, audits provide findings of new hazards observed, investigations identify failed processes and uncontrolled environmental conditions, etc. While each of these is, of course, a legitimate source of hazard information, many hazards are built in
to the system. Many more of the hazards are the result of system, process and procedural designs, or the cumulative effects of individually small, but collectively significant changes that have occurred incrementally in the systems or their operational environments. Thus, it pays to look carefully at the factors that can affect human and equipment performance and take proactive steps to avoid problems that can be identified early.
System and task analyses should completely explain the interactions among the hardware, software, people and environment that make up the system in sufficient detail to identify hazards and perform risk analyses (FAA, 2006a). Analysis, as opposed to description, is a process aimed at identifying system problems, performance difficulties and sources of error (Annett, 2003). Additionally, and perhaps most importantly, system and task analysis provides the opportunity to gain a fundamental understanding of the organization’s systems, processes and operational environment in which it operates. This understanding allows for meaningful compliance as opposed to perfunctory compliance. Meaningful compliance is achieved when the organization applies regulations in the context of their operations in a way that accomplishes the safety intent of the regulations. Perfunctory compliance is basically a check the block mentality.
It is important to emphasize at this point that the decisions of management which occur during the design phase regarding compliance and risk management both reflect and shape the organizational culture and organizational behavior. The authors’ past experiences have shown that when an organization implements and aggressively uses SMS practices such as employee reporting systems, risk management procedures and safety assurance processes early in the organizational lifecycle, we can expect that there will be success in fostering pro-safety attitudes and behaviors in its employees. In the operational arena, we have seen many safety programs comprised of essentially reporting systems that fail to take advantage of system and task analysis. This leads to treating problems in a form of isolation, as each report, finding, occurrence, etc., gets treated to some type of closure, independent of other related systems or operating conditions (see also Sparrow, 2000, 2008). It is safe to say that a truly predictive process, or a robust management-of-change process, cannot reach fruition without comprehensive system and task analyses.
The International Civil Aviation Organization (ICAO) also emphasizes the importance of SRM and safety assurance. ICAO makes clear that the two core operational processes of an SMS are safety risk management and safety assurance (ICAO, 2009). ICAO goes on to say that SRM is a generic term that encompasses the assessment and mitigation of the safety risks of the consequences of hazards
that threaten the capabilities of an organization, to a level as low as reasonably practicable.
The Federal Aviation Administration (FAA) states that SRM is a formal process within SMS that describes an organization’s systems, identifies hazards, analyzes risks and assesses and controls those risks. The FAA also cautions that the SRM process should be embedded in the organization’s operational processes that are used to provide product and services, and not a separate or distinct process (FAA, 2009a). As such, SRM becomes an activity that not only supports the management of safety, but also contributes to other related organizational processes. In essence, the results of the SRM process, beginning with a sound system task analysis, become part of the system rather than an add on or an after-the-fact corrective process. Risk management isn’t something you do – it’s the way you do it that makes the process function properly and safely. It is important to remember that SRM should be part of the entire management domain of the organization. It cannot be just relegated to a safety officer or safety department. It has to be part of the way that the management, and especially senior management, runs the entire business enterprise.
Safety assurance is a formal management process within an SMS that systematically provides confidence that an organization’s products or services meet or exceed safety requirements (FAA, 2009b). ICAO (2009) reminds us that safety assurance must be considered as a continuous, ongoing activity. It should be aimed at ensuring that the initial identification of hazards and assessments of the consequences of safety risks, and the defenses that are used as a means of control, remain valid and applicable over time.
As stated earlier, SRM and safety assurance rely on system and task analysis. We have observed that these processes are vastly misunderstood by a great number of operators. We have seen many operators shy away from these processes because they were unfamiliar with them and they, frankly, were scared to delve into what they believed to be a very complex and purely academic area, or an endeavor that requires additional specialist personnel, or processes that are beyond their means. In reality, system/task analysis does not have to be that complicated. While there are some sophisticated software tools and complex methods available, simple lists and descriptions can often suffice. In fact, simple flow charts that we will explore later can pave the way.
One of the primary purposes of system and task analysis is to reach a complete and predictive state within an organization’s SMS. This is the type of risk management in which the organization considers activities that are in the future. The organization reviews changes in systems and processes, revisits its mission, evaluates business practices and assesses the operational environment. It then becomes the basis for proactive safety management, where the organization considers its current practices and foresees contingencies – the ‘what if…’ questions.
System and task analysis are at the forefront of an effective risk management system (see Figure 1.1
). Alternately, system and task analysis are termed as ‘system
analysis’ in the systems engineering world, ‘systems description’ in much of the industrial system safety literature, and ‘establishing the context’ in the AS/NZS 4360 standard (Standards Australia, 2004). ISO 31000 risk management standards are concepts by which organizations can gain a fundamental understanding of their respective systems that are essential to hazard identification, risk analysis and assessment and risk control. The relationship between system/task analysis and SRM is depicted in Figure 1.1
. It reflects that some degree of variability can be expected in normal
human performance. In addition, workplace conditions can also affect this variability, as well as introducing triggers
that activate human error (see Dismukes, Berman and Loukopoulos, 2007; Hollnagel, 2004).
As this chapter progresses, we will explore recommended methodologies to conduct both systems analysis and task analysis. Before we delve too deeply into actual analysis methods, it is important to provide a brief review of systems, and of the process approach to system analysis as it relates to SMS.
Figure 1.1 Safety risk management and task analysis
Roland and Moriarty (1990) provide an excellent definition of system in their text on safety engineering. They define a system as ‘a composite of people, procedures and equipment that are integrated to perform a specific operational task of function within a specific environment’ (p. 6). Notice particularly the terms integrated, function and environment. Every system is designed to perform a specific mission, or produce a product or service; consequently, it is the responsibility of management (the owners of the system) to define the goals for the mission, product, or service. These goals represent their expectations for the outcomes of their investment. In order to meet these expectations, an understanding of the system’s environment and the processes that make up the system becomes critical. A common error is to only define a system in terms of physical assets (for example, aircraft, computers, facilities), and to refer to organizational components as organizational factors. In the context of a safety management system, the organization is the system and these assets are resources used to accomplish the organization’s mission.
Systems include the processes which consist of activities people do within the systems. Systems also include the tools, facilities, procedures and equipment that are used to perform the processes. In addition, systems have a defined mission, purpose, or objective. This is where the concepts of systems and processes converge. A system has a mission or objective – to produce a product or service (outcome). A process has outputs which are the results of the efforts put forth by the people in the system.
Processes are the essential operations that a system must perform to achieve its outcomes. ISO 9000, the international quality management system standard, defines a process as an ‘interrelated or interacting set of activities which transforms inputs into outputs’. In every case, inputs are turned into outputs because some kind of activity is completed. Note the active nature of this definition. It does not define the product or service, but rather the things that people in the system do to produce them. A process is people doing things rather than things people have done. The products or services are the outcomes, what the system owners want out of it, while the processes are the activities that get them there (ISO, 2000).
There is a subtle but important difference between outputs and outcomes. Process outputs should be defined such that they can be measured. Practitioners need to avoid defining outputs without considering process measurements. Additionally, outputs should be defined in a manner to support the desired outcomes. Activities are often defined in terms of outputs – flights made, maintenance procedures accomplished, audits conducted, etc., but the goal of meeting the system’s desired outcome has been neglected. It’s easy, though, to define things that are easily countable and neglect or even ignore outcomes. For example, a common measure in practice is timeliness. While this is a desirable goal in most activities and is usually easy to measure, it does not matter how timely we produce an inferior result. In safety, it does not matter how many audits we conduct, how many reports we process, etc., if we aren’t achieving the outcome of enhanced safety.
In some cases, outcomes are difficult to measure directly. This is often (if not always) the case in safety where the desired outcome (absence of accidents) can be intangible and its inverse events (accidents, incidents, etc.) are rare. Process measures are often employed in these cases. Process measures place emphasis on performance shaping and process supporting factors such as equipment, procedures, personnel selection, training, qualification and supervision and facilities. The theory being that, if these process factors are well managed, confidence in the probability of successful outcomes is enhanced (readers are referred to ISO 9001–2000 sub-clause 7.5.2 for an example standard for process verification and validation).
The Process Approach
The process approach is basically a systems approach in that the process owner is identified, procedures are evaluated, outputs to the next process are verified, controls to insure desired output are confirmed and finally, performance measures are reviewed to ensure consistent results. This approach accepts that different types of processes take on different characteristics and have different expectations. For the purpose of our analyses, there are three basic types of processes—operational, management and resource allocation processes.
• Operational processes do the real work of the system. Inputs are transformed into outputs in the form of products or services. These may be direct customer services such as providing transportation, manufacturing or repairing aircraft or aircraft components, directing air traffic, etc. They may also be internal processes such as providing employee training.
• Management processes serve as controls on operational processes. This may take the long-term form of such activities as establishing policies or authoring and disseminating policy and procedural documents. It may also take the form of more real-time activities such as direct supervision and management of employees’ activities.
• Resource allocation processes concern procurement and allocation of personnel, facilities, equipment, tools and other resources that are used in operational processes.
Process attributes are important factors to consider in system analysis. These attributes are the same attributes as described in the FAA’s Air Transportation Oversight System (ATOS), however, we use a s...