1
Introduction
The ValuJet accident continues to raise troubling questionsâno longer about what happened but about why it happened, and what is to keep something similar from happening in the future. As these questions lead into the complicated and human core of flight safety, they become increasingly difficult to answer.
Langewiesche, 1998
The Atlantic Monthly
Introduction
âTo err is humanâ it is said, and people make mistakesâit is part of the human condition. When people err, they may be embarrassed or angry with themselves, but most often the errors are minor and little attention is paid to the consequences. However, sometimes the errors lead to more serious consequences. Occasionally, people working in hospitals, airlines, power stations, chemical refineries, or similar settings commit errorsâerrors that may cause accidents with catastrophic consequences, potentially leading to injury or death to those who played no part in the error.
Such work settings, known as âcomplex systemsâ (Perrow, 1999), generate electricity, refine crude oil, manage air traffic, transport products and people, and treat the sick, to name a few. They have brought substantial benefits to our way of life, and permitted a standard of living to which many have become accustomed, but when someone who works in these systems makes an error, the consequences may be severe. Although companies and their regulators typically establish extensive performance standards to prevent errors, these errors, which in other environments may be inconsequential, can, in these settings, result in severe consequences.
A new catastrophe seems to occur somewhere in the world with regularity, often one that is later attributed to someone doing something wrong. Whether it is an airplane accident, a train derailment, a tanker grounding, or any of the myriad events that seem to occur with regularity, the tendency of often simple errors to wreak havoc continues. Despite the progress made, systems have not yet been developed that are immune to the errors of those who operate them. The human genetic structure has been mapped, the Internet developed, and cell phones designed with more computing power than most computers had but a few short years ago, but human error has not yet been eliminated from complex systems.
However, while error has not been eliminated, our understanding of the causes of errors has increased. Particularly in complex systems where there is little tolerance for errors, regulators, system designers, and operators have developed and implemented techniques that anticipate and address potential opportunities for error and it is hoped, prevent errors from being committed that can jeopardize system safety.
The Crash of ValuJet Flight 592
To illustrate how even simple errors can lead to a catastrophic accident, let us look at an event in one of our safest complex systemsâcommercial air transportation. Despite numerous measures that had been developed to prevent the very types of errors that occurred, several people, including some who were not even involved in the conduct of the accident flight, committed critical errors that led to an accident.
On May 11, 1996, just minutes after it had taken off from nearby Miami, Florida, a McDonnell Douglas DC-9 crashed into the Florida Everglades (National Transportation Safety Board, 1997). Investigators determined that the cause of the accident was relatively simple and straightforward; an intense fire broke out in the airplaneâs cargo compartment and within minutes burned through the compartment into the cabin, quickly spreading through the cabin. The pilots were unable to land before the fire degraded the airplaneâs structural integrity. All onboard were killed in the accident (Figure 1.1).
The investigation led to considerable worldwide media attention. As with any large-scale event involving a substantial loss of life, this was understandable. But other factors played a part as well. The airline had been operating for less than 3 years, and it had employed what were then nontraditional airline practices. It had expanded rapidly, and in the months before the accident experienced two nonfatal accidents. After this accident, many criticized the airline, questioning its management practices and its safety record. Government officials initially defended the airlineâs practices, but then reversed themselves. Just over a month after the accident, government regulators, citing deficiencies in the airlineâs operations, forced it to suspend operations until it could satisfy their demands for reform. This led to even more media attention.
FIGURE 1.1
The ValuJet accident site in the Florida Everglades. (Courtesy of the National Transportation Safety Board, 1997.)
As details about the crash emerged and more was learned, the scope of the tragedy increased. Minutes after takeoff, the pilots had declared an emergency, describing smoke in the cockpit. Within days investigators learned that despite strict prohibitions, canisters of chemical oxygen generators had been loaded onto the aircraft. It was believed that the canisters, the report of smoke in the cockpit, and the accident were related.
Oxygen generators provide oxygen to airline passengers in the event of a cabin depressurization and are therefore designed to be safely transported in aircraft, provided the canisters are properly installed within protective housings. However, if the canisters are not packaged properly, or are shipped without locks to prevent initiation of oxygen generation, they could inadvertently generate oxygen. The process creates heat as a by-product, bringing the surface temperature of the canisters to as high as 500°F (260°C).
Investigators believed that boxes of canisters that lacked locks or other protection were placed loosely in boxes and loaded into the airplaneâs cargo hold underneath the cabin. After being jostled during takeoff and climb out, the canisters began generating oxygen. The canister surfaces became heated to the point that adjacent material in the cargo compartment was ignited and a fire began. The canisters then fed the fire with pure oxygen, producing one of extraordinary intensity that quickly penetrated the fire resistant material lining the cargo hold, material that had not been designed to protect against an oxygen-fed fire. The fire burned through the cabin floor and, with the pure oxygen continuing to feed it, grew to the point where the structure weakened and the airplane become uncontrollable. It crashed into the Everglades, a body of shallow water, becoming submerged under its soft silt floor (Figure 1.2).
FIGURE 1.2
Unexpended, unburned chemical oxygen generator, locking cap in place, but open. (Courtesy of the National Transportation Safety Board, 1997.)
Because of the potential danger that unprotected oxygen generators pose, they are considered hazardous and airlines are prohibited from loading unexpended and unprotected canisters of oxygen generators onto aircraft. Yet, after the accident, it was clear that someone had placed the canisters on the airplane. As a result, a major focus of the investigation emerged to determine how and why the canisters were loaded onto the airplane.
Investigators learned that no single error led to loading the canisters onto the aircraft. To the contrary, about 2 months before the accident, several individuals committed relatively insignificant errors, in a particular sequence. Each error, in itself, was seemingly minorâthe type that people may commit when rushed, for example. Rarely do these errors cause catastrophic consequences. However in this accident, despite government-approved standards and procedures designed and implemented to prevent them, people still committed critical errors that resulted in a maintenance technician shipping three boxes of unexpended oxygen generators on the accident airplane.
Although the errors may have appeared insignificant, a complex system such as commercial aviation has little room for even insignificant errors. Investigators seeking to identify the errors to determine their role in the cause of the accident faced multiple challenges. Many specialists had to methodically gather and examine a vast amount of information, then analyze it to identify the critical errors, the persons who committed them, and the context in which the errors occurred.
It took substantial effort to understand the nature of the errors that led to this accident, and investigators succeeded in learning how the errors were committed. The benefits of their activities were as substantial. By meticulously collecting and analyzing the necessary data, investigators were able to learn what happened and whyâinformation that managers and regulators then applied to system operations to make them safer. Many learned lessons from this accident, and they applied what they learned to their own operations. While the tragedy of the accident cannot be diminished, it made the aviation industry a safer one; it has not witnessed a similar type of accident. This is the hope that guides error investigations, that circumstances similar to the event being investigated will not recur and that those facing the same circumstances will not repeat the errors made earlier.
Investigating Error
Today, in many industrialized countries, government agencies or commissions generally investigate major incidents and accidents. Some countries have established agencies that are dedicated to that purpose. For example, the National Transportation Safety Board in the United States, the Transportation Safety Board of Canada, and the Australian Transport Safety Bureau, investigate incidents and accidents across transportation modes in their respective countries. In other countries, government agencies investigate accidents in selected transportation modes, such as the Air Accidents Investigation Branch of Great Britain and the BEA (Bureau dâEnquĂȘtes et dâAnalyses pour la sĂ©curitĂ© de lâaviation civile) of France, which investigate commercial aviation accidents and incidents.
However, when relatively minor accidents or incidents occur, organizations with little, if any, experience may need to conduct the investigations themselves. Without the proper understanding, those investigating error may apply investigative procedures incorrectly or fail to recognize how the error came about. Although researchers have extensively examined error (e.g., Reason, 1990, 1997; Woods, Johannesen, Cook, and Sarter, 1994), there is little available to guide those wishing to investigate error. Despite the many accidents and incidents that are caused by operator error, it appears that few know a formal process to investigate errors or how to apply such a method during the course of an investigation.
This book presents a method of investigating errors believed to have led to an accident or incident. It can be applied to error investigations in any complex system, although most of the examples presented are aviation related. This primarily reflects the long tradition and experience of agencies that investigate aviation accidents, and the authorâs experience participating in such investigations. Please consider the examples presented as tools to illustrate points made in the book and not as reflections on the susceptibility of any one system or transportation mode to incidents or accidents. Neither the nature of the errors nor the process of investigating errors differs substantially among systems.
This book is designed for practitioners and investigators, as well as for students of error. It is intended to serve as a roadmap to those with little or no experience in human factors or in conducting error investigations. Though formal training in human factors, psychology, or ergonomics, or experience in formal investigative methodology is helpful, it is not required. The ability to understand and effectively apply an investigative discipline to the process is as important as formal training and experience.
Chapters begin with reviews of the literature and, where appropriate, follow with explicit techniques on documenting data specific to the discussion in that chapter. Most chapters also end with âhelpful techniques,â designed to serve as quick investigative references.
Outline of the Book
The book is divided into five sections, each addressing a different aspect of error in complex systems. Section I defines concepts that are basic to the book, errors and complex systems, Section II focuses on types of antecedents to error, Section III describes data sources and analysis techniques, Section IV discusses three contemporary issues in human error, and Section V reviews an accident in detail and presents thoughts on selected issues important to error investigations.
Chapter 2 defines error in complex systems and introduces such critical concepts as operator, incident, accident, and investigation. Contemporary error theories are discussed, with particular attention devoted to Perrowâs description of system accidents (1999) and Moray (2000) and Reasonâs (1990, 1997), models of error in complex systems. Changes in views of error over the years are discussed.
Chapter 3 discusses the analysis of data obtained in a human error investigation. Diffe...