PART I
SETTING THE STAGE
CHAPTER 1
ADVANCING ORGANIZATIONAL RELIABILITY
Karlene H. Roberts
THE FIELD OF HIGH RELIABILITY ORGANIZATIONS (HRO) research is now over thirty years old. This chapter discusses original reasons for delineating this area of research and the nature of the early research. I go on to indicate the reasons this book is needed at this time and conclude with a brief description of each chapter.
IN THE BEGINNING
HRO research began in the mid-1980s at the University of California, Berkeley. Berkeley researchers Todd La Porte, Karlene Roberts, and Gene Rochlin were soon joined by Karl Weick, then at the University of Texas. We were interested in the growing number of catastrophes that appeared to us to have been partially or wholly caused by organizational processes, including the 1981 Hyatt Regency walkway collapse; the 1994 South Canyon Fire, about which Weick wrote (1995, 1996); and the 1984 Union Carbide disaster in Bhopal, India. Unfortunately, the list goes on and on.1 Some time ago we were asked to write reflections of the early work. The first section of this chapter draws heavily on those reflections.
As Weick (2012) points out:
Prominent ideas were available to analyze evocative events such as the preceding [Hyatt Regency walkway collapse, etc.]. [Charles] Perrow (1984) had proposed that increasingly tightly complex systems fostered accidents as normal occurrence, a proposal that encouraged counterpoint. Barry Turner (1978) had sketched the outlines of organizational culture, the incubation of small failures (later to be conceptualized as ânormalizationâ) and the organizational blind spots. âAs a caricature it could be said that organizations achieve a minimal level of coordination by persuading their decision makers to agree they will all neglect the same kinds of considerations when they make decisions (p. 166).â Not long thereafter, Linda Smircich (1983) in a definitive [Administrative Science Quarterly] article gave legitimacy to the notion of organizational culture. Trial and error learning was a basic assumption which meant that, the possibility that groups in which the first error was the last trial provoked interest and a search for explanations. . . . My point is that these ideas, and others not mentioned, were available to make sense of an emerging set of organizations that were complex systems, under time pressure, conducting risky operations, with different authority structures for normal, high tempo, and emergency times, and yet in the best cases were nearly error free. (p. 2)
At the beginning of our research, we were introduced to three technologically sophisticated, complex subunits of organizations that were charged with doing their tasks without major incident: the US Navyâs Nimitz-class aircraft carriers, the US Federal Aviation Administrationâs [FAA] air traffic control [ATC] system, and Pacific Gas and Electric Companyâs [PG&E] Diablo Canyon nuclear power plant. To us these organizations appeared to engage in different processes, or seemed to bundle the same processes differently, than the average organization studied in organizational research. We kicked off the research with a one-day workshop held on the aircraft carrier USS Carl Vinson, which was also attended by members from the other two HRO organizations we planned to study. One outcome of the workshop was that managers in all three organizations felt they faced the same sophisticated challenges.
IMPORTANT CHARACTERISTICS OF THE INITIAL WORK
THE HRO project did not start by looking at failures but rather at the manner in which organizations with a disposition to fail had not. It became readily apparent that HROs do not maintain reliability purely by mechanistic control or by redundancy or by âgee whizâ technology. They work into the fabric of these mechanistic concerns a mindset and culture that makes everyone mindful of their surroundings, how they are unfolding, and what they may be missing. High reliability organizing deploys limited conceptions to unlimited interdependencies. These organizations are set apart from other organizations because they handle complexity with self-consciousness, humility, explicitness, and an awareness that simplification inherently produces misrepresentations.
The initial high reliability project focused on current functioning because we researchers knew little of past practices and operations. In all cases it took many years to reach the level of performance observed by the researchers. For example, the Air Commerce Act was signed into law in the United States in 1926. It contained provisions for certifying aircraft, licensing pilots, and so on. By the mid-1930s there was a growing awareness that something needed to be done to improve air travel safety. At the same time, the federal government encouraged airlines to embed control centers in five US airports. Maps, blackboards, and boat-shaped weights were used to track air traffic. Ground personnel had no direct radio link with aircraft, and ATC centers contacted each other by phone.
Technological changes have vastly altered what high reliability functioning looks like today. Not long after the emergence of ATC centers, semi-automated systems were developed based on the marriage of radar and computer technology. In 2004 the US Department of Transportation announced plans to develop a ânext genâ plan to guide air traffic from 2025 and thereafter. This plan will take advantage of the growing number of onboard technologies for precision guidance (Federal Aviation Administration, 2015).
Because the researchers had no experience with the histories of these organizations, many existing organizational processes that may no longer serve a purpose were probably not uncovered. An apocryphal story is told about the US Army on the eve of World War II. A senior officer was reviewing an artillery crew in training. The officer noticed that each time the gun fired, one of the firing team members stood off to the side with his arm extended straight out and his fist clenched. The inspecting officer asked the purpose of this procedure, but no one seemed to know. Sometime later a World War I veteran reviewed the gun drill and said, âWell, of course, heâs holding the horses.â An obsolete part of the drill was still in use (Brunvand, 2000), as is probably true in HROs.
The units under study in the original research were subunits of larger organizations and not necessarily representative of the organization as a whole. Flight operations, for example, are essential to the missions of an aircraft carrier but not its entire menu of complex tasks, which include navigation, weapons handling, supply, housing and feeding six thousand people, and so on (Rochlin, 2012).2 The carrier is central to the task force and is an important part of the navy, which is part of the US Department of Defense. It was beyond the scope of the project to determine the contribution to the nested series by safer or, rather, more reliable operation of the suborganizations. By studying subunits the team may have created an error of the third kind (Mitroff & Featheringham, 1974)âthat is, solving the wrong problem precisely by only examining part of the problem. Paul Schulman and Emory Roe attempt to rectify this problem in Chapter 9.
More research is needed to explore how organizations or units of organizations that can fail disastrously are linked to other organizations or units of organizations. Specifically, more attention needs to be given to organizations that help other organizations fail or fail alongside them (Roberts, 2012). The failure of BP and its semisubmersible deepwater drilling rig Deepwater Horizon is a good example. As reported, on April 20, 2010, the Macondo well blew up; this accident cost the lives of eleven men and began an environmental catastrophe when the drilling rig sank and over four million barrels of crude oil spilled into the Gulf of Mexico (National Commission on the Deepwater Horizon Oil Spill and Offshore Drilling, 2011, back cover).
There are a number of reasons for the paucity of research on interlinked and interdependent organizations. Such research is costly and resource demanding. Moreover, most organizations in which some units need to avoid catastrophe are complex in a manner that would require large, multidisciplinary research teams. The Diablo Canyon nuclear power plant is an example of an interdependent, hence complex organization requiring excessive resources. It is enmeshed in the problems, politics, and legalities of PG&E and its regulator, the California Public Utilities Commission (CPUC), to say nothing of local community politics. Building a multidisciplinary (or interdisciplinary) team is not easy. Scholars are not used to talking with other scholars who speak different languages and are enmeshed in different constructs.
The observations at the root of the original HRO conceptual and process findings were intense case studies of the three organizations. Early on, we realized that formal interviews and questionnaires were of little value in organizations in which researchers didnât have an available literature on which to build. Both these research methodologies assume researchers know some basics about what is going on in the organization.
A CONCEPTUAL PROBLEM THAT DOESNâT GO AWAY
A frequent criticism of HRO research is the lack of agreement on a definition by the (now) many authors contributing to the work (e.g., Hopkins, 2007). The chapters in this book reflect this lack of consensus. Despite this repeated criticism, early in the work Rochlin (1993, p. 16) provided a list of defining criteria that seem to provide fairly clear boundaries for organizations to be labeled as high reliability or reliability seeking:
1. The organization is required to maintain high levels of operational reliability and/or safety if it is to be allowed to continue to carry out its tasks (La Porte & Consolini, 1991).
2. The organization is also required to maintain high levels of capability, performance, and service to meet public and/or economic expectations and requirements (Roberts, 1990a, 1990b).
3. Because of the consequentiality of error or failure, the organization cannot easily make marginal trade-offs between capacity and safety. In a deep sense, safety is not fungible (Schulman, 1993).
4. As a result, the organization is reluctant to allow primary task-related learning to proceed by the usual modalities of trial and error for fear that the first error will be the last trial (La Porte & Consolini, 1991).
5. Because of the complexity of both technology and task environment, the organization must actively manage its activities and technologies in real time while maintaining capacity and flexibility to respond to events and circumstances that can at most be generally bounded (Roberts, 1990a, 1990b).
6. The organization will be judged to have âfailedââeither operationally or sociallyâif it does not perform at high levels. Whether service or safety is degraded, the degradation will be noticed and criticized almost immediately (Rochlin, La Porte, & Roberts, 1987).
The labeling problem is further compounded by the fact that most high reliability research still selects on the dependent variable by first identifying organizations that researchers think are or should be high reliability or reliability seeking. But, does reliability mean the same thing to all employees in a single organization or across organizations? Definitional problems, too, may have led to the fact that the research project went by several names before âhigh reliabilityâ stuck as the work matured. It is disconcerting that the acronym HRO has become a marketing label: âWhen it is treated as a catchword this is unfortunate because it makes thinking seem unnecessary and even worse, impossible. The implication is that once you have achieved the honor of being an HRO, you can move on to other thingsâ (Weick, 2012).
EARLY RESEARCH FINDINGS
According to Chrysanthi Lekka (2011), the original research identified several characteristics and processes that enabled the three organizations to achieve and maintain their excellent safety records (e.g., Roberts & Rousseau, 1989; Roberts, 1990b, 1993a; La Porte & Consolini, 1991; Roberts & Bea, 2001). These include:
Deference to expertise. In emergencies and high-tempo operations, decision making migrates to people with expertise regardless of their hierarchical position in the organization. During routine operations decision making is hierarchical.
Management by exception. Managers focus on the âbigger pictureâ (strategy) and let operational decisions be made closer to the decision implementation site. Managers monitor these decisions but only intervene when they see something that is about to go wrong (Roberts, 1993a; 1993b).
Continuous training. The organization engages in continuous training to enhance and maintain operator knowledge of the complex operations within the organization and improve technical competence. Such training also enables people to recognize hazards and respond to âunexpectedâ problems appropriately and is a means to build interpersonal trust and credibility among coworkers.
Safety-critical information communicated using a number of channels. Using a variety of communication channels ensures that workers can receive and act on information in a timely way, especially during high-tempo or emergency operations. For example, at the time of the research, nuclear-powered aircraft carriers used twenty different communication devices ranging from radios to sound-powered phones (Roberts, 1990b). Currently, the Boeing 777 aircraft uses eight communication devices.
Built-in redundancy. The provision of backup systems in case of failure is one redundancy mechanism. Other such mechanisms include internal crosschecks of safety-critical decisions and continuous monitoring of safety-critical activities (e.g., Roberts, 1990b; Hofmann, Jacobs, & Landy, 1995). Nuclear-powered aircraft carriers operate a âbuddy systemâ whereby activities carried out by one individual are observed by a second crew member (Roberts, 1990b).
ORGANIZING FOR RELIABILITY: A GUIDE FOR RESEARCH AND PRACTICE
The central purpose of Organizing for Reliability is to showcase the different perspectives about high reliability organizing that have emerged over the past thirty years. We feel that all too often reliability is embedded in studies of accidents (e.g., Caird & Kline, 2004), safety, (e.g., Naveh & Katz-Navon, 2015), errors (e.g., Jones & Atchley, 2002), and disasters (e.g., Guiberson, 2010) rather than being given attention in its own right. The basic question in HRO research is, What accounts for the exceptional ability of some organizations to continuously maintain high levels of operational reliability under demanding conditions? Its importance is underscored by distant accidents (e.g., the Challenger and Columbia space shuttles) and more recent mishaps (e.g., the Volkswagen emissions debacle, PG&Eâs San Bruno, California, pipeline explosion) in which organizations failed to operate reliably.
Contributors to this book are richly diverse in terms of career sta...