“Introduction” to Normal Accidents, by Charles Perrow Welcome to the world of high-risk technologies. You may have noticed that they seem to be multiplying, and it is true. As our technology expands, as our wars multiply, and as we invade more and more of nature, we create systems— organizations, and the organization of organizations—that increase the risks for the operators, passengers, innocent bystanders, and for future generations. In this book we will review some of these systems—nuclear power plants, chemical plants, aircraft and air traffic control, ships, dams, nuclear weapons, space missions, and genetic engineering. Most of these risky enterprises have catastrophic potential, the ability to take the lives of hundreds of people in one blow, or to shorten or cripple the lives of thousands or millions more. Every year there are more such systems. That is the bad news. The good news is that if we can understand the nature of risky enterprises better, we may be able to reduce or even remove these dangers. I have to present a lot of the bad news here in order to reach the good, but it is the possibility of managing high-risk technologies better than we are doing now that motivates this inquiry. There are many improvements we can make that I will not dwell on, because they are fairly obvious— such as better operator training, safer designs, more quality control, and more effective regulation. Experts are working on these solutions in both government and industry. I am not too sanguine about these efforts, since the risks seem to appear faster than the reduction of risks, but that is not the topic of this book. Rather, I will dwell upon characteristics of high-risk technologies that suggest that no matter how effective conventional safety devices are, there is a form of accident that is inevitable. This is not good news for systems that have high catastrophic potential, such as nuclear power plants, nuclear weapons systems, recombinant DNA production, or even ships carrying highly toxic or explosive cargoes. It suggests, for example, that the probability of a nuclear plant meltdown with dispersion of radioactive materials to the atmosphere is not one chance in a million a year, but more like one chance in the next decade. Most high-risk systems have some special characteristics, beyond their toxic or explosive or genetic dangers, that make accidents in them inevitable, even “normal.” This has to do with the way failures can interact and the way the system is tied together. It is possible to analyze these special characteristics and in doing so gain a much better understanding of why accidents occur in these systems, and why they always will. If we know that, then we are in a better position to argue that certain technologies should be abandoned, and others, which we cannot abandon because we have built much of our society around them, should be modified. Risk will never be eliminated from high-risk systems, and we will never eliminate more than a few systems at best. At the very least, however, we might stop blaming the wrong people and the wrong factors, and stop trying to fix the systems in ways that only make them riskier. The argument is basically very simple. We start with a plant, airplane, ship, biology laboratory, or other setting with a lot of components (parts, procedures, operators). Then we need two or more failures among components that interact in some unexpected way. No one dreamed that when X failed, Y would also be out of order and the two failures would interact so as to both start a fire and silence the fire alarm. Furthermore, no one can figure out the interaction at the time and thus know what to do. The problem is just something that never occurred to the designers. Next time they will put in an extra alarm system and a fire suppressor, but who knows, that might just allow three more unexpected interactions among inevitable failures. This interacting tendency is characteristic of a system, not of a part or an operation; we will call it the “interactive complexity” of the system. For some systems that have this kind of complexity, such as universities or research and development labs, the accident will not spread and be serious because there is a lot of slack available, and time to spare, and other ways to get things done. But suppose the system is also “tightly coupled,” that is, processes happen very fast and can’t be turned off, the failed parts 2 cannot be isolated from other parts, or there is no other way to keep the production going safely. Then recovery from the initial disturbance is not possible; it will spread quickly and irretrievably for at least some time. Indeed, operator action or the safety systems may make it worse, since for a time it is not know what the problem really is. Probably many production processes started out this way—complexly interactive and tightly coupled. But with experience, better designs, equipment, and procedures appeared, and the unsuspected interactions were avoided and the tight coupling reduced. This appears to have happened in the case of air traffic control, where interactive complexity and tight coupling have been reduced by better organization and “technological fixes.” We will also see how the interconnection between dams and earthquakes is beginning to be understood. We now know that it involves a larger system than we originally thought when we just closed off a canyon and let it fill with water. But for most of the systems we shall consider in this book, neither better organization nor technological innovations appear to make them any less prone to system accidents. In fact, these systems require organizational structures that have large internal contradictions, and technological fixes that only increase interactive complexity and tighten the coupling; they become still more prone to certain kinds of accidents. If interactive complexity and tight coupling— system characteristics—inevitably will produce an accident, I believe we are justified in calling it a normal accident, or a system accident. The odd term normal accident is meant to signal that, given the system characteristics, multiple and unexpected interactions of failures are inevitable. This is an expression of an integral characteristic of the system, not a statement of frequency. It is normal for us to die, but we only do it once. System accidents are uncommon, even rare; yet this is not all that reassuring, if they can produce catastrophes. The best way to introduce the idea of a normal accident or a system accident is to give a hypothetical example from a homey, everyday experience. It should be familiar to all of us; it is one of those days when everything seems to go wrong. A Day in the Life You stay home from work or school because you have an important job interview downtown this morning that you have finally negotiated. Your friend or spouse has already left when you make breakfast, but unfortunately he or she has left the glass coffeepot on the stove with the heat on. The coffee has boiled dry and the glass pot has cracked. Coffee is an addiction for you, so you rummage about in the closet until you find an old drip coffeemaker. Then you wait for the water to boil, watching the clock, and after a quick cup dash out the door. When you get to your car you find that in your haste you have left your car keys (and the apartment keys) in the apartment. That’s okay, because there is a spare apartment key hidden in the hallway for just such emergencies. (This is a safety device, a redundancy, incidentally.) But then you remember that you gave a friend the key the other night because he had some books to pick up, and, planing ahead, you knew you would not be home when he came. (That finishes that redundant pathway, as engineers call it.) Well, it is getting late, but there is always the neighbor’s car. The neighbor is a nice old gent who drives his car about once a month and keeps it in good condition. You knock on the door, your tale ready. But he tells you that it just so happened that the generator went out last week and the man is coming this afternoon to pick it up and fix it. Another “backup” system has failed you, this time through no connection with your behavior at all (uncoupled or independent events, in this case, since the key and the generator are rarely connected). Well, there is always the bus. But not always. The nice old gent has been listening to the radio and tells you the threatened lock-out of the drivers by the bus company has indeed occurred. The drivers refuse to drive what they claim are unsafe buses, and incidentally want more money as well. (A safety system has foiled you, of all things.) You call a cab from your neighbor’s apartment, but none can be had because of the bus strike. (These two events, the bus strike and the lack of cabs, are tightly connected, dependent events, 3 or tightly coupled events, as we shall call them, since one triggers the other.) You call the interviewer’s secretary and say, “It’s just too crazy to try to explain, but all sorts of things happened this morning and I can’t make the interview with Mrs. Thompson. Can we reschedule it?” And you say to yourself, next week I am going to line up two cars and a cab and make the morning coffee myself. The secretary answers “Sure,” but says to himself, “This person is obviously unreliable; now this after pushing for weeks for an interview with Thompson.” He makes a note to that effect on the record and searches for the most inconvenient time imaginable for next week, one that Mrs. Thompson might have to cancel. Now I would like you to answer a brief questionnaire about this event. Which was the primary cause of this “accident” or foulup? 1. Human error (such as leaving the heat on under the coffee, or forgetting the keys in the rush)? Yes______ No______ Unsure ______ 2. Mechanical failure (the generator on the neighbor’s car)? Yes______ No______ Unsure ______ 3. The environment (bus strike and taxi overload)? Yes______ No______ Unsure ______ 4. Design of the system (in which you can lock yourself out of the apartment rather than having to use a door key to set the lock; a lack of emergency capacity in the taxi fleet)? Yes______ No______ Unsure ______ 5. Procedures used (such as warming up coffee in a glass pot; allowing only normal time to get out on this morning)? Yes______ No______ Unsure ______ I you answered “not sure” or “no” to all of the above, I am with you. If you answered “yes” to the first, human error, you are taking a stand on multiple failure accidents that resembles that of the President’s Commission to Investigate the Accident at Three Mile Island. The Commission blamed everyone, but primarily the operators. The builders of the equipment, Babcock and Wilcox, blamed only the operators. If you answered “yes” to the second choice, mechanical error, you can join the Metropolitan Edison officials who run the Three Mile Island plant. They said the accident was caused by the faculty valve, and then sued the vendor, Babcock and Wilcox. If you answered “yes” to the fourth, design of the system, you can join the experts of the Essex Corporation, who did a study for the Nuclear Regulatory Commission of the control room. The best answer is not “all of the above” or any one of the choices, but rather “none of the above.” (Of course I did not give you this as an option.) The cause of the accident is to be found in the complexity of the system. That is, each of the failures—design, equipment, operators, procedures, or environment—was trivial by itself. Such failures are expected to occur since nothing is perfect, and we normally take little notice of them. The bus strike would not affect you if you had your car key or the neighbor’s car. The neighbor’s generator failure would be of little consequence if taxis were available. If it were not an important appointment the absence of cares, buses, and taxis would not matter. On any other morning the broken coffeepot would have been an annoyance (an incident, we will call it), but would not have added to your anxiety and caused you to dash out without your keys. Though the failures were trivial in themselves, and each one had a backup system or redundant path to treat if the main one were blocked, the failures became serious when they interacted. It is the interaction of the multiple failures that explains the accident. We expect bus strikes occasionally, we expect to forget our keys with that kind of apartment lock (why else hide a redundant key?), we occasionally loan the extra key to someone rather than disclose its hiding place. What we don’t expect is for all of these events to come together at once. That is why we told the secretary that it was a crazy morning, too complex to explain, and invoked Murphy’s law to ourselves (if anything can go wrong, it will). That accident had its cause in the interactive nature of the world for us that morning and in its tight coupling—not in the discrete failures, which 4 are to be expected and which are guarded against with backup systems. Most of the time we don’t notice the inherent coupling in our world, because most of the time there are no failures, or the failures that occur do not interact. But all of a sudden, things that we did not realize could be linked (buses and generators, coffee and a loaned key) became linked. The system is suddenly more tightly coupled that we had realized. When we have interactive systems that are also tightly coupled, it is “normal” for them to have this kind of an accident, even though it is infrequent. It is normal not in the sense of being frequent or being expected—indeed, neither is true, which is why we were so baffled by what went wrong. It is normal in the sense that it is an inherent property of the system to occasionally experience this interaction. Three Mile Island was such a normal or system accident, and so were countless others that we shall examine in this book. We have such accidents because we have built an industrial society that has some parts, like industrial plants or military adventures, that have highly interactive and tightly coupled units. Unfortunately, some of these have high potential for catastrophic accidents. Our “day in the life” example introduced some useful terms. Accidents can be the result of multiple failures. Our example illustrated failures in five components: in design, equipment, procedures, operators, and environment. To apply this concept to accidents in general, we will need to add a sixth area—supplies and materials. All six will be abbreviated as the DEPOSE components (for design, equipment, procedures, operators, supplies and materials, and environment). The example showed how different parts of the system can be quite dependent upon one another, as when the bus strike created a shortage of taxis. This dependence is know as tight coupling. On the other hand, events in a system can occur independently as we noted with the failure of the generator and forgetting the keys. These are loosely coupled events, because although at this time they were both involved in the same production sequence, one was not caused by the other. One final point which our example cannot illustrate. It isn’t the best case of a normal accident or system accident, as we shall use these terms, because the interdependence of the events was comprehensible for the person or “operator.” She or he could not do much about the events singly or in their interdependence, but she or he could understand the interactions. In complex industrial, space, and military system, the normal accident generally (not always) means that the interactions are not only unexpected, but are incomprehensible for some critical period of time. In part this is because in these human-machine systems the interactions literally cannot be seen. In part it is because, even if they are seen, they are not believed. As we shall find out and as Robert Jervis and Karl Weick have noted, seeing is not necessarily believing; sometime, we must believe before we can see. Variations on the Theme While basically simple, the idea that guides this book has some quite radical ramifications. For example, virtually every system we will examine places “operator error” high on its list of causal factors—generally about 60 to 80 percent of accidents are attributed to this factor. But if, as we shall see time and time again, the operator is confronted by unexpected and usually mysterious interactions among failures, saying that he or she should have zigged instead of zagged is possible only after the fact. Before the accident no one could know what was going on and what should have been done. Sometimes the errors are bizarre. We will encounter “noncollision course collisions,” for example, where ships that were about to pass in the night suddenly turn and ram each other. But careful inquiry suggests that the mariners had quite reasonable explanations for their actions; it is just that the interaction of small failures led them to construct quite erroneous worlds in their minds, and in this case these conflicting images led to collision. Another ramification is that great events have small beginnings. Running through the book are accidents that start with trivial kitchen mishaps; we will find them on aircraft and ships and in nuclear plants, having to do with making coffee or washing up. Small failures abound in big 5 systems; accidents are not often caused by massive pipe breaks, wings coming off, or motors running amok. Patient accident reconstruction reveals the banality and triviality behind most catastrophes. Small beginnings all too often cause great events when the system uses a “transformation” process rather than an additive or fabricating one. Where chemical reactions, high temperature and pressure, or air, vapor, or water turbulence is involved, we cannot see what is going on or even, at times, understand the principles. In many transformation systems we generally know what works, but sometimes do not know why. These systems are particularly vulnerable to small failures that “propagate” unexpectedly, because of complexity and tight coupling. We will examine other systems where there is less transformation and more fabrication or assembly, systems that process raw materials rather than change them. Here there is an opportunity to learn from accidents and greatly reduce complexity and coupling. These systems can still have accidents—all systems can. But they are more likely to stem from major failures whose dynamics are obvious, rather than the trivial ones that are hidden from understanding. Another ramification is the role of organizations and management in preventing failures—or causing them. Organizations are at the center of our inquiry, even though we will often talk about hardware and pressure and temperature and the like. High-risk systems have a double penalty; because normal accidents stem from the mysterious interaction of failures, those closest to the system, the operators, have to be able to take independent and sometimes quite creative action. But because these systems are so tightly coupled, control of operators must be centralized because there is little time to check everything out and be aware of what another part of the system is doing. An operator can’t just do her own thing; tight coupling means tightly prescribed steps and invariant sequences that cannot be changed. But systems cannot be both decentralized and centralized at the same time; they are organizational Pushme-pullyous, straight out of Dr. Doolittle stories, trying to do in opposite directions at once. So we must add organizational contradictions to our list of problems. Even aside from these inherent contradictions, the role of organizations is important in other respects for our story. Time and time again warnings are ignored, unnecessary risks taken, sloppy work done, deception and downright lying practiced. As an organizational theorist I am reasonably unshaken by this; it occurs in all organizations, and it is a part of the human condition. But when it comes to systems with radioactive, toxic, or explosive materials, or those operating in an unforgiving, hostile environment in the air, at sea, or under the ground, these routine sins of organizations have very nonroutine consequences. Our ability to organize does not match the inherent hazards of some of our organized activities. Better organization will always help any endeavor. But the best is not good enough for some that we have decided to pursue. Nor can better technology always do the job. Besides being a book about organizations (but painlessly, without the jargon and the sacred texts), this is a book about technology. You will probably learn more than you ever wanted to about condensate polishers, buffet boundaries, reboilers, and slat retraction systems. But that is in passing (and even while passing you are allowed a considerable measure of incomprehension). What is not in passing but is essential here is an evaluation of technology and its “fixes.” As the saying goes, man’s reach has always exceeded his grasp (and of course that goes for women too). It should be so. But we might begin to learn that of all the glorious possibilities out there to reach for, some are going to be beyond our grasp in catastrophic ways. There is not technological imperative that says we must have power or weapons from nuclear fission or fusion, or that we must create and loose upon the earth organisms that will devour our oil spills. We could reach for, and grasp, solar power or safe coal-fired plants, and the safe ship designs and industry controls that would virtually eliminate oil spills. No catastrophic potential flows from these. It is particularly important to evaluate technological fixes in the systems that we cannot or will not do without. Fixes, including safety 6 devices, sometimes create new accidents, and quite often merely allow those in charge to run the system faster, or in worse weather, or with bigger explosives. Some technological fixes are error-reducing--the jet engine is simpler and safer than the piston engine; fathometers are better than lead lines; three engines are better than two on an airplane; computers are more reliable than pneumatic controls. But other technological fixes are excuses for poor organization or an attempt to compensate for poor system design. The attention of authorities in some of these systems, unfortunately, is hard to get when safety is involved. When we add complexity and coupling to catastrophe, we have something that is fairly new in the world. Catastrophes have always been with us. In the distant past, the natural ones easily exceeded the human-made ones. Humanmade catastrophes appear to have increased with industrialization as we built devices that could crash, sink, burn, or explode. In the last fifty years, however, and particularly in the last twenty-five, to the usual cause of accidents— some component failure, which could be prevented in the future—was added a new cause: interactive complexity in the presence of tight coupling, producing a system accident. We have produced designs so complicated that we cannot anticipate all the possible interactions of the inevitable failures; we add safety devices that are deceived or avoided or defeated by hidden paths in the systems. The systems have become more complicated because either they are dealing with more deadly substances, or we demand they function in ever more hostile environments or with ever greater speed and volume. And still new systems keep appearing, such as gene splicing, and others grow ever more complex and tightly tied together. In the past, designers could learn from the collapse of a medieval cathedral under construction, or the explosion of boilers on steamboats, or the collision of railroad trains on a single track. But we seem to be unable to learn from chemical plant explosions or nuclear plant accidents. We may have reached a plateau where our learning curve is nearly flat. It is true that I should be wary of that supposition. Reviewing the wearisome Cassandras in history who prophesied that we had reached our limit with the reciprocating steam engine or the coal-fired railroad engine reminds us that predicting the course of technology in history is perilous. Some wellplaced warnings will not harm us, however. One last warning before outlining the chapters to come. The new risks have produced a new breed of shamans, called risk assessors. As with the shamans and the physicians of old, it might be more dangerous to go to them for advice than to suffer unattended. In our last chapter we will examine the dangers of this new alchemy where body counting replaces social and cultural values and excludes us from participating in decisions about the risks that a few have decided the many cannot do without. The issue is not risk, but power. Fast Forward Chapter 1 will examine the accident at Three Mile Island (TMI) where there were four independent failures, all small, none of which the operators could be aware of. The system caused that accident, not the operators. Chapter 2 raises the question of why, if these plants are so complex and tightly coupled, we have not had more TMIs. A review of the nuclear power industry and some of its trivial and its serious accidents will suggest that we have not given large plants of the size of TMI time to express themselves. The record of the industry and the Nuclear Regulatory Commission is frightening, but not because it is all that different from the records of other industries and regulatory agencies. It isn’t. It is frightening because of the catastrophic potential of this industry; it has to have a perfect performance record, and it is far from achieving that. We can go a fair distance with some loosely defined concepts such as complexity, coupling, and catastrophe, but in order to venture further into the world of high-risk systems we need better definitions, and a better model of systems and accidents and their consequences. This is the work of Chapter 3, where terms are defined and amply illustrated with still more accident stories. In this chapter we explore the advantages of loose coupling, map the industrial, service, and voluntary organizational world 7 according to complexity and coupling, and add a definition of types of catastrophes. Chapter 4 applies our complexity, coupling, and catastrophe theories to the chemical industry. I wish to make it clear that normal accidents or, as we will generally call them, system accidents, are not limited to the nuclear industry. Some of the most interesting and bizarre examples of the unanticipated interaction of failures appear in this chapter—and we are now talking about a quite well-run industry with ample riches to spend on safety, training, and high-technology solutions. Yet chemical plants mostly just sit there, though occasionally they will send a several hundred pound missile a mile away into a community or incinerate a low flying airplane. In Chapter 5 we move out into the environment and examine aircraft and flying, and air traffic control and the airports and airways. Flying is in part a transformation system, but largely just very complex and tightly coupled. Technological fixes are made continuously here, but designers and airlines just keep pushing up against the limits with each new advance. Flying is risky, and always will be. With the airways system, on the other hand, we will examine the actual reduction of complexity and coupling through organizational changes and technological developments; this system has become very safe, as safety goes in inherently risky systems. An examination of the John Wayne International Airport in Orange County, California, will remind us of the inherent risks. With marine transport, in Chapter 6, the opposite problem is identified. No reduction in complexity or coupling has been achieved. Horrendous tales are told, three of which we will detail, about the needless perils of this system. We will analyze it as one that induces errors through its very structure, examining insurance, shipbuilders, shippers, captains and crews, collision avoidance systems, and the international anarchy that prevents effective regulation and encourages cowboys and hot rodders at sea. One would not think that ships could pile up as if they were on the Long Island Expressway, but they do. Chapter 7 might seem to be a diversion since dams, lakes, and mines are not prone to system accidents. But it will support our point because they are also linear, rather than complex systems, and the accidents there are foreseeable and avoidable. However, when we move away from the individual dam or mine and take into account the larger system in which they exist, we find the “eco-system accident,” an interaction of systems that were thought to be independent but are not because of the larger ecology. Once we realize this we can prevent future accidents of this type; in linear systems we can learn from our mistakes. Dams, lakes, and mines also simply provide tales worth telling. Do dams sink or float when they fail? Could we forestall a colossal earthquake in California by a series of mammoth chiropractic spinal adjustments? How could we lose a whole lake and barges and tugs in a matter of hours? (By inadvertently creating an eco-system accident.) Chapter 8 deals with far more esoteric systems. Space missions are very complex and tightly coupled, but the catastrophic potential was small and now is smaller. More important, this system allows us to examine the role of the operator (in this case, extraordinarily well-trained astronauts) whom the omniscient designers and managers tried to treat like chimpanzees. It is a cautionary tale for all high-technology systems. Accidents with nuclear weapons, from dropping them to firing them by mistake, will illustrate a system so complicated and error-prone that the fate of the earth may be decided more by inadvertence than anger. The prospects are, I am afraid, terrifying. Equally frightening is the section in this chapter on gene splicing, or recombinant DNA. In this case, in the unseemly haste for prizes and profits, we have abandoned even the most elementary safeguards, and may loose upon the world a rude beast whose time need not have come. In the last chapter we shall examine the new shamans, the risk assessors, and their inadvertent allies, the cognitive psychologists. Naturally, as a sociologist, I will have a few sharp words to say about the latter, but point out that their research has really provided the grounds for a public role in high-risk decision making, one that risk assessors do not envisage. Finally, we will add up the credits and deficits of the systems we examined, and I will make a few modest 8 suggestions for complicating the lives of some systems—and shutting others down completely.