1 An Introduction to Failure Analysis
The people weāve worked with started doing in-depth failure analysis on industrial equipment in the mid-1960s. Prior to that time there werenāt a lot of industrial failure analyses, and the ones that were done were just involved with trying to understand the physical causes. The efforts of those early folks were primarily linked with an interest in improving production equipment reliability and capacity in chemical plants. From their work and a manufacturing and processing viewpoint, it wasnāt until the early 1970s that a realization began to develop that the true sources of industrial problems were much more complex.
What is failure analysis? There are probably as many definitions as people you ask the question of, but we prefer to think of it as āthe process of interpreting the features of a deteriorated system or component to determine why it no longer performs the intended functionā. Failure analysis entails using deductive logic to find the physical and human causes of the failure, then using inductive logic to find the latent causes. From an understanding of these āfailure rootsā, there should be a path to the changes needed to prevent the recurrence of the incident.
Some people in industry prefer not to use the term failure analysis, and more than once we have heard a statement such as āWe donāt want our maintenance improvement (or some similar) program being driven by concentrating on failuresā. Itās easy to understand their words but impossible to understand their logic. Most of us learn from our mistakes and, in the same manner that professional athletes use when they study game videos or farmers use in analyzing soils and crop yields, failure analysis allows us to look at our weaknesses and errors, gain knowledge from them, and try to do a better job the next time.
This book is an attempt at a manual that explains how and why mechanical machinery fails and how to solve those problems. Realizing that no single text can address all failures, this book tries to explain how the basic failure mechanisms occur, the things we all do to cause machinery problems, how to recognize those things, and what to do to prevent future similar incidents. Unfortunately, there is an almost infinite number of failure symptoms and appearances and the book canāt address all of them. But it should allow the careful reader to analyze and solve by far the majority of the mechanical failures that occur in the typical paper mills, chemical plants, power plants, and manufacturing facilities.
THE CAUSES OF FAILURES
Why are there premature equipment failures? When the people closely involved with the failure are asked this question, they almost always say it is āthe other guyās faultā. If one were to ask a plant millwright or a maintenance mechanic, the most likely answer to that question would be āoperator errorā. But if the same question were asked of an operator who worked in the plant with that millwright, their answer might be ābecause it wasnāt properly repairedā. At times there is some validity to both of these answers, but the honest and complete answer is always much more complex.
It would be nice and neat if there were only one cause per failure, because eliminating the problem would be easy, but in reality, there are multiple causes to every equipment problem. Unfortunately, there are many people who believe that there is only one cause for a failure. However, look at the analysis of any well-studied major disaster and ask if there was only one cause. Was there a single cause for the BP oil well disaster? ⦠Three Mile Island? ⦠the Exxon Valdez mess? ⦠Bhopal? ⦠Chernobyl? ⦠a major airplane crash? The analyses of these and other, well-recognized and extensively studied failures show that they all have multiple causes. Then, why would any intelligent person believe a typical pump or fan failure would be different? In the case of Three Mile Island, there were three huge studies, each commissioned by one of the responsible groups. All three of the studies said there were numerous causes but that it was āprimarily the fault of the other two organizationsā. In doing failure analyses, it is often amusing to listen to the management staff talk about how the workforce employees āmessed upā without any recognition at all of how their engineering and management practices were involved.
At an international conference on failure analysis, a presentation was made on the causes of aircraft equipment failures. The presentation data showed:
- ⢠30% ā Manufacturing Errors
- ⢠26% ā Design Errors
- ⢠23% ā Maintenance Errors
- ⢠18% ā Material Selection
- ⢠3% ā Operation
During the question-and-answer session after the presentation, a member of the audience asked the speaker why they had listed only one cause for each failure when there were usually multiple causes. The speaker agreed with the questionerās point, but then said, āThere was only one blank on the formā. This answer is a quote and an interesting testimony to the general publicās lack of perception.
When people discuss cases that have been carefully studied, such as those listed earlier, they almost always agree that there are multiple causes for each. Yet when directly involved with a failure, the ability to be objective seems to disappear and, ignoring reality, many people come up with conclusions such as those mentioned in the presentation above. They then take this data, draw an attractive pie chart or bar graph, and point to a nice neat single cause for every failure ⦠when an honest analysis clearly states that is neither true nor logical.
Two comments:
- A. At a later time in the session, another group analyzed the same basic airplane equipment failure data set but reached very different conclusions. They too sorted the data with the idea of a single cause for each failure!
- B. One of the questions that I find interesting is, āWhy donāt these people recognize that many failures ha...