Control System Safety and Reliability
Safety and reliability have been essential parameters of automatic control systems design for decades. It is clearly recognized that a safe and reliable system provides many benefits. Economic benefits include less lost production, higher quality product, reduced maintenance costs, and lower risk costs. Other benefits include regulatory compliance, the ability to schedule maintenance, and many othersâincluding peace of mind and the satisfaction of a job well done.
Given the importance of safety and reliability, how are they achieved? How are they measured? The science of Reliability Engineering has advanced quite a bit in recent decades. That science offers a number of fundamental concepts used to achieve high reliability and high safety. These concepts include high-strength design, fault-tolerant design, on-line failure diagnostics, and high-common-cause strength. All of these important concepts will be developed in later chapters of this book. When these concepts are actually understood and used, great benefits can result.
Reliability and safety are measured using a number of well-defined parameters including Reliability, Availability, MTTF (Mean Time To Failure), RRF (Risk Reduction Factor), PFD (Probability of Failure on Demand), PFDavg (Average Probability of Failure on Demand), PFS (Probability of Safe Failure), and other special metrics. These terms have been developed over the last 60 years or so by the reliability and safety engineering community.
Reliability Engineering
The science of reliability engineering has developed a number of qualitative and semi-quantitative techniques that allow an engineer to understand system operation in the presence of a component failure. These techniques include failure modes and effects analysis (FMEA), qualitative fault tree analysis (FTA), and hazard and operational analysis (HAZOPS). Other techniques based on probability theory and statistics allow the control engineer to quantitatively evaluate the reliability and safety of control system designs. Reliability block diagrams and fault trees use combinational probability to evaluate the system-level probability of success, probability of safe failure, or probability of dangerous failure. Another popular technique called Markov models shows system success and failure via circles called states. These techniques will be covered in this book.
Life-cycle cost modeling may be the most useful technique of all to answer questions of optimal cost and justification. Using this analysis tool, the output of a reliability analysis in the language of statistics is converted to the clearly understood language of money. It is frequently quite surprising how much money can be saved using reliable and safe equipment. This is especially true when the cost of failure is high.
Reliability engineering is built upon a foundation of probability and statistics. But, a successful control system reliability evaluation depends just as much on control and safety systems knowledge. This knowledge includes an understanding of the components used in these systems, the component failure modes and their effect on the system, and the system failure modes and failure stress sources present in the system environment. Thus logic, systems engineering, and some mathematics are combined to complete the tool-set needed for reliability and safety evaluation. Real-world factorsâincluding on-line diagnostic capability, repair times, software failures, human failures, common-cause failures, failure modes, and time-dependent failure ratesâmust be addressed in a complete analysis.
Perspective
The field of reliability engineering is relatively new compared to other engineering disciplines, with significant research having been driven by military needs in the mid-1940s. Introductory work in hardware reliability was done in conjunction with the German V2 rocket program, where innovations such as the 2oo3 (two out of three) voting scheme were invented [Ref. 1, 2]. Human reliability research began with American studies done on radar operators and gunners during World War II. Military systems were among the first to reach complexity levels at which reliability engineering became important. Methods were needed to answer important questions, such as: âWhich configuration is more reliable on an airplane, four small engines or two large engines?â
Control systems and safety protection systems have also followed an evolutionary path toward greater complexity. Early control systems were simple. Push buttons and solenoid valves, sight gauges, thermometers, and dipsticks were typical control tools. Later, single loop pneumatic controllers dominated. Most of these machines were not only inherently reliable, many failed in predictable ways. With a pneumatic system, when the air tubes leaked, the output went down. When an air filter clogged, the output went to zero. When the hissing noise changed, a good technician could ârun diagnosticsâ just by listening to determine where the problem was. Safety protection systems were built from relays and sensing switches. With the addition of safety springs and special contacts, these devices would virtually always fail with the contacts open. Again, they were simple devices that were inherently reliable with predictable, (mostly) fail-safe failure modes.
The inevitable need for better processes eventually pushed control systems to a level of complexity at which sophisticated electronics became the optimal solution for control and safety protection. Distributed microcomputer-based controllers introduced in the mid-1970s offered economic benefits, improved reliability, and flexibility.
The level of complexity in our control systems has continued to increase, and programmable electronic systems have become the standard. Systems today utilize a hierarchical collection of computers of all sizes, from microcomputer-based sensors to world-wide computer communication networks. Industrial control and safety protection systems are now among the most complex systems anywhere. These complex systems are the type that can benefit most from reliability engineering. Control systems designers need answers to their questions: âWhich control architecture gives the best reliability for the application?â âWhat combination of systems will give me the lowest cost of ownership for the next five years?â âShould I use a personal computer to control our reactor?â âWhat architecture is needed to meet SIL3 safety requirements?â
These questions are best answered using quantitative reliability and safety analysis. Markov analysis has been developed into one of the best techniques for answering these questions, especially when time dependent variables such as imperfect proof testing are important. Failure Modes Effects and Diagnostic Analysis (FMEDA) has been developed and refined as a new tool for quantitative measurement of diagnostic capability. These new tools and refined methods have made it easier to optimize designs using reliability engineering.
Standards
Many new international standards have been created in the world of reliability engineering. Standards now provide detailed methods of determining component failure rates [Ref. 3]. Standards provide checklists of issues that should be addressed in qualitative evaluation. Standards define performance measures against which quantitative reliability and safety calculations can be compared. Standards also provide explanations and examples of how systems can be designed to maximize safety and reliability.
Several of these international standards play an important role in the safety and reliability evaluation of control systems. The ISA-84.01 standard [Ref. 4], Applications of Safety Instrumented Systems for the Process Industries, was a pioneering effort and first described quantitative means to show safety integrity (Figure 1-1). It also described the boundaries of the Safety Instrumented System (SIS) and the Basic Process Control System (BPCS). When used with ANSI/ISA-91.01 [Ref. 5], which provides definitions to identify components of a safety critical system, various plant equipment can be classified into the proper group.
Figure 1-1. Safety Integrity Levels (SIL)
ISA-84.01 also pioneered the concept of a âsafety life-cycle,â a systematic design process that begins with conceptual process design and ends with SIS decommissioning. A simplified version of the safety life-cycle chart is shown in Figure 1-2.
Figure 1-2. Simplified Safety Life-cycle (SLC)
The original ISA-84.01-1996 standard has been replaced by the updated...