Design for Reliability
eBook - ePub

Design for Reliability

Dev G. Raheja, Louis J. Gullo, Dev G. Raheja, Louis J. Gullo

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Design for Reliability

Dev G. Raheja, Louis J. Gullo, Dev G. Raheja, Louis J. Gullo

Book details
Book preview
Table of contents
Citations

About This Book

A unique, design-based approach to reliability engineering

Design for Reliability provides engineers and managers with a range of tools and techniques for incorporating reliability into the design process for complex systems. It clearly explains how to design for zero failure of critical system functions, leading to enormous savings in product life-cycle costs and a dramatic improvement in the ability to compete in global markets.

Readers will find a wealth of design practices not covered in typical engineering books, allowing them to think outside the box when developing reliability requirements. They will learn to address high failure rates associated with systems that are not properly designed for reliability, avoiding expensive and time-consuming engineering changes, such as excessive testing, repairs, maintenance, inspection, and logistics.

Special features of this book include:

  • A unified approach that integrates ideas from computer science and reliability engineering
  • Techniques applicable to reliability as well as safety, maintainability, system integration, and logistic engineering
  • Chapters on design for extreme environments, developing reliable software, design for trustworthiness, and HALT influence on design

Design for Reliability is a must-have guide for engineers and managers in R&D, product development, reliability engineering, product safety, and quality assurance, as well as anyone who needs to deliver high product performance at a lower cost while minimizing system failure.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Design for Reliability an online PDF/ePUB?
Yes, you can access Design for Reliability by Dev G. Raheja, Louis J. Gullo, Dev G. Raheja, Louis J. Gullo in PDF and/or ePUB format, as well as other popular books in Technology & Engineering & Electrical Engineering & Telecommunications. We have over one million books available in our catalogue for you to explore.
Chapter 1
Design for Reliability Paradigms
Dev Raheja

Why Design for Reliability?

The science of reliability has not kept pace with user expectations. Many corporations still use MTBF (mean time between failures) as a measure of reliability, which, depending on the statistical distribution of failure data, implies acceptance of roughly 50 to 70% failures during the time indicated by the MTBF. No user today can tolerate such a high number of failures. Ideally, a user does not want any failures for the entire expected life! The life expected is determined by the life inferred by users, such as 100,000 miles or 10 years for an automobile, at least 10 years for kitchen appliances, and at least 20 years for a commercial airliner. Most commercial companies, such as automotive and medical device manufacturers, have stopped using the MTBF measure and aim at 1 to 10% failures during a self-defined time. This is still not in line with users' dreams. The real question is: Why not design for zero failures if we can increase profits and gain more market share? Zero failures implies zero mission-critical failures or zero safety-critical system failures. As a minimum, systems in which failures can lead to catastrophic consequences must be designed for zero failures. There are companies that are able to do this. Toyota, Apple, Gillette, Honda, Boeing, Johnson & Johnson, Corning, and Hewlett-Packard are a few examples.
The aim of design for reliability (DFR) is to design-out failures of critical system functions in a system. The number of such failures should be zero for the expected life of the product. Some components may be allowed to fail, such as in redundant systems. For example, in aerospace, as long as a system can function at least for the duration of the mission and the failed components are replaced prior to the next mission to maintain redundancy, certain failures can be tolerated. This is, however, insufficient for complex systems where thousands of software interactions, hundreds of wiring connections, and hundreds of human factors affect the systems' reliability. Then there are issues of compatibility [1] among components and materials, among subsystems, and among hardware and software interactions. Therefore, for complex systems we may find it impossible to have zero failures, but we must at least prevent the potential failures we know about. Since failures can come from unknown and unexpected interactions, we should try to design-in fallback modes for unexpected events. A “what-if” analysis usually points to some events of this type. To minimize failures in complex systems, in this book we describe techniques for improving software and interface reliability.
As indicated earlier, some companies have built a strong and long-lasting reputation for reliability based on aiming at zero failures. Toyota and Sony built their world leadership mostly on high reliability; and Hyundai has been offering a 10-year warranty and increasing its market share steadily. Progress has been made since then. In 1974, when nobody in the world gave a warranty longer than one year, Cooper Industries gave a 15-year warranty to electric power utilities on high-voltage transformer components and stood out as the leader in profitability among all Fortune 500 electrical companies. Raytheon has established a culture at the highest level in the corporation of providing customers with mission assurance through a “no doubt” mindset. Says Bill Swanson, chairman and CEO of Raytheon: “[T]here must be no doubt that our products will work in the field when they are needed” (Raytheon Company, Technology Today, 2005, Issue 4). Similarly, with its new lifetime power train warranty, Chrysler is creating new standards for reliability.

Reflections on the Current State of the Art

Reliability is defined as the probability of performing all the functions (including safety functions) satisfactorily for a specified time and specified use conditions. The functions and use conditions come from the specification. If a specification misses or is vague 60% or more of the time, the reliability predictions are of very little value. This is usually the case [2]. The second big issue is: How many failures should be tolerable? Some readers may not agree that we can design for zero critical failures, but the evidence supports the contrary conclusion. We may not be able to prevent failures that we did not foresee, but we can design out all the critical failure modes that we discover during the requirements analysis and in the failure mode and effects analysis (FMEA). In over 30 years' experience, I have yet to encounter a failure mode that cannot be designed-out. The cost is usually not an issue if the FMEA is conducted and the improvements are made during the early design stage. The time specified for critical failures in the reliability definition should be the entire lifetime expected.
In this chapter we address how to write a good system specification and how to design so as not to fail. We make it clear that the design for reliability should concentrate on the critical and major failures. This prevents us from solving easy problems and ignoring the complex ones. The following incident raises issues that are central to designing for reliability.
The lessons learned from the Interstate 35 bridge collapse in Minnesota on August 1, 2007 into the Mississippi River on August 1, killing 13, give us some clues about what needs to be done. Similar failure mechanisms can be found in many large electrical and mechanical systems, such as aircraft and electric power plants.
The bridge was expanded from four lanes to six, and eventually to eight. Some wonder whether that might have played a role in its collapse. Investigators said the failure resulted because of a flaw in its design. The designers had specified a metal plate that was too thin to serve as a junction of several girders.
Like many products, it gradually got exposed to higher loads, adding strain to the weak spot. At the time of the collapse, the maintenance crews had brought tons of equipment and material onto the deck for a repair job. The bridge was of a design known as a nonredundant structure, meaning that if a single part failed, the entire structure could collapse. Experts say that the pigeon dung all over the steel could have caused faster corrosion than was predicted.
This case history challenges the fundamentals of engineering taught in the universities.
  • Should the design margin be 100% or 800%? “How does the designer determine the design margin?”
  • Should we design for pigeons doing their dirty job? What about designing for all the other environmental stressors, such as chemicals sprayed during snow emergencies, tornados, and earthquakes?
  • Should we design-in redundancy on large mechanical systems to avoid disasters? The wisdom says that redundancy delays failures but may not avoid disasters. The failure could occur in both the redundant paths, such as in an aircraft accident where the flying debris cut through all three redundant hydraulic lines.
  • Should we design for sudden shocks experienced by the bridge during repair and maintenance?
These concerns apply to any product, such as electronics, electrical power systems, and even a complex software design. In software, the corrosion can be symbolic for applying too many patches without knowing the interactions. Call it “software corrosion.”
The answers to the questions above should be a resounding “yes.” An engineering team should foresee all these and many more failure scenarios before starting to design. The obvious strategy is to write a good system specification by first predicting all major potential failures and avoiding them by writing robust requirements. Oversights and omissions in specifications are the biggest weakness in the design for reliability. Typically, 200 to 300 requirements are generally missing or vague for a reasonably complex system such as an automotive transmission.
Analyses techniques covered in this book for hardware and software help us discover many missing requirements, and a good brainstorming session for overlooked requirements always results in discovering many more. What we really need is perhaps the paradigms based on lessons learned.

The Paradigms for Design for Reliability

Reliability is a process. If the right process is followed, results are likely to be right. The opposite is also true in the absence of the right process. There is a saying: “If we don't know where we are going, that's where we will go.” It is difficult enough to do the right things, but it is even more difficult to know what the right things are!
Knowledge of the right things comes from practicing the use of lessons learned. Just having all the facts at your fingertips does not work. One must utilize the accumulated knowledge for arriving at correct decisions. Theory is not enough. One must keep becoming better by practicing. Take the example of swimming. One cannot learn to swim from books alone; one must practice swimming. It is okay to fail as long as mistakes are the stepping stones to failure prevention. Thomas Edison was reminded that he failed 2000 times before the success of the light bulb. His answer, “I never failed. There were 2000 steps in this process.”
One of the best techniques is to use lessons learned in the form of paradigms. They are easy to remember and they make good topics for brainstorming during design reviews.

Paradigm 1: Learn To Be Lean Instead of Mean

When engineers say that a component's life is five years, they usually imply the calculation of the mean value, which says that there is a 50% chance of failure during the five years. In other words, either the supplier or the customer has to pay for 50% failures during the product cycle. This is expensive for both: a lose–lose situation. Besides, there are many indirect expenses: for warranties, production testing, and more inventories to replace failed parts. This is mean management. It has a negative return on investment. It is mean to the supplier because of loss of future business and mean to the customer in putting up with the frustrations of downtime and the cost of business interruptions. Therefore, our failure rate goal should be as lean as possible. Engineers should promise minimum life to customers, not mean life. Never use averages in reliability; they are of no use to anyone.

Paradigm 2: Spend a Lot of Time on Requirement Analysis

It is worth repeating that the sources of most failures are incomplete, ambiguous, and poorly defined requirements. That is why we introduce unnecessary design changes and write deviations when we are in hurry to ship a product. Look particularly for missing functions in the specifications. There is often practically nothing in a specification about modularity, reliability, safety, serviceability, logistics, human factors, reduction of “no faults found,” diagnostics capability, and prevention of warranty failures. Very few specifications address even obvious requirements, such as internal interface, external interface, user–hardware interface, user–software interface, and how the product should behave if and when a sneak failure occurs. Developing a good specification is an iterative process with inputs from the customer and the entities that are downstream in the process. Those who are trying to build reliability around a faulty specification should only expect a faulty product. Unfortunately, most companies think of reliability when the design is already approved. At this stage there is no budget and no time for major design changes. The only thing a company can do is to hope for reasonable reliability and commit to do better the next time.
To identify missing functions, a cross-functional team is necessary. At least one member from each disciple should be present, such as manufacturing, field service, and marketing, as well as a customer representative. If the specification contains only 50% of the necessary features, how can one even think of reliability? Reliability is not possible without accurate and comprehensive specifications. Therefore, writing accurate performance specifications is a prerequisite for reliability. Such specifications should aim at zero failures for the modes that result in product recalls, high downtime, and inability to diagnose. My interviews with those attending my reliability courses reveal that the dealers are unable to diagnose about 65% of the problems (no faults found). Obviously, fault isolation requirements in the specifications are necessary to reduce down time.
To ensure the accuracy and completeness of a specification, only those who have knowledge of what makes a good specification should approve it. They must ensure that the specification is clear on what the product should never do, however stupid it may sound. For example: “There shall be no sudden acceleration during landing” for an aircraft. In addition, the marketing and sales experts should participate in writing the specification to make sure that old warranty problems “shall not” be in the new product and that there is enough gain in reliability to give the product a competitive edge.
The “shall not” specification is not limited to failures. That would be too simple. We must be able to see the complexity in this simplicity. This is called interconnectedness. We need to know that reliability is intertwined with many elements of life-cycle costs. The costs of downtime, repairs, preventive maintenance, amount of logistics support required, safety, diagnostics, and serviceability are dependent on the level of reliability. In the same spirit, we should also analyze product friendliness and modularity, which are interconnected with reliability. For example, General Motors is designing its hydrogen cars to have a single chassis for all models instead of 80 different chassis as is the case with current production. This action influences reliability in many ways. Similarly, an analysis of downtime should be conducted by service engineering staff to ensure that each fault will be diagnosed in a timely manner, repairs will be quick, and life-cycle costs will be reduced by extending the maintenance cycles or eliminating the need for maintenance altogether. The specification should be critiqued for quick serviceability and ease of access. Until the specification is written thoroughly and approved, no design work should begin. An example of the need to identify missing requirements is that nearly 1000 people around the world lost their lives while the kinks were being removed from the 290-ton McDonnell Douglas DC-...

Table of contents