Chapter 1
Reliability Theory
A solid foundation in theoretical knowledge surrounding system reliability is fundamental to the analysis of telecommunications systems. All modern system reliability analysis relies heavily on the application of probability and statistics mathematics. This chapter presents a discussion of the theories, mathematics, and concepts required to analyze telecommunications systems. It begins by presenting the system metrics that are most important to telecommunications engineers, managers, and executives. These metrics are the typical desired output of an analysis, design, or concept. They form the basis of contract language, system specifications, and network design. Without a target metric for design or evaluation, a system can be constructed that fails to meet the end customer's expectations. System metrics are calculated by making assumptions or assignments of statistical distributions. These statistical distributions form the basis for an analysis and are crucial to the accuracy of the system model. A fundamental understanding of the statistical models used in reliability is important. The statistical distributions commonly used in telecommunications reliability analysis are presented from a quantitative mathematical perspective. Review of the basic concepts of probability and statistics that are relevant to reliability analysis are also presented.
Having developed a clear, concise understanding of the required probability and statistics theory, this chapter focuses on techniques of reliability analysis. Assumptions adopted for failure and repair of individual components or systems are incorporated into larger systems made up of many components or systems. Several techniques exist for performing system analysis, each with its own drawbacks and advantages. These analysis techniques include reliability block diagrams (RBDs), Markov analysis, and numerical Monte Carlo simulation. The advantages and disadvantages of each of the presented approaches are discussed along with the technical methodology for conducting each type of analysis.
System sparing considerations are presented in the final section of this chapter. Component sparing levels for large systems is a common consideration in telecommunications systems. Methods for calculating sparing levels based on the RMA repair period, failure rate, and redundancy level are presented in this section.
Chapter 1 makes considerable reference to the well-established and foundational work published in āSystem Reliability Theory: Models, Statistical Methods and Applicationsā by M. Rausand and A. H
yland. References to this text are made in Chapter 1 using a superscript
1 indicator.
1.1 System Metrics
System metrics are arguably the most important topic presented in this book. The definitions and concepts of reliability, availability, maintainability, and failure rate are fundamental to both defining and analyzing telecommunications systems. During the analysis phase of a system design, metrics such as availability and failure rate may be calculated as predictive values. These calculated values can be used to develop contracts and guide customer expectations in contract negotiations.
This section discusses the metrics of importance in telecommunications from both a detailed technical perspective and a practical operational perspective. The predictive and empirical calculation of each metric is presented along with caveats associated with each approach.
1.1.1 Reliability
MIL-STD-721C (MILSTD,1981) defines reliability with two different complementary definitions.
1. The duration or probability of failure-free performance under stated conditions.
2. The probability that an item can perform its intended function for a specified interval under stated conditions. (For nonredundant items, this is equivalent to definition 1. For redundant items this is equivalent to the definition of mission reliability.)
Both MIL-STD-721C definitions of reliability focus on the same performance measure. The probability of failure-free performance or mission success refers to the likelihood that the system being examined works for a stated period of time. In order to quantify and thus calculate reliability as a system metric, the terms āstated periodā and āstated conditionsā must be clearly defined for any system or mission.
The stated period defines the duration over which the system analysis is valid. Without definition of the stated period, the term reliability has no meaning. Reliability is a time-dependent function. Defining reliability as a statistical probability becomes a problem of distribution selection and metric calculation.
The stated conditions define the operating parameters under which the reliability function is valid. These conditions are crucial to both defining and limiting the scope under which a reliability analysis or function is valid. Both designers and consumers of telecommunications systems must pay particular attention to the āstated conditionsā in order to ensure that the decisions and judgments derived are correct and appropriate.
Reliability taken from a qualitative perspective often invokes personal experience and perceptions. Qualitative analysis of reliability should be done as a broad-brush or high-level analysis based in a quantitative technical understanding of the term. In many cases, qualitative reliability is defined as a sense or āgut feelingā of how well a system can or will perform. Th...