
- 360 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
Architecture Design for Soft Errors
About this book
Architecture Design for Soft Errors provides a comprehensive description of the architectural techniques to tackle the soft error problem. It covers the new methodologies for quantitative analysis of soft errors as well as novel, cost-effective architectural techniques to mitigate them.
To provide readers with a better grasp of the broader problem definition and solution space, this book also delves into the physics of soft errors and reviews current circuit and software mitigation techniques. There are a number of different ways this book can be read or used in a course: as a complete course on architecture design for soft errors covering the entire book; a short course on architecture design for soft errors; and as a reference book on classical fault-tolerant machines.
This book is recommended for practitioners in semi-conductor industry, researchers and developers in computer architecture, advanced graduate seminar courses on soft errors, and (iv) as a reference book for undergraduate courses in computer architecture.
- Helps readers build-in fault tolerance to the billions of microchips produced each year, all of which are subject to soft errors
- Shows readers how to quantify their soft error reliability
- Provides state-of-the-art techniques to protect against soft errors
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weâve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere â even offline. Perfect for commutes or when youâre on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Architecture Design for Soft Errors by Shubu Mukherjee in PDF and/or ePUB format, as well as other popular books in Computer Science & Computer Engineering. We have over one million books available in our catalogue for you to explore.
Information
CHAPTER 1
Introduction
1.1 Overview
In the past few decades, the exponential growth in the number of transistors per chip has brought tremendous progress in the performance and functionality of semiconductor devices and, in particular, microprocessors. In 1965, Intel Corporationâs cofounder, Gordon Moore, predicted that the number of transistors per chip will double every 18â24 months. The first Intel microprocessor with 2200 transistors was developed in 1971, 24 years after the invention of the transistor by John Bardeen, Walter Brattain, and William Shockley in Bell Labs. Thirty-five years later, in 2006, Intel announced its first billion-transistor ItaniumÂŽ microprocessorâcodenamed Montecitoâwith approximately 1.72 billion transistors. This exponential growth in the number of transistorsâpopularly known as Mooreâs lawâhas fueled the growth of the semiconductor industry for the past four decades.
Each succeeding technology generation has, however, introduced new obstacles to maintaining this exponential growth rate in the number of transistors per chip. Packing more and more transistors on a chip requires printing ever-smaller features. This led the industry to change lithographyâthe technology used to print circuits onto computer chipsâmultiple times. The performance of off-chip dynamic random access memories (DRAM) compared to microprocessors started slowing down resulting in the âmemory wallâ problem. This led to faster DRAM technologies, as well as to adoption of higher level architectural solutions, such as prefetching and multithreading, which allow a microprocessor to tolerate longer latency memory operations. Recently, the power dissipation of semiconductor chips started reaching astronomical proportions, signaling the arrival of the âpower wall.â This caused manufacturers to pay special attention to reducing power dissipation via innovation in process technology as well as in architecture and circuit design. In this series of challenges, transient faults from alpha particles and neutrons are next in line. Some refer to this as the âsoft error wall.â
Radiation-induced transient faults arise from energetic particles, such as alpha particles from packaging material and neutrons from the atmosphere, generating electronâhole pairs (directly or indirectly) as they pass through a semiconductor device. Transistor source and diffusion nodes can collect these charges. A sufficient amount of accumulated charge may invert the state of a logic device, such as a latch, static random access memory (SRAM) cell, or gate, thereby introducing a logical fault into the circuitâs operation. Because this type of fault does not reflect a permanent malfunction of the device, it is termed soft or transient.
This book describes architectural techniques to tackle the soft error problem. Computer architecture has long coped with various types of faults, including faults induced by radiation. For example, error correction codes (ECC) are commonly used in memory systems. High-end systems have often used redundant copies of hardware to detect faults and recover from errors. Many of these solutions have, however, been prohibitively expensive and difficult to justify in the mainstream commodity computing market.
The necessity to find cheaper reliability solutions has driven a whole new class of quantitative analysis of soft errors and corresponding solutions that mitigate their effects. This book covers the new methodologies for quantitative analysis of soft errors and novel cost-effective architectural techniques to mitigate them. This book also reevaluates traditional architectural solutions in the context of the new quantitative analysis. To provide readers with a better grasp of the broader problem definition and solution space, this book also delves into the physics of soft errors and reviews current circuit and software mitigation techniques.
Specifically, this chapter provides a general introduction to and necessary background for radiation-induced soft errors, which is the topic of this book. The chapter reviews basic terminologies, such as faults and errors, and dependability models and describes basic types of permanent and transient faults encountered in silicon chips. Readers not interested in a broad overview of permanent faults could skip that section. The chapter will go into the details of the physics of how alpha particles and neutrons cause a transient fault. Finally, this chapter reviews architectural models of soft errors and corresponding trends in soft error rates (SERs).
1.1.1 Evidence of Soft Errors
The first report on soft errors due to alpha particle contamination in computer chips was from Intel Corporation in 1978. Intel was unable to deliver its chips to AT&T, which had contracted to use Intel components to convert its switching system from mechanical relays to integrated circuits. Eventually, Intelâs May and Woods traced the problem to their chip packaging modules. These packaging modules got contaminated with uranium from an old uranium mine located upstream on Coloradoâs Green River from the new ceramic factory that made these modules. In their 1979 landmark paper, May and Woods [15] described Intelâs problem with alpha particle contamination. The authors introduced the key concept of Qcrit or âcritical charge,â which must be overcome by the accumulated charge generated by the particle strike to introduce the fault into the circuitâs operation. Subsequently, IBM Corporation faced a similar problem of radioactive contamination in its chips from 1986 to 1987. Eventually, IBM traced the problem to a distant chemical plant, which used a radioactive contaminant to clean the bottles that stored an acid required in the chip manufacturing process.
The first report on soft errors due to cosmic radiation in computer chips came in 1984 but remained within IBM Corporation [30]. In 1979, Ziegler and Lanford predicted the occurrence of soft errors due to cosmic radiation at terrestrial sites and aircraft altitudes [29]. Because it was difficult to isolate errors specifically from cosmic radiation, Ziegler and Lanfordâs prediction was treated with skepticism. Then, the duo postulated that such errors would increase with altitude, thereby providing a unique signature for soft errors due to cosmic radiation. IBM validated this hypothesis from the data gathered from its computer repair logs. Subsequently, in 1996, Normand reported a number of incidents of cosmic ray strikes by studying error logs of several large computer systems [17].
In 1995, Baumann et al. [4] observed a new kind of soft errors caused by boron-10 isotopes, which were activated by low-energy atmospheric neutrons. This discovery prompted the removal of boro-phospho-silicate glass (BPSG) and boron-10 isotopes from the manufacturing process, th...
Table of contents
- Cover
- Title page
- Table of Contents
- Copyright
- Dedication
- Foreword
- Preface
- Chapter 1: Introduction
- Chapter 2: Device- and Circuit-Level Modeling, Measurement, and Mitigation
- Chapter 3: Architectural Vulnerability Analysis
- Chapter 4: Advanced Architectural Vulnerability Analysis
- Chapter 5: Error Coding Techniques
- Chapter 6: Fault Detection via Redundant Execution
- Chapter 7: Hardware Error Recovery
- Chapter 8: Software Detection and Recovery
- Index