Distributed Algorithms
eBook - ePub

Distributed Algorithms

Nancy A. Lynch

Share book
  1. 904 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Distributed Algorithms

Nancy A. Lynch

Book details
Book preview
Table of contents
Citations

About This Book

In Distributed Algorithms, Nancy Lynch provides a blueprint for designing, implementing, and analyzing distributed algorithms. She directs her book at a wide audience, including students, programmers, system designers, and researchers.

Distributed Algorithms contains the most significant algorithms and impossibility results in the area, all in a simple automata-theoretic setting. The algorithms are proved correct, and their complexity is analyzed according to precisely defined complexity measures. The problems covered include resource allocation, communication, consensus among distributed processes, data consistency, deadlock detection, leader election, global snapshots, and many others.

The material is organized according to the system modelā€”first by the timing model and then by the interprocess communication mechanism. The material on system models is isolated in separate chapters for easy reference.

The presentation is completely rigorous, yet is intuitive enough for immediate comprehension. This book familiarizes readers with important problems, algorithms, and impossibility results in the area: readers can then recognize the problems when they arise in practice, apply the algorithms to solve them, and use the impossibility results to determine whether problems are unsolvable. The book also provides readers with the basic mathematical tools for designing new algorithms and proving new impossibility results. In addition, it teaches readers how to reason carefully about distributed algorithmsā€”to model them formally, devise precise specifications for their required behavior, prove their correctness, and evaluate their performance with realistic measures.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Distributed Algorithms an online PDF/ePUB?
Yes, you can access Distributed Algorithms by Nancy A. Lynch in PDF and/or ePUB format, as well as other popular books in Computer Science & Databases. We have over one million books available in our catalogue for you to explore.

Information

Year
1996
ISBN
9780080504704
Chapter 1 Introduction

1.1 The Subject Matter

The term distributed algorithms covers a large variety of concurrent algorithms used for a wide range of applications. Originally, this term was used to refer to algorithms that were designed to run on many processors ā€œdistributedā€ over a large geographical area. But over the years, the usage of this term has been broadened, so that it now includes algorithms that run on local area networks and even algorithms for shared memory multiprocessors. This has happened because it has become recognized that the algorithms used in these various settings have a great deal in common.
Distributed algorithms arise in many applications, including telecommunications, distributed information processing, scientific computing, and real-time process control. An important part of the job of building a system for any of these applications is the design, implementation, and analysis of distributed algorithms. The algorithms that arise, and the problems that they are designed to solve, form the subject matter of the field of study covered in this book.
There are many different kinds of distributed algorithms. Some of the attributes by which they differ include
  • The interprocess communication (IPC) method: Distributed algorithms run on a collection of processors, which need to communicate somehow. Some common methods of communication include accessing shared memory, sending point-to-point or broadcast messages (either over a long distance or local area network), and executing remote procedure calls.
  • The timing model: Several different assumptions can be made about the timing of events in the system, reflecting the different types of timing information that might be used by algorithms. At one extreme, processors can be completely synchronous, performing communication and computation in perfect lock-step synchrony. At the other extreme, they can be completely asynchronous, taking steps at arbitrary speeds and in arbitrary orders. In between, there are a wide range of possible assumptions that can be grouped together under the designation partially synchronous; in all of these cases, processors have partial information about the timing of events. For example, processors might have bounds on their relative speeds or might have access to approximately synchronized clocks.
  • The failure model: The hardware upon which an algorithm runs might be assumed to be completely reliable. Or, the algorithm might need to tolerate some limited amount of faulty behavior. Such faulty behavior can include processor failures: processors might just stop, with or without warning; might fail transiently; or might exhibit more severe Byzantine failures, where a failed processor can behave arbitrarily. Faulty behavior can also include failures of the communication mechanisms, including message loss or duplication.
  • The problems addressed: Of course, the algorithms also differ in the problems that they are supposed to solve. The typical problems that are considered are those that arise in the application areas mentioned above. They include problems of resource allocation, communication, consensus among distributed processors, database concurrency control, deadlock detection, global snapshots, synchronization, and implementation of various types of objects.
Some kinds of concurrent algorithms, such as Parallel Random Access Machine (PRAM) algorithms and algorithms for fixed-connection networks (for example, arrays, trees, and hypercubes), are not covered in this book. The algorithms presented here are distinguished within the larger class of concurrent algorithms by having a higher degree of uncertainty and more independence of activities. Some of the types of uncertainty and independence that the algorithms in this book must contend with include
  • unknown number of processors
  • unknown network topology
  • independent inputs at different locations
  • several programs executing at once, starting at different times, and operating at different speeds
  • processor nondeterminism
  • uncertain message delivery times
  • unknown message ordering
  • processor and communication failures
Fortunately, not every algorithm has to contend with all of these types of uncertainty!
Because of all this uncertainty, the behavior of distributed algorithms is often quite difficult to understand. Even though the code for an algorithm may be short, the fact that many processors execute the code in parallel, with steps interleaved in some undetermined way, implies that there are many different ways in which the algorithm can behave, even for the same inputs. Thus, it is generally impossible to understand the algorithm by predicting exactly how it will execute. This can be contrasted with other kinds of parallel algorithms such as PRAM algorithms, for which we can often predict exactly what the algorithm will do at every point in time. For a distributed algorithm, instead of understanding everything about its behavior, the best that we usually can do is to understand certain selected properties of its behavior.
The study of distributed algorithms has developed over the past 15 years into a fairly coherent field. The general style of work in this field is more or less as follows. First, problems of significance in practical distributed computing are identified, and abstract versions of these problems, suitable for mathematical study, are defined. Then, algorithms that solve the problems are developed. These are described precisely and proved to solve the stated problems, and their complexity, according to various measures, is analyzed. Designers of such algorithms typically try to minimize their complexity. Also, impossibility results and lower bounds are proved, demonstrating limitations on what problems can be solved and with what costs. Underlying all of this work are mathematical models for distributed systems.
These results comprise a very interesting mathematical theory. But they are more than a mathematical theory: the problem statements can be used to formulate specifications for portions of real systems, the algorithms can (in many cases) be engineered for practical use, and the impossibility results can help to tell designers when to stop trying to build something. All of these results, as well as the underlying mathematical models, can provide designers with assistance in understanding the systems they build.

1.2 Our Viewpoint

This book contains a study of the field of distributed algorithms. Because this field is so large and active, we cannot give an exhaustive study. Since we have had to select, we have tried to choose the most fundamental results in the area, both theoretically and practically speaking. These are not always the optimal results in terms of the complexity measures; instead, we have favored those that are simple and that illustrate important general methods of design or reasoning. The results we present involve a small number of problems that are typical of this area, including leader election, network searching, spanning tree construction, distributed consensus, mutual exclusion, resource allocation, construction of objects, synchronization, global snapshots, and reliable communication. These problems recur in many different applications. We consider the same problems in several different system models.
One feature of this book is that we present all the algorithms, impossibility results, and lower bounds in terms of a more or less unified formal framework. This framework consists of a small number of formal, automata-theoretic models for various types of distributed systems, together with some standard ways of reasoning about systems using the models. Our framework is automata-theoretic, rather than being based on any particular formal language or formal proof logic; this allows us to present results in terms of basic set-theoretic mathematics without worrying too much about language details. It also allows flexibility, in that a variety of languages and logics could be used to describe and reason about algorithms in the same framework. Using a formal framework permits a rigorous treatment of all the results.
Some more remarks about rigor are in order. A rigorous treatment is especially important in the area of distributed algorithms because of the many subtle complications that arise. Without such care, it is difficult to avoid mistakes. However, it is not clear how we could make a completely rigorous presentation both reasonably short and intuitively understandable. In this book, we compromise and use a mixture of intuitive and rigorous reasoning. Namely, we give precise descriptions of appropriate formal models. We sometimes give precise descriptions of algorithms in terms of the formal models, sometimes English descriptions, and sometimes both. The degree of rigor in correctness arguments for algorithms varies greatly: sometimes we give rather formal proofs and sometimes only intuitive sketches. We hope, however, that we have provided enough tools for you to expand our intuitive sketches into formal proofs when you want to. We generally present impossibility arguments rather rigorously, in terms of the formal models.
Because there are so many different settings and problems to consider, it is not obvious how best to organize the presentation of the material. We have chosen to organize it primarily according to the formal modelsā€”in particular, according to those aspects of the models that seem to make the most difference in the results, and secondarily by abstract problem statements. The deepest distinctions among the models seem to be based on timing assumptions, but IPC mechanisms and failure assumptions are also important factors.
The timing models we consider are the following.
  • The synchronous model: This is the simplest model to describe, to program, and to reason about. We assume that components take steps simultaneously, that is, that execution proceeds in synchronous rounds. Of course, this is not what actually happens in most distributed systems, but the synchronous model can be useful anyway. Understanding how to solve a problem in the synchronous model is often a useful intermediate step toward understanding how to solve it in more realistic models. For example, it is sometimes possible for a real distributed system to ā€œsimulateā€ a synchronous system. Also, impossibility results for the synchronous model carry over directly to less well-behaved models. On the other hand, it is impossible or inefficient to implement the synchronous model in many types of distributed systems.
  • The asynchronous model: Here we assume that the separate components take steps in an arbitrary order, at arbitrary relative speeds. This model is also reasonably simple to describe, although there are a few subtleties, mainly involving liveness considerations. It is harder to program than the synchronous model because of the extra uncertainty in the order of events. However, the asynchronous model does allow the programmer to ignore specific timing considerations. Since the asynchronous model assumes less about time than is guaranteed by typical distributed systems, algorithms designed for the asynchronous model are general and portable: they are guaranteed to run correctly in networks with arbitrary timing guarantees. On the other hand, the asynchronous model sometimes does not provide enough power to solve problems efficiently, or even to solve them at all.
  • The partially synchronous (timing-based) model: Here we assume some restrictions on the relative timing of events, but execution is not completely lock-step as it is in the synchronous model. These models are the most realistic, but they are also the most difficult to program. Algorithms designed using knowledge of the timing of events can be efficient, but they can also be fragile in that they will not run correctly if the timing assumptions are violated.
The next basis we use for classification is the IPC mechanism. In this book, we consider both shared memory and message passing. We present the shared memory model first, because it is more powerful and simpler to understand, and because many of the techniques and results for the shared memory setting can be adapted for use in the network setting. Next, we organize the material according to the problem studied. And finally, we study many of the problems under different failure assumptions. You should see, as we present the same problems in a variety of different models, that apparently minor differences in assumptions can make a big difference in the results. We have tried to identify and highlight such differences.
We have tried to make our presentation as modular as possible by composing algorithms to obtain other algorithms, by developing algorithms using levels of abstraction, and by transforming algorithms for one model into algorithms for other models. This helps greatly to reduce the complexity of the ideas and allows us to accomplish more with less work. The same kinds of modularity can serve the same purposes in practical distributed system design.

1.3 Overview of Chapters 2ā€“25

The specific topics that this book covers are as ...

Table of contents