High Performance Parallel Runtimes
eBook - ePub

High Performance Parallel Runtimes

Design and Implementation

Michael Klemm, Jim Cownie

Buch teilen
  1. 356 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

High Performance Parallel Runtimes

Design and Implementation

Michael Klemm, Jim Cownie

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

This book focuses on the theoretical and practical aspects of parallel programming systems for today's high performance multi-core processors and discusses the efficient implementation of key algorithms needed to implement parallel programming models. Such implementations need to take into account the specific architectural aspects of the underlying computer architecture and the features offered by the execution environment.This book briefly reviews key concepts of modern computer architecture, focusing particularly on the performance of parallel codes as well as the relevant concepts in parallel programming models. The book then turns towards the fundamental algorithms used to implement the parallel programming models and discusses how they interact with modern processors.

While the book will focus on the general mechanisms, we will mostly use the Intel processor architecture to exemplify the implementation concepts discussed but will present other processor architectures where appropriate. All algorithms and concepts are discussed in an easy to understand way with many illustrative examples, figures, and source code fragments.The target audience of the book is students in Computer Science who are studying compiler construction, parallel programming, or programming systems. Software developers who have an interest in the core algorithms used to implement a parallel runtime system, or who need to educate themselves for projects that require the algorithms and concepts discussed in this book will also benefit from reading it.

You can find the source code for this book at https://github.com/parallel-runtimes/lomp.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist High Performance Parallel Runtimes als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu High Performance Parallel Runtimes von Michael Klemm, Jim Cownie im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Informatik & Systemarchitektur. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Jahr
2021
ISBN
9783110632897

1 Setting the stage

Today’s world is a parallel world. Parallelism is ubiquitous. From the smallest devices, like processors that enable the Internet of Things, to the largest supercomputers, almost all devices now provide an execution environment with multiple processing elements. Thus, they require programmers to write parallel code that can exploit the parallelism available in the hardware. This ubiquity also means that it is necessary to implement runtime environments that support such parallel programs. In this book, we discuss the issues involved in building parallel runtime systems so that most (application) programmers don’t have to worry about the complicated low-level details of parallel programming, but rather mainly concern themselves with the, unfortunately still complicated, higher-level issues!
We will cover the fundamental building blocks on which a parallel programming language relies, and discuss how they interact with modern machine architectures to help you understand how to provide high-performance implementations of these building blocks. Obviously, this also requires that you understand:
  • What the sensible performance measures for each construct are.
  • What the theoretical limits of performance are, given the properties of the underlying hardware.
  • How to measure the performance of both the hardware and the code.
  • How to use measurements of the hardware properties to design software that will perform well.
Throughout the book, we will show some interesting effects of the way modern processors are designed and the (performance) pitfalls that await programmers who have to reason about low-level machine details when they are implementing a high-performance parallel runtime system. You will see that there are some counterintuitive conclusions that will most certainly direct your thoughts about machine performance, but also implementation decisions, in the wrong direction.

1.1 Structure of the book

To better understand the structure of the book, please have a look at Figure 1.1. It shows the typical layers of a parallel runtime system. The application code sits atop the parallel runtime library that implements the key functionality to support the parallelism in the application. The parallel runtime usually relies on a native library that supplies the concept of threading via the operating system (e. g., the POSIX* thread library pthreads [34]). The lowest level in the stack is the multi-core processor that executes the code of the parallel runtime system and the application. In many cases, both the threading library and the parallel runtime system will use functionality provided by the multi-core processor for improved efficiency.
Figure 1.1 Layers of a parallel runtime system.
The remainder of book generally approaches the topic from the top to the bottom. We start with the layer that is accessible to the (application) programmer: the parallel programming model. We discuss some of the design choices and how they affect the general structure of the software stack that implements a parallel programming model. Chapter 2 briefly introduces some of the key concepts of parallel programming models for this book. Don’t be disappointed that this is not going to be an in-depth introduction to parallel programming, but rather assumes that you have a basic familiarity with parallel programming already and only scratches the surfaces of this topic.
Chapter 3 describes the basics of multi-core architectures. While this is, for sure, the lowest level (even below the software stack!), covering the machine-level details early on in the book seems useful, as some of the implementation choices and algorithms that we will present are clearly motivated by how modern processors work and how they behave when they are executing a parallel application.
In Chapter 4, we explain how the parallel programming model interacts with the runtime system via the compiler and the runtime entry points. Chapter 5 discusses some cross-cutting aspects that are usually needed in a parallel runtime system, like how to manage parallelism or how to do memory management.
Chapter 6 through Chapter 9 cover the details! These chapters dive deep into the specific aspects and the implementation of key concepts like mutual exclusion, atomic operations, barriers, reductions, and task pools. All of these chapters focus on how the implemented algorithms interact with the machine and what effect they cause in a modern processor. This should provide a clear picture about what you, as a low-level ninja programmer to be, will have to understand to be able to extract parallel performance. Ideally, this will lead to a much better use of the expensive machine that executes your parallel runtime system (and the parallel application on top of it).

1.2 Design space exploration

Before you can think about implementing a parallel runtime system, you have to think about what the parallel programming model should look like. This is, of course, if you start from scratch. If your task is to implement an existing programming model, your options are somewhat more limited, as now the programmer-facing parts of the model are defined, though you may still have some flexibility about what the internals of your implementations will look like.
One of the main questions for the implementer of a parallel programming model is whether your programming model should be implemented as a library that provides an application programming interface (API) or as part of the programming language itself (or an extension of it). To make things even more complicated, you could also think of a hybrid model where parts of the model are expressed in the language while others are covered through API routines. Figure 1.2 shows the three categories and gives a few examples of well-known parallel programming models. Figure 1.3 shows a different categorization by the parallel architecture that is targeted by these programming models.
As you may imagine, each of these designs can bring some benefits, but at the same time these may come at price—that is, the design may have drawbacks with respect to the alternative implementation of the parallel programming model. Here, we review the two main design choices and discuss their benefits and drawbacks.

1.2.1 Parallelism as a library

Injecting parallelism by using a library seems like an obvious choice. Since most programming languages support libraries, you can potentially perform parallel programming from any programming language. One particularly good example of this is the POSIX thread library pthreads, which brings multi-threading to C and other languages on POSIX-compatible systems, e. g., the GNU/Linux* operating system. Another example is Intel* Threading Building Blocks [151], which adds task-based parallelism to the C++ language.
Figure 1.2 Paradigms for parallel programming models.
Figure 1.3 Parallel programming models by categorized by memory architecture.
So, what’s wrong with this idea? The main issue with using an API-only approach to parallelism is that the compiler is not normally aware of the special meaning of the calls to the library that are creating parallelism, and therefore has to treat the API routines as black boxes whose content is hidden. Historically, this issue was exposed because the C language did not define a memory model (see Section 3.2.2 for a discussion of memory models). Instead, the compiler assumed that ...

Inhaltsverzeichnis