eBook - ePub

High Performance Parallel Runtimes

Name: High Performance Parallel Runtimes
Author: Michael Klemm, Jim Cownie

Design and Implementation

Michael Klemm, Jim Cownie

Condividi libro

356 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

High Performance Parallel Runtimes

Design and Implementation

Michael Klemm, Jim Cownie

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Informazioni sul libro

This book focuses on the theoretical and practical aspects of parallel programming systems for today's high performance multi-core processors and discusses the efficient implementation of key algorithms needed to implement parallel programming models. Such implementations need to take into account the specific architectural aspects of the underlying computer architecture and the features offered by the execution environment.This book briefly reviews key concepts of modern computer architecture, focusing particularly on the performance of parallel codes as well as the relevant concepts in parallel programming models. The book then turns towards the fundamental algorithms used to implement the parallel programming models and discusses how they interact with modern processors.

While the book will focus on the general mechanisms, we will mostly use the Intel processor architecture to exemplify the implementation concepts discussed but will present other processor architectures where appropriate. All algorithms and concepts are discussed in an easy to understand way with many illustrative examples, figures, and source code fragments.The target audience of the book is students in Computer Science who are studying compiler construction, parallel programming, or programming systems. Software developers who have an interest in the core algorithms used to implement a parallel runtime system, or who need to educate themselves for projects that require the algorithms and concepts discussed in this book will also benefit from reading it.

You can find the source code for this book at https://github.com/parallel-runtimes/lomp.

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

High Performance Parallel Runtimes è disponibile online in formato PDF/ePub?

Sì, puoi accedere a High Performance Parallel Runtimes di Michael Klemm, Jim Cownie in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Informatik e Systemarchitektur. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

De Gruyter Oldenbourg

Anno

2021

ISBN

9783110632897

Edizione

Argomento

Informatik

Categoria

Systemarchitektur

1 Setting the stage

Today’s world is a parallel world. Parallelism is ubiquitous. From the smallest devices, like processors that enable the Internet of Things, to the largest supercomputers, almost all devices now provide an execution environment with multiple processing elements. Thus, they require programmers to write parallel code that can exploit the parallelism available in the hardware. This ubiquity also means that it is necessary to implement runtime environments that support such parallel programs. In this book, we discuss the issues involved in building parallel runtime systems so that most (application) programmers don’t have to worry about the complicated low-level details of parallel programming, but rather mainly concern themselves with the, unfortunately still complicated, higher-level issues!

We will cover the fundamental building blocks on which a parallel programming language relies, and discuss how they interact with modern machine architectures to help you understand how to provide high-performance implementations of these building blocks. Obviously, this also requires that you understand:

What the sensible performance measures for each construct are.
What the theoretical limits of performance are, given the properties of the underlying hardware.
How to measure the performance of both the hardware and the code.
How to use measurements of the hardware properties to design software that will perform well.

Throughout the book, we will show some interesting effects of the way modern processors are designed and the (performance) pitfalls that await programmers who have to reason about low-level machine details when they are implementing a high-performance parallel runtime system. You will see that there are some counterintuitive conclusions that will most certainly direct your thoughts about machine performance, but also implementation decisions, in the wrong direction.

1.1 Structure of the book

To better understand the structure of the book, please have a look at Figure 1.1. It shows the typical layers of a parallel runtime system. The application code sits atop the parallel runtime library that implements the key functionality to support the parallelism in the application. The parallel runtime usually relies on a native library that supplies the concept of threading via the operating system (e. g., the POSIX^* thread library pthreads [34]). The lowest level in the stack is the multi-core processor that executes the code of the parallel runtime system and the application. In many cases, both the threading library and the parallel runtime system will use functionality provided by the multi-core processor for improved efficiency.

Figure 1.1 Layers of a parallel runtime system.

The remainder of book generally approaches the topic from the top to the bottom. We start with the layer that is accessible to the (application) programmer: the parallel programming model. We discuss some of the design choices and how they affect the general structure of the software stack that implements a parallel programming model. Chapter 2 briefly introduces some of the key concepts of parallel programming models for this book. Don’t be disappointed that this is not going to be an in-depth introduction to parallel programming, but rather assumes that you have a basic familiarity with parallel programming already and only scratches the surfaces of this topic.

Chapter 3 describes the basics of multi-core architectures. While this is, for sure, the lowest level (even below the software stack!), covering the machine-level details early on in the book seems useful, as some of the implementation choices and algorithms that we will present are clearly motivated by how modern processors work and how they behave when they are executing a parallel application.

In Chapter 4, we explain how the parallel programming model interacts with the runtime system via the compiler and the runtime entry points. Chapter 5 discusses some cross-cutting aspects that are usually needed in a parallel runtime system, like how to manage parallelism or how to do memory management.

Chapter 6 through Chapter 9 cover the details! These chapters dive deep into the specific aspects and the implementation of key concepts like mutual exclusion, atomic operations, barriers, reductions, and task pools. All of these chapters focus on how the implemented algorithms interact with the machine and what effect they cause in a modern processor. This should provide a clear picture about what you, as a low-level ninja programmer to be, will have to understand to be able to extract parallel performance. Ideally, this will lead to a much better use of the expensive machine that executes your parallel runtime system (and the parallel application on top of it).

1.2 Design space exploration

Before you can think about implementing a parallel runtime system, you have to think about what the parallel programming model should look like. This is, of course, if you start from scratch. If your task is to implement an existing programming model, your options are somewhat more limited, as now the programmer-facing parts of the model are defined, though you may still have some flexibility about what the internals of your implementations will look like.

One of the main questions for the implementer of a parallel programming model is whether your programming model should be implemented as a library that provides an application programming interface (API) or as part of the programming language itself (or an extension of it). To make things even more complicated, you could also think of a hybrid model where parts of the model are expressed in the language while others are covered through API routines. Figure 1.2 shows the three categories and gives a few examples of well-known parallel programming models. Figure 1.3 shows a different categorization by the parallel architecture that is targeted by these programming models.

As you may imagine, each of these designs can bring some benefits, but at the same time these may come at price—that is, the design may have drawbacks with respect to the alternative implementation of the parallel programming model. Here, we review the two main design choices and discuss their benefits and drawbacks.

1.2.1 Parallelism as a library

Injecting parallelism by using a library seems like an obvious choice. Since most programming languages support libraries, you can potentially perform parallel programming from any programming language. One particularly good example of this is the POSIX thread library pthreads, which brings multi-threading to C and other languages on POSIX-compatible systems, e. g., the GNU/Linux^* operating system. Another example is Intel^* Threading Building Blocks [151], which adds task-based parallelism to the C++ language.

Figure 1.2 Paradigms for parallel programming models.

Figure 1.3 Parallel programming models by categorized by memory architecture.

So, what’s wrong with this idea? The main issue with using an API-only approach to parallelism is that the compiler is not normally aware of the special meaning of the calls to the library that are creating parallelism, and therefore has to treat the API routines as black boxes whose content is hidden. Historically, this issue was exposed because the C language did not define a memory model (see Section 3.2.2 for a discussion of memory models). Instead, the compiler assumed that ...