eBook - ePub

High Performance Parallel Runtimes

Name: High Performance Parallel Runtimes
Author: Michael Klemm, Jim Cownie

Design and Implementation

Michael Klemm, Jim Cownie

Compartir libro

356 páginas
English
ePUB (apto para móviles)
Disponible en iOS y Android

eBook - ePub

High Performance Parallel Runtimes

Design and Implementation

Michael Klemm, Jim Cownie

Detalles del libro

Vista previa del libro

Índice

Citas

Información del libro

This book focuses on the theoretical and practical aspects of parallel programming systems for today's high performance multi-core processors and discusses the efficient implementation of key algorithms needed to implement parallel programming models. Such implementations need to take into account the specific architectural aspects of the underlying computer architecture and the features offered by the execution environment.This book briefly reviews key concepts of modern computer architecture, focusing particularly on the performance of parallel codes as well as the relevant concepts in parallel programming models. The book then turns towards the fundamental algorithms used to implement the parallel programming models and discusses how they interact with modern processors.

While the book will focus on the general mechanisms, we will mostly use the Intel processor architecture to exemplify the implementation concepts discussed but will present other processor architectures where appropriate. All algorithms and concepts are discussed in an easy to understand way with many illustrative examples, figures, and source code fragments.The target audience of the book is students in Computer Science who are studying compiler construction, parallel programming, or programming systems. Software developers who have an interest in the core algorithms used to implement a parallel runtime system, or who need to educate themselves for projects that require the algorithms and concepts discussed in this book will also benefit from reading it.

You can find the source code for this book at https://github.com/parallel-runtimes/lomp.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?

Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.

¿Cómo descargo los libros?

Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.

¿En qué se diferencian los planes de precios?

Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.

¿Qué es Perlego?

Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.

¿Perlego ofrece la función de texto a voz?

Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.

¿Es High Performance Parallel Runtimes un PDF/ePUB en línea?

Sí, puedes acceder a High Performance Parallel Runtimes de Michael Klemm, Jim Cownie en formato PDF o ePUB, así como a otros libros populares de Informatik y Systemarchitektur. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial

De Gruyter Oldenbourg

Año

2021

ISBN

9783110632897

Edición

Categoría

Informatik

Categoría

Systemarchitektur

1 Setting the stage

Today’s world is a parallel world. Parallelism is ubiquitous. From the smallest devices, like processors that enable the Internet of Things, to the largest supercomputers, almost all devices now provide an execution environment with multiple processing elements. Thus, they require programmers to write parallel code that can exploit the parallelism available in the hardware. This ubiquity also means that it is necessary to implement runtime environments that support such parallel programs. In this book, we discuss the issues involved in building parallel runtime systems so that most (application) programmers don’t have to worry about the complicated low-level details of parallel programming, but rather mainly concern themselves with the, unfortunately still complicated, higher-level issues!

We will cover the fundamental building blocks on which a parallel programming language relies, and discuss how they interact with modern machine architectures to help you understand how to provide high-performance implementations of these building blocks. Obviously, this also requires that you understand:

What the sensible performance measures for each construct are.
What the theoretical limits of performance are, given the properties of the underlying hardware.
How to measure the performance of both the hardware and the code.
How to use measurements of the hardware properties to design software that will perform well.

Throughout the book, we will show some interesting effects of the way modern processors are designed and the (performance) pitfalls that await programmers who have to reason about low-level machine details when they are implementing a high-performance parallel runtime system. You will see that there are some counterintuitive conclusions that will most certainly direct your thoughts about machine performance, but also implementation decisions, in the wrong direction.

1.1 Structure of the book

To better understand the structure of the book, please have a look at Figure 1.1. It shows the typical layers of a parallel runtime system. The application code sits atop the parallel runtime library that implements the key functionality to support the parallelism in the application. The parallel runtime usually relies on a native library that supplies the concept of threading via the operating system (e. g., the POSIX^* thread library pthreads [34]). The lowest level in the stack is the multi-core processor that executes the code of the parallel runtime system and the application. In many cases, both the threading library and the parallel runtime system will use functionality provided by the multi-core processor for improved efficiency.

Figure 1.1 Layers of a parallel runtime system.

The remainder of book generally approaches the topic from the top to the bottom. We start with the layer that is accessible to the (application) programmer: the parallel programming model. We discuss some of the design choices and how they affect the general structure of the software stack that implements a parallel programming model. Chapter 2 briefly introduces some of the key concepts of parallel programming models for this book. Don’t be disappointed that this is not going to be an in-depth introduction to parallel programming, but rather assumes that you have a basic familiarity with parallel programming already and only scratches the surfaces of this topic.

Chapter 3 describes the basics of multi-core architectures. While this is, for sure, the lowest level (even below the software stack!), covering the machine-level details early on in the book seems useful, as some of the implementation choices and algorithms that we will present are clearly motivated by how modern processors work and how they behave when they are executing a parallel application.

In Chapter 4, we explain how the parallel programming model interacts with the runtime system via the compiler and the runtime entry points. Chapter 5 discusses some cross-cutting aspects that are usually needed in a parallel runtime system, like how to manage parallelism or how to do memory management.

Chapter 6 through Chapter 9 cover the details! These chapters dive deep into the specific aspects and the implementation of key concepts like mutual exclusion, atomic operations, barriers, reductions, and task pools. All of these chapters focus on how the implemented algorithms interact with the machine and what effect they cause in a modern processor. This should provide a clear picture about what you, as a low-level ninja programmer to be, will have to understand to be able to extract parallel performance. Ideally, this will lead to a much better use of the expensive machine that executes your parallel runtime system (and the parallel application on top of it).

1.2 Design space exploration

Before you can think about implementing a parallel runtime system, you have to think about what the parallel programming model should look like. This is, of course, if you start from scratch. If your task is to implement an existing programming model, your options are somewhat more limited, as now the programmer-facing parts of the model are defined, though you may still have some flexibility about what the internals of your implementations will look like.

One of the main questions for the implementer of a parallel programming model is whether your programming model should be implemented as a library that provides an application programming interface (API) or as part of the programming language itself (or an extension of it). To make things even more complicated, you could also think of a hybrid model where parts of the model are expressed in the language while others are covered through API routines. Figure 1.2 shows the three categories and gives a few examples of well-known parallel programming models. Figure 1.3 shows a different categorization by the parallel architecture that is targeted by these programming models.

As you may imagine, each of these designs can bring some benefits, but at the same time these may come at price—that is, the design may have drawbacks with respect to the alternative implementation of the parallel programming model. Here, we review the two main design choices and discuss their benefits and drawbacks.

1.2.1 Parallelism as a library

Injecting parallelism by using a library seems like an obvious choice. Since most programming languages support libraries, you can potentially perform parallel programming from any programming language. One particularly good example of this is the POSIX thread library pthreads, which brings multi-threading to C and other languages on POSIX-compatible systems, e. g., the GNU/Linux^* operating system. Another example is Intel^* Threading Building Blocks [151], which adds task-based parallelism to the C++ language.

Figure 1.2 Paradigms for parallel programming models.

Figure 1.3 Parallel programming models by categorized by memory architecture.

So, what’s wrong with this idea? The main issue with using an API-only approach to parallelism is that the compiler is not normally aware of the special meaning of the calls to the library that are creating parallelism, and therefore has to treat the API routines as black boxes whose content is hidden. Historically, this issue was exposed because the C language did not define a memory model (see Section 3.2.2 for a discussion of memory models). Instead, the compiler assumed that ...