High Performance Parallel Runtimes
eBook - ePub

High Performance Parallel Runtimes

Design and Implementation

Michael Klemm, Jim Cownie

Share book
  1. 356 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

High Performance Parallel Runtimes

Design and Implementation

Michael Klemm, Jim Cownie

Book details
Book preview
Table of contents
Citations

About This Book

This book focuses on the theoretical and practical aspects of parallel programming systems for today's high performance multi-core processors and discusses the efficient implementation of key algorithms needed to implement parallel programming models. Such implementations need to take into account the specific architectural aspects of the underlying computer architecture and the features offered by the execution environment.This book briefly reviews key concepts of modern computer architecture, focusing particularly on the performance of parallel codes as well as the relevant concepts in parallel programming models. The book then turns towards the fundamental algorithms used to implement the parallel programming models and discusses how they interact with modern processors.

While the book will focus on the general mechanisms, we will mostly use the Intel processor architecture to exemplify the implementation concepts discussed but will present other processor architectures where appropriate. All algorithms and concepts are discussed in an easy to understand way with many illustrative examples, figures, and source code fragments.The target audience of the book is students in Computer Science who are studying compiler construction, parallel programming, or programming systems. Software developers who have an interest in the core algorithms used to implement a parallel runtime system, or who need to educate themselves for projects that require the algorithms and concepts discussed in this book will also benefit from reading it.

You can find the source code for this book at https://github.com/parallel-runtimes/lomp.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is High Performance Parallel Runtimes an online PDF/ePUB?
Yes, you can access High Performance Parallel Runtimes by Michael Klemm, Jim Cownie in PDF and/or ePUB format, as well as other popular books in Informatik & Systemarchitektur. We have over one million books available in our catalogue for you to explore.

Information

Year
2021
ISBN
9783110632897
Edition
1

1 Setting the stage

Today’s world is a parallel world. Parallelism is ubiquitous. From the smallest devices, like processors that enable the Internet of Things, to the largest supercomputers, almost all devices now provide an execution environment with multiple processing elements. Thus, they require programmers to write parallel code that can exploit the parallelism available in the hardware. This ubiquity also means that it is necessary to implement runtime environments that support such parallel programs. In this book, we discuss the issues involved in building parallel runtime systems so that most (application) programmers don’t have to worry about the complicated low-level details of parallel programming, but rather mainly concern themselves with the, unfortunately still complicated, higher-level issues!
We will cover the fundamental building blocks on which a parallel programming language relies, and discuss how they interact with modern machine architectures to help you understand how to provide high-performance implementations of these building blocks. Obviously, this also requires that you understand:
  • What the sensible performance measures for each construct are.
  • What the theoretical limits of performance are, given the properties of the underlying hardware.
  • How to measure the performance of both the hardware and the code.
  • How to use measurements of the hardware properties to design software that will perform well.
Throughout the book, we will show some interesting effects of the way modern processors are designed and the (performance) pitfalls that await programmers who have to reason about low-level machine details when they are implementing a high-performance parallel runtime system. You will see that there are some counterintuitive conclusions that will most certainly direct your thoughts about machine performance, but also implementation decisions, in the wrong direction.

1.1 Structure of the book

To better understand the structure of the book, please have a look at Figure 1.1. It shows the typical layers of a parallel runtime system. The application code sits atop the parallel runtime library that implements the key functionality to support the parallelism in the application. The parallel runtime usually relies on a native library that supplies the concept of threading via the operating system (e. g., the POSIX* thread library pthreads [34]). The lowest level in the stack is the multi-core processor that executes the code of the parallel runtime system and the application. In many cases, both the threading library and the parallel runtime system will use functionality provided by the multi-core processor for improved efficiency.
Figure 1.1 Layers of a parallel runtime system.
The remainder of book generally approaches the topic from the top to the bottom. We start with the layer that is accessible to the (application) programmer: the parallel programming model. We discuss some of the design choices and how they affect the general structure of the software stack that implements a parallel programming model. Chapter 2 briefly introduces some of the key concepts of parallel programming models for this book. Don’t be disappointed that this is not going to be an in-depth introduction to parallel programming, but rather assumes that you have a basic familiarity with parallel programming already and only scratches the surfaces of this topic.
Chapter 3 describes the basics of multi-core architectures. While this is, for sure, the lowest level (even below the software stack!), covering the machine-level details early on in the book seems useful, as some of the implementation choices and algorithms that we will present are clearly motivated by how modern processors work and how they behave when they are executing a parallel application.
In Chapter 4, we explain how the parallel programming model interacts with the runtime system via the compiler and the runtime entry points. Chapter 5 discusses some cross-cutting aspects that are usually needed in a parallel runtime system, like how to manage parallelism or how to do memory management.
Chapter 6 through Chapter 9 cover the details! These chapters dive deep into the specific aspects and the implementation of key concepts like mutual exclusion, atomic operations, barriers, reductions, and task pools. All of these chapters focus on how the implemented algorithms interact with the machine and what effect they cause in a modern processor. This should provide a clear picture about what you, as a low-level ninja programmer to be, will have to understand to be able to extract parallel performance. Ideally, this will lead to a much better use of the expensive machine that executes your parallel runtime system (and the parallel application on top of it).

1.2 Design space exploration

Before you can think about implementing a parallel runtime system, you have to think about what the parallel programming model should look like. This is, of course, if you start from scratch. If your task is to implement an existing programming model, your options are somewhat more limited, as now the programmer-facing parts of the model are defined, though you may still have some flexibility about what the internals of your implementations will look like.
One of the main questions for the implementer of a parallel programming model is whether your programming model should be implemented as a library that provides an application programming interface (API) or as part of the programming language itself (or an extension of it). To make things even more complicated, you could also think of a hybrid model where parts of the model are expressed in the language while others are covered through API routines. Figure 1.2 shows the three categories and gives a few examples of well-known parallel programming models. Figure 1.3 shows a different categorization by the parallel architecture that is targeted by these programming models.
As you may imagine, each of these designs can bring some benefits, but at the same time these may come at price—that is, the design may have drawbacks with respect to the alternative implementation of the parallel programming model. Here, we review the two main design choices and discuss their benefits and drawbacks.

1.2.1 Parallelism as a library

Injecting parallelism by using a library seems like an obvious choice. Since most programming languages support libraries, you can potentially perform parallel programming from any programming language. One particularly good example of this is the POSIX thread library pthreads, which brings multi-threading to C and other languages on POSIX-compatible systems, e. g., the GNU/Linux* operating system. Another example is Intel* Threading Building Blocks [151], which adds task-based parallelism to the C++ language.
Figure 1.2 Paradigms for parallel programming models.
Figure 1.3 Parallel programming models by categorized by memory architecture.
So, what’s wrong with this idea? The main issue with using an API-only approach to parallelism is that the compiler is not normally aware of the special meaning of the calls to the library that are creating parallelism, and therefore has to treat the API routines as black boxes whose content is hidden. Historically, this issue was exposed because the C language did not define a memory model (see Section 3.2.2 for a discussion of memory models). Instead, the compiler assumed that ...

Table of contents