eBook - ePub

High Performance Parallelism Pearls Volume Two

Name: High Performance Parallelism Pearls Volume Two
ISBN: 9780128038901

Multicore and Many-core Programming Approaches

Jim Jeffers,

James Reinders,

592 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

High Performance Parallelism Pearls Volume Two

Multicore and Many-core Programming Approaches

Jim Jeffers,

James Reinders,

About this book

High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage parallelism. Similar to Volume 1, the techniques included here explain how to use processors and coprocessors with the same programming – illustrating the most effective ways to combine Xeon Phi coprocessors with Xeon and other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as biomed, genetics, finance, manufacturing, imaging, and more. Each chapter in this edited work includes detailed explanations of the programming techniques used, while showing high performance results on both Intel Xeon Phi coprocessors and multicore processors. Learn from dozens of new examples and case studies illustrating "success stories" demonstrating not just the features of Xeon-powered systems, but also how to leverage parallelism across these heterogeneous systems. - Promotes write-once, run-anywhere coding, showing how to code for high performance on multicore processors and Xeon Phi - Examples from multiple vertical domains illustrating real-world use of Xeon Phi coprocessors - Source code available for download to facilitate further exploration

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Topic

Subtopic

Index

Chapter 1

Introduction

James Reinders; Jim Jeffers Intel Corporation, USA

Abstract

This chapter introduces this book written by 73 experts sharing real-world examples and techniques that led to high-performance applications on multicore and many-core system. All chapters reference actual code and the modifications made to the code to improve performance. Codes discussed are freely available for download (http://lotsofcores.com). All the figures and diagrams from the book are available freely as well, to help facilitate teaching of parallel programming.

Keywords

512-bit SIMD

AVX-512

Coarse-grain

Embree

MPI

MPI shared memory

OSPRay

OpenCL

OpenMP

Python

SIMD

TBB

Xeon Phi

Heterogeneous

Hybrid parallelism

In-order

Latency optimizations

Many-core

Multicore

Nested parallelism

New era in programming

Offloading

Out-of-order

Power savings

Prefetching

pyMIC

Reserved core

Stream programming

Thread-safe

Vectorization

It has become well known that programming for the Intel Xeon Phi coprocessor heightens awareness of the need for scaling, vectorization, and increasing temporal locality of reference—exactly the keys to parallel programming. Once these keys to effective parallel programming are addressed, the result is a parallel program that makes Intel® Xeon Phi™ coprocessors and multicore processors optimized for higher performance. That represents a highly compelling preservation of investment when focus is given to modifying code in a portable and performance portable manner. Unsurprisingly, that means the chapters in this book use C, C++, and Fortran with standard parallel programming models including OpenMP, MPI, TBB, and OpenCL. We see that optimizations improve applications both on processors, such as Intel® Xeon® processors, and Intel Xeon Phi products.

We are not supposed to have a favorite chapter, especially since 73 amazing experts contributed to this second Pearls book. They share compelling lessons in effective parallel programming through both specific application optimizations, and their illustration of key techniques…and we can learn from every one of them. However, we cannot avoid feeling a little like the characters on Big Bang Theory (a popular television show) who get excited by the mere mention of Stephen Hawking. Now, to be very clear, Stephen Hawking did not work on this book. At least, not to our knowledge.

Applications and techniques

The programming topics that receive the most discussion in this book are OpenMP and vectorization, followed closely by MPI. However, there are many more topics which get serious attention including nested parallelism, latency optimizations, prefetching, Python, OpenCL, offloading, stream programming, making code thread-safe, and power savings.

This book does not have distinct sections, but you will find that the first half of the book consists of chapters that dive deeply into optimizing a single application and dealing with the work that is needed to optimize that application for parallelism. The second half of the book switches to chapters that dive into a technique or approach, and illustrate it with a number of different examples.

In all the chapters, the examples were selected for their educational content, applicability, and success. You can download codes and try them yourself! Examples demonstrate successful approaches to parallel programming that have application with both processors and coprocessors. Not all the examples scale well enough to make an Intel Xeon Phi coprocessor run faster than a processor. This is reality we all face in programming, and it reinforces something we should never be bashful in pointing out: a common programming model matters a great deal. The programming is not making a choice on what will run better; it focuses on parallel programming and can use either multicore or many-core products. The techniques utilized almost always apply to both processors and coprocessors. Some chapters utilize nonportable techniques and explain why. The most common use of nonportable programming you will see in this book is focused targeting to 512-bit SIMD, a feature that arrived in Intel Xeon Phi coprocessors before appearing in processors. The strong benefits of common programming emerge over and over in real-life examples, including those in this book.

SIMD and vectorization

Many chapters make code changes in their applications to utilize SIMD capabilities of processors and coprocessors, including Chapters 2–4, 8. There are three additional vectorization focused chapters tackling key techniques or tools that you may find indispensible. The concept of an SIMD function is covered in Chapter 22. SIMD functions allow a program written to operate on scalar (one at a time) data to be vectorized by the appropriate use of OpenMP SIMD directives. A tool to help analyze your vectorization opportunities and give advice is the subject of Chapter 23. An increasingly popular library approach to parallel vector programming, called OpenVec, is covered in Chapter 24.

We do have a really cool chapter that begins with “The best current explanation of how our universe began is with a period of rapid exponential expansion, termed inflation. This created the large, mostly empty, universe that we observe today. The principle piece of evidence for this comes from……the Cosmic Microwave Background (CMB), a microwave frequency background radiation, thought to have been left over from the big bang…”

Who would not be excited by that?

In an attempt to avoid accusations that we have a favorite chapter…… we buried “Cosmic Microwave Background Analysis: Nested Parallelism In Practice” in the middle of the book so it is as far as possible from the cover which features the Cosmos supercomputer that theoretical physicists at the University of Cambridge use. The same book cover that has an OSPRay rendered visualization from the Modal program that they optimize in their chapter (and yes, they do work with Dr. Hawking – but we still are not saying he actually worked on the book!).

OpenMP and nested parallelism

Many chapters make code changes in their applications to harness task or thread level parallelism with OpenMP. Chapter 17 drives home the meaning and value of being more “coarse-grained” in order to scale well. The challenges of making legacy code thread-safe are discussed in some detail in Chapter 5, including discussions of choices that did not work.

Two chapters advocate nested parallelism in OpenMP, and use it to get significant performance gains: Chapters 10 and 18. Exploiting multilevel parallelism deserves consideration even if rejected in the past. OpenMP nesting is turned off by default by most implementations, and is generally consider unsafe by typical user...

Cover image
Title page
Table of Contents
Copyright
Contributors
Acknowledgments
Foreword
Preface
Chapter 1: Introduction
Chapter 2: Numerical Weather Prediction Optimization
Chapter 3: WRF Goddard Microphysics Scheme Optimization
Chapter 4: Pairwise DNA Sequence Alignment Optimization
Chapter 5: Accelerated Structural Bioinformatics for Drug Discovery
Chapter 6: Amber PME Molecular Dynamics Optimization
Chapter 7: Low-Latency Solutions for Financial Services Applications
Chapter 8: Parallel Numerical Methods in Finance
Chapter 9: Wilson Dslash Kernel From Lattice QCD Optimization
Chapter 10: Cosmic Microwave Background Analysis: Nested Parallelism in Practice
Chapter 11: Visual Search Optimization
Chapter 12: Radio Frequency Ray Tracing
Chapter 13: Exploring Use of the Reserved Core
Chapter 14: High Performance Python Offloading
Chapter 15: Fast Matrix Computations on Heterogeneous Streams
Chapter 16: MPI-3 Shared Memory Programming Introduction
Chapter 17: Coarse-Grained OpenMP for Scalable Hybrid Parallelism
Chapter 18: Exploiting Multilevel Parallelism in Quantum Simulations
Chapter 19: OpenCL: There and Back Again
Chapter 20: OpenMP Versus OpenCL: Difference in Performance?
Chapter 21: Prefetch Tuning Optimizations
Chapter 22: SIMD Functions Via OpenMP
Chapter 23: Vectorization Advice
Chapter 24: Portable Explicit Vectorization Intrinsics
Chapter 25: Power Analysis for Applications and Data Centers
Author Index
Subject Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access High Performance Parallelism Pearls Volume Two by Jim Jeffers,James Reinders in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming. We have over one million books available in our catalogue for you to explore.