
eBook - ePub
High Performance Parallelism Pearls Volume One
Multicore and Many-core Programming Approaches
- 600 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
High Performance Parallelism Pearls Volume One
Multicore and Many-core Programming Approaches
About this book
High Performance Parallelism Pearls shows how to leverage parallelism on processors and coprocessors with the same programming â illustrating the most effective ways to better tap the computational potential of systems with Intel Xeon Phi coprocessors and Intel Xeon processors or other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as chemistry, engineering, and environmental science. Each chapter in this edited work includes detailed explanations of the programming techniques used, while showing high performance results on both Intel Xeon Phi coprocessors and multicore processors. Learn from dozens of new examples and case studies illustrating "success stories" demonstrating not just the features of these powerful systems, but also how to leverage parallelism across these heterogeneous systems.
- Promotes consistent standards-based programming, showing in detail how to code for high performance on multicore processors and IntelÂŽ Xeon Phiâ˘
- Examples from multiple vertical domains illustrating parallel optimizations to modernize real-world codes
- Source code available for download to facilitate further exploration
Trusted by 375,005 students
Access to over 1 million titles for a fair monthly price.
Study more efficiently using our study tools.
Information
Chapter 1
Introduction
James Reinders Intel Corporation
Abstract
This chapter introduces this book to share the experience of software developers who have written highly scalable code to take advantage of both multicore (Xeon or other) and many-core (Intel Xeon Phi) machines. Such modernization of code can come from concurrent algorithms, vectorization and data locality, managing power usage, and other techniques. The advantages of neo-heterogeneous systems are apparent because the programming techniques used benefit both multicore and many-core devices. Sixty-nine experts contributed to this book so that we can all learn from their experiences.
Keywords
Heterogeneous
Many-core
Multicore
Neo-heterogeneous
Xeon Phi
AVX-512
New era in programming
We should âcreate a cookbookâ was a common and frequent comment that Jim Jeffers and I heard after IntelÂŽ Xeon Phi⢠Coprocessor High-Performance Programming was published. Guillaume Colin de Verdière was early in his encouragement to create such a book and was pleased when we moved forward with this project. Guillaume matched action with words by also coauthoring the first contributed chapter with Jason Sewall, From âcorrectâ to âcorrect & efficientâ: a Hydro2D case study with Godunovâs scheme. Their chapter reflects a basic premise of this book that the sharing of experience and success can be highly educational to others. It also contains a theme familiar to those who program the massive parallelism of the Intel Xeon Phi family: running code on Intel Xeon Phi coprocessors is easy. This lets you quickly focus on optimization and the achievement of high performanceâbut we do need to tune for parallelism in our applications! Notably, we see such optimization work improves the performance on processors and coprocessors. As the authors note, âa rising tide lifts all boats.â
Learning from successful experiences
Learning from others is what this book is all about. This book brings together the collective work of numerous experts in parallel programming to share their work. The examples were selected for their educational content, applicability, and successâandâyou can download the codes and try them yourself! All the examples demonstrate successful approaches to parallel programming, but not all the examples scale well enough to make an Intel Xeon Phi coprocessor run faster than a processor. In the real world, this is what we face and reinforces something we are not bashful in pointing out: a common programming model matters a great deal. You see that notion emerge over and over in real-life examples including those in this book.
We are indebted to the many contributors to this book. In this book, you find a rich set of examples and advice. Given that this is the introduction, we offer a little perspective to bind it together somewhat. Most of all, we encourage you to dive into the rich examples, found starting in Chapter 2.
Code modernization
It is popular to talk about âcode modernizationâ these days. Having experienced the âinspired by 61 coresâ phenomenon, we are excited to see it has gone viral and is now being discussed by more and more people. You will find lots of âmodernizationâ shown in this book.
Code modernization is reorganizing the code, and perhaps changing algorithms, to increase the amount of thread parallelism, vector/SIMD operations, and compute intensity to optimize performance on modern architectures. Thread parallelism, vector/SIMD operations, and an emphasis on temporal data reuse are all critical for high-performance programming. Many existing applications were written before these elements were required for performance, and therefore, such codes are not yet optimized for modern computers.
Modernize with concurrent algorithms
Examples of opportunities to rethink approaches to better suit the parallelism of modern computers are scattered throughout this book. Chapter 5 encourages using barriers with an eye toward more concurrency. Chapter 11 stresses the importance of not statically decomposing workloads because neither workloads nor the machines we run them on are truly uniform. Chapter 18 shows the power of not thinking that the parallel world is flat. Chapter 26 juggles data, computation, and storage to increase performance. Chapter 12 increases performance by ensuring parallelism in a heterogeneous node. Enhancing parallelism across a heterogeneous cluster is illustrated in Chapter 13 and Chapter 25.
Modernize with vectorization and data locality
Chapter 8 provides a solid examination of data layout issues in the quest to process data as vectors. Chapters 27 and 28 provide additional education and motivation for doing data layout and vectorization work.
Understanding power usage
Power usage is mentioned in enough chapters that we invited Intelâs power tuning expert, Claude Wright, to write Chapter 14. His chapter looks directly at methods to measure power including creating a simple software-based power analyzer with the Intel MPSS tools and also the difficulties of measuring idle power since you are not idle if you are busy measuring power!
ISPC and OpenCL anyone?
While OpenMP and TBB dominate as parallel programming solutions in the industry and this book, we have included some mind-stretching chapters that make the case for other solutions.
SPMD programming gives interesting solutions for vectorization including data layout help, at the cost of dropping sequential consistency. Is it that okay? Chapters 6 and 21 include usage of ispc and its SPMD approach for your consideration. SPMD thinking resonates well when you approach vectorization, even if you do not adopt ispc.
Chapter 22 is written to advocate for OpenCL usage in a heterogeneous world. The contributors describe results from the BUDE molecular docking code, which sustains over 30% of peak floating point performance on a wide variety of systems.
Intel Xeon Phi coprocessor specific
While most of the chapters move algorithms forward on processors and coprocessors, three chapters are dedicated to a deeper look at Intel Xeon Phi coprocessor specific topics. Chapter 15 presents current best practices for managing Intel Xeon Phi coprocessors in a cluster. Chapters 16 and 20 give valuable insights for users of Intel Xeon Phi coprocessors.
Many-core, neo-heterogeneous
The adoption rate of Intel Xeon Phi coprocessors has been steadily increasing since they were first introduced in November 2012. By mid-2013, the cumulative number of FLOPs contributed by Intel Xeon Phi coprocessors in TOP 500 machines exceeded the combined FLOPs contributed by all the graphics processing units (GPUs) installed as floating-point accelerators in the TOP 500 list. In fact, the only device type contributing more FLOPs to TOP 500 supercomputers was Intel XeonÂŽ processors.
As we mentioned in the Preface, the 61 cores of an Intel Xeon Phi coprocessor have inspired a new era of interest in parallel programming. As we saw in our introductory book, Intel Xeon Phi Coprocessor High-Performance Programming, the coprocessors use the same programming languages, parallel programming models, and the same tools as processors. In essence, this means that the challenge of programming the coprocessor is largely the same challenge as parallel programming for a general-purpose processor. This is because the design of both processors and the Intel Xeon Phi coprocessor avoided the restricted programming nature inherent in heterogeneous programming when using devices with restricted programming capabilities.
The experiences of programmers using the Intel Xeon Phi coprocessor time and time again have reinforced the value of a common programming modelâa fact that is independently and repeatedly emphasized by the chapter authors in this book. The take-away message is clear that the effort spent to tune for scaling and vectorization for the Intel Xeon Phi coprocessor is time well spent for improving performance for processors such as Intel Xeon processors.
No âXeon Phiâ in the title, neo-heterogeneous programming
Because the key programming challenges are generically parallel, we knew we needed to emphasize the applicability to both multicore and many-core computing instead of focusing only on Intel Xeon Phi coprocessors, which is why âXeon Phiâ does not appear in the title of this book.
However, systems with coprocessors and processors combined do usher in two unique challenges that are addressed in this book: (1) Hiding the latency of moving data to and from an attached device, a challenge common to any âattachedâ device including GPUs and coprocessors. Future Intel Xeon Phi products will offer configurations that eliminate the data-movement challenge by being offered as processors instead of being packaged coprocessors. (2) Another unique and broader challenge lies in programming heterogeneous systems. Previously, heterogeneous programming referred to systems that combined incompatible computational devices. Incompatible in that they used programming methods different enough to require separate development tools and coding approaches. The Intel Xeon Phi products changed all that. Intel Xeon Phi coprocessors offer...
Table of contents
- Cover image
- Title page
- Table of Contents
- Copyright
- Contributors
- Acknowledgments
- Foreword
- Preface
- Chapter 1: Introduction
- Chapter 2: From âCorrectâ to âCorrect & Efficientâ: A Hydro2D Case Study with Godunovâs Scheme
- Chapter 3: Better Concurrency and SIMD on HBM
- Chapter 4: Optimizing for Reacting Navier-Stokes Equations
- Chapter 5: Plesiochronous Phasing Barriers
- Chapter 6: Parallel Evaluation of Fault Tree Expressions
- Chapter 7: Deep-Learning Numerical Optimization
- Chapter 8: Optimizing Gather/Scatter Patterns
- Chapter 9: A Many-Core Implementation of the Direct N-Body Problem
- Chapter 10: N-Body Methods
- Chapter 11: Dynamic Load Balancing Using OpenMP 4.0
- Chapter 12: Concurrent Kernel Offloading
- Chapter 13: Heterogeneous Computing with MPI
- Chapter 14: Power Analysis on the IntelŽ Xeon Phi⢠Coprocessor
- Chapter 15: Integrating Intel Xeon Phi Coprocessors into a Cluster Environment
- Chapter 16: Supporting Cluster File Systems on IntelŽ Xeon Phi⢠Coprocessors
- Chapter 17: NWChem: Quantum Chemistry Simulations at Scale
- Chapter 18: Efficient Nested Parallelism on Large-Scale Systems
- Chapter 19: Performance Optimization of Black-Scholes Pricing
- Chapter 20: Data Transfer Using the Intel COI Library
- Chapter 21: High-Performance Ray Tracing
- Chapter 22: Portable Performance with OpenCL
- Chapter 23: Characterization and Optimization Methodology Applied to Stencil Computations
- Chapter 24: Profiling-Guided Optimization
- Chapter 25: Heterogeneous MPI application optimization with ITAC
- Chapter 26: Scalable Out-of-Core Solvers on a Cluster
- Chapter 27: Sparse Matrix-Vector Multiplication: Parallelization and Vectorization
- Chapter 28: Morton Order Improves Performance
- Author Index
- Subject Index
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, weâve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere â even offline. Perfect for commutes or when youâre on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access High Performance Parallelism Pearls Volume One by James Reinders,James Jeffers in PDF and/or ePUB format, as well as other popular books in Computer Science & Software Development. We have over one million books available in our catalogue for you to explore.