eBook - ePub

C++ High Performance

Name: C++ High Performance
Author: Viktor Sehr, Bjorn Andrist

Viktor Sehr, Bjorn Andrist

Share book

374 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

C++ High Performance

Viktor Sehr, Bjorn Andrist

Book details

Book preview

Table of contents

Citations

About This Book

Write code that scales across CPU registers, multi-core, and machine clustersAbout This Book• Explore concurrent programming in C++• Identify memory management problems• Use SIMD and STL containers for performance improvementWho This Book Is ForIf you're a C++ developer looking to improve the speed of your code or simply wanting to take your skills up to the next level, then this book is perfect for you.What You Will Learn• Benefits of modern C++ constructs and techniques• Identify hardware bottlenecks, such as CPU cache misses, to boost performance• Write specialized data structures for performance-critical code• Use modern metaprogramming techniques to reduce runtime calculations• Achieve efficient memory management using custom memory allocators• Reduce boilerplate code using reflection techniques• Reap the benefits of lock-free concurrent programming• Perform under-the-hood optimizations with preserved readability using proxy objects• Gain insights into subtle optimizations used by STL algorithms• Utilize the Range V3 library for expressive C++ code• Parallelize your code over CPU and GPU, without compromising readabilityIn DetailC++ is a highly portable language and can be used to write both large-scale applications and performance-critical code. It has evolved over the last few years to become a modern and expressive language. This book will guide you through optimizing the performance of your C++ apps by allowing them to run faster and consume fewer resources on the device they're running on without compromising the readability of your code base.The book begins by helping you measure and identify bottlenecks in a C++ code base. It then moves on by teaching you how to use modern C++ constructs and techniques. You'll see how this affects the way you write code. Next, you'll see the importance of data structure optimization and memory management, and how it can be used efficiently with respect to CPU caches. After that, you'll see how STL algorithm and composable Range V3 should be used to both achieve faster execution and more readable code, followed by how to use STL containers and how to write your own specialized iterators.Moving on, you'll get hands-on experience in making use of modern C++ metaprogramming and reflection to reduce boilerplate code as well as in working with proxy objects to perform optimizations under the hood. After that, you'll learn concurrent programming and understand lock-free data structures. The book ends with an overview of parallel algorithms using STL execution policies, Boost Compute, and OpenCL to utilize both the CPU and the GPU.Style and approachThis easy-to-follow guide is full of examples and self-sufficient code snippets that help you with high performance programming with C++. You'll get your hands dirty with this all-inclusive guide that uncovers hidden performance improvement areas for any C++ code.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is C++ High Performance an online PDF/ePUB?

Yes, you can access C++ High Performance by Viktor Sehr, Bjorn Andrist in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming in C++. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2018

ISBN

9781787124776

Edition

Topic

Computer Science

Subtopic

Programming in C++

Index

Computer Science

Parallel STL

In this chapter, you will learn how to use the computer's graphical processing unit for computationally heavy tasks. We will use the excellent Boost Compute library, which exposes the GPU via an interface that resembles the STL, meaning that you will move your standard C++ code almost seamlessly from the CPU to the GPU.

This chapter is not going to go in depth into theories of parallelizing algorithms or parallel programming in general, as these subjects are far too complex to cover in a single chapter. Also, there is a multitude of books on this subject. Instead, this chapter is going to take a more practical approach and demonstrate how to extend a current C++ code base to utilize parallelism while preserving the readability of the code base.

In other words, we do not want the parallelism to get in the way of readability; rather, we want the parallelism to be abstracted away so that parallelizing the code is only a matter of changing a parameter to an algorithm.

In earlier chapters, we have stressed that we prefer STL algorithms over handcrafted for-loops; in this chapter, we will see some great advantages of using algorithms.

We will start this chapter off by looking at a few parallel implementations of standard algorithms, and the added complexity of writing parallel versions of them. We will then go on to see how we can adapt a code base to use the parallel extensions of STL, and finally we will take a brief look at how we can use the capabilities of the GPU in a simple way by using Boost Compute and OpenCL.

Importance of parallelism

From a programmer's perspective, it would have been very convenient if the computer hardware of today had been a 100 GHz single core CPU rather than a three gigahertz multi-core CPU, and we wouldn't need to care about parallelism. But, as the evolution of computer hardware is going in the direction of multi-core CPUs, programmers have to use efficient parallel patterns in order to make the most out of the hardware.

Parallel algorithms

As mentioned in Chapter 10, Concurrency, with parallelism we refer to programming that takes advantage of hardware with multiple cores. It makes no sense to parallelize algorithms if the hardware does not provide any of the benefits of it.

Therefore, a parallel algorithm equivalent of a sequential algorithm is algorithmically slower than the sequential. Its benefits come from the ability to spread the algorithms onto several processing units.

With that in mind, it's also notable that not all algorithms gain the same performance increase when run in parallel. As a simple measurement of how well an algorithm scales, we can measure:

A: The time it takes to execute sequentially at one CPU core
B: The time it takes to execute in parallel, multiplied by the number of cores

If A and B are equal, the algorithm parallelizes perfectly, and the larger B is compared to A, the worse the algorithm parallelizes.

How well an algorithm parallelizes depends on how independently each element can be processed. For example, std::transform() is trivial to parallelize in the sense that each element is processed completely independent of every other. This means that theoretically, for n number of cores, it would execute n times as fast as a sequential execution. In practice, though, there are a multitude of parameters that limit parallel execution such as creating threads, context switches, and so on, as mentioned in Chapter 10, Concurrency in C++.

As parallel algorithms always have a higher computational cost than their sequential equivalent, there are some cases where you may want a sequential version even though it's slower. An example of such a case is if you are optimizing for low energy consumption rather than low computational time. Even though this is probably a very rare case (perhaps a solar-powered galaxy-exploring spacecraft), it might be worth noting.

Implementing parallel std::transform()

Although algorithmically std::transform() is easy to implement, in practice implementing even a rudimentary parallel version is more complex than it might appear at first sight.

A naive parallel implementation of std::transform() would probably look something like this:

Divide the elements into chunks corresponding to the number of cores in the computer
Execute each chunk in a separate task in parallel
Wait for all tasks to finish

Naive implementation

Using std::thread::hardware_concurrency() to determine the number of supported hardware threads, a naive implementation could look like this. Note that hardware_concurrency() might return 0 if it for some reason is undetermined, and therefore it is clamped to be at least one:

template <typename SrcIt, typename DstIt, typename Func>
auto par_transform_naive(SrcIt first, SrcIt last, DstIt dst, Func f) {
 auto n = static_cast<size_t>(std::distance(first, last));
 auto num_tasks = std::max(std::thread::hardware_concurrency(), 1);
 auto chunk_sz = std::max(n / num_tasks, 1);
 auto futures = std::vector<std::future<void>>{};
 futures.reserve(num_tasks); // Invoke each chunk on a separate 
 // task, to be executed in parallel
 for (size_t task_idx = 0; task_idx < num_tasks; ++task_idx) {
 auto start_idx = chunk_sz * task_idx;
 auto stop_idx = std::min(chunk_sz * (task_idx + 1), n);
 auto fut = std::async([first, dst, start_idx, stop_idx, &f](){
 std::transform(fi...