eBook - ePub

C++ High Performance

Name: C++ High Performance
Author: Viktor Sehr, Bjorn Andrist

Viktor Sehr, Bjorn Andrist

Partager le livre

374 pages
English
ePUB (adapté aux mobiles)
Disponible sur iOS et Android

eBook - ePub

C++ High Performance

Viktor Sehr, Bjorn Andrist

Détails du livre

Aperçu du livre

Table des matières

Citations

À propos de ce livre

Write code that scales across CPU registers, multi-core, and machine clustersAbout This Book• Explore concurrent programming in C++• Identify memory management problems• Use SIMD and STL containers for performance improvementWho This Book Is ForIf you're a C++ developer looking to improve the speed of your code or simply wanting to take your skills up to the next level, then this book is perfect for you.What You Will Learn• Benefits of modern C++ constructs and techniques• Identify hardware bottlenecks, such as CPU cache misses, to boost performance• Write specialized data structures for performance-critical code• Use modern metaprogramming techniques to reduce runtime calculations• Achieve efficient memory management using custom memory allocators• Reduce boilerplate code using reflection techniques• Reap the benefits of lock-free concurrent programming• Perform under-the-hood optimizations with preserved readability using proxy objects• Gain insights into subtle optimizations used by STL algorithms• Utilize the Range V3 library for expressive C++ code• Parallelize your code over CPU and GPU, without compromising readabilityIn DetailC++ is a highly portable language and can be used to write both large-scale applications and performance-critical code. It has evolved over the last few years to become a modern and expressive language. This book will guide you through optimizing the performance of your C++ apps by allowing them to run faster and consume fewer resources on the device they're running on without compromising the readability of your code base.The book begins by helping you measure and identify bottlenecks in a C++ code base. It then moves on by teaching you how to use modern C++ constructs and techniques. You'll see how this affects the way you write code. Next, you'll see the importance of data structure optimization and memory management, and how it can be used efficiently with respect to CPU caches. After that, you'll see how STL algorithm and composable Range V3 should be used to both achieve faster execution and more readable code, followed by how to use STL containers and how to write your own specialized iterators.Moving on, you'll get hands-on experience in making use of modern C++ metaprogramming and reflection to reduce boilerplate code as well as in working with proxy objects to perform optimizations under the hood. After that, you'll learn concurrent programming and understand lock-free data structures. The book ends with an overview of parallel algorithms using STL execution policies, Boost Compute, and OpenCL to utilize both the CPU and the GPU.Style and approachThis easy-to-follow guide is full of examples and self-sufficient code snippets that help you with high performance programming with C++. You'll get your hands dirty with this all-inclusive guide that uncovers hidden performance improvement areas for any C++ code.

Foire aux questions

Comment puis-je résilier mon abonnement ?

Il vous suffit de vous rendre dans la section compte dans paramètres et de cliquer sur « Résilier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez résilié votre abonnement, il restera actif pour le reste de la période pour laquelle vous avez payé. Découvrez-en plus ici.

Puis-je / comment puis-je télécharger des livres ?

Pour le moment, tous nos livres en format ePub adaptés aux mobiles peuvent être téléchargés via l’application. La plupart de nos PDF sont également disponibles en téléchargement et les autres seront téléchargeables très prochainement. Découvrez-en plus ici.

Quelle est la différence entre les formules tarifaires ?

Les deux abonnements vous donnent un accès complet à la bibliothèque et à toutes les fonctionnalités de Perlego. Les seules différences sont les tarifs ainsi que la période d’abonnement : avec l’abonnement annuel, vous économiserez environ 30 % par rapport à 12 mois d’abonnement mensuel.

Qu’est-ce que Perlego ?

Nous sommes un service d’abonnement à des ouvrages universitaires en ligne, où vous pouvez accéder à toute une bibliothèque pour un prix inférieur à celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! Découvrez-en plus ici.

Prenez-vous en charge la synthèse vocale ?

Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte à haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accélérer ou le ralentir. Découvrez-en plus ici.

Est-ce que C++ High Performance est un PDF/ePUB en ligne ?

Oui, vous pouvez accéder à C++ High Performance par Viktor Sehr, Bjorn Andrist en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Computer Science et Programming in C++. Nous disposons de plus d’un million d’ouvrages à découvrir dans notre catalogue.

Informations

Éditeur

Packt Publishing

Année

2018

ISBN

9781787124776

Édition

Sujet

Computer Science

Sous-sujet

Programming in C++

Parallel STL

In this chapter, you will learn how to use the computer's graphical processing unit for computationally heavy tasks. We will use the excellent Boost Compute library, which exposes the GPU via an interface that resembles the STL, meaning that you will move your standard C++ code almost seamlessly from the CPU to the GPU.

This chapter is not going to go in depth into theories of parallelizing algorithms or parallel programming in general, as these subjects are far too complex to cover in a single chapter. Also, there is a multitude of books on this subject. Instead, this chapter is going to take a more practical approach and demonstrate how to extend a current C++ code base to utilize parallelism while preserving the readability of the code base.

In other words, we do not want the parallelism to get in the way of readability; rather, we want the parallelism to be abstracted away so that parallelizing the code is only a matter of changing a parameter to an algorithm.

In earlier chapters, we have stressed that we prefer STL algorithms over handcrafted for-loops; in this chapter, we will see some great advantages of using algorithms.

We will start this chapter off by looking at a few parallel implementations of standard algorithms, and the added complexity of writing parallel versions of them. We will then go on to see how we can adapt a code base to use the parallel extensions of STL, and finally we will take a brief look at how we can use the capabilities of the GPU in a simple way by using Boost Compute and OpenCL.

Importance of parallelism

From a programmer's perspective, it would have been very convenient if the computer hardware of today had been a 100 GHz single core CPU rather than a three gigahertz multi-core CPU, and we wouldn't need to care about parallelism. But, as the evolution of computer hardware is going in the direction of multi-core CPUs, programmers have to use efficient parallel patterns in order to make the most out of the hardware.

Parallel algorithms

As mentioned in Chapter 10, Concurrency, with parallelism we refer to programming that takes advantage of hardware with multiple cores. It makes no sense to parallelize algorithms if the hardware does not provide any of the benefits of it.

Therefore, a parallel algorithm equivalent of a sequential algorithm is algorithmically slower than the sequential. Its benefits come from the ability to spread the algorithms onto several processing units.

With that in mind, it's also notable that not all algorithms gain the same performance increase when run in parallel. As a simple measurement of how well an algorithm scales, we can measure:

A: The time it takes to execute sequentially at one CPU core
B: The time it takes to execute in parallel, multiplied by the number of cores

If A and B are equal, the algorithm parallelizes perfectly, and the larger B is compared to A, the worse the algorithm parallelizes.

How well an algorithm parallelizes depends on how independently each element can be processed. For example, std::transform() is trivial to parallelize in the sense that each element is processed completely independent of every other. This means that theoretically, for n number of cores, it would execute n times as fast as a sequential execution. In practice, though, there are a multitude of parameters that limit parallel execution such as creating threads, context switches, and so on, as mentioned in Chapter 10, Concurrency in C++.

As parallel algorithms always have a higher computational cost than their sequential equivalent, there are some cases where you may want a sequential version even though it's slower. An example of such a case is if you are optimizing for low energy consumption rather than low computational time. Even though this is probably a very rare case (perhaps a solar-powered galaxy-exploring spacecraft), it might be worth noting.

Implementing parallel std::transform()

Although algorithmically std::transform() is easy to implement, in practice implementing even a rudimentary parallel version is more complex than it might appear at first sight.

A naive parallel implementation of std::transform() would probably look something like this:

Divide the elements into chunks corresponding to the number of cores in the computer
Execute each chunk in a separate task in parallel
Wait for all tasks to finish

Naive implementation

Using std::thread::hardware_concurrency() to determine the number of supported hardware threads, a naive implementation could look like this. Note that hardware_concurrency() might return 0 if it for some reason is undetermined, and therefore it is clamped to be at least one:

template <typename SrcIt, typename DstIt, typename Func>
auto par_transform_naive(SrcIt first, SrcIt last, DstIt dst, Func f) {
 auto n = static_cast<size_t>(std::distance(first, last));
 auto num_tasks = std::max(std::thread::hardware_concurrency(), 1);
 auto chunk_sz = std::max(n / num_tasks, 1);
 auto futures = std::vector<std::future<void>>{};
 futures.reserve(num_tasks); // Invoke each chunk on a separate 
 // task, to be executed in parallel
 for (size_t task_idx = 0; task_idx < num_tasks; ++task_idx) {
 auto start_idx = chunk_sz * task_idx;
 auto stop_idx = std::min(chunk_sz * (task_idx + 1), n);
 auto fut = std::async([first, dst, start_idx, stop_idx, &f](){
 std::transform(fi...