C++ High Performance
eBook - ePub

C++ High Performance

Viktor Sehr, Bjorn Andrist

Compartir libro
  1. 374 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

C++ High Performance

Viktor Sehr, Bjorn Andrist

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Write code that scales across CPU registers, multi-core, and machine clustersAbout This Book• Explore concurrent programming in C++• Identify memory management problems• Use SIMD and STL containers for performance improvementWho This Book Is ForIf you're a C++ developer looking to improve the speed of your code or simply wanting to take your skills up to the next level, then this book is perfect for you.What You Will Learn• Benefits of modern C++ constructs and techniques• Identify hardware bottlenecks, such as CPU cache misses, to boost performance• Write specialized data structures for performance-critical code• Use modern metaprogramming techniques to reduce runtime calculations• Achieve efficient memory management using custom memory allocators• Reduce boilerplate code using reflection techniques• Reap the benefits of lock-free concurrent programming• Perform under-the-hood optimizations with preserved readability using proxy objects• Gain insights into subtle optimizations used by STL algorithms• Utilize the Range V3 library for expressive C++ code• Parallelize your code over CPU and GPU, without compromising readabilityIn DetailC++ is a highly portable language and can be used to write both large-scale applications and performance-critical code. It has evolved over the last few years to become a modern and expressive language. This book will guide you through optimizing the performance of your C++ apps by allowing them to run faster and consume fewer resources on the device they're running on without compromising the readability of your code base.The book begins by helping you measure and identify bottlenecks in a C++ code base. It then moves on by teaching you how to use modern C++ constructs and techniques. You'll see how this affects the way you write code. Next, you'll see the importance of data structure optimization and memory management, and how it can be used efficiently with respect to CPU caches. After that, you'll see how STL algorithm and composable Range V3 should be used to both achieve faster execution and more readable code, followed by how to use STL containers and how to write your own specialized iterators.Moving on, you'll get hands-on experience in making use of modern C++ metaprogramming and reflection to reduce boilerplate code as well as in working with proxy objects to perform optimizations under the hood. After that, you'll learn concurrent programming and understand lock-free data structures. The book ends with an overview of parallel algorithms using STL execution policies, Boost Compute, and OpenCL to utilize both the CPU and the GPU.Style and approachThis easy-to-follow guide is full of examples and self-sufficient code snippets that help you with high performance programming with C++. You'll get your hands dirty with this all-inclusive guide that uncovers hidden performance improvement areas for any C++ code.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es C++ High Performance un PDF/ePUB en línea?
Sí, puedes acceder a C++ High Performance de Viktor Sehr, Bjorn Andrist en formato PDF o ePUB, así como a otros libros populares de Computer Science y Programming in C++. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2018
ISBN
9781787124776
Edición
1

Parallel STL

In this chapter, you will learn how to use the computer's graphical processing unit for computationally heavy tasks. We will use the excellent Boost Compute library, which exposes the GPU via an interface that resembles the STL, meaning that you will move your standard C++ code almost seamlessly from the CPU to the GPU.
This chapter is not going to go in depth into theories of parallelizing algorithms or parallel programming in general, as these subjects are far too complex to cover in a single chapter. Also, there is a multitude of books on this subject. Instead, this chapter is going to take a more practical approach and demonstrate how to extend a current C++ code base to utilize parallelism while preserving the readability of the code base.
In other words, we do not want the parallelism to get in the way of readability; rather, we want the parallelism to be abstracted away so that parallelizing the code is only a matter of changing a parameter to an algorithm.
In earlier chapters, we have stressed that we prefer STL algorithms over handcrafted for-loops; in this chapter, we will see some great advantages of using algorithms.
We will start this chapter off by looking at a few parallel implementations of standard algorithms, and the added complexity of writing parallel versions of them. We will then go on to see how we can adapt a code base to use the parallel extensions of STL, and finally we will take a brief look at how we can use the capabilities of the GPU in a simple way by using Boost Compute and OpenCL.

Importance of parallelism

From a programmer's perspective, it would have been very convenient if the computer hardware of today had been a 100 GHz single core CPU rather than a three gigahertz multi-core CPU, and we wouldn't need to care about parallelism. But, as the evolution of computer hardware is going in the direction of multi-core CPUs, programmers have to use efficient parallel patterns in order to make the most out of the hardware.

Parallel algorithms

As mentioned in Chapter 10, Concurrency, with parallelism we refer to programming that takes advantage of hardware with multiple cores. It makes no sense to parallelize algorithms if the hardware does not provide any of the benefits of it.
Therefore, a parallel algorithm equivalent of a sequential algorithm is algorithmically slower than the sequential. Its benefits come from the ability to spread the algorithms onto several processing units.
With that in mind, it's also notable that not all algorithms gain the same performance increase when run in parallel. As a simple measurement of how well an algorithm scales, we can measure:
  • A: The time it takes to execute sequentially at one CPU core
  • B: The time it takes to execute in parallel, multiplied by the number of cores
If A and B are equal, the algorithm parallelizes perfectly, and the larger B is compared to A, the worse the algorithm parallelizes.
How well an algorithm parallelizes depends on how independently each element can be processed. For example, std::transform() is trivial to parallelize in the sense that each element is processed completely independent of every other. This means that theoretically, for n number of cores, it would execute n times as fast as a sequential execution. In practice, though, there are a multitude of parameters that limit parallel execution such as creating threads, context switches, and so on, as mentioned in Chapter 10, Concurrency in C++.
As parallel algorithms always have a higher computational cost than their sequential equivalent, there are some cases where you may want a sequential version even though it's slower. An example of such a case is if you are optimizing for low energy consumption rather than low computational time. Even though this is probably a very rare case (perhaps a solar-powered galaxy-exploring spacecraft), it might be worth noting.

Implementing parallel std::transform()

Although algorithmically std::transform() is easy to implement, in practice implementing even a rudimentary parallel version is more complex than it might appear at first sight.
A naive parallel implementation of std::transform() would probably look something like this:
  • Divide the elements into chunks corresponding to the number of cores in the computer
  • Execute each chunk in a separate task in parallel
  • Wait for all tasks to finish

Naive implementation

Using std::thread::hardware_concurrency() to determine the number of supported hardware threads, a naive implementation could look like this. Note that hardware_concurrency() might return 0 if it for some reason is undetermined, and therefore it is clamped to be at least one:
template <typename SrcIt, typename DstIt, typename Func>
auto par_transform_naive(SrcIt first, SrcIt last, DstIt dst, Func f) {
auto n = static_cast<size_t>(std::distance(first, last));
auto num_tasks = std::max(std::thread::hardware_concurrency(), 1);
auto chunk_sz = std::max(n / num_tasks, 1);
auto futures = std::vector<std::future<void>>{};
futures.reserve(num_tasks); // Invoke each chunk on a separate
// task, to be executed in parallel
for (size_t task_idx = 0; task_idx < num_tasks; ++task_idx) {
auto start_idx = chunk_sz * task_idx;
auto stop_idx = std::min(chunk_sz * (task_idx + 1), n);
auto fut = std::async([first, dst, start_idx, stop_idx, &f](){
std::transform(fi...

Índice