C++ High Performance
Viktor Sehr, Bjorn Andrist
- 374 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
C++ High Performance
Viktor Sehr, Bjorn Andrist
About This Book
Write code that scales across CPU registers, multi-core, and machine clustersAbout This Book⢠Explore concurrent programming in C++⢠Identify memory management problems⢠Use SIMD and STL containers for performance improvementWho This Book Is ForIf you're a C++ developer looking to improve the speed of your code or simply wanting to take your skills up to the next level, then this book is perfect for you.What You Will Learn⢠Benefits of modern C++ constructs and techniques⢠Identify hardware bottlenecks, such as CPU cache misses, to boost performance⢠Write specialized data structures for performance-critical code⢠Use modern metaprogramming techniques to reduce runtime calculations⢠Achieve efficient memory management using custom memory allocators⢠Reduce boilerplate code using reflection techniques⢠Reap the benefits of lock-free concurrent programming⢠Perform under-the-hood optimizations with preserved readability using proxy objects⢠Gain insights into subtle optimizations used by STL algorithms⢠Utilize the Range V3 library for expressive C++ code⢠Parallelize your code over CPU and GPU, without compromising readabilityIn DetailC++ is a highly portable language and can be used to write both large-scale applications and performance-critical code. It has evolved over the last few years to become a modern and expressive language. This book will guide you through optimizing the performance of your C++ apps by allowing them to run faster and consume fewer resources on the device they're running on without compromising the readability of your code base.The book begins by helping you measure and identify bottlenecks in a C++ code base. It then moves on by teaching you how to use modern C++ constructs and techniques. You'll see how this affects the way you write code. Next, you'll see the importance of data structure optimization and memory management, and how it can be used efficiently with respect to CPU caches. After that, you'll see how STL algorithm and composable Range V3 should be used to both achieve faster execution and more readable code, followed by how to use STL containers and how to write your own specialized iterators.Moving on, you'll get hands-on experience in making use of modern C++ metaprogramming and reflection to reduce boilerplate code as well as in working with proxy objects to perform optimizations under the hood. After that, you'll learn concurrent programming and understand lock-free data structures. The book ends with an overview of parallel algorithms using STL execution policies, Boost Compute, and OpenCL to utilize both the CPU and the GPU.Style and approachThis easy-to-follow guide is full of examples and self-sufficient code snippets that help you with high performance programming with C++. You'll get your hands dirty with this all-inclusive guide that uncovers hidden performance improvement areas for any C++ code.
Frequently asked questions
Information
Parallel STL
Importance of parallelism
Parallel algorithms
- A: The time it takes to execute sequentially at one CPU core
- B: The time it takes to execute in parallel, multiplied by the number of cores
Implementing parallel std::transform()
- Divide the elements into chunks corresponding to the number of cores in the computer
- Execute each chunk in a separate task in parallel
- Wait for all tasks to finish
Naive implementation
template <typename SrcIt, typename DstIt, typename Func>
auto par_transform_naive(SrcIt first, SrcIt last, DstIt dst, Func f) {
auto n = static_cast<size_t>(std::distance(first, last));
auto num_tasks = std::max(std::thread::hardware_concurrency(), 1);
auto chunk_sz = std::max(n / num_tasks, 1);
auto futures = std::vector<std::future<void>>{};
futures.reserve(num_tasks); // Invoke each chunk on a separate
// task, to be executed in parallel
for (size_t task_idx = 0; task_idx < num_tasks; ++task_idx) {
auto start_idx = chunk_sz * task_idx;
auto stop_idx = std::min(chunk_sz * (task_idx + 1), n);
auto fut = std::async([first, dst, start_idx, stop_idx, &f](){
std::transform(fi...