
eBook - ePub
CUDA Fortran for Scientists and Engineers
Best Practices for Efficient CUDA Fortran Programming
- 338 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
CUDA Fortran for Scientists and Engineers
Best Practices for Efficient CUDA Fortran Programming
About this book
CUDA Fortran for Scientists and Engineers shows how high-performance application developers can leverage the power of GPUs using Fortran, the familiar language of scientific computing and supercomputer performance benchmarking. The authors presume no prior parallel computing experience, and cover the basics along with best practices for efficient GPU computing using CUDA Fortran.
To help you add CUDA Fortran to existing Fortran codes, the book explains how to understand the target GPU architecture, identify computationally intensive parts of the code, and modify the code to manage the data and parallelism and optimize performance. All of this is done in Fortran, without having to rewrite in another language. Each concept is illustrated with actual examples so you can immediately evaluate the performance of your code in comparison.
- Leverage the power of GPU computing with PGI's CUDA Fortran compiler
- Gain insights from members of the CUDA Fortran language development team
- Includes multi-GPU programming in CUDA Fortran, covering both peer-to-peer and message passing interface (MPI) approaches
- Includes full source code for all the examples and several case studies
- Download source code and slides from the book's companion website
Trusted byĀ 375,005 students
Access to over 1 million titles for a fair monthly price.
Study more efficiently using our study tools.
Information
Subtopic
Digital MediaIndex
Computer SciencePart I: CUDA Fortran Programming
Outline
Chapter 1 Introduction
Chapter 2 Performance Measurement and Metrics
Chapter 3 Optimization
Chapter 4 Multi-GPU Programming
Chapter 1
Introduction
Abstract
After a short discussion of the history of parallel computation on graphics processing units, or GPUs, this chapter goes through a sequence of simple examples that illustrate the fundamental aspects of computation on GPUs using CUDA Fortran. The hybrid nature of CUDA Fortran programming is illustrated, which contains both host code that is run on the CPU and device code that is executed on the GPU. Ways to determine hardware features and capabilities from within CUDA Fortran code are presented, as are error handling, compilation of CUDA Fortran code, and system management.
Keywords
Data parallelism; Hybrid computation; Host and device code; Kernel; Execution configuration; Compute capability; Error handling; Compilation; Device management
1.1 A brief history of GPU computing
Parallel computing has been around in one form or another for many decades. In the early stages it was generally confined to practitioners who had access to large and expensive machines. Today, things are very different. Almost all consumer desktop and laptop computers have central processing units, or CPUs, with multiple cores. Even most processors in cell phones and tablets have multiple cores. The principal reason for the nearly ubiquitous presence of multiple cores in CPUs is the inability of CPU manufacturers to increase performance in single-core designs by boosting the clock speed. As a result, since about 2005 CPU designs have āscaled outā to multiple cores rather than āscaled upā to higher clock rates. Although CPUs are available with a few to tens of cores, this amount of parallelisms pales in comparison to the number of cores in a graphics processing unit (GPU). For example, the NVIDIA TeslaĀ® K20X contains 2688 cores. GPUs were highly parallel architectures from their beginning, in the mid-1990s, since graphics processing is an inherently parallel task.
The use of GPUs for general-purpose computing, often referred to as GPGPU, was initially a challenging endeavor. One had to program to the graphics application programming interface (API), which proved to be very restrictive in the types of algorithms that could be mapped to the GPU. Even when such a mapping was possible, the programming required to make this happen was difficult and not intuitive for scientists and engineers outside the computer graphics vocation. As such, adoption of the GPU for scientific and engineering computations was slow.
Things changed for GPU computing with the advent of NVIDIAās CUDAĀ® architecture in 2007. The CUDA architecture included both hardware components on NVIDIAās GPU and a software programming environment that eliminated the barriers to adoption that plagued GPGPU. Since CUDAās first appearance in 2007, its adoption has been tremendous, to the point where, in November 2010, three of the top five supercomputers in the Top 500 list used GPUs. In the November 2012 Top 500 list, the fastest computer in the world was also GPU-powered. One of the reasons for this very fast adoption of CUDA is that the programming model was very simple. CUDA C, the first interface to the CUDA architecture, is essentially C with a few extensions that can offload portions of an algorithm to run on the GPU. It is a hybrid approach where both CPU and GPU are used, so porting computations to the GPU can be performed incrementally.
In late 2009, a joint effort between The Portland GroupĀ® (PGIĀ®) and NVIDIA led to the CUDA Fortran compiler. Just as CUDA C is C with extensions, CUDA Fortran is essentially Fortran 90 with a few extensions that allow users to leverage the power of GPUs in their computations. Many books, articles, and other documents have been written to aid in the development of efficient CUDA C applications (e.g., Sanders and Kandrot, 2011; Kirk and Hwu, 2012; Wilt, 2013). Because it is newer, CUDA Fortran has relatively fewer aids for code development. Much of the material for writing efficient CUDA C translates easily to CUDA Fortran, since the underlying architecture is the same, but there is still a need for material that addresses how to write efficient code in CUDA Fortran. There are a couple of reasons for this. First, though CUDA C and CUDA Fortran are similar, there are some differences that will affect how code is written. This is not surprising, since CPU code written in C and Fortran will typically take on a different character as projects grow. Also, there are some features in CUDA C that are not present in CUDA Fortran, such as certain aspects of textures. Conversely, there are some features in CUDA Fortran, such as the device variable attribute used to denote data that resides on the GPU, that are not present in CUDA C.
This book is written for those who want to use parallel computation as a tool in getting other work done rather than as an end in itself. The aim is to give the reader a basic set of skills necessary for them to write reasonably optimized CUDA Fortran code that takes advantage of the NVIDIAĀ® computing hardware. The reason for taking this approach rather than attempting to teach how to extract every last ounce of performance from the hardware is the assumption that those using CUDA Fortran do so as a means rather than an end. Such users typically value clear and maintainable code that is simple to write and performs reasonably well across many generations of CUDA-enabled hardware and CUDA Fortran software.
But where is the line drawn in terms of the effort-performance tradeoff? In the end it is up to the developer to decide how much effort to put into optimizing code. In making this decision, we need to know what type of payoff we can expect when eliminating various bottlenecks and what effort is involved in doing so. One goal of this book is to help the reader develop an intuition needed to make such a return-on-investment assessment. To achieve this end, we discuss bottlenecks encountered in writing common algorithms in science and engineering applications in CUDA Fortran. Multiple workarounds are presented when possible, along with the performance impact of each optimization effort.
1.2 Parallel computation
Before jumping into writing CUDA Fortran code, we should say a few words about where CUDA fits in with other types of parallel programming models. Familiarity with and an understanding of other parallel programming models is not a prerequisite for this book, but for readers who do have some parallel programming experience, this section might be helpful in categorizing CUDA.
We have already mentioned that CUDA is a hybrid computing model, where both the CPU and GPU are used in an application. This is advantageous for development because sections of an existing CPU code can be ported to the GPU incrementally. It is possible to overlap computation on the CPU with computation on the GPU, so this is one aspect of parallelism.
A far greater degree of parallelism occurs within the GPU...
Table of contents
- Cover image
- Title page
- Table of Contents
- Copyright
- Dedication
- Acknowledgments
- Preface
- Part I: CUDA Fortran Programming
- Part II: Case Studies
- Part III: Appendices
- References
- Index
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, weāve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere ā even offline. Perfect for commutes or when youāre on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access CUDA Fortran for Scientists and Engineers by Gregory Ruetsch,Massimiliano Fatica in PDF and/or ePUB format, as well as other popular books in Computer Science & Digital Media. We have over one million books available in our catalogue for you to explore.