Julia High Performance
eBook - ePub

Julia High Performance

Optimizations, distributed computing, multithreading, and GPU programming with Julia 1.0 and beyond, 2nd Edition

  1. 218 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Julia High Performance

Optimizations, distributed computing, multithreading, and GPU programming with Julia 1.0 and beyond, 2nd Edition

About this book

Design and develop high-performance programs in Julia 1.0

Key Features

  • Learn the characteristics of high-performance Julia code
  • Use the power of the GPU to write efficient numerical code
  • Speed up your computation with the help of newly introduced shared memory multi-threading in Julia 1.0

Book Description

Julia is a high-level, high-performance dynamic programming language for numerical computing. If you want to understand how to avoid bottlenecks and design your programs for the highest possible performance, then this book is for you.

The book starts with how Julia uses type information to achieve its performance goals, and how to use multiple dispatches to help the compiler emit high-performance machine code. After that, you will learn how to analyze Julia programs and identify issues with time and memory consumption. We teach you how to use Julia's typing facilities accurately to write high-performance code and describe how the Julia compiler uses type information to create fast machine code. Moving ahead, you'll master design constraints and learn how to use the power of the GPU in your Julia code and compile Julia code directly to the GPU. Then, you'll learn how tasks and asynchronous IO help you create responsive programs and how to use shared memory multithreading in Julia. Toward the end, you will get a flavor of Julia's distributed computing capabilities and how to run Julia programs on a large distributed cluster.

By the end of this book, you will have the ability to build large-scale, high-performance Julia applications, design systems with a focus on speed, and improve the performance of existing programs.

What you will learn

  • Understand how Julia code is transformed into machine code
  • Measure the time and memory taken by Julia programs
  • Create fast machine code using Julia's type information
  • Define and call functions without compromising Julia's performance
  • Accelerate your code via the GPU
  • Use tasks and asynchronous IO for responsive programs
  • Run Julia programs on large distributed clusters

Who this book is for

This book is for beginners and intermediate Julia programmers who are interested in high-performance technical programming. A basic knowledge of Julia programming is assumed.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Julia High Performance by Avik Sengupta in PDF and/or ePUB format, as well as other popular books in Computer Science & Natural Language Processing. We have over one million books available in our catalogue for you to explore.

Using Arrays

It should not be a surprise to the reader of this book that array operations are often the cornerstone of scientific and numeric programming. While arrays are a fundamental data structure in all programming, there are special considerations to bear in mind when used in numerical programming. One particular difference is that arrays are not just viewed as entities for data storage. Rather, they may represent the fundamental mathematical structures of vectors and matrices.
In this chapter, we will discuss how to use arrays in Julia in the fastest possible way. When you profile your program, you will find that in many cases, the majority of its execution time is spent in array operations. Therefore, the discussions in this chapter will likely turn out to be crucial in creating high-performance Julia code. The following are the topics we will cover:
  • Array internals and storage
  • Bounds checks
  • In-place operations
  • Broadcasting
  • Subarrays and array views
  • SIMD parallelization using AVX
  • Specialized array types
  • Writing generic library functions using arrays

Array internals in Julia

We discussed how Julia's performance comes out of using type information to compile specific and fast machine code for different data types. Nowhere is this more apparent than in array-related code. This is probably the use case in which all of Julia's design choices pay off for creating high-performance code.

Array representation and storage

The array type in Julia is parameterized by the type of its elements and the number of its dimensions. Hence, the type of an array is represented as Array{T, N}, where T is the type of its elements, and N is the number of dimensions. So, for example, Array{UTF8String, 1} is a one-dimensional array of strings, while Array{Float64,2} is a two-dimensional array of floating point numbers.
Type parameters
You must have realized that type parameters in Julia do not always have to be other types; they can be symbols, or instances of any bitstype. This makes Julia's type system enormously powerful. It allows the type system to represent complex relationships and enables many operations to be moved to compile (or dispatch) time, rather than at runtime.
Representing the type of its element within the type of arrays as a type parameter allows powerful optimization. It allows arrays of primitive types (and many immutable types) to be stored inline. In other words, the elements of the array are stored within the array's own primary memory allocation.
In the following diagram, we illustrate this storage mechanism. The numbers in the top row represent array indexes, while the numbers in the boxes are the integer elements stored within the array. The number in the bottom row represent the memory addresses where each of these elements are stored:
In most other dynamic languages, all the arrays are stored using pointers to their values. This is usually because the language runtime does not have enough information about the types of values to be stored in an array, and hence, cannot allocate the correctly sized storage.
As represented in the following diagram, when an array is allocated, the contiguous storage simply consists of pointers to the actual elements, even when these elements are primitive types that can be stored natively in memory:
This method of storing arrays inline without pointer indirection as much as possible has many advantages and, as we discussed earlier, is responsible for many of Julia's performance claims. In other dynamic languages, the type of every element of the array is uncertain and the compiler has to insert type checks on each access. This can quickly add up and become a major performance drain.
Further, even when every element of the array is of the same type, we pay the cost of memory load for every array element if they are stored as pointers. Given the relative costs of a CPU operation versus a memory load on a modern processor, not doing this is a huge benefit.
There are other benefits too. When the compiler and CPU notice operations on a contiguous block of memory, CPU pipelining and caching are much more efficient. Some CPU optimizations, such as Single Instruction Multiple Data (SIMD), are also unavailable when using indirect array loads.

Column-wise storage

When an array has only one dimension, its elements can be stored one after the other in a contiguous block of memory. As we observed in the previous section, operating on this array sequentially from its starting index to its end can be very fast, making it amenable to many compiler and CPU optimizations.
Two-dimensional (or greater) arrays can, however, be stored in two different ways. We can store them row-wise or column wise. In other words, we can store from the beginning of the array the elements of the first row, followed by the elements of the second row, and so on. Alternatively, we can store the elements of the first column, then the elements of the second column, and so on.
Consider, in the following diagram, the matrix with three rows and four columns:
This matrix can be stored in two different ways, as illustrated in the following diagram. In the Row Major option, the elements of a row are stored together. In the Column Major option, the elements of a column are stored together:
Arrays in C are stored as row-ordered. Julia, on the other hand, chooses the latter strategy, storing arrays as column-ordered, similar to MATLAB and Fortran. This rule generalizes to the higher-dimensional array as well; it is always the last dimension that is stored first.
Naming conventions

Conventionally, the term row refers to the first dimension of a two-dimensional array, and column refers to the second dimension. As an example, for a two-dimensional array of x::Array{Float64, 2} floats, the expression x[2,4] refers to the elements in the second row and fourth column.
This particular strategy of storing arrays has implications for how we navigate them. The most efficient way to read an array is in the same order in which it is laid out in memory. That is, each sequential read should access contiguous areas in memory.
We can demonstrate the performance impact of reading arrays in sequence with the following code that squares and sums the elements of a two-dimensional floating point array, writing the result at each step back to the same position. The following code exercises both the read and write operations for the array:
function col_iter(x) 
s=zero(eltype(x))
for i in 1:size(x, 2)
for j in 1:size(x, 1)
s = s + x[j, i] ^ 2
x[j, i] = s
end
end
end

function row_iter(x)
s=zero(eltype(x))
for i in 1:size(x, 1)
for j in 1:size(x, 2)
s = s + x[i, j] ^ 2
x[i, j] = s
end
end
end
The row_iter function operates on the array in the first row, while the col_iter function operates on the array in the first column. We expect, based on the description of the previous array storage, that the col_iter function would be considerably faster than the row_iter function. Running the benchmarks, this is indeed what we see, as follows:
julia> a = rand(1000, 1000);


julia> @btime col_iter($a)
1.116 ms (0 allocations: 0 bytes)

julia> @btime row_iter($a)
3.429 ms (0 allocations: 0 bytes)
The difference between the two is quite significant. The column major access is more than twice as fast. This kind of difference in the inner loop of an algorithm can make a very noticeable difference in the overall runtime of a program. It is, therefore, crucial to consider the order in which multidimensional arrays are processed when writing performance-sensitive code.

Adjoints

Sometimes, you may have an algorithm that is naturally expressed as row-first. Converting the algorithm to column-first iteration can be a difficult process, prone to errors. In such cases, Julia provides an Adjoint type that can store a transpose of a matrix as a view into the original matrix. Row-wise iteration will be the fast option. We illustrate this in the following code.
julia> b=a'
1000×1000 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
...

julia> @btime col_iter($b)
3.521 ms (0 allocations: 0 bytes)

julia> @btime row_iter($b)
1.160 ms (0 allocations: 0 bytes)
We see here that the timings of the function have been inverted from those in the previous section. Column iteration is now slower than row iteration. You will also notice that the time for row iteration is now the same as the column iteration was previously. This implies that using an Adjoint has no performance overhead at all.

Array initialization

Creating a new array is a two-step process. In step one, memory for the array needs to be allocated. The size of the memory is a product of the length of the array and the size of each element of the array (ignoring the small overhead of the type tag at the top). Second, the allocated memory n...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. Dedication
  4. About Packt
  5. Foreword
  6. Contributors
  7. Preface
  8. Julia is Fast
  9. Analyzing Performance
  10. Types, Type Inference, and Stability
  11. Making Fast Function Calls
  12. Fast Numbers
  13. Using Arrays
  14. Accelerating Code with the GPU
  15. Concurrent Programming with Tasks
  16. Threads
  17. Distributed Computing with Julia
  18. Licences
  19. Other Books You May Enjoy