Mastering PostgreSQL 11
eBook - ePub

Mastering PostgreSQL 11

Expert techniques to build scalable, reliable, and fault-tolerant database applications, 2nd Edition

Hans-Jürgen Schönig

Share book
  1. 450 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Mastering PostgreSQL 11

Expert techniques to build scalable, reliable, and fault-tolerant database applications, 2nd Edition

Hans-Jürgen Schönig

Book details
Book preview
Table of contents

About This Book

Master the capabilities of PostgreSQL 11 to efficiently manage and maintain your database

Key Features

  • Master advanced concepts of PostgreSQL 11 with real-world datasets and examples
  • Explore query parallelism, data replication, and database performance while working with larger datasets
  • Extend the functionalities of your PostgreSQL instance to suit your organization's needs with minimal effort

Book Description

This second edition of Mastering PostgreSQL 11 helps you build dynamic database solutions for enterprise applications using the latest release of PostgreSQL, which enables database analysts to design both the physical and technical aspects of the system architecture with ease.

This book begins with an introduction to the newly released features in PostgreSQL 11 to help you build efficient and fault-tolerant PostgreSQL applications. You'll examine all of the advanced aspects of PostgreSQL in detail, including logical replication, database clusters, performance tuning, monitoring, and user management. You will also work with the PostgreSQL optimizer, configuring PostgreSQL for high speed, and see how to move from Oracle to PostgreSQL. As you progress through the chapters, you will cover transactions, locking, indexes, and optimizing queries to improve performance. Additionally, you'll learn to manage network security and explore backups and replications, while understanding the useful extensions of PostgreSQL so that you can optimize the speed and performance of large databases.

By the end of this book, you will be able to use your database to its utmost capacity by implementing advanced administrative tasks with ease.

What you will learn

  • Get to grips with advanced PostgreSQL 11 features and SQL functions
  • Make use of the indexing features in PostgreSQL and fine-tune the performance of your queries
  • Work with stored procedures and manage backup and recovery
  • Master replication and failover techniques
  • Troubleshoot your PostgreSQL instance for solutions to common and not-so-common problems
  • Perform database migration from MySQL and Oracle to PostgreSQL with ease

Who this book is for

This book is for data and database professionals wanting to implement advanced functionalities and master complex administrative tasks with PostgreSQL 11. Prior experience of database administration with PostgreSQL database will aid in understanding the concepts covered in this book.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Mastering PostgreSQL 11 an online PDF/ePUB?
Yes, you can access Mastering PostgreSQL 11 by Hans-Jürgen Schönig in PDF and/or ePUB format, as well as other popular books in Computer Science & Databases. We have over one million books available in our catalogue for you to explore.



Optimizing Queries for Good Performance

In Chapter 5, Log Files and System Statistics, you learned how to read system statistics and how to make use of what PostgreSQL provides. Armed with this knowledge, this chapter is all about good query performance. You will learn more about the following topics:
  • Optimizer internals
  • Execution plans
  • Partitioning data
  • Enabling and disabling optimizer settings
  • Parameters for good query performance
  • Parallel queries
  • JIT compilation
By the end of this chapter, we will be able to write better and faster queries. If the queries still aren't very good, we should be able to understand why this is the case. We will also be able to use the new techniques we have learned about to partition data.

Learning what the optimizer does

Before even attempting to think about query performance, it makes sense to familiarize yourself with what the query optimizer does. Having a deeper understanding of what is going on under the hood makes a lot of sense because it helps you see what the database is really up to and what it is doing.

Optimizations by example

To demonstrate how the optimizer works, I have compiled an example. It is something that has been used by me over the years for PostgreSQL training. Let's assume that there are three tables, as follows:
CREATE TABLE a (aid int, ...); -- 100 million rows CREATE TABLE b (bid int, ...); -- 200 million rows  CREATE TABLE c (cid int, ...); -- 300 million rows 
Let's further assume that those tables contain millions, or maybe hundreds of millions, of rows. In addition to that, there are indexes:
CREATE INDEX idx_a ON a (aid); CREATE INDEX idx_b ON b (bid); CREATE INDEX idx_c ON c (cid);
FROM a, b
WHERE aid = bid;
Finally, there is a view joining the first two tables together.
Let's suppose that the end user wants to run the following query. What will the optimizer do with this query? What choices does the planner have?
FROM v, c

WHERE v.aid = c.cid
AND cid = 4;
Before looking at the real optimization process, we will focus on some of the options that the planner has.

Evaluating join options

The planner has a couple of options here, so let's take this opportunity to understand what can go wrong if trivial approaches are used.
Suppose the planner just steams ahead and calculates the output of the view. What is the best way to join 100 million rows with 200 million rows?
In this section, a couple of (not all) join options will be discussed to show you what PostgreSQL is able to do.

Nested loops

One way to join two tables is to use a nested loop. The principle is simple. Here is some pseudocode:
for x in table1: 
for y in table2:
if x.field == y.field
issue row
keep doing
Nested loops are often used if one of the sides is very small and contains only a limited set of data. In our example, a nested loop would lead to 100 million x 200 million iterations through the code. This is clearly not an option because the runtime would simply explode.
A nested loop is generally O(n2), so it is only efficient if one side of the join is very small. In this example, this is not the case, so a nested loop can be ruled out for calculating the view.

Hash joins

The second option is a hash join. The following strategy could be applied to solve our little problem. The following listing shows how a hash join works:
Hash join
Sequentialscan table 1
Sequentialscan table 2
Both sides can be hashed and the hash keys could be compared, leaving us with the result of the join. The trouble here is that all of the values have to be hashed and stored somewhere.

Merge joins

Finally, there is the merge join. The idea here is to use sorted lists to join the results. If both sides of the join are sorted, the system can just take rows from the top and see if they match and return them. The main requirement here is that the lists are sorted. Here is a sample plan:
Merge join 
Sort table 1
Sequential scan table 1
Sort table 2
Sequential scan table 2
To join these two tables (table 1 and table 2), data has to be provided in a sorted order. In many cases, PostgreSQL will just sort the data. However, there are other options for providing the join with sorted data. One way is to consult an index, as shown in the following example:
Merge join
Index scan table 1
Index scan table 2
One side of the join or both sides can use sorted data coming from lower levels of the plan. If the table is accessed directly, an index is the obvious choice to do that, but only if the returned result set is significantly smaller than the entire table. Otherwise, we encounter almost double the overhead because first we have to read the entire index, then the entire table. If the result set is a large portion of the table, a sequential scan is more efficient, especially if it is being accessed in the primary key order.
The beauty of a merge join is that it can handle a lot of data. The downside is that data has to be sorted or taken from an index at some point.
Sorting is O(n * log(n)). Therefore, sorting 300 million rows to perform the join is not attractive either.
Note that since the introduction of PostgreSQL 10.0, all join options described here are also available in a parallel version. The optimizer will therefore not just consider those standard join options, but also evaluate whether it makes sense to perform parallel queries or not.

Applying transformations

Obviously, doing the obvious thing (joining the view first) makes no sense at all. A nested loop would send the execution time through the roof. A hash join has to hash millions of rows, and a nested loop has to sort 300 million rows. All three options are clearly unsuitable here. The way out is to apply logical transformations to make the query fast. In this section, you will learn what the planner does to speed up the query. A couple of steps will be performed.

Inlining the view

The first transformation that's done by the optimizer is to inline views. Here is what happens:
FROM a, b
WHERE aid = bid
) AS v, c
WHERE v.aid = c.cid
AND cid = 4;
The view is inlined and transformed into a subselect. What does this o...

Table of contents