Mastering PostgreSQL 11
eBook - ePub

Mastering PostgreSQL 11

Expert techniques to build scalable, reliable, and fault-tolerant database applications, 2nd Edition

Hans-Jürgen Schönig

Compartir libro
  1. 450 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Mastering PostgreSQL 11

Expert techniques to build scalable, reliable, and fault-tolerant database applications, 2nd Edition

Hans-Jürgen Schönig

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Master the capabilities of PostgreSQL 11 to efficiently manage and maintain your database

Key Features

  • Master advanced concepts of PostgreSQL 11 with real-world datasets and examples
  • Explore query parallelism, data replication, and database performance while working with larger datasets
  • Extend the functionalities of your PostgreSQL instance to suit your organization's needs with minimal effort

Book Description

This second edition of Mastering PostgreSQL 11 helps you build dynamic database solutions for enterprise applications using the latest release of PostgreSQL, which enables database analysts to design both the physical and technical aspects of the system architecture with ease.

This book begins with an introduction to the newly released features in PostgreSQL 11 to help you build efficient and fault-tolerant PostgreSQL applications. You'll examine all of the advanced aspects of PostgreSQL in detail, including logical replication, database clusters, performance tuning, monitoring, and user management. You will also work with the PostgreSQL optimizer, configuring PostgreSQL for high speed, and see how to move from Oracle to PostgreSQL. As you progress through the chapters, you will cover transactions, locking, indexes, and optimizing queries to improve performance. Additionally, you'll learn to manage network security and explore backups and replications, while understanding the useful extensions of PostgreSQL so that you can optimize the speed and performance of large databases.

By the end of this book, you will be able to use your database to its utmost capacity by implementing advanced administrative tasks with ease.

What you will learn

  • Get to grips with advanced PostgreSQL 11 features and SQL functions
  • Make use of the indexing features in PostgreSQL and fine-tune the performance of your queries
  • Work with stored procedures and manage backup and recovery
  • Master replication and failover techniques
  • Troubleshoot your PostgreSQL instance for solutions to common and not-so-common problems
  • Perform database migration from MySQL and Oracle to PostgreSQL with ease

Who this book is for

This book is for data and database professionals wanting to implement advanced functionalities and master complex administrative tasks with PostgreSQL 11. Prior experience of database administration with PostgreSQL database will aid in understanding the concepts covered in this book.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Mastering PostgreSQL 11 un PDF/ePUB en línea?
Sí, puedes acceder a Mastering PostgreSQL 11 de Hans-Jürgen Schönig en formato PDF o ePUB, así como a otros libros populares de Computer Science y Databases. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2018
ISBN
9781789539172
Edición
2
Categoría
Databases

Optimizing Queries for Good Performance

In Chapter 5, Log Files and System Statistics, you learned how to read system statistics and how to make use of what PostgreSQL provides. Armed with this knowledge, this chapter is all about good query performance. You will learn more about the following topics:
  • Optimizer internals
  • Execution plans
  • Partitioning data
  • Enabling and disabling optimizer settings
  • Parameters for good query performance
  • Parallel queries
  • JIT compilation
By the end of this chapter, we will be able to write better and faster queries. If the queries still aren't very good, we should be able to understand why this is the case. We will also be able to use the new techniques we have learned about to partition data.

Learning what the optimizer does

Before even attempting to think about query performance, it makes sense to familiarize yourself with what the query optimizer does. Having a deeper understanding of what is going on under the hood makes a lot of sense because it helps you see what the database is really up to and what it is doing.

Optimizations by example

To demonstrate how the optimizer works, I have compiled an example. It is something that has been used by me over the years for PostgreSQL training. Let's assume that there are three tables, as follows:
CREATE TABLE a (aid int, ...); -- 100 million rows CREATE TABLE b (bid int, ...); -- 200 million rows  CREATE TABLE c (cid int, ...); -- 300 million rows 
Let's further assume that those tables contain millions, or maybe hundreds of millions, of rows. In addition to that, there are indexes:
CREATE INDEX idx_a ON a (aid); CREATE INDEX idx_b ON b (bid); CREATE INDEX idx_c ON c (cid);
CREATE VIEW v AS SELECT *
FROM a, b
WHERE aid = bid;
Finally, there is a view joining the first two tables together.
Let's suppose that the end user wants to run the following query. What will the optimizer do with this query? What choices does the planner have?
SELECT * 
FROM v, c

WHERE v.aid = c.cid
AND cid = 4;
Before looking at the real optimization process, we will focus on some of the options that the planner has.

Evaluating join options

The planner has a couple of options here, so let's take this opportunity to understand what can go wrong if trivial approaches are used.
Suppose the planner just steams ahead and calculates the output of the view. What is the best way to join 100 million rows with 200 million rows?
In this section, a couple of (not all) join options will be discussed to show you what PostgreSQL is able to do.

Nested loops

One way to join two tables is to use a nested loop. The principle is simple. Here is some pseudocode:
for x in table1: 
for y in table2:
if x.field == y.field
issue row
else
keep doing
Nested loops are often used if one of the sides is very small and contains only a limited set of data. In our example, a nested loop would lead to 100 million x 200 million iterations through the code. This is clearly not an option because the runtime would simply explode.
A nested loop is generally O(n2), so it is only efficient if one side of the join is very small. In this example, this is not the case, so a nested loop can be ruled out for calculating the view.

Hash joins

The second option is a hash join. The following strategy could be applied to solve our little problem. The following listing shows how a hash join works:
Hash join
Sequentialscan table 1
Sequentialscan table 2
Both sides can be hashed and the hash keys could be compared, leaving us with the result of the join. The trouble here is that all of the values have to be hashed and stored somewhere.

Merge joins

Finally, there is the merge join. The idea here is to use sorted lists to join the results. If both sides of the join are sorted, the system can just take rows from the top and see if they match and return them. The main requirement here is that the lists are sorted. Here is a sample plan:
Merge join 
Sort table 1
Sequential scan table 1
Sort table 2
Sequential scan table 2
To join these two tables (table 1 and table 2), data has to be provided in a sorted order. In many cases, PostgreSQL will just sort the data. However, there are other options for providing the join with sorted data. One way is to consult an index, as shown in the following example:
Merge join
Index scan table 1
Index scan table 2
One side of the join or both sides can use sorted data coming from lower levels of the plan. If the table is accessed directly, an index is the obvious choice to do that, but only if the returned result set is significantly smaller than the entire table. Otherwise, we encounter almost double the overhead because first we have to read the entire index, then the entire table. If the result set is a large portion of the table, a sequential scan is more efficient, especially if it is being accessed in the primary key order.
The beauty of a merge join is that it can handle a lot of data. The downside is that data has to be sorted or taken from an index at some point.
Sorting is O(n * log(n)). Therefore, sorting 300 million rows to perform the join is not attractive either.
Note that since the introduction of PostgreSQL 10.0, all join options described here are also available in a parallel version. The optimizer will therefore not just consider those standard join options, but also evaluate whether it makes sense to perform parallel queries or not.

Applying transformations

Obviously, doing the obvious thing (joining the view first) makes no sense at all. A nested loop would send the execution time through the roof. A hash join has to hash millions of rows, and a nested loop has to sort 300 million rows. All three options are clearly unsuitable here. The way out is to apply logical transformations to make the query fast. In this section, you will learn what the planner does to speed up the query. A couple of steps will be performed.

Inlining the view

The first transformation that's done by the optimizer is to inline views. Here is what happens:
SELECT * 
FROM
(
SELECT *
FROM a, b
WHERE aid = bid
) AS v, c
WHERE v.aid = c.cid
AND cid = 4;
The view is inlined and transformed into a subselect. What does this o...

Índice