From ad7cd5f4e437d4a926828c625615f761375b1330 Mon Sep 17 00:00:00 2001 From: Amy Qualls <aqualls@gitlab.com> Date: Thu, 4 Aug 2022 09:34:42 +0000 Subject: [PATCH] Move explain-plan page to database folder Reunite this page with its friends in development/database/ --- doc/development/chatops_on_gitlabcom.md | 2 +- .../database/database_reviewer_guidelines.md | 4 +- doc/development/database/index.md | 2 +- .../database/pagination_guidelines.md | 2 +- .../database/understanding_explain_plans.md | 829 +++++++++++++++++ doc/development/database_review.md | 8 +- doc/development/query_performance.md | 4 +- .../understanding_explain_plans.md | 832 +----------------- 8 files changed, 847 insertions(+), 836 deletions(-) create mode 100644 doc/development/database/understanding_explain_plans.md diff --git a/doc/development/chatops_on_gitlabcom.md b/doc/development/chatops_on_gitlabcom.md index 2065021c61b5..7309b92c702c 100644 --- a/doc/development/chatops_on_gitlabcom.md +++ b/doc/development/chatops_on_gitlabcom.md @@ -59,5 +59,5 @@ To request access to ChatOps on GitLab.com: ## See also - [ChatOps Usage](../ci/chatops/index.md) -- [Understanding EXPLAIN plans](understanding_explain_plans.md) +- [Understanding EXPLAIN plans](database/understanding_explain_plans.md) - [Feature Groups](feature_flags/index.md#feature-groups) diff --git a/doc/development/database/database_reviewer_guidelines.md b/doc/development/database/database_reviewer_guidelines.md index a34ea52454fe..64a570ed43d9 100644 --- a/doc/development/database/database_reviewer_guidelines.md +++ b/doc/development/database/database_reviewer_guidelines.md @@ -53,14 +53,14 @@ that require a more in-depth discussion between the database reviewers and maint - [Database Office Hours Agenda](https://docs.google.com/document/d/1wgfmVL30F8SdMg-9yY6Y8djPSxWNvKmhR5XmsvYX1EI/edit). - <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [YouTube playlist with past recordings](https://www.youtube.com/playlist?list=PL05JrBw4t0Kp-kqXeiF7fF7cFYaKtdqXM). -You should also join the [#database-lab](../understanding_explain_plans.md#database-lab-engine) +You should also join the [#database-lab](understanding_explain_plans.md#database-lab-engine) Slack channel and get familiar with how to use Joe, the Slackbot that provides developers with their own clone of the production database. Understanding and efficiently using `EXPLAIN` plans is at the core of the database review process. The following guides provide a quick introduction and links to follow on more advanced topics: -- Guide on [understanding EXPLAIN plans](../understanding_explain_plans.md). +- Guide on [understanding EXPLAIN plans](understanding_explain_plans.md). - [Explaining the unexplainable series in `depesz`](https://www.depesz.com/tag/unexplainable/). We also have licensed access to The Art of PostgreSQL available, if you are interested in getting access please check out the diff --git a/doc/development/database/index.md b/doc/development/database/index.md index b427f54ff3cb..7dff125dc42f 100644 --- a/doc/development/database/index.md +++ b/doc/development/database/index.md @@ -16,7 +16,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w ## Tooling -- [Understanding EXPLAIN plans](../understanding_explain_plans.md) +- [Understanding EXPLAIN plans](understanding_explain_plans.md) - [explain.depesz.com](https://explain.depesz.com/) or [explain.dalibo.com](https://explain.dalibo.com/) for visualizing the output of `EXPLAIN` - [pgFormatter](https://sqlformat.darold.net/) a PostgreSQL SQL syntax beautifier - [db:check-migrations job](dbcheck-migrations-job.md) diff --git a/doc/development/database/pagination_guidelines.md b/doc/development/database/pagination_guidelines.md index 1641708ce016..fe2e3b469394 100644 --- a/doc/development/database/pagination_guidelines.md +++ b/doc/development/database/pagination_guidelines.md @@ -192,7 +192,7 @@ The query execution plan shows that this query is efficient, the database only r (6 rows) ``` -See the [Understanding EXPLAIN plans](../understanding_explain_plans.md) to find more information about reading execution plans. +See the [Understanding EXPLAIN plans](understanding_explain_plans.md) to find more information about reading execution plans. Let's visit the 50_000th page: diff --git a/doc/development/database/understanding_explain_plans.md b/doc/development/database/understanding_explain_plans.md new file mode 100644 index 000000000000..49babde737a8 --- /dev/null +++ b/doc/development/database/understanding_explain_plans.md @@ -0,0 +1,829 @@ +--- +stage: Data Stores +group: Database +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Understanding EXPLAIN plans + +PostgreSQL allows you to obtain query plans using the `EXPLAIN` command. This +command can be invaluable when trying to determine how a query performs. +You can use this command directly in your SQL query, as long as the query starts +with it: + +```sql +EXPLAIN +SELECT COUNT(*) +FROM projects +WHERE visibility_level IN (0, 20); +``` + +When running this on GitLab.com, we are presented with the following output: + +```sql +Aggregate (cost=922411.76..922411.77 rows=1 width=8) + -> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0) + Filter: (visibility_level = ANY ('{0,20}'::integer[])) +``` + +When using _just_ `EXPLAIN`, PostgreSQL does not actually execute our query, +instead it produces an _estimated_ execution plan based on the available +statistics. This means the actual plan can differ quite a bit. Fortunately, +PostgreSQL provides us with the option to execute the query as well. To do so, +we need to use `EXPLAIN ANALYZE` instead of just `EXPLAIN`: + +```sql +EXPLAIN ANALYZE +SELECT COUNT(*) +FROM projects +WHERE visibility_level IN (0, 20); +``` + +This produces: + +```sql +Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1) + -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) + Filter: (visibility_level = ANY ('{0,20}'::integer[])) + Rows Removed by Filter: 65677 +Planning time: 2.861 ms +Execution time: 3428.596 ms +``` + +As we can see this plan is quite different, and includes a lot more data. Let's +discuss this step by step. + +Because `EXPLAIN ANALYZE` executes the query, care should be taken when using a +query that writes data or might time out. If the query modifies data, +consider wrapping it in a transaction that rolls back automatically like so: + +```sql +BEGIN; +EXPLAIN ANALYZE +DELETE FROM users WHERE id = 1; +ROLLBACK; +``` + +The `EXPLAIN` command also takes additional options, such as `BUFFERS`: + +```sql +EXPLAIN (ANALYZE, BUFFERS) +SELECT COUNT(*) +FROM projects +WHERE visibility_level IN (0, 20); +``` + +This then produces: + +```sql +Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1) + Buffers: shared hit=208846 + -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) + Filter: (visibility_level = ANY ('{0,20}'::integer[])) + Rows Removed by Filter: 65677 + Buffers: shared hit=208846 +Planning time: 2.861 ms +Execution time: 3428.596 ms +``` + +For more information, refer to the official +[`EXPLAIN` documentation](https://www.postgresql.org/docs/current/sql-explain.html) +and [using `EXPLAIN` guide](https://www.postgresql.org/docs/current/using-explain.html). + +## Nodes + +Every query plan consists of nodes. Nodes can be nested, and are executed from +the inside out. This means that the innermost node is executed before an outer +node. This can be best thought of as nested function calls, returning their +results as they unwind. For example, a plan starting with an `Aggregate` +followed by a `Nested Loop`, followed by an `Index Only scan` can be thought of +as the following Ruby code: + +```ruby +aggregate( + nested_loop( + index_only_scan() + index_only_scan() + ) +) +``` + +Nodes are indicated using a `->` followed by the type of node taken. For +example: + +```sql +Aggregate (cost=922411.76..922411.77 rows=1 width=8) + -> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0) + Filter: (visibility_level = ANY ('{0,20}'::integer[])) +``` + +Here the first node executed is `Seq scan on projects`. The `Filter:` is an +additional filter applied to the results of the node. A filter is very similar +to Ruby's `Array#select`: it takes the input rows, applies the filter, and +produces a new list of rows. After the node is done, we perform the `Aggregate` +above it. + +Nested nodes look like this: + +```sql +Aggregate (cost=176.97..176.98 rows=1 width=8) (actual time=0.252..0.252 rows=1 loops=1) + Buffers: shared hit=155 + -> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1) + Buffers: shared hit=155 + -> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1) + Index Cond: (id < 100) + Heap Fetches: 0 + -> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36) + Index Cond: (id = users_1.id) + Heap Fetches: 0 +Planning time: 2.585 ms +Execution time: 0.310 ms +``` + +Here we first perform two separate "Index Only" scans, followed by performing a +"Nested Loop" on the result of these two scans. + +## Node statistics + +Each node in a plan has a set of associated statistics, such as the cost, the +number of rows produced, the number of loops performed, and more. For example: + +```sql +Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0) +``` + +Here we can see that our cost ranges from `0.00..908044.47` (we cover this in +a moment), and we estimate (since we're using `EXPLAIN` and not `EXPLAIN +ANALYZE`) a total of 5,746,914 rows to be produced by this node. The `width` +statistics describes the estimated width of each row, in bytes. + +The `costs` field specifies how expensive a node was. The cost is measured in +arbitrary units determined by the query planner's cost parameters. What +influences the costs depends on a variety of settings, such as `seq_page_cost`, +`cpu_tuple_cost`, and various others. +The format of the costs field is as follows: + +```sql +STARTUP COST..TOTAL COST +``` + +The startup cost states how expensive it was to start the node, with the total +cost describing how expensive the entire node was. In general: the greater the +values, the more expensive the node. + +When using `EXPLAIN ANALYZE`, these statistics also include the actual time +(in milliseconds) spent, and other runtime statistics (for example, the actual number of +produced rows): + +```sql +Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) +``` + +Here we can see we estimated 5,746,969 rows to be returned, but in reality we +returned 5,746,940 rows. We can also see that _just_ this sequential scan took +2.98 seconds to run. + +Using `EXPLAIN (ANALYZE, BUFFERS)` also gives us information about the +number of rows removed by a filter, the number of buffers used, and more. For +example: + +```sql +Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) + Filter: (visibility_level = ANY ('{0,20}'::integer[])) + Rows Removed by Filter: 65677 + Buffers: shared hit=208846 +``` + +Here we can see that our filter has to remove 65,677 rows, and that we use +208,846 buffers. Each buffer in PostgreSQL is 8 KB (8192 bytes), meaning our +above node uses *1.6 GB of buffers*. That's a lot! + +Keep in mind that some statistics are per-loop averages, while others are total values: + +| Field name | Value type | +| --- | --- | +| Actual Total Time | per-loop average | +| Actual Rows | per-loop average | +| Buffers Shared Hit | total value | +| Buffers Shared Read | total value | +| Buffers Shared Dirtied | total value | +| Buffers Shared Written | total value | +| I/O Read Time | total value | +| I/O Read Write | total value | + +For example: + +```sql + -> Index Scan using users_pkey on public.users (cost=0.43..3.44 rows=1 width=1318) (actual time=0.025..0.025 rows=1 loops=888) + Index Cond: (users.id = issues.author_id) + Buffers: shared hit=3543 read=9 + I/O Timings: read=17.760 write=0.000 +``` + +Here we can see that this node used 3552 buffers (3543 + 9), returned 888 rows (`888 * 1`), and the actual duration was 22.2 milliseconds (`888 * 0.025`). +17.76 milliseconds of the total duration was spent in reading from disk, to retrieve data that was not in the cache. + +## Node types + +There are quite a few different types of nodes, so we only cover some of the +more common ones here. + +A full list of all the available nodes and their descriptions can be found in +the [PostgreSQL source file `plannodes.h`](https://gitlab.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h). +pgMustard's [EXPLAIN docs](https://www.pgmustard.com/docs/explain) also offer detailed look into nodes and their fields. + +### Seq Scan + +A sequential scan over (a chunk of) a database table. This is like using +`Array#each`, but on a database table. Sequential scans can be quite slow when +retrieving lots of rows, so it's best to avoid these for large tables. + +### Index Only Scan + +A scan on an index that did not require fetching anything from the table. In +certain cases an index only scan may still fetch data from the table, in this +case the node includes a `Heap Fetches:` statistic. + +### Index Scan + +A scan on an index that required retrieving some data from the table. + +### Bitmap Index Scan and Bitmap Heap scan + +Bitmap scans fall between sequential scans and index scans. These are typically +used when we would read too much data from an index scan, but too little to +perform a sequential scan. A bitmap scan uses what is known as a [bitmap +index](https://en.wikipedia.org/wiki/Bitmap_index) to perform its work. + +The [source code of PostgreSQL](https://gitlab.com/postgres/postgres/blob/REL_11_STABLE/src/include/nodes/plannodes.h#L441) +states the following on bitmap scans: + +> Bitmap Index Scan delivers a bitmap of potential tuple locations; it does not +> access the heap itself. The bitmap is used by an ancestor Bitmap Heap Scan +> node, possibly after passing through intermediate Bitmap And and/or Bitmap Or +> nodes to combine it with the results of other Bitmap Index Scans. + +### Limit + +Applies a `LIMIT` on the input rows. + +### Sort + +Sorts the input rows as specified using an `ORDER BY` statement. + +### Nested Loop + +A nested loop executes its child nodes for every row produced by a node that +precedes it. For example: + +```sql +-> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1) + Buffers: shared hit=155 + -> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1) + Index Cond: (id < 100) + Heap Fetches: 0 + -> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36) + Index Cond: (id = users_1.id) + Heap Fetches: 0 +``` + +Here the first child node (`Index Only Scan using users_pkey on users users_1`) +produces 36 rows, and is executed once (`rows=36 loops=1`). The next node +produces 1 row (`rows=1`), but is repeated 36 times (`loops=36`). This is +because the previous node produced 36 rows. + +This means that nested loops can quickly slow the query down if the various +child nodes keep producing many rows. + +## Optimising queries + +With that out of the way, let's see how we can optimise a query. Let's use the +following query as an example: + +```sql +SELECT COUNT(*) +FROM users +WHERE twitter != ''; +``` + +This query counts the number of users that have a Twitter profile set. +Let's run this using `EXPLAIN (ANALYZE, BUFFERS)`: + +```sql +EXPLAIN (ANALYZE, BUFFERS) +SELECT COUNT(*) +FROM users +WHERE twitter != ''; +``` + +This produces the following plan: + +```sql +Aggregate (cost=845110.21..845110.22 rows=1 width=8) (actual time=1271.157..1271.158 rows=1 loops=1) + Buffers: shared hit=202662 + -> Seq Scan on users (cost=0.00..844969.99 rows=56087 width=0) (actual time=0.019..1265.883 rows=51833 loops=1) + Filter: ((twitter)::text <> ''::text) + Rows Removed by Filter: 2487813 + Buffers: shared hit=202662 +Planning time: 0.390 ms +Execution time: 1271.180 ms +``` + +From this query plan we can see the following: + +1. We need to perform a sequential scan on the `users` table. +1. This sequential scan filters out 2,487,813 rows using a `Filter`. +1. We use 202,622 buffers, which equals 1.58 GB of memory. +1. It takes us 1.2 seconds to do all of this. + +Considering we are just counting users, that's quite expensive! + +Before we start making any changes, let's see if there are any existing indexes +on the `users` table that we might be able to use. We can obtain this +information by running `\d users` in a `psql` console, then scrolling down to +the `Indexes:` section: + +```sql +Indexes: + "users_pkey" PRIMARY KEY, btree (id) + "index_users_on_confirmation_token" UNIQUE, btree (confirmation_token) + "index_users_on_email" UNIQUE, btree (email) + "index_users_on_reset_password_token" UNIQUE, btree (reset_password_token) + "index_users_on_static_object_token" UNIQUE, btree (static_object_token) + "index_users_on_unlock_token" UNIQUE, btree (unlock_token) + "index_on_users_name_lower" btree (lower(name::text)) + "index_users_on_accepted_term_id" btree (accepted_term_id) + "index_users_on_admin" btree (admin) + "index_users_on_created_at" btree (created_at) + "index_users_on_email_trigram" gin (email gin_trgm_ops) + "index_users_on_feed_token" btree (feed_token) + "index_users_on_group_view" btree (group_view) + "index_users_on_incoming_email_token" btree (incoming_email_token) + "index_users_on_managing_group_id" btree (managing_group_id) + "index_users_on_name" btree (name) + "index_users_on_name_trigram" gin (name gin_trgm_ops) + "index_users_on_public_email" btree (public_email) WHERE public_email::text <> ''::text + "index_users_on_state" btree (state) + "index_users_on_state_and_user_type" btree (state, user_type) + "index_users_on_unconfirmed_email" btree (unconfirmed_email) WHERE unconfirmed_email IS NOT NULL + "index_users_on_user_type" btree (user_type) + "index_users_on_username" btree (username) + "index_users_on_username_trigram" gin (username gin_trgm_ops) + "tmp_idx_on_user_id_where_bio_is_filled" btree (id) WHERE COALESCE(bio, ''::character varying)::text IS DISTINCT FROM ''::text +``` + +Here we can see there is no index on the `twitter` column, which means +PostgreSQL has to perform a sequential scan in this case. Let's try to fix this +by adding the following index: + +```sql +CREATE INDEX CONCURRENTLY twitter_test ON users (twitter); +``` + +If we now re-run our query using `EXPLAIN (ANALYZE, BUFFERS)` we get the +following plan: + +```sql +Aggregate (cost=61002.82..61002.83 rows=1 width=8) (actual time=297.311..297.312 rows=1 loops=1) + Buffers: shared hit=51854 dirtied=19 + -> Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1) + Filter: ((twitter)::text <> ''::text) + Rows Removed by Filter: 2487830 + Heap Fetches: 26037 + Buffers: shared hit=51854 dirtied=19 +Planning time: 0.191 ms +Execution time: 297.334 ms +``` + +Now it takes just under 300 milliseconds to get our data, instead of 1.2 +seconds. However, we still use 51,854 buffers, which is about 400 MB of memory. +300 milliseconds is also quite slow for such a simple query. To understand why +this query is still expensive, let's take a look at the following: + +```sql +Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1) + Filter: ((twitter)::text <> ''::text) + Rows Removed by Filter: 2487830 +``` + +We start with an index only scan on our index, but we somehow still apply a +`Filter` that filters out 2,487,830 rows. Why is that? Well, let's look at how +we created the index: + +```sql +CREATE INDEX CONCURRENTLY twitter_test ON users (twitter); +``` + +We told PostgreSQL to index all possible values of the `twitter` column, +even empty strings. Our query in turn uses `WHERE twitter != ''`. This means +that the index does improve things, as we don't need to do a sequential scan, +but we may still encounter empty strings. This means PostgreSQL _has_ to apply a +Filter on the index results to get rid of those values. + +Fortunately, we can improve this even further using "partial indexes". Partial +indexes are indexes with a `WHERE` condition that is applied when indexing data. +For example: + +```sql +CREATE INDEX CONCURRENTLY some_index ON users (email) WHERE id < 100 +``` + +This index would only index the `email` value of rows that match `WHERE id < +100`. We can use partial indexes to change our Twitter index to the following: + +```sql +CREATE INDEX CONCURRENTLY twitter_test ON users (twitter) WHERE twitter != ''; +``` + +After being created, if we run our query again we are given the following plan: + +```sql +Aggregate (cost=1608.26..1608.27 rows=1 width=8) (actual time=19.821..19.821 rows=1 loops=1) + Buffers: shared hit=44036 + -> Index Only Scan using twitter_test on users (cost=0.41..1479.71 rows=51420 width=0) (actual time=0.023..15.514 rows=51833 loops=1) + Heap Fetches: 1208 + Buffers: shared hit=44036 +Planning time: 0.123 ms +Execution time: 19.848 ms +``` + +That's _a lot_ better! Now it only takes 20 milliseconds to get the data, and we +only use about 344 MB of buffers (instead of the original 1.58 GB). The reason +this works is that now PostgreSQL no longer needs to apply a `Filter`, as the +index only contains `twitter` values that are not empty. + +Keep in mind that you shouldn't just add partial indexes every time you want to +optimise a query. Every index has to be updated for every write, and they may +require quite a bit of space, depending on the amount of indexed data. As a +result, first check if there are any existing indexes you may be able to reuse. +If there aren't any, check if you can perhaps slightly change an existing one to +fit both the existing and new queries. Only add a new index if none of the +existing indexes can be used in any way. + +When comparing execution plans, don't take timing as the only important metric. +Good timing is the main goal of any optimization, but it can be too volatile to +be used for comparison (for example, it depends a lot on the state of cache). +When optimizing a query, we usually need to reduce the amount of data we're +dealing with. Indexes are the way to work with fewer pages (buffers) to get the +result, so, during optimization, look at the number of buffers used (read and hit), +and work on reducing these numbers. Reduced timing is the consequence of reduced +buffer numbers. [Database Lab Engine](#database-lab-engine) guarantees that the plan is structurally +identical to production (and overall number of buffers is the same as on production), +but difference in cache state and I/O speed may lead to different timings. + +## Queries that can't be optimised + +Now that we have seen how to optimise a query, let's look at another query that +we might not be able to optimise: + +```sql +EXPLAIN (ANALYZE, BUFFERS) +SELECT COUNT(*) +FROM projects +WHERE visibility_level IN (0, 20); +``` + +The output of `EXPLAIN (ANALYZE, BUFFERS)` is as follows: + +```sql +Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1) + Buffers: shared hit=208846 + -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) + Filter: (visibility_level = ANY ('{0,20}'::integer[])) + Rows Removed by Filter: 65677 + Buffers: shared hit=208846 +Planning time: 2.861 ms +Execution time: 3428.596 ms +``` + +Looking at the output we see the following Filter: + +```sql +Filter: (visibility_level = ANY ('{0,20}'::integer[])) +Rows Removed by Filter: 65677 +``` + +Looking at the number of rows removed by the filter, we may be tempted to add an +index on `projects.visibility_level` to somehow turn this Sequential scan + +filter into an index-only scan. + +Unfortunately, doing so is unlikely to improve anything. Contrary to what some +might believe, an index being present _does not guarantee_ that PostgreSQL +actually uses it. For example, when doing a `SELECT * FROM projects` it is much +cheaper to just scan the entire table, instead of using an index and then +fetching data from the table. In such cases PostgreSQL may decide to not use an +index. + +Second, let's think for a moment what our query does: it gets all projects with +visibility level 0 or 20. In the above plan we can see this produces quite a lot +of rows (5,745,940), but how much is that relative to the total? Let's find out +by running the following query: + +```sql +SELECT visibility_level, count(*) AS amount +FROM projects +GROUP BY visibility_level +ORDER BY visibility_level ASC; +``` + +For GitLab.com this produces: + +```sql + visibility_level | amount +------------------+--------- + 0 | 5071325 + 10 | 65678 + 20 | 674801 +``` + +Here the total number of projects is 5,811,804, and 5,746,126 of those are of +level 0 or 20. That's 98% of the entire table! + +So no matter what we do, this query retrieves 98% of the entire table. Since +most time is spent doing exactly that, there isn't really much we can do to +improve this query, other than _not_ running it at all. + +What is important here is that while some may recommend to straight up add an +index the moment you see a sequential scan, it is _much more important_ to first +understand what your query does, how much data it retrieves, and so on. After +all, you can not optimise something you do not understand. + +### Cardinality and selectivity + +Earlier we saw that our query had to retrieve 98% of the rows in the table. +There are two terms commonly used for databases: cardinality, and selectivity. +Cardinality refers to the number of unique values in a particular column in a +table. + +Selectivity is the number of unique values produced by an operation (for example, an +index scan or filter), relative to the total number of rows. The higher the +selectivity, the more likely PostgreSQL is able to use an index. + +In the above example, there are only 3 unique values: 0, 10, and 20. This means +the cardinality is 3. The selectivity in turn is also very low: 0.0000003% (2 / +5,811,804), because our `Filter` only filters using two values (`0` and `20`). +With such a low selectivity value it's not surprising that PostgreSQL decides +using an index is not worth it, because it would produce almost no unique rows. + +## Rewriting queries + +So the above query can't really be optimised as-is, or at least not much. But +what if we slightly change the purpose of it? What if instead of retrieving all +projects with `visibility_level` 0 or 20, we retrieve those that a user +interacted with somehow? + +Fortunately, GitLab has an answer for this, and it's a table called +`user_interacted_projects`. This table has the following schema: + +```sql +Table "public.user_interacted_projects" + Column | Type | Modifiers +------------+---------+----------- + user_id | integer | not null + project_id | integer | not null +Indexes: + "index_user_interacted_projects_on_project_id_and_user_id" UNIQUE, btree (project_id, user_id) + "index_user_interacted_projects_on_user_id" btree (user_id) +Foreign-key constraints: + "fk_rails_0894651f08" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE + "fk_rails_722ceba4f7" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE +``` + +Let's rewrite our query to `JOIN` this table onto our projects, and get the +projects for a specific user: + +```sql +EXPLAIN ANALYZE +SELECT COUNT(*) +FROM projects +INNER JOIN user_interacted_projects ON user_interacted_projects.project_id = projects.id +WHERE projects.visibility_level IN (0, 20) +AND user_interacted_projects.user_id = 1; +``` + +What we do here is the following: + +1. Get our projects. +1. `INNER JOIN` `user_interacted_projects`, meaning we're only left with rows in + `projects` that have a corresponding row in `user_interacted_projects`. +1. Limit this to the projects with `visibility_level` of 0 or 20, and to + projects that the user with ID 1 interacted with. + +If we run this query we get the following plan: + +```sql + Aggregate (cost=871.03..871.04 rows=1 width=8) (actual time=9.763..9.763 rows=1 loops=1) + -> Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1) + -> Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1) + Index Cond: (user_id = 1) + -> Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145) + Index Cond: (id = user_interacted_projects.project_id) + Filter: (visibility_level = ANY ('{0,20}'::integer[])) + Rows Removed by Filter: 0 + Planning time: 2.614 ms + Execution time: 9.809 ms +``` + +Here it only took us just under 10 milliseconds to get the data. We can also see +we're retrieving far fewer projects: + +```sql +Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145) + Index Cond: (id = user_interacted_projects.project_id) + Filter: (visibility_level = ANY ('{0,20}'::integer[])) + Rows Removed by Filter: 0 +``` + +Here we see we perform 145 loops (`loops=145`), with every loop producing 1 row +(`rows=1`). This is much less than before, and our query performs much better! + +If we look at the plan we also see our costs are very low: + +```sql +Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145) +``` + +Here our cost is only 3.45, and it takes us 7.25 milliseconds to do so (0.05 * 145). +The next index scan is a bit more expensive: + +```sql +Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1) +``` + +Here the cost is 160.71 (`cost=0.43..160.71`), taking about 2.5 milliseconds +(based on the output of `actual time=....`). + +The most expensive part here is the "Nested Loop" that acts upon the result of +these two index scans: + +```sql +Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1) +``` + +Here we had to perform 870.52 disk page fetches for 203 rows, 9.748 +milliseconds, producing 143 rows in a single loop. + +The key takeaway here is that sometimes you have to rewrite (parts of) a query +to make it better. Sometimes that means having to slightly change your feature +to accommodate for better performance. + +## What makes a bad plan + +This is a bit of a difficult question to answer, because the definition of "bad" +is relative to the problem you are trying to solve. However, some patterns are +best avoided in most cases, such as: + +- Sequential scans on large tables +- Filters that remove a lot of rows +- Performing a certain step that requires _a lot_ of + buffers (for example, an index scan for GitLab.com that requires more than 512 MB). + +As a general guideline, aim for a query that: + +1. Takes no more than 10 milliseconds. Our target time spent in SQL per request + is around 100 milliseconds, so every query should be as fast as possible. +1. Does not use an excessive number of buffers, relative to the workload. For + example, retrieving ten rows shouldn't require 1 GB of buffers. +1. Does not spend a long amount of time performing disk IO operations. The + setting `track_io_timing` must be enabled for this data to be included in the + output of `EXPLAIN ANALYZE`. +1. Applies a `LIMIT` when retrieving rows without aggregating them, such as + `SELECT * FROM users`. +1. Doesn't use a `Filter` to filter out too many rows, especially if the query + does not use a `LIMIT` to limit the number of returned rows. Filters can + usually be removed by adding a (partial) index. + +These are _guidelines_ and not hard requirements, as different needs may require +different queries. The only _rule_ is that you _must always measure_ your query +(preferably using a production-like database) using `EXPLAIN (ANALYZE, BUFFERS)` +and related tools such as: + +- [`explain.depesz.com`](https://explain.depesz.com/). +- [`explain.dalibo.com/`](https://explain.dalibo.com/). + +## Producing query plans + +There are a few ways to get the output of a query plan. Of course you +can directly run the `EXPLAIN` query in the `psql` console, or you can +follow one of the other options below. + +### Database Lab Engine + +GitLab team members can use [Database Lab Engine](https://gitlab.com/postgres-ai/database-lab), and the companion +SQL optimization tool - [Joe Bot](https://gitlab.com/postgres-ai/joe). + +Database Lab Engine provides developers with their own clone of the production database, while Joe Bot helps with exploring execution plans. + +Joe Bot is available in the [`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C) channel on Slack, +and through its [web interface](https://console.postgres.ai/gitlab/joe-instances). + +With Joe Bot you can execute DDL statements (like creating indexes, tables, and columns) and get query plans for `SELECT`, `UPDATE`, and `DELETE` statements. + +For example, in order to test new index on a column that is not existing on production yet, you can do the following: + +Create the column: + +```sql +exec ALTER TABLE projects ADD COLUMN last_at timestamp without time zone +``` + +Create the index: + +```sql +exec CREATE INDEX index_projects_last_activity ON projects (last_activity_at) WHERE last_activity_at IS NOT NULL +``` + +Analyze the table to update its statistics: + +```sql +exec ANALYZE projects +``` + +Get the query plan: + +```sql +explain SELECT * FROM projects WHERE last_activity_at < CURRENT_DATE +``` + +Once done you can rollback your changes: + +```sql +reset +``` + +For more information about the available options, run: + +```sql +help +``` + +The web interface comes with the following execution plan visualizers included: + +- [Depesz](https://explain.depesz.com/) +- [PEV2](https://github.com/dalibo/pev2) +- [FlameGraph](https://github.com/mgartner/pg_flame) + +#### Tips & Tricks + +The database connection is now maintained during your whole session, so you can use `exec set ...` for any session variables (such as `enable_seqscan` or `work_mem`). These settings are applied to all subsequent commands until you reset them. For example you can disable parallel queries with + +```sql +exec SET max_parallel_workers_per_gather = 0 +``` + +### Rails console + +Using the [`activerecord-explain-analyze`](https://github.com/6/activerecord-explain-analyze) +you can directly generate the query plan from the Rails console: + +```ruby +pry(main)> require 'activerecord-explain-analyze' +=> true +pry(main)> Project.where('build_timeout > ?', 3600).explain(analyze: true) + Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600) + ↳ (pry):12 +=> EXPLAIN for: SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600) +Seq Scan on public.projects (cost=0.00..2.17 rows=1 width=742) (actual time=0.040..0.041 rows=0 loops=1) + Output: id, name, path, description, created_at, updated_at, creator_id, namespace_id, ... + Filter: (projects.build_timeout > 3600) + Rows Removed by Filter: 14 + Buffers: shared hit=2 +Planning time: 0.411 ms +Execution time: 0.113 ms +``` + +### ChatOps + +[GitLab team members can also use our ChatOps solution, available in Slack using the +`/chatops` slash command](../chatops_on_gitlabcom.md). + +NOTE: +While ChatOps is still available, the recommended way to generate execution plans is to use [Database Lab Engine](#database-lab-engine). + +You can use ChatOps to get a query plan by running the following: + +```sql +/chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20) +``` + +Visualising the plan using <https://explain.depesz.com/> is also supported: + +```sql +/chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20) +``` + +Quoting the query is not necessary. + +For more information about the available options, run: + +```sql +/chatops run explain --help +``` + +## Further reading + +A more extensive guide on understanding query plans can be found in +the [presentation](https://public.dalibo.com/exports/conferences/_archives/_2012/201211_explain/understanding_explain.pdf) +from [Dalibo.org](https://www.dalibo.com/en/). + +Depesz's blog also has a good [section](https://www.depesz.com/tag/unexplainable/) dedicated to query plans. diff --git a/doc/development/database_review.md b/doc/development/database_review.md index 37643f04db7a..720be082f9d7 100644 --- a/doc/development/database_review.md +++ b/doc/development/database_review.md @@ -179,7 +179,7 @@ Include in the MR description: - [explain.depesz.com](https://explain.depesz.com) or [explain.dalibo.com](https://explain.dalibo.com): Paste both the plan and the query used in the form. - When providing query plans, make sure it hits enough data: - You can use a GitLab production replica to test your queries on a large scale, - through the `#database-lab` Slack channel or through [ChatOps](understanding_explain_plans.md#chatops). + through the `#database-lab` Slack channel or through [ChatOps](database/understanding_explain_plans.md#chatops). - Usually, the `gitlab-org` namespace (`namespace_id = 9970`) and the `gitlab-org/gitlab-foss` (`project_id = 13083`) or the `gitlab-org/gitlab` (`project_id = 278964`) projects provide enough data to serve as a good example. @@ -187,7 +187,7 @@ Include in the MR description: - If your queries belong to a new feature in GitLab.com and thus they don't return data in production: - You may analyze the query and to provide the plan from a local environment. - `#database-lab` and [postgres.ai](https://postgres.ai/) both allow updates to data (`exec UPDATE issues SET ...`) and creation of new tables and columns (`exec ALTER TABLE issues ADD COLUMN ...`). - - More information on how to find the number of actual returned records in [Understanding EXPLAIN plans](understanding_explain_plans.md) + - More information on how to find the number of actual returned records in [Understanding EXPLAIN plans](database/understanding_explain_plans.md) - For query changes, it is best to provide both the SQL queries along with the plan _before_ and _after_ the change. This helps spot differences quickly. - Include data that shows the performance improvement, preferably in @@ -267,10 +267,10 @@ Include in the MR description: - Check for any overly complex queries and queries the author specifically points out for review (if any) - If not present, ask the author to provide SQL queries and query plans - (for example, by using [ChatOps](understanding_explain_plans.md#chatops) or direct + (for example, by using [ChatOps](database/understanding_explain_plans.md#chatops) or direct database access) - For given queries, review parameters regarding data distribution - - [Check query plans](understanding_explain_plans.md) and suggest improvements + - [Check query plans](database/understanding_explain_plans.md) and suggest improvements to queries (changing the query, schema or adding indexes and similar) - General guideline is for queries to come in below [100ms execution time](query_performance.md#timing-guidelines-for-queries) - Avoid N+1 problems and minimize the [query count](merge_request_performance_guidelines.md#query-counts). diff --git a/doc/development/query_performance.md b/doc/development/query_performance.md index 4fe27d42c388..139dc025190a 100644 --- a/doc/development/query_performance.md +++ b/doc/development/query_performance.md @@ -36,9 +36,9 @@ The first time a query is made, it is made on a "cold cache". Meaning it needs to read from disk. If you run the query again, the data can be read from the cache, or what PostgreSQL calls shared buffers. This is the "warm cache" query. -When analyzing an [`EXPLAIN` plan](understanding_explain_plans.md), you can see +When analyzing an [`EXPLAIN` plan](database/understanding_explain_plans.md), you can see the difference not only in the timing, but by looking at the output for `Buffers` -by running your explain with `EXPLAIN(analyze, buffers)`. [Database Lab](understanding_explain_plans.md#database-lab-engine) +by running your explain with `EXPLAIN(analyze, buffers)`. [Database Lab](database/understanding_explain_plans.md#database-lab-engine) automatically includes these options. If you are making a warm cache query, you see only the `shared hits`. diff --git a/doc/development/understanding_explain_plans.md b/doc/development/understanding_explain_plans.md index 17fcd5b3e885..72c3df11a96f 100644 --- a/doc/development/understanding_explain_plans.md +++ b/doc/development/understanding_explain_plans.md @@ -1,829 +1,11 @@ --- -stage: Data Stores -group: Database -info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +redirect_to: 'database/understanding_explain_plans.md' +remove_date: '2022-11-04' --- -# Understanding EXPLAIN plans +This document was moved to [another location](database/understanding_explain_plans.md). -PostgreSQL allows you to obtain query plans using the `EXPLAIN` command. This -command can be invaluable when trying to determine how a query performs. -You can use this command directly in your SQL query, as long as the query starts -with it: - -```sql -EXPLAIN -SELECT COUNT(*) -FROM projects -WHERE visibility_level IN (0, 20); -``` - -When running this on GitLab.com, we are presented with the following output: - -```sql -Aggregate (cost=922411.76..922411.77 rows=1 width=8) - -> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0) - Filter: (visibility_level = ANY ('{0,20}'::integer[])) -``` - -When using _just_ `EXPLAIN`, PostgreSQL does not actually execute our query, -instead it produces an _estimated_ execution plan based on the available -statistics. This means the actual plan can differ quite a bit. Fortunately, -PostgreSQL provides us with the option to execute the query as well. To do so, -we need to use `EXPLAIN ANALYZE` instead of just `EXPLAIN`: - -```sql -EXPLAIN ANALYZE -SELECT COUNT(*) -FROM projects -WHERE visibility_level IN (0, 20); -``` - -This produces: - -```sql -Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1) - -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) - Filter: (visibility_level = ANY ('{0,20}'::integer[])) - Rows Removed by Filter: 65677 -Planning time: 2.861 ms -Execution time: 3428.596 ms -``` - -As we can see this plan is quite different, and includes a lot more data. Let's -discuss this step by step. - -Because `EXPLAIN ANALYZE` executes the query, care should be taken when using a -query that writes data or might time out. If the query modifies data, -consider wrapping it in a transaction that rolls back automatically like so: - -```sql -BEGIN; -EXPLAIN ANALYZE -DELETE FROM users WHERE id = 1; -ROLLBACK; -``` - -The `EXPLAIN` command also takes additional options, such as `BUFFERS`: - -```sql -EXPLAIN (ANALYZE, BUFFERS) -SELECT COUNT(*) -FROM projects -WHERE visibility_level IN (0, 20); -``` - -This then produces: - -```sql -Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1) - Buffers: shared hit=208846 - -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) - Filter: (visibility_level = ANY ('{0,20}'::integer[])) - Rows Removed by Filter: 65677 - Buffers: shared hit=208846 -Planning time: 2.861 ms -Execution time: 3428.596 ms -``` - -For more information, refer to the official -[`EXPLAIN` documentation](https://www.postgresql.org/docs/current/sql-explain.html) -and [using `EXPLAIN` guide](https://www.postgresql.org/docs/current/using-explain.html). - -## Nodes - -Every query plan consists of nodes. Nodes can be nested, and are executed from -the inside out. This means that the innermost node is executed before an outer -node. This can be best thought of as nested function calls, returning their -results as they unwind. For example, a plan starting with an `Aggregate` -followed by a `Nested Loop`, followed by an `Index Only scan` can be thought of -as the following Ruby code: - -```ruby -aggregate( - nested_loop( - index_only_scan() - index_only_scan() - ) -) -``` - -Nodes are indicated using a `->` followed by the type of node taken. For -example: - -```sql -Aggregate (cost=922411.76..922411.77 rows=1 width=8) - -> Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0) - Filter: (visibility_level = ANY ('{0,20}'::integer[])) -``` - -Here the first node executed is `Seq scan on projects`. The `Filter:` is an -additional filter applied to the results of the node. A filter is very similar -to Ruby's `Array#select`: it takes the input rows, applies the filter, and -produces a new list of rows. After the node is done, we perform the `Aggregate` -above it. - -Nested nodes look like this: - -```sql -Aggregate (cost=176.97..176.98 rows=1 width=8) (actual time=0.252..0.252 rows=1 loops=1) - Buffers: shared hit=155 - -> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1) - Buffers: shared hit=155 - -> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1) - Index Cond: (id < 100) - Heap Fetches: 0 - -> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36) - Index Cond: (id = users_1.id) - Heap Fetches: 0 -Planning time: 2.585 ms -Execution time: 0.310 ms -``` - -Here we first perform two separate "Index Only" scans, followed by performing a -"Nested Loop" on the result of these two scans. - -## Node statistics - -Each node in a plan has a set of associated statistics, such as the cost, the -number of rows produced, the number of loops performed, and more. For example: - -```sql -Seq Scan on projects (cost=0.00..908044.47 rows=5746914 width=0) -``` - -Here we can see that our cost ranges from `0.00..908044.47` (we cover this in -a moment), and we estimate (since we're using `EXPLAIN` and not `EXPLAIN -ANALYZE`) a total of 5,746,914 rows to be produced by this node. The `width` -statistics describes the estimated width of each row, in bytes. - -The `costs` field specifies how expensive a node was. The cost is measured in -arbitrary units determined by the query planner's cost parameters. What -influences the costs depends on a variety of settings, such as `seq_page_cost`, -`cpu_tuple_cost`, and various others. -The format of the costs field is as follows: - -```sql -STARTUP COST..TOTAL COST -``` - -The startup cost states how expensive it was to start the node, with the total -cost describing how expensive the entire node was. In general: the greater the -values, the more expensive the node. - -When using `EXPLAIN ANALYZE`, these statistics also include the actual time -(in milliseconds) spent, and other runtime statistics (for example, the actual number of -produced rows): - -```sql -Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) -``` - -Here we can see we estimated 5,746,969 rows to be returned, but in reality we -returned 5,746,940 rows. We can also see that _just_ this sequential scan took -2.98 seconds to run. - -Using `EXPLAIN (ANALYZE, BUFFERS)` also gives us information about the -number of rows removed by a filter, the number of buffers used, and more. For -example: - -```sql -Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) - Filter: (visibility_level = ANY ('{0,20}'::integer[])) - Rows Removed by Filter: 65677 - Buffers: shared hit=208846 -``` - -Here we can see that our filter has to remove 65,677 rows, and that we use -208,846 buffers. Each buffer in PostgreSQL is 8 KB (8192 bytes), meaning our -above node uses *1.6 GB of buffers*. That's a lot! - -Keep in mind that some statistics are per-loop averages, while others are total values: - -| Field name | Value type | -| --- | --- | -| Actual Total Time | per-loop average | -| Actual Rows | per-loop average | -| Buffers Shared Hit | total value | -| Buffers Shared Read | total value | -| Buffers Shared Dirtied | total value | -| Buffers Shared Written | total value | -| I/O Read Time | total value | -| I/O Read Write | total value | - -For example: - -```sql - -> Index Scan using users_pkey on public.users (cost=0.43..3.44 rows=1 width=1318) (actual time=0.025..0.025 rows=1 loops=888) - Index Cond: (users.id = issues.author_id) - Buffers: shared hit=3543 read=9 - I/O Timings: read=17.760 write=0.000 -``` - -Here we can see that this node used 3552 buffers (3543 + 9), returned 888 rows (`888 * 1`), and the actual duration was 22.2 milliseconds (`888 * 0.025`). -17.76 milliseconds of the total duration was spent in reading from disk, to retrieve data that was not in the cache. - -## Node types - -There are quite a few different types of nodes, so we only cover some of the -more common ones here. - -A full list of all the available nodes and their descriptions can be found in -the [PostgreSQL source file `plannodes.h`](https://gitlab.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h). -pgMustard's [EXPLAIN docs](https://www.pgmustard.com/docs/explain) also offer detailed look into nodes and their fields. - -### Seq Scan - -A sequential scan over (a chunk of) a database table. This is like using -`Array#each`, but on a database table. Sequential scans can be quite slow when -retrieving lots of rows, so it's best to avoid these for large tables. - -### Index Only Scan - -A scan on an index that did not require fetching anything from the table. In -certain cases an index only scan may still fetch data from the table, in this -case the node includes a `Heap Fetches:` statistic. - -### Index Scan - -A scan on an index that required retrieving some data from the table. - -### Bitmap Index Scan and Bitmap Heap scan - -Bitmap scans fall between sequential scans and index scans. These are typically -used when we would read too much data from an index scan, but too little to -perform a sequential scan. A bitmap scan uses what is known as a [bitmap -index](https://en.wikipedia.org/wiki/Bitmap_index) to perform its work. - -The [source code of PostgreSQL](https://gitlab.com/postgres/postgres/blob/REL_11_STABLE/src/include/nodes/plannodes.h#L441) -states the following on bitmap scans: - -> Bitmap Index Scan delivers a bitmap of potential tuple locations; it does not -> access the heap itself. The bitmap is used by an ancestor Bitmap Heap Scan -> node, possibly after passing through intermediate Bitmap And and/or Bitmap Or -> nodes to combine it with the results of other Bitmap Index Scans. - -### Limit - -Applies a `LIMIT` on the input rows. - -### Sort - -Sorts the input rows as specified using an `ORDER BY` statement. - -### Nested Loop - -A nested loop executes its child nodes for every row produced by a node that -precedes it. For example: - -```sql --> Nested Loop (cost=0.86..176.75 rows=87 width=0) (actual time=0.035..0.249 rows=36 loops=1) - Buffers: shared hit=155 - -> Index Only Scan using users_pkey on users users_1 (cost=0.43..4.95 rows=87 width=4) (actual time=0.029..0.123 rows=36 loops=1) - Index Cond: (id < 100) - Heap Fetches: 0 - -> Index Only Scan using users_pkey on users (cost=0.43..1.96 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=36) - Index Cond: (id = users_1.id) - Heap Fetches: 0 -``` - -Here the first child node (`Index Only Scan using users_pkey on users users_1`) -produces 36 rows, and is executed once (`rows=36 loops=1`). The next node -produces 1 row (`rows=1`), but is repeated 36 times (`loops=36`). This is -because the previous node produced 36 rows. - -This means that nested loops can quickly slow the query down if the various -child nodes keep producing many rows. - -## Optimising queries - -With that out of the way, let's see how we can optimise a query. Let's use the -following query as an example: - -```sql -SELECT COUNT(*) -FROM users -WHERE twitter != ''; -``` - -This query counts the number of users that have a Twitter profile set. -Let's run this using `EXPLAIN (ANALYZE, BUFFERS)`: - -```sql -EXPLAIN (ANALYZE, BUFFERS) -SELECT COUNT(*) -FROM users -WHERE twitter != ''; -``` - -This produces the following plan: - -```sql -Aggregate (cost=845110.21..845110.22 rows=1 width=8) (actual time=1271.157..1271.158 rows=1 loops=1) - Buffers: shared hit=202662 - -> Seq Scan on users (cost=0.00..844969.99 rows=56087 width=0) (actual time=0.019..1265.883 rows=51833 loops=1) - Filter: ((twitter)::text <> ''::text) - Rows Removed by Filter: 2487813 - Buffers: shared hit=202662 -Planning time: 0.390 ms -Execution time: 1271.180 ms -``` - -From this query plan we can see the following: - -1. We need to perform a sequential scan on the `users` table. -1. This sequential scan filters out 2,487,813 rows using a `Filter`. -1. We use 202,622 buffers, which equals 1.58 GB of memory. -1. It takes us 1.2 seconds to do all of this. - -Considering we are just counting users, that's quite expensive! - -Before we start making any changes, let's see if there are any existing indexes -on the `users` table that we might be able to use. We can obtain this -information by running `\d users` in a `psql` console, then scrolling down to -the `Indexes:` section: - -```sql -Indexes: - "users_pkey" PRIMARY KEY, btree (id) - "index_users_on_confirmation_token" UNIQUE, btree (confirmation_token) - "index_users_on_email" UNIQUE, btree (email) - "index_users_on_reset_password_token" UNIQUE, btree (reset_password_token) - "index_users_on_static_object_token" UNIQUE, btree (static_object_token) - "index_users_on_unlock_token" UNIQUE, btree (unlock_token) - "index_on_users_name_lower" btree (lower(name::text)) - "index_users_on_accepted_term_id" btree (accepted_term_id) - "index_users_on_admin" btree (admin) - "index_users_on_created_at" btree (created_at) - "index_users_on_email_trigram" gin (email gin_trgm_ops) - "index_users_on_feed_token" btree (feed_token) - "index_users_on_group_view" btree (group_view) - "index_users_on_incoming_email_token" btree (incoming_email_token) - "index_users_on_managing_group_id" btree (managing_group_id) - "index_users_on_name" btree (name) - "index_users_on_name_trigram" gin (name gin_trgm_ops) - "index_users_on_public_email" btree (public_email) WHERE public_email::text <> ''::text - "index_users_on_state" btree (state) - "index_users_on_state_and_user_type" btree (state, user_type) - "index_users_on_unconfirmed_email" btree (unconfirmed_email) WHERE unconfirmed_email IS NOT NULL - "index_users_on_user_type" btree (user_type) - "index_users_on_username" btree (username) - "index_users_on_username_trigram" gin (username gin_trgm_ops) - "tmp_idx_on_user_id_where_bio_is_filled" btree (id) WHERE COALESCE(bio, ''::character varying)::text IS DISTINCT FROM ''::text -``` - -Here we can see there is no index on the `twitter` column, which means -PostgreSQL has to perform a sequential scan in this case. Let's try to fix this -by adding the following index: - -```sql -CREATE INDEX CONCURRENTLY twitter_test ON users (twitter); -``` - -If we now re-run our query using `EXPLAIN (ANALYZE, BUFFERS)` we get the -following plan: - -```sql -Aggregate (cost=61002.82..61002.83 rows=1 width=8) (actual time=297.311..297.312 rows=1 loops=1) - Buffers: shared hit=51854 dirtied=19 - -> Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1) - Filter: ((twitter)::text <> ''::text) - Rows Removed by Filter: 2487830 - Heap Fetches: 26037 - Buffers: shared hit=51854 dirtied=19 -Planning time: 0.191 ms -Execution time: 297.334 ms -``` - -Now it takes just under 300 milliseconds to get our data, instead of 1.2 -seconds. However, we still use 51,854 buffers, which is about 400 MB of memory. -300 milliseconds is also quite slow for such a simple query. To understand why -this query is still expensive, let's take a look at the following: - -```sql -Index Only Scan using twitter_test on users (cost=0.43..60873.13 rows=51877 width=0) (actual time=279.184..293.532 rows=51833 loops=1) - Filter: ((twitter)::text <> ''::text) - Rows Removed by Filter: 2487830 -``` - -We start with an index only scan on our index, but we somehow still apply a -`Filter` that filters out 2,487,830 rows. Why is that? Well, let's look at how -we created the index: - -```sql -CREATE INDEX CONCURRENTLY twitter_test ON users (twitter); -``` - -We told PostgreSQL to index all possible values of the `twitter` column, -even empty strings. Our query in turn uses `WHERE twitter != ''`. This means -that the index does improve things, as we don't need to do a sequential scan, -but we may still encounter empty strings. This means PostgreSQL _has_ to apply a -Filter on the index results to get rid of those values. - -Fortunately, we can improve this even further using "partial indexes". Partial -indexes are indexes with a `WHERE` condition that is applied when indexing data. -For example: - -```sql -CREATE INDEX CONCURRENTLY some_index ON users (email) WHERE id < 100 -``` - -This index would only index the `email` value of rows that match `WHERE id < -100`. We can use partial indexes to change our Twitter index to the following: - -```sql -CREATE INDEX CONCURRENTLY twitter_test ON users (twitter) WHERE twitter != ''; -``` - -After being created, if we run our query again we are given the following plan: - -```sql -Aggregate (cost=1608.26..1608.27 rows=1 width=8) (actual time=19.821..19.821 rows=1 loops=1) - Buffers: shared hit=44036 - -> Index Only Scan using twitter_test on users (cost=0.41..1479.71 rows=51420 width=0) (actual time=0.023..15.514 rows=51833 loops=1) - Heap Fetches: 1208 - Buffers: shared hit=44036 -Planning time: 0.123 ms -Execution time: 19.848 ms -``` - -That's _a lot_ better! Now it only takes 20 milliseconds to get the data, and we -only use about 344 MB of buffers (instead of the original 1.58 GB). The reason -this works is that now PostgreSQL no longer needs to apply a `Filter`, as the -index only contains `twitter` values that are not empty. - -Keep in mind that you shouldn't just add partial indexes every time you want to -optimise a query. Every index has to be updated for every write, and they may -require quite a bit of space, depending on the amount of indexed data. As a -result, first check if there are any existing indexes you may be able to reuse. -If there aren't any, check if you can perhaps slightly change an existing one to -fit both the existing and new queries. Only add a new index if none of the -existing indexes can be used in any way. - -When comparing execution plans, don't take timing as the only important metric. -Good timing is the main goal of any optimization, but it can be too volatile to -be used for comparison (for example, it depends a lot on the state of cache). -When optimizing a query, we usually need to reduce the amount of data we're -dealing with. Indexes are the way to work with fewer pages (buffers) to get the -result, so, during optimization, look at the number of buffers used (read and hit), -and work on reducing these numbers. Reduced timing is the consequence of reduced -buffer numbers. [Database Lab Engine](#database-lab-engine) guarantees that the plan is structurally -identical to production (and overall number of buffers is the same as on production), -but difference in cache state and I/O speed may lead to different timings. - -## Queries that can't be optimised - -Now that we have seen how to optimise a query, let's look at another query that -we might not be able to optimise: - -```sql -EXPLAIN (ANALYZE, BUFFERS) -SELECT COUNT(*) -FROM projects -WHERE visibility_level IN (0, 20); -``` - -The output of `EXPLAIN (ANALYZE, BUFFERS)` is as follows: - -```sql -Aggregate (cost=922420.60..922420.61 rows=1 width=8) (actual time=3428.535..3428.535 rows=1 loops=1) - Buffers: shared hit=208846 - -> Seq Scan on projects (cost=0.00..908053.18 rows=5746969 width=0) (actual time=0.041..2987.606 rows=5746940 loops=1) - Filter: (visibility_level = ANY ('{0,20}'::integer[])) - Rows Removed by Filter: 65677 - Buffers: shared hit=208846 -Planning time: 2.861 ms -Execution time: 3428.596 ms -``` - -Looking at the output we see the following Filter: - -```sql -Filter: (visibility_level = ANY ('{0,20}'::integer[])) -Rows Removed by Filter: 65677 -``` - -Looking at the number of rows removed by the filter, we may be tempted to add an -index on `projects.visibility_level` to somehow turn this Sequential scan + -filter into an index-only scan. - -Unfortunately, doing so is unlikely to improve anything. Contrary to what some -might believe, an index being present _does not guarantee_ that PostgreSQL -actually uses it. For example, when doing a `SELECT * FROM projects` it is much -cheaper to just scan the entire table, instead of using an index and then -fetching data from the table. In such cases PostgreSQL may decide to not use an -index. - -Second, let's think for a moment what our query does: it gets all projects with -visibility level 0 or 20. In the above plan we can see this produces quite a lot -of rows (5,745,940), but how much is that relative to the total? Let's find out -by running the following query: - -```sql -SELECT visibility_level, count(*) AS amount -FROM projects -GROUP BY visibility_level -ORDER BY visibility_level ASC; -``` - -For GitLab.com this produces: - -```sql - visibility_level | amount -------------------+--------- - 0 | 5071325 - 10 | 65678 - 20 | 674801 -``` - -Here the total number of projects is 5,811,804, and 5,746,126 of those are of -level 0 or 20. That's 98% of the entire table! - -So no matter what we do, this query retrieves 98% of the entire table. Since -most time is spent doing exactly that, there isn't really much we can do to -improve this query, other than _not_ running it at all. - -What is important here is that while some may recommend to straight up add an -index the moment you see a sequential scan, it is _much more important_ to first -understand what your query does, how much data it retrieves, and so on. After -all, you can not optimise something you do not understand. - -### Cardinality and selectivity - -Earlier we saw that our query had to retrieve 98% of the rows in the table. -There are two terms commonly used for databases: cardinality, and selectivity. -Cardinality refers to the number of unique values in a particular column in a -table. - -Selectivity is the number of unique values produced by an operation (for example, an -index scan or filter), relative to the total number of rows. The higher the -selectivity, the more likely PostgreSQL is able to use an index. - -In the above example, there are only 3 unique values: 0, 10, and 20. This means -the cardinality is 3. The selectivity in turn is also very low: 0.0000003% (2 / -5,811,804), because our `Filter` only filters using two values (`0` and `20`). -With such a low selectivity value it's not surprising that PostgreSQL decides -using an index is not worth it, because it would produce almost no unique rows. - -## Rewriting queries - -So the above query can't really be optimised as-is, or at least not much. But -what if we slightly change the purpose of it? What if instead of retrieving all -projects with `visibility_level` 0 or 20, we retrieve those that a user -interacted with somehow? - -Fortunately, GitLab has an answer for this, and it's a table called -`user_interacted_projects`. This table has the following schema: - -```sql -Table "public.user_interacted_projects" - Column | Type | Modifiers -------------+---------+----------- - user_id | integer | not null - project_id | integer | not null -Indexes: - "index_user_interacted_projects_on_project_id_and_user_id" UNIQUE, btree (project_id, user_id) - "index_user_interacted_projects_on_user_id" btree (user_id) -Foreign-key constraints: - "fk_rails_0894651f08" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE - "fk_rails_722ceba4f7" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE -``` - -Let's rewrite our query to `JOIN` this table onto our projects, and get the -projects for a specific user: - -```sql -EXPLAIN ANALYZE -SELECT COUNT(*) -FROM projects -INNER JOIN user_interacted_projects ON user_interacted_projects.project_id = projects.id -WHERE projects.visibility_level IN (0, 20) -AND user_interacted_projects.user_id = 1; -``` - -What we do here is the following: - -1. Get our projects. -1. `INNER JOIN` `user_interacted_projects`, meaning we're only left with rows in - `projects` that have a corresponding row in `user_interacted_projects`. -1. Limit this to the projects with `visibility_level` of 0 or 20, and to - projects that the user with ID 1 interacted with. - -If we run this query we get the following plan: - -```sql - Aggregate (cost=871.03..871.04 rows=1 width=8) (actual time=9.763..9.763 rows=1 loops=1) - -> Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1) - -> Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1) - Index Cond: (user_id = 1) - -> Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145) - Index Cond: (id = user_interacted_projects.project_id) - Filter: (visibility_level = ANY ('{0,20}'::integer[])) - Rows Removed by Filter: 0 - Planning time: 2.614 ms - Execution time: 9.809 ms -``` - -Here it only took us just under 10 milliseconds to get the data. We can also see -we're retrieving far fewer projects: - -```sql -Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145) - Index Cond: (id = user_interacted_projects.project_id) - Filter: (visibility_level = ANY ('{0,20}'::integer[])) - Rows Removed by Filter: 0 -``` - -Here we see we perform 145 loops (`loops=145`), with every loop producing 1 row -(`rows=1`). This is much less than before, and our query performs much better! - -If we look at the plan we also see our costs are very low: - -```sql -Index Scan using projects_pkey on projects (cost=0.43..3.45 rows=1 width=4) (actual time=0.049..0.050 rows=1 loops=145) -``` - -Here our cost is only 3.45, and it takes us 7.25 milliseconds to do so (0.05 * 145). -The next index scan is a bit more expensive: - -```sql -Index Scan using index_user_interacted_projects_on_user_id on user_interacted_projects (cost=0.43..160.71 rows=205 width=4) (actual time=0.939..2.508 rows=145 loops=1) -``` - -Here the cost is 160.71 (`cost=0.43..160.71`), taking about 2.5 milliseconds -(based on the output of `actual time=....`). - -The most expensive part here is the "Nested Loop" that acts upon the result of -these two index scans: - -```sql -Nested Loop (cost=0.86..870.52 rows=203 width=0) (actual time=1.072..9.748 rows=143 loops=1) -``` - -Here we had to perform 870.52 disk page fetches for 203 rows, 9.748 -milliseconds, producing 143 rows in a single loop. - -The key takeaway here is that sometimes you have to rewrite (parts of) a query -to make it better. Sometimes that means having to slightly change your feature -to accommodate for better performance. - -## What makes a bad plan - -This is a bit of a difficult question to answer, because the definition of "bad" -is relative to the problem you are trying to solve. However, some patterns are -best avoided in most cases, such as: - -- Sequential scans on large tables -- Filters that remove a lot of rows -- Performing a certain step that requires _a lot_ of - buffers (for example, an index scan for GitLab.com that requires more than 512 MB). - -As a general guideline, aim for a query that: - -1. Takes no more than 10 milliseconds. Our target time spent in SQL per request - is around 100 milliseconds, so every query should be as fast as possible. -1. Does not use an excessive number of buffers, relative to the workload. For - example, retrieving ten rows shouldn't require 1 GB of buffers. -1. Does not spend a long amount of time performing disk IO operations. The - setting `track_io_timing` must be enabled for this data to be included in the - output of `EXPLAIN ANALYZE`. -1. Applies a `LIMIT` when retrieving rows without aggregating them, such as - `SELECT * FROM users`. -1. Doesn't use a `Filter` to filter out too many rows, especially if the query - does not use a `LIMIT` to limit the number of returned rows. Filters can - usually be removed by adding a (partial) index. - -These are _guidelines_ and not hard requirements, as different needs may require -different queries. The only _rule_ is that you _must always measure_ your query -(preferably using a production-like database) using `EXPLAIN (ANALYZE, BUFFERS)` -and related tools such as: - -- [`explain.depesz.com`](https://explain.depesz.com/). -- [`explain.dalibo.com/`](https://explain.dalibo.com/). - -## Producing query plans - -There are a few ways to get the output of a query plan. Of course you -can directly run the `EXPLAIN` query in the `psql` console, or you can -follow one of the other options below. - -### Database Lab Engine - -GitLab team members can use [Database Lab Engine](https://gitlab.com/postgres-ai/database-lab), and the companion -SQL optimization tool - [Joe Bot](https://gitlab.com/postgres-ai/joe). - -Database Lab Engine provides developers with their own clone of the production database, while Joe Bot helps with exploring execution plans. - -Joe Bot is available in the [`#database-lab`](https://gitlab.slack.com/archives/CLJMDRD8C) channel on Slack, -and through its [web interface](https://console.postgres.ai/gitlab/joe-instances). - -With Joe Bot you can execute DDL statements (like creating indexes, tables, and columns) and get query plans for `SELECT`, `UPDATE`, and `DELETE` statements. - -For example, in order to test new index on a column that is not existing on production yet, you can do the following: - -Create the column: - -```sql -exec ALTER TABLE projects ADD COLUMN last_at timestamp without time zone -``` - -Create the index: - -```sql -exec CREATE INDEX index_projects_last_activity ON projects (last_activity_at) WHERE last_activity_at IS NOT NULL -``` - -Analyze the table to update its statistics: - -```sql -exec ANALYZE projects -``` - -Get the query plan: - -```sql -explain SELECT * FROM projects WHERE last_activity_at < CURRENT_DATE -``` - -Once done you can rollback your changes: - -```sql -reset -``` - -For more information about the available options, run: - -```sql -help -``` - -The web interface comes with the following execution plan visualizers included: - -- [Depesz](https://explain.depesz.com/) -- [PEV2](https://github.com/dalibo/pev2) -- [FlameGraph](https://github.com/mgartner/pg_flame) - -#### Tips & Tricks - -The database connection is now maintained during your whole session, so you can use `exec set ...` for any session variables (such as `enable_seqscan` or `work_mem`). These settings are applied to all subsequent commands until you reset them. For example you can disable parallel queries with - -```sql -exec SET max_parallel_workers_per_gather = 0 -``` - -### Rails console - -Using the [`activerecord-explain-analyze`](https://github.com/6/activerecord-explain-analyze) -you can directly generate the query plan from the Rails console: - -```ruby -pry(main)> require 'activerecord-explain-analyze' -=> true -pry(main)> Project.where('build_timeout > ?', 3600).explain(analyze: true) - Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600) - ↳ (pry):12 -=> EXPLAIN for: SELECT "projects".* FROM "projects" WHERE (build_timeout > 3600) -Seq Scan on public.projects (cost=0.00..2.17 rows=1 width=742) (actual time=0.040..0.041 rows=0 loops=1) - Output: id, name, path, description, created_at, updated_at, creator_id, namespace_id, ... - Filter: (projects.build_timeout > 3600) - Rows Removed by Filter: 14 - Buffers: shared hit=2 -Planning time: 0.411 ms -Execution time: 0.113 ms -``` - -### ChatOps - -[GitLab team members can also use our ChatOps solution, available in Slack using the -`/chatops` slash command](chatops_on_gitlabcom.md). - -NOTE: -While ChatOps is still available, the recommended way to generate execution plans is to use [Database Lab Engine](#database-lab-engine). - -You can use ChatOps to get a query plan by running the following: - -```sql -/chatops run explain SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20) -``` - -Visualising the plan using <https://explain.depesz.com/> is also supported: - -```sql -/chatops run explain --visual SELECT COUNT(*) FROM projects WHERE visibility_level IN (0, 20) -``` - -Quoting the query is not necessary. - -For more information about the available options, run: - -```sql -/chatops run explain --help -``` - -## Further reading - -A more extensive guide on understanding query plans can be found in -the [presentation](https://public.dalibo.com/exports/conferences/_archives/_2012/201211_explain/understanding_explain.pdf) -from [Dalibo.org](https://www.dalibo.com/en/). - -Depesz's blog also has a good [section](https://www.depesz.com/tag/unexplainable/) dedicated to query plans. +<!-- This redirect file can be deleted after <2022-11-04>. --> +<!-- Redirects that point to other docs in the same project expire in three months. --> +<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. --> +<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html --> -- GitLab