From 4b82d592e5fa4aba23373c1a62341f1ff1b971ef Mon Sep 17 00:00:00 2001 From: Brett Walker <bwalker@gitlab.com> Date: Fri, 21 Aug 2020 15:36:53 +0000 Subject: [PATCH] GraphQL pagination implmemention and details --- doc/development/api_graphql_styleguide.md | 11 ++- doc/development/graphql_guide/pagination.md | 89 +++++++++++++++++++++ 2 files changed, 98 insertions(+), 2 deletions(-) create mode 100644 doc/development/graphql_guide/pagination.md diff --git a/doc/development/api_graphql_styleguide.md b/doc/development/api_graphql_styleguide.md index 65ee46ac804c1..1b0a69a46444e 100644 --- a/doc/development/api_graphql_styleguide.md +++ b/doc/development/api_graphql_styleguide.md @@ -142,7 +142,10 @@ def reply_id end ``` -### Connection Types +### Connection types + +TIP: **Tip:** +For specifics on implementation, see [Pagination implementation](#pagination-implementation). GraphQL uses [cursor based pagination](https://graphql.org/learn/pagination/#pagination-and-edges) @@ -1168,6 +1171,10 @@ tested for within the unit test of `Types::MutationType`. The merge request can be referred to as an example of this, including the method of testing deprecated aliased mutations. +## Pagination implementation + +To learn more, visit [GraphQL pagination](graphql_guide/pagination.md). + ## Validating arguments For validations of single arguments, use the @@ -1323,7 +1330,7 @@ end More about complexity: [GraphQL Ruby documentation](https://graphql-ruby.org/queries/complexity_and_depth.html). -## Documentation and Schema +## Documentation and schema Our schema is located at `app/graphql/gitlab_schema.rb`. See the [schema reference](../api/graphql/reference/index.md) for details. diff --git a/doc/development/graphql_guide/pagination.md b/doc/development/graphql_guide/pagination.md new file mode 100644 index 0000000000000..f5947bca8915d --- /dev/null +++ b/doc/development/graphql_guide/pagination.md @@ -0,0 +1,89 @@ +# GraphQL pagination + +## Types of pagination + +GitLab uses two primary types of pagination: **offset** and **keyset** +(sometimes called cursor-based) pagination. +The GraphQL API mainly uses keyset pagination, falling back to offset pagination when needed. + +### Offset pagination + +This is the traditional, page-by-page pagination, that is most common, +and used across much of GitLab. You can recognize it by +a list of page numbers near the bottom of a page, which, when clicked, +take you to that page of results. + +For example, when you click **Page 100**, we send `100` to the +backend. For example, if each page has say 20 items, the +backend calculates `20 * 100 = 2000`, +and it queries the database by offsetting (skipping) the first 2000 +records and pulls the next 20. + +```plaintext +page number * page size = where to find my records +``` + +There are a couple of problems with this: + +- Performance. When we query for page 100 (which gives an offset of + 2000), then the database has to scan through the table to that + specific offset, and then pick up the next 20 records. As the offset + increases, the performance degrades quickly. + Read more in + [The SQL I Love <3. Efficient pagination of a table with 100M records](http://allyouneedisbackend.com/blog/2017/09/24/the-sql-i-love-part-1-scanning-large-table/). + +- Data stability. When you get the 20 items for page 100 (at + offset 2000), GitLab shows those 20 items. If someone then + deletes or adds records in page 99 or before, the items at + offset 2000 become a different set of items. You can even get into a + situation where, when paginating, you could skip over items, + because the list keeps changing. + Read more in + [Pagination: You're (Probably) Doing It Wrong](https://coderwall.com/p/lkcaag/pagination-you-re-probably-doing-it-wrong). + +### Keyset pagination + +Given any specific record, if you know how to calculate what comes +after it, you can query the database for those specific records. + +For example, suppose you have a list of issues sorted by creation date. +If you know the first item on a page has a specific date (say Jan 1), you can ask +for all records that were created after that date and take the first 20. +It no longer matters if many are deleted or added, as you always ask for +the ones after that date, and so get the correct items. + +Unfortunately, there is no easy way to know if the issue created +on Jan 1 is on page 20 or page 100. + +Some of the benefits and tradeoffs of keyset pagination are + +- Performance is much better. + +- Data stability is greater since you're not going to miss records due to + deletions or insertions. + +- It's the best way to do infinite scrolling. + +- It's more difficult to program and maintain. Easy for `updated_at` and + `sort_order`, complicated (or impossible) for complex sorting scenarios. + +## Implementation + +When pagination is supported for a query, GitLab defaults to using +keyset pagination. You can see where this is configured in +[`pagination/connections.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/graphql/pagination/connections.rb). +If a query returns `ActiveRecord::Relation`, keyset pagination is automatically used. + +This was a conscious decision to support performance and data stability. + +However, there are some cases where we have to use the offset +pagination connection, `OffsetActiveRecordRelationConnection`, such as when +sorting by label priority in issues, due to the complexity of the sort. + +<!-- ### Keyset pagination --> + +<!-- ### Offset pagination --> + +<!-- ### External pagination --> + +<!-- ### Pagination testing --> -- GitLab