Skip to content
代码片段 群组 项目
未验证 提交 62df5606 编辑于 作者: Krasimir Angelov's avatar Krasimir Angelov 提交者: GitLab
浏览文件

Merge branch 'document-risk-of-scope-to' into 'master'

Make it clearer that `scope_to` needs an index to work well

See merge request https://gitlab.com/gitlab-org/gitlab/-/merge_requests/169532



Merged-by: default avatarKrasimir Angelov <kangelov@gitlab.com>
Approved-by: default avatarKrasimir Angelov <kangelov@gitlab.com>
Reviewed-by: default avatarKrasimir Angelov <kangelov@gitlab.com>
Co-authored-by: default avatarDylan Griffith <dyl.griffith@gmail.com>
No related branches found
No related tags found
无相关合并请求
...@@ -367,17 +367,30 @@ Namespace.each_batch(of: 100) do |relation| ...@@ -367,17 +367,30 @@ Namespace.each_batch(of: 100) do |relation|
end end
``` ```
In some cases, only a subset of records must be examined. If only 10% of the 1000 records #### Using a composite or partial index to iterate a subset of the table
need examination, apply a filter to the initial relation when the jobs are created:
When applying additional filters, it is important to ensure they are properly
covered by an index to optimize `EachBatch` performance.
In the below examples we need an index on `(type, id)` or `id WHERE type IS NULL`
to support the filters. See
[the `EachBatch` documentation for more information](iterating_tables_in_batches.md).
If you have a suitable index and you want to iterate only a subset of the table
you can apply a `where` clause before the `each_batch` like:
```ruby ```ruby
# Works well if there is an index like either of:
# - `id WHERE type IS NULL`
# - `(type, id)`
# Does not work well otherwise.
Namespace.where(type: nil).each_batch(of: 100) do |relation| Namespace.where(type: nil).each_batch(of: 100) do |relation|
relation.update_all(type: 'User') relation.update_all(type: 'User')
end end
``` ```
In the first example, we don't know how many records will be updated in each batch. An advantage of this approach is that you get consistent batch sizes. But it is
In the second (filtered) example, we know exactly 100 will be updated with each batch. only suitable where there is an index that matches the `where` clauses as well
as the batching strategy.
`BatchedMigrationJob` provides a `scope_to` helper method to apply additional filters and achieve this: `BatchedMigrationJob` provides a `scope_to` helper method to apply additional filters and achieve this:
...@@ -385,6 +398,11 @@ In the second (filtered) example, we know exactly 100 will be updated with each ...@@ -385,6 +398,11 @@ In the second (filtered) example, we know exactly 100 will be updated with each
```ruby ```ruby
class BackfillNamespaceType < BatchedMigrationJob class BackfillNamespaceType < BatchedMigrationJob
# Works well if there is an index like either of:
# - `id WHERE type IS NULL`
# - `(type, id)`
# Does not work well otherwise.
scope_to ->(relation) { relation.where(type: nil) } scope_to ->(relation) { relation.where(type: nil) }
operation_name :update_all operation_name :update_all
feature_category :source_code_management feature_category :source_code_management
...@@ -425,10 +443,6 @@ In the second (filtered) example, we know exactly 100 will be updated with each ...@@ -425,10 +443,6 @@ In the second (filtered) example, we know exactly 100 will be updated with each
end end
``` ```
NOTE:
When applying additional filters, it is important to ensure they are properly covered by an index to optimize `EachBatch` performance.
In the example above we need an index on `(type, id)` to support the filters. See [the `EachBatch` documentation for more information](iterating_tables_in_batches.md).
### Access data for multiple databases ### Access data for multiple databases
Background migration contrary to regular migrations does have access to multiple databases Background migration contrary to regular migrations does have access to multiple databases
......
0% 加载中 .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册