Skip to content
代码片段 群组 项目
提交 129c5205 编辑于 作者: Clayton Cornell's avatar Clayton Cornell
浏览文件

Merge branch 'background_migration_strategy_doc' into 'master'

Add batching strategy section to background migration docs

See merge request gitlab-org/gitlab!92248
No related branches found
No related tags found
无相关合并请求
......@@ -374,6 +374,77 @@ module Gitlab
end
```
### Adding filters to the initial batching
By default, when creating background jobs to perform the migration, batched background migrations will iterate over the full specified table. This is done using the [`PrimaryKeyBatchingStrategy`](https://gitlab.com/gitlab-org/gitlab/-/blob/c9dabd1f4b8058eece6d8cb4af95e9560da9a2ee/lib/gitlab/database/migrations/batched_background_migration_helpers.rb#L17). This means if there are 1000 records in the table and the batch size is 100, there will be 10 jobs. For illustrative purposes, `EachBatch` is used like this:
```ruby
# PrimaryKeyBatchingStrategy
Projects.all.each_batch(of: 100) do |relation|
relation.where(foo: nil).update_all(foo: 'bar') # this happens in each background job
end
```
There are cases where we only need to look at a subset of records. Perhaps we only need to update 1 out of every 10 of those 1000 records. It would be best if we could apply a filter to the initial relation when the jobs are created:
```ruby
Projects.where(foo: nil).each_batch(of: 100) do |relation|
relation.update_all(foo: 'bar')
end
```
In the `PrimaryKeyBatchingStrategy` example, we do not know how many records will be updated in each batch. In the filtered example, we know exactly 100 will be updated with each batch.
The `PrimaryKeyBatchingStrategy` contains [a method that can be overwritten](https://gitlab.com/gitlab-org/gitlab/-/blob/dd1e70d3676891025534dc4a1e89ca9383178fe7/lib/gitlab/background_migration/batching_strategies/primary_key_batching_strategy.rb#L38-52) to apply additional filtering on the initial `EachBatch`.
We can accomplish this by:
1. Create a new class that inherits from `PrimaryKeyBatchingStrategy` and overrides the method using the desired filter (this may be the same filter used in the sub-batch):
```ruby
# frozen_string_literal: true
module GitLab
module BackgroundMigration
module BatchingStrategies
class FooStrategy < PrimaryKeyBatchingStrategy
def apply_additional_filters(relation, job_arguments: [], job_class: nil)
relation.where(foo: nil)
end
end
end
end
end
```
1. In the post-deployment migration that queues the batched background migration, specify the new batching strategy using the `batch_class_name` parameter:
```ruby
class BackfillProjectsFoo < Gitlab::Database::Migration[2.0]
MIGRATION = 'BackfillProjectsFoo'
DELAY_INTERVAL = 2.minutes
BATCH_CLASS_NAME = 'FooStrategy'
restrict_gitlab_migration gitlab_schema: :gitlab_main
def up
queue_batched_background_migration(
MIGRATION,
:routes,
:id,
job_interval: DELAY_INTERVAL,
batch_class_name: BATCH_CLASS_NAME
)
end
def down
delete_batched_background_migration(MIGRATION, :routes, :id, [])
end
end
```
When applying a batching strategy, it is important to ensure the filter properly covered by an index to optimize `EachBatch` performance. See [the `EachBatch` docs for more information](../iterating_tables_in_batches.md).
## Testing
Writing tests is required for:
......
0% 加载中 .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册