Skip to content
代码片段 群组 项目
代码所有者
将用户和群组指定为特定文件更改的核准人。 了解更多。
batched_background_migrations.md 49.04 KiB
stage: Data Stores
group: Database
info: "See the Technical Writers assigned to Development Guidelines: https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments-to-development-guidelines"

Batched background migrations

Batched background migrations should be used to perform data migrations whenever a migration exceeds the time limits in our guidelines. For example, you can use batched background migrations to migrate data that's stored in a single JSON column to a separate table instead.

NOTE: Batched background migrations replaced the legacy background migrations framework. Check that documentation in reference to any changes involving that framework.

NOTE: The batched background migrations framework has ChatOps support. Using ChatOps, GitLab engineers can interact with the batched background migrations present in the system.

When to use batched background migrations

Use a batched background migration when you migrate data in tables containing so many rows that the process would exceed the time limits in our guidelines if performed using a regular Rails migration.

  • Batched background migrations should be used when migrating data in high-traffic tables.
  • Batched background migrations may also be used when executing numerous single-row queries for every item on a large dataset. Typically, for single-record patterns, runtime is largely dependent on the size of the dataset. Split the dataset accordingly, and put it into background migrations.
  • Don't use batched background migrations to perform schema migrations.

Background migrations can help when:

  • Migrating events from one table to multiple separate tables.
  • Populating one column based on JSON stored in another column.
  • Migrating data that depends on the output of external services. (For example, an API.)

Notes

  • If the batched background migration is part of an important upgrade, it must be announced in the release post. Discuss with your Project Manager if you're unsure if the migration falls into this category.
  • You should use the generator to create batched background migrations, so that required files are created by default.

How batched background migrations work

Batched background migrations (BBM) are subclasses of Gitlab::BackgroundMigration::BatchedMigrationJob that define a perform method. As the first step, a regular migration creates a batched_background_migrations record with the BBM class and the required arguments. By default, batched_background_migrations is in an active state, and those are picked up by the Sidekiq worker to execute the actual batched migration.

All migration classes must be defined in the namespace Gitlab::BackgroundMigration. Place the files in the directory lib/gitlab/background_migration/.

Execution mechanism

Batched background migrations are picked from the queue in the order they are enqueued. Multiple migrations are fetched and executed in parallel, as long they are in active state and do not target the same database table. The default number of migrations processed in parallel is 2, for GitLab.com this limit is configured to 4. Once migration is picked for execution, a job is created for the specific batch. After each job execution, migration's batch size may be increased or decreased, based on the performance of the last 20 jobs.

@startuml
hide empty description
skinparam ConditionEndStyle hline
left to right direction
rectangle "Batched background migration queue" as migrations {
  rectangle "Migration N (active)" as migrationn
  rectangle "Migration 1 (completed)" as migration1
  rectangle "Migration 2 (active)" as migration2
  rectangle "Migration 3 (on hold)" as migration3
  rectangle "Migration 4 (active)" as migration4
  migration1 -[hidden]> migration2
  migration2 -[hidden]> migration3
  migration3 -[hidden]> migration4
  migration4 -[hidden]> migrationn
}
rectangle "Execution Workers" as workers {
 rectangle "Execution Worker 1 (busy)" as worker1
 rectangle "Execution Worker 2 (available)" as worker2
 worker1 -[hidden]> worker2
}
migration2 --> [Scheduling Worker]
migration4 --> [Scheduling Worker]
[Scheduling Worker] --> worker2
@enduml

Soon as a worker is available, the BBM is processed by the runner.

@startuml
hide empty description
start
rectangle Runner {
  :Migration;
  if (Have reached batching bounds?) then (Yes)
    if (Have jobs to retry?) then (Yes)
      :Fetch the batched job;
    else (No)
      :Finish active migration;
      stop
    endif
  else (No)
    :Create a batched job;
  endif
  :Execute batched job;
  :Evaluate DB health;
  note right: Checks for table autovacuum, Patroni Apdex, Write-ahead logging
  if (Evaluation signs to stop?) then (Yes)
    :Put migration on hold;
  else (No)
    :Optimize migration;
  endif
}
@enduml

Idempotence

Batched background migrations are executed in a context of a Sidekiq process. The usual Sidekiq rules apply, especially the rule that jobs should be small and idempotent. Ensure that in the case where your migration job is retried, data integrity is guaranteed.

See Sidekiq best practices guidelines for more details.

Migration optimization

After each job execution, a verification takes place to check if the migration can be optimized. The optimization underlying mechanic is based on the concept of time efficiency. It calculates the exponential moving average of time efficiencies for the last N jobs and updates the batch size of the batched background migration to its optimal value.

This mechanism, however, makes it hard for us to provide an accurate estimation for total execution time of the migration when using the database migration pipeline.

We are discussing the ways to fix this problem in this issue

Job retry mechanism

The batched background migrations retry mechanism ensures that a job is executed again in case of failure. The following diagram shows the different stages of our retry mechanism:

@startuml
hide empty description
note as N1
  can_split?:
  the failure is due to a query timeout
end note
    [*] --> Running
Running --> Failed
note on link
  if number of retries <= MAX_ATTEMPTS
end note
Running --> Succeeded
Failed --> Running
note on link
  if number of retries > MAX_ATTEMPTS
  and can_split? == true
  then two jobs with smaller
  batch size will be created
end note
Failed --> [*]
Succeeded --> [*]
@enduml

Failed batched background migrations

The whole batched background migration is marked as failed (/chatops run batched_background_migrations status MIGRATION_ID shows the migration as failed) if any of the following is true:

Throttling batched migrations

Because batched migrations are update heavy and there have been incidents due to the heavy load from these migrations while the database was underperforming, a throttling mechanism exists to mitigate future incidents.

These database indicators are checked to throttle a migration. Upon receiving a stop signal, the migration is paused for a set time (10 minutes):

  • WAL queue pending archival crossing the threshold.
  • Active autovacuum on the tables on which the migration works on.
  • Patroni apdex SLI dropping below the SLO.
  • WAL rate crossing the threshold.

There is an ongoing effort to add more indicators to further enhance the database health check framework. For more details, see epic 7594.

Isolation