-
由 Vijay Hawoldar 创作于
Updates the docs for deleting batched migrations to mention re-queuing.
由 Vijay Hawoldar 创作于Updates the docs for deleting batched migrations to mention re-queuing.
stage: Data Stores
group: Database
info: "See the Technical Writers assigned to Development Guidelines: https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments-to-development-guidelines"
Batched background migrations
Batched background migrations should be used to perform data migrations whenever a migration exceeds the time limits in our guidelines. For example, you can use batched background migrations to migrate data that's stored in a single JSON column to a separate table instead.
NOTE: Batched background migrations replaced the legacy background migrations framework. Check that documentation in reference to any changes involving that framework.
NOTE: The batched background migrations framework has ChatOps support. Using ChatOps, GitLab engineers can interact with the batched background migrations present in the system.
When to use batched background migrations
Use a batched background migration when you migrate data in tables containing so many rows that the process would exceed the time limits in our guidelines if performed using a regular Rails migration.
- Batched background migrations should be used when migrating data in high-traffic tables.
- Batched background migrations may also be used when executing numerous single-row queries for every item on a large dataset. Typically, for single-record patterns, runtime is largely dependent on the size of the dataset. Split the dataset accordingly, and put it into background migrations.
- Don't use batched background migrations to perform schema migrations.
Background migrations can help when:
- Migrating events from one table to multiple separate tables.
- Populating one column based on JSON stored in another column.
- Migrating data that depends on the output of external services. (For example, an API.)
Notes
- If the batched background migration is part of an important upgrade, it must be announced in the release post. Discuss with your Project Manager if you're unsure if the migration falls into this category.
- You should use the generator to create batched background migrations, so that required files are created by default.
How batched background migrations work
Batched background migrations (BBM) are subclasses of
Gitlab::BackgroundMigration::BatchedMigrationJob
that define a perform
method.
As the first step, a regular migration creates a batched_background_migrations
record with the BBM class and the required arguments. By default,
batched_background_migrations
is in an active state, and those are picked up
by the Sidekiq worker to execute the actual batched migration.
All migration classes must be defined in the namespace Gitlab::BackgroundMigration
. Place the files
in the directory lib/gitlab/background_migration/
.
Execution mechanism
Batched background migrations are picked from the queue in the order they are enqueued. Multiple migrations are fetched and executed in parallel, as long they are in active state and do not target the same database table. The default number of migrations processed in parallel is 2, for GitLab.com this limit is configured to 4. Once migration is picked for execution, a job is created for the specific batch. After each job execution, migration's batch size may be increased or decreased, based on the performance of the last 20 jobs.
@startuml
hide empty description
skinparam ConditionEndStyle hline
left to right direction
rectangle "Batched background migration queue" as migrations {
rectangle "Migration N (active)" as migrationn
rectangle "Migration 1 (completed)" as migration1
rectangle "Migration 2 (active)" as migration2
rectangle "Migration 3 (on hold)" as migration3
rectangle "Migration 4 (active)" as migration4
migration1 -[hidden]> migration2
migration2 -[hidden]> migration3
migration3 -[hidden]> migration4
migration4 -[hidden]> migrationn
}
rectangle "Execution Workers" as workers {
rectangle "Execution Worker 1 (busy)" as worker1
rectangle "Execution Worker 2 (available)" as worker2
worker1 -[hidden]> worker2
}
migration2 --> [Scheduling Worker]
migration4 --> [Scheduling Worker]
[Scheduling Worker] --> worker2
@enduml
Soon as a worker is available, the BBM is processed by the runner.
@startuml
hide empty description
start
rectangle Runner {
:Migration;
if (Have reached batching bounds?) then (Yes)
if (Have jobs to retry?) then (Yes)
:Fetch the batched job;
else (No)
:Finish active migration;
stop
endif
else (No)
:Create a batched job;
endif
:Execute batched job;
:Evaluate DB health;
note right: Checks for table autovacuum, Patroni Apdex, Write-ahead logging
if (Evaluation signs to stop?) then (Yes)
:Put migration on hold;
else (No)
:Optimize migration;
endif
}
@enduml
Idempotence
Batched background migrations are executed in a context of a Sidekiq process. The usual Sidekiq rules apply, especially the rule that jobs should be small and idempotent. Ensure that in the case where your migration job is retried, data integrity is guaranteed.
See Sidekiq best practices guidelines for more details.
Migration optimization
After each job execution, a verification takes place to check if the migration can be optimized. The optimization underlying mechanic is based on the concept of time efficiency. It calculates the exponential moving average of time efficiencies for the last N jobs and updates the batch size of the batched background migration to its optimal value.
This mechanism, however, makes it hard for us to provide an accurate estimation for total execution time of the migration when using the database migration pipeline.
We are discussing the ways to fix this problem in this issue
Job retry mechanism
The batched background migrations retry mechanism ensures that a job is executed again in case of failure. The following diagram shows the different stages of our retry mechanism:
@startuml
hide empty description
note as N1
can_split?:
the failure is due to a query timeout
end note
[*] --> Running
Running --> Failed
note on link
if number of retries <= MAX_ATTEMPTS
end note
Running --> Succeeded
Failed --> Running
note on link
if number of retries > MAX_ATTEMPTS
and can_split? == true
then two jobs with smaller
batch size will be created
end note
Failed --> [*]
Succeeded --> [*]
@enduml
-
MAX_ATTEMPTS
is defined in theGitlab::Database::BackgroundMigration
class. -
can_split?
is defined in theGitlab::Database::BatchedJob
class.
Failed batched background migrations
The whole batched background migration is marked as failed
(/chatops run batched_background_migrations status MIGRATION_ID
shows
the migration as failed
) if any of the following is true:
- There are no more jobs to consume, and there are failed jobs.
- More than half of the jobs failed since the background migration was started.
Throttling batched migrations
Because batched migrations are update heavy and there have been incidents due to the heavy load from these migrations while the database was underperforming, a throttling mechanism exists to mitigate future incidents.
These database indicators are checked to throttle a migration. Upon receiving a stop signal, the migration is paused for a set time (10 minutes):
- WAL queue pending archival crossing the threshold.
- Active autovacuum on the tables on which the migration works on.
- Patroni apdex SLI dropping below the SLO.
- WAL rate crossing the threshold.
There is an ongoing effort to add more indicators to further enhance the database health check framework. For more details, see epic 7594.