From 0632d9ef0ba7d1732c49052e1d345be99be805bd Mon Sep 17 00:00:00 2001 From: Terri Chu <tchu@gitlab.com> Date: Tue, 28 Feb 2023 15:45:46 +0000 Subject: [PATCH] Create Advanced Search Migrations styleguide --- .gitlab/CODEOWNERS | 3 +- .../14-10-old-search-migration-removal.yml | 2 +- .../{elasticsearch.md => advanced_search.md} | 305 +---------------- doc/development/changelog.md | 2 +- doc/development/feature_development.md | 2 +- .../advanced_search_migration_styleguide.md | 306 ++++++++++++++++++ 6 files changed, 314 insertions(+), 306 deletions(-) rename doc/development/{elasticsearch.md => advanced_search.md} (61%) create mode 100644 doc/development/search/advanced_search_migration_styleguide.md diff --git a/.gitlab/CODEOWNERS b/.gitlab/CODEOWNERS index 72cdaf77f525..67bd749d3e06 100644 --- a/.gitlab/CODEOWNERS +++ b/.gitlab/CODEOWNERS @@ -715,6 +715,7 @@ lib/gitlab/checks/** @proglottis @toon @zj-gitlab /doc/development/distributed_tracing.md @msedlakjakubowski /doc/development/documentation/ @sselhorn /doc/development/elasticsearch.md @ashrafkhamis +/doc/development/search/advanced_search_migration_styleguide.md @ashrafkhamis /doc/development/experiment_guide/ @phillipwells /doc/development/export_csv.md @eread /doc/development/fe_guide/content_editor.md @ashrafkhamis @@ -1431,4 +1432,4 @@ ee/lib/ee/api/entities/project.rb @gitlab-org/manage/manage-workspace/backend-ap [Manage::Foundations] /lib/sidebars/ @gitlab/ @gitlab-org/manage/foundations/engineering -/ee/lib/sidebars/ @gitlab-org/manage/foundations/engineering \ No newline at end of file +/ee/lib/sidebars/ @gitlab-org/manage/foundations/engineering diff --git a/data/deprecations/14-10-old-search-migration-removal.yml b/data/deprecations/14-10-old-search-migration-removal.yml index 1991c0ef1775..4700063e68f6 100644 --- a/data/deprecations/14-10-old-search-migration-removal.yml +++ b/data/deprecations/14-10-old-search-migration-removal.yml @@ -7,4 +7,4 @@ stage: enablement tiers: premium, ultimate issue_url: https://gitlab.com/gitlab-org/gitlab/-/issues/359133 - documentation_url: https://docs.gitlab.com/ee/development/elasticsearch.html#deleting-advanced-search-migrations-in-a-major-version-upgrade + documentation_url: https://docs.gitlab.com/ee/development/search/advanced_search_migration_styleguide.html#deleting-advanced-search-migrations-in-a-major-version-upgrade diff --git a/doc/development/elasticsearch.md b/doc/development/advanced_search.md similarity index 61% rename from doc/development/elasticsearch.md rename to doc/development/advanced_search.md index 935964a9a90b..dd05c1475ece 100644 --- a/doc/development/elasticsearch.md +++ b/doc/development/advanced_search.md @@ -4,16 +4,16 @@ group: Global Search info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments --- -# Elasticsearch knowledge +# Advanced Search development -This area is to maintain a compendium of useful information when working with Elasticsearch. +This page includes information about developing and working with Elasticsearch. Information on how to enable Elasticsearch and perform the initial indexing is in the [Elasticsearch integration documentation](../integration/advanced_search/elasticsearch.md#enable-advanced-search). ## Deep Dive -In June 2019, Mario de la Ossa hosted a Deep Dive (GitLab team members only: `https://gitlab.com/gitlab-org/create-stage/issues/1`) on the GitLab [Elasticsearch integration](../integration/advanced_search/elasticsearch.md) to share his domain specific knowledge with anyone who may work in this part of the codebase in the future. You can find the <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [recording on YouTube](https://www.youtube.com/watch?v=vrvl-tN2EaA), and the slides on [Google Slides](https://docs.google.com/presentation/d/1H-pCzI_LNrgrL5pJAIQgvLX8Ji0-jIKOg1QeJQzChug/edit) and in [PDF](https://gitlab.com/gitlab-org/create-stage/uploads/c5aa32b6b07476fa8b597004899ec538/Elasticsearch_Deep_Dive.pdf). Everything covered in this deep dive was accurate as of GitLab 12.0, and while specific details may have changed since then, it should still serve as a good introduction. +In June 2019, Mario de la Ossa hosted a Deep Dive (GitLab team members only: `https://gitlab.com/gitlab-org/create-stage/issues/1`) on the GitLab [Elasticsearch integration](../integration/advanced_search/elasticsearch.md) to share his domain specific knowledge with anyone who may work in this part of the codebase in the future. You can find the <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [recording on YouTube](https://www.youtube.com/watch?v=vrvl-tN2EaA), and the slides on [Google Slides](https://docs.google.com/presentation/d/1H-pCzI_LNrgrL5pJAIQgvLX8Ji0-jIKOg1QeJQzChug/edit) and in [PDF](https://gitlab.com/gitlab-org/create-stage/uploads/c5aa32b6b07476fa8b597004899ec538/Elasticsearch_Deep_Dive.pdf). Everything covered in this deep dive was accurate as of GitLab 12.0, and while specific details might have changed, it should still serve as a good introduction. In August 2020, a second Deep Dive was hosted, focusing on [GitLab-specific architecture for multi-indices support](#zero-downtime-reindexing-with-multiple-indices). The <i class="fa fa-youtube-play youtube" aria-hidden="true"></i> [recording on YouTube](https://www.youtube.com/watch?v=0WdPR9oB2fg) and the [slides](https://lulalala.gitlab.io/gitlab-elasticsearch-deepdive/) are available. Everything covered in this deep dive was accurate as of GitLab 13.3. @@ -184,305 +184,6 @@ If the current version is `v12p1`, and we need to create a new version for `v12p 1. Change the namespace for files under `v12p1` folder from `Latest` to `V12p1` 1. Make changes to files under the `latest` folder as needed -## Creating a new Advanced Search migration - -> This functionality was introduced by [#234046](https://gitlab.com/gitlab-org/gitlab/-/issues/234046). - -NOTE: -This only supported for indices created with GitLab 13.0 or greater. - -In the [`ee/elastic/migrate/`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/ee/elastic/migrate) folder, create a new file with the filename format `YYYYMMDDHHMMSS_migration_name.rb`. This format is the same for Rails database migrations. - -```ruby -# frozen_string_literal: true - -class MigrationName < Elastic::Migration - # Important: Any updates to the Elastic index mappings must be replicated in the respective - # configuration files: - # - `Elastic::Latest::Config`, for the main index. - # - `Elastic::Latest::<Type>Config`, for standalone indices. - - def migrate - end - - # Check if the migration has completed - # Return true if completed, otherwise return false - def completed? - end -end -``` - -Applied migrations are stored in `gitlab-#{RAILS_ENV}-migrations` index. All migrations not executed -are applied by the [`Elastic::MigrationWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic/migration_worker.rb) -cron worker sequentially. - -To update Elastic index mappings, apply the configuration to the respective files: - -- For the main index: [`Elastic::Latest::Config`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/elastic/latest/config.rb). -- For standalone indices: `Elastic::Latest::<Type>Config`. - -Migrations can be built with a retry limit and have the ability to be [failed and marked as halted](https://gitlab.com/gitlab-org/gitlab/-/blob/66e899b6637372a4faf61cfd2f254cbdd2fb9f6d/ee/lib/elastic/migration.rb#L40). -Any data or index cleanup needed to support migration retries should be handled within the migration. - -### Migration helpers - -The following migration helpers are available in `ee/app/workers/concerns/elastic/`: - -#### `Elastic::MigrationBackfillHelper` - -Backfills a specific field in an index. In most cases, the mapping for the field should already be added. - -Requires the `index_name` and `field_name` methods. - -```ruby -class MigrationName < Elastic::Migration - include Elastic::MigrationBackfillHelper - - private - - def index_name - Issue.__elasticsearch__.index_name - end - - def field_name - :schema_version - end -end -``` - -#### `Elastic::MigrationUpdateMappingsHelper` - -Updates a mapping in an index by calling `put_mapping` with the mapping specified. - -Requires the `index_name` and `new_mappings` methods. - -```ruby -class MigrationName < Elastic::Migration - include Elastic::MigrationUpdateMappingsHelper - - private - - def index_name - Issue.__elasticsearch__.index_name - end - - def new_mappings - { - schema_version: { - type: 'short' - } - } - end -end -``` - -#### `Elastic::MigrationRemoveFieldsHelper` - -Removes specified fields from an index. - -Requires the `index_name`, `document_type` methods. If there is one field to remove, add the `field_to_remove` method, otherwise add `fields_to_remove` with an array of fields. - -Checks in batches if any documents that match `document_type` have the fields specified in Elasticsearch. If documents exist, uses a Painless script to perform `update_by_query`. - -```ruby -class MigrationName < Elastic::Migration - include Elastic::MigrationRemoveFieldsHelper - - batched! - throttle_delay 1.minute - - private - - def index_name - User.__elasticsearch__.index_name - end - - def document_type - 'user' - end - - def fields_to_remove - %w[two_factor_enabled has_projects] - end -end -``` - -The default batch size is `10_000`. You can override this value by specifying `BATCH_SIZE`: - -```ruby -class MigrationName < Elastic::Migration - include Elastic::MigrationRemoveFieldsHelper - - batched! - BATCH_SIZE = 100 - - ... -end -``` - -#### `Elastic::MigrationObsolete` - -Marks a migration as obsolete when it's no longer required. - -```ruby -class MigrationName < Elastic::Migration - include Elastic::MigrationObsolete -end -``` - -#### `Elastic::MigrationHelper` - -Contains methods you can use when a migration doesn't fit the previous examples. - -```ruby -class MigrationName < Elastic::Migration - include Elastic::MigrationHelper - - def migrate - ... - end - - def completed? - ... - end -end -``` - -### Migration options supported by the `Elastic::MigrationWorker` - -[`Elastic::MigrationWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic/migration_worker.rb) supports the following migration options: - -- `batched!` - Allow the migration to run in batches. If set, the [`Elastic::MigrationWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic/migration_worker.rb) -will re-enqueue itself with a delay which is set using the `throttle_delay` option described below. The batching -must be handled within the `migrate` method, this setting controls the re-enqueuing only. - -- `batch_size` - Sets the number of documents modified during a `batched!` migration run. This size should be set to a value which allows the updates -enough time to finish. This can be tuned in combination with the `throttle_delay` option described below. The batching -must be handled within a custom `migrate` method or by using the [`Elastic::MigrationBackfillHelper`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/concerns/elastic/migration_backfill_helper.rb) -`migrate` method which uses this setting. Default value is 1000 documents. - -- `throttle_delay` - Sets the wait time in between batch runs. This time should be set high enough to allow each migration batch -enough time to finish. Additionally, the time should be less than 30 minutes since that is how often the -[`Elastic::MigrationWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic/migration_worker.rb) -cron worker runs. Default value is 5 minutes. - -- `pause_indexing!` - Pause indexing while the migration runs. This setting will record the indexing setting before -the migration runs and set it back to that value when the migration is completed. - -- `space_requirements!` - Verify that enough free space is available in the cluster when the migration runs. This setting - will halt the migration if the storage required is not available when the migration runs. The migration must provide - the space required in bytes by defining a `space_required_bytes` method. - -- `retry_on_failure` - Enable the retry on failure feature. By default, it retries - the migration 30 times. After it runs out of retries, the migration is marked as halted. - To customize the number of retries, pass the `max_attempts` argument: - `retry_on_failure max_attempts: 10` - -```ruby -# frozen_string_literal: true - -class BatchedMigrationName < Elastic::Migration - # Declares a migration should be run in batches - batched! - throttle_delay 10.minutes - pause_indexing! - space_requirements! - retry_on_failure - - # ... -end -``` - -### Multi-version compatibility - -These Advanced Search migrations, like any other GitLab changes, need to support the case where -[multiple versions of the application are running at the same time](multi_version_compatibility.md). - -Depending on the order of deployment, it's possible that the migration -has started or finished and there's still a server running the application code from before the -migration. We need to take this into consideration until we can -[ensure all Advanced Search migrations start after the deployment has finished](https://gitlab.com/gitlab-org/gitlab/-/issues/321619). - -### Reverting a migration - -Because Elasticsearch does not support transactions, we always need to design our -migrations to accommodate a situation where the application -code is reverted after the migration has started or after it is finished. - -For this reason we generally defer destructive actions (for example, deletions after -some data is moved) to a later merge request after the migrations have -completed successfully. To be safe, for self-managed customers we should also -defer it to another release if there is risk of important data loss. - -### Best practices for Advanced Search migrations - -Follow these best practices for best results: - -- When working in batches, keep the batch size under 9,000 documents - and `throttle_delay` for at least 3 minutes. The bulk indexer is set to run - every 1 minute and process a batch of 10,000 documents. These limits - allow the bulk indexer time to process records before another migration - batch is attempted. -- To ensure that document counts are up to date, it is recommended to refresh - the index before checking if a migration is completed. -- Add logging statements to each migration when the migration starts, when a - completion check occurs, and when the migration is completed. These logs - are helpful when debugging issues with migrations. -- Pause indexing if you're using any Elasticsearch Reindex API operations. -- Consider adding a retry limit if there is potential for the migration to fail. - This ensures that migrations can be halted if an issue occurs. - -## Deleting Advanced Search migrations in a major version upgrade - -Since our Advanced Search migrations usually require us to support multiple -code paths for a long period of time, it's important to clean those up when we -safely can. - -We choose to use GitLab major version upgrades as a safe time to remove -backwards compatibility for indices that have not been fully migrated. We -[document this in our upgrade documentation](../update/index.md#upgrading-to-a-new-major-version). -We also choose to replace the migration code with the halted migration -and remove tests so that: - -- We don't need to maintain any code that is called from our Advanced Search - migrations. -- We don't waste CI time running tests for migrations that we don't support - anymore. -- Operators who have not run this migration and who upgrade directly to the - target version will see a message prompting them to reindex from scratch. - -To be extra safe, we will not delete migrations that were created in the last -minor version before the major upgrade. So, if we are upgrading to `%14.0`, -we should not delete migrations that were only added in `%13.12`. This is an -extra safety net as we expect there are migrations that get merged that may -take multiple weeks to finish on GitLab.com. It would be bad if we upgraded -GitLab.com to `%14.0` before the migrations in `%13.12` were finished. Since -our deployments to GitLab.com are automated and we currently don't have -automated checks to prevent this, the extra precaution is warranted. -Additionally, even if we did have automated checks to prevent it, we wouldn't -actually want to hold up GitLab.com deployments on Advanced Search migrations, -as they may still have another week to go, and that's too long to block -deployments. - -### Process for removing migrations - -For every migration that was created 2 minor versions before the major version -being upgraded to, we do the following: - -1. Confirm the migration has actually completed successfully for GitLab.com. -1. Replace the content of the migration with: - - ```ruby - include Elastic::MigrationObsolete - ``` - -1. Delete any spec files to support this migration. -1. Remove any logic handling backwards compatibility for this migration. You - can find this by looking for - `Elastic::DataMigrationService.migration_has_finished?(:migration_name_in_lowercase)`. -1. Create a merge request with these changes. Noting that we should not - accidentally merge this before the major release is started. - ## Performance Monitoring ### Prometheus diff --git a/doc/development/changelog.md b/doc/development/changelog.md index 27bcffe75602..5469d3fc2a82 100644 --- a/doc/development/changelog.md +++ b/doc/development/changelog.md @@ -94,7 +94,7 @@ EE: true uses system fonts for all text." - Any client-facing change to our REST and GraphQL APIs **must** have a changelog entry. See the [complete list what comprises a GraphQL breaking change](api_graphql_styleguide.md#breaking-changes). -- Any change that introduces an [Advanced Search migration](elasticsearch.md#creating-a-new-advanced-search-migration) +- Any change that introduces an [Advanced Search migration](search/advanced_search_migration_styleguide.md#creating-a-new-advanced-search-migration) **must** have a changelog entry. - A fix for a regression introduced and then fixed in the same release (such as fixing a bug introduced during a monthly release candidate) **should not** diff --git a/doc/development/feature_development.md b/doc/development/feature_development.md index e24f105250f6..4c1d01e07522 100644 --- a/doc/development/feature_development.md +++ b/doc/development/feature_development.md @@ -79,7 +79,7 @@ Consult these topics for information on contributing to specific GitLab features - [Adding a new Redis instance](redis/new_redis_instance.md) - [Sidekiq guidelines](sidekiq/index.md) for working with Sidekiq workers - [Working with Gitaly](gitaly.md) -- [Elasticsearch integration docs](elasticsearch.md) +- [Advanced Search integration docs](advanced_search.md) - [Working with merge request diffs](diffs.md) - [Approval Rules](merge_request_concepts/approval_rules.md) - [Repository mirroring](repository_mirroring.md) diff --git a/doc/development/search/advanced_search_migration_styleguide.md b/doc/development/search/advanced_search_migration_styleguide.md new file mode 100644 index 000000000000..1676e666fcc5 --- /dev/null +++ b/doc/development/search/advanced_search_migration_styleguide.md @@ -0,0 +1,306 @@ +--- +stage: Data Stores +group: Global Search +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments +--- + +# Advanced Search migration style guide + +## Creating a new Advanced Search migration + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/234046) in GitLab 13.6. + +NOTE: +This functionality is only supported for indices created in GitLab 13.0 and later. + +In the [`ee/elastic/migrate/`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/ee/elastic/migrate) folder, create a new file with the filename format `YYYYMMDDHHMMSS_migration_name.rb`. This format is the same for Rails database migrations. + +```ruby +# frozen_string_literal: true + +class MigrationName < Elastic::Migration + # Important: Any updates to the Elastic index mappings must be replicated in the respective + # configuration files: + # - `Elastic::Latest::Config`, for the main index. + # - `Elastic::Latest::<Type>Config`, for standalone indices. + + def migrate + end + + # Check if the migration has completed + # Return true if completed, otherwise return false + def completed? + end +end +``` + +Applied migrations are stored in `gitlab-#{RAILS_ENV}-migrations` index. All migrations not executed +are applied by the [`Elastic::MigrationWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic/migration_worker.rb) +cron worker sequentially. + +To update Elastic index mappings, apply the configuration to the respective files: + +- For the main index: [`Elastic::Latest::Config`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/elastic/latest/config.rb). +- For standalone indices: `Elastic::Latest::<Type>Config`. + +Migrations can be built with a retry limit and have the ability to be [failed and marked as halted](https://gitlab.com/gitlab-org/gitlab/-/blob/66e899b6637372a4faf61cfd2f254cbdd2fb9f6d/ee/lib/elastic/migration.rb#L40). +Any data or index cleanup needed to support migration retries should be handled in the migration. + +### Migration helpers + +The following migration helpers are available in `ee/app/workers/concerns/elastic/`: + +#### `Elastic::MigrationBackfillHelper` + +Backfills a specific field in an index. In most cases, the mapping for the field should already be added. + +Requires the `index_name` and `field_name` methods. + +```ruby +class MigrationName < Elastic::Migration + include Elastic::MigrationBackfillHelper + + private + + def index_name + Issue.__elasticsearch__.index_name + end + + def field_name + :schema_version + end +end +``` + +#### `Elastic::MigrationUpdateMappingsHelper` + +Updates a mapping in an index by calling `put_mapping` with the mapping specified. + +Requires the `index_name` and `new_mappings` methods. + +```ruby +class MigrationName < Elastic::Migration + include Elastic::MigrationUpdateMappingsHelper + + private + + def index_name + Issue.__elasticsearch__.index_name + end + + def new_mappings + { + schema_version: { + type: 'short' + } + } + end +end +``` + +#### `Elastic::MigrationRemoveFieldsHelper` + +Removes specified fields from an index. + +Requires the `index_name`, `document_type` methods. If there is one field to remove, add the `field_to_remove` method, otherwise add `fields_to_remove` with an array of fields. + +Checks in batches if any documents that match `document_type` have the fields specified in Elasticsearch. If documents exist, uses a Painless script to perform `update_by_query`. + +```ruby +class MigrationName < Elastic::Migration + include Elastic::MigrationRemoveFieldsHelper + + batched! + throttle_delay 1.minute + + private + + def index_name + User.__elasticsearch__.index_name + end + + def document_type + 'user' + end + + def fields_to_remove + %w[two_factor_enabled has_projects] + end +end +``` + +The default batch size is `10_000`. You can override this value by specifying `BATCH_SIZE`: + +```ruby +class MigrationName < Elastic::Migration + include Elastic::MigrationRemoveFieldsHelper + + batched! + BATCH_SIZE = 100 + + ... +end +``` + +#### `Elastic::MigrationObsolete` + +Marks a migration as obsolete when it's no longer required. + +```ruby +class MigrationName < Elastic::Migration + include Elastic::MigrationObsolete +end +``` + +#### `Elastic::MigrationHelper` + +Contains methods you can use when a migration doesn't fit the previous examples. + +```ruby +class MigrationName < Elastic::Migration + include Elastic::MigrationHelper + + def migrate + ... + end + + def completed? + ... + end +end +``` + +### Migration options supported by the `Elastic::MigrationWorker` + +[`Elastic::MigrationWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic/migration_worker.rb) supports the following migration options: + +- `batched!` - Allow the migration to run in batches. If set, [`Elastic::MigrationWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic/migration_worker.rb) + re-enqueues itself with a delay which is set using the `throttle_delay` option described below. The batching + must be handled in the `migrate` method. This setting controls the re-enqueuing only. + +- `batch_size` - Sets the number of documents modified during a `batched!` migration run. This size should be set to a value which allows the updates + enough time to finish. This can be tuned in combination with the `throttle_delay` option described below. The batching + must be handled in a custom `migrate` method or by using the [`Elastic::MigrationBackfillHelper`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/concerns/elastic/migration_backfill_helper.rb) + `migrate` method which uses this setting. Default value is 1000 documents. + +- `throttle_delay` - Sets the wait time in between batch runs. This time should be set high enough to allow each migration batch + enough time to finish. Additionally, the time should be less than 30 minutes because that is how often the + [`Elastic::MigrationWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic/migration_worker.rb) + cron worker runs. Default value is 5 minutes. + +- `pause_indexing!` - Pause indexing while the migration runs. This setting records the indexing setting before + the migration runs and set it back to that value when the migration is completed. + +- `space_requirements!` - Verify that enough free space is available in the cluster when the migration runs. This setting + halts the migration if the storage required is not available when the migration runs. The migration must provide + the space required in bytes by defining a `space_required_bytes` method. + +- `retry_on_failure` - Enable the retry on failure feature. By default, it retries + the migration 30 times. After it runs out of retries, the migration is marked as halted. + To customize the number of retries, pass the `max_attempts` argument: + `retry_on_failure max_attempts: 10` + +```ruby +# frozen_string_literal: true + +class BatchedMigrationName < Elastic::Migration + # Declares a migration should be run in batches + batched! + throttle_delay 10.minutes + pause_indexing! + space_requirements! + retry_on_failure + + # ... +end +``` + +### Multi-version compatibility + +These Advanced Search migrations, like any other GitLab changes, need to support the case where +[multiple versions of the application are running at the same time](../multi_version_compatibility.md). + +Depending on the order of deployment, it's possible that the migration +has started or finished and there's still a server running the application code from before the +migration. We need to take this into consideration until we can +[ensure all Advanced Search migrations start after the deployment has finished](https://gitlab.com/gitlab-org/gitlab/-/issues/321619). + +### Reverting a migration + +Because Elasticsearch does not support transactions, we always need to design our +migrations to accommodate a situation where the application +code is reverted after the migration has started or after it is finished. + +For this reason we generally defer destructive actions (for example, deletions after +some data is moved) to a later merge request after the migrations have +completed successfully. To be safe, for self-managed customers we should also +defer it to another release if there is risk of important data loss. + +### Best practices for Advanced Search migrations + +Follow these best practices for best results: + +- When working in batches, keep the batch size under 9,000 documents + and `throttle_delay` for at least 3 minutes. The bulk indexer is set to run + every 1 minute and process a batch of 10,000 documents. These limits + allow the bulk indexer time to process records before another migration + batch is attempted. +- To ensure that document counts are up to date, you should refresh + the index before checking if a migration is completed. +- Add logging statements to each migration when the migration starts, when a + completion check occurs, and when the migration is completed. These logs + are helpful when debugging issues with migrations. +- Pause indexing if you're using any Elasticsearch Reindex API operations. +- Consider adding a retry limit if there is potential for the migration to fail. + This ensures that migrations can be halted if an issue occurs. + +## Deleting Advanced Search migrations in a major version upgrade + +Because our Advanced Search migrations usually require us to support multiple +code paths for a long period of time, it's important to clean those up when we +safely can. + +We choose to use GitLab major version upgrades as a safe time to remove +backwards compatibility for indices that have not been fully migrated. We +[document this in our upgrade documentation](../../update/index.md#upgrading-to-a-new-major-version). +We also choose to replace the migration code with the halted migration +and remove tests so that: + +- We don't need to maintain any code that is called from our Advanced Search + migrations. +- We don't waste CI time running tests for migrations that we don't support + anymore. +- Operators who have not run this migration and who upgrade directly to the + target version see a message prompting them to reindex from scratch. + +To be extra safe, we do not delete migrations that were created in the last +minor version before the major upgrade. So, if we are upgrading to `%14.0`, +we should not delete migrations that were only added in `%13.12`. This +extra safety net allows for migrations that might +take multiple weeks to finish on GitLab.com. It would be bad if we upgraded +GitLab.com to `%14.0` before the migrations in `%13.12` were finished. Because +our deployments to GitLab.com are automated and we don't have +automated checks to prevent this, the extra precaution is warranted. +Additionally, even if we did have automated checks to prevent it, we wouldn't +actually want to hold up GitLab.com deployments on Advanced Search migrations, +as they may still have another week to go, and that's too long to block +deployments. + +### Process for removing migrations + +For every migration that was created 2 minor versions before the major version +being upgraded to, we do the following: + +1. Confirm the migration has actually completed successfully for GitLab.com. +1. Replace the content of the migration with: + + ```ruby + include Elastic::MigrationObsolete + ``` + +1. Delete any spec files to support this migration. +1. Remove any logic handling backwards compatibility for this migration. You + can find this by looking for + `Elastic::DataMigrationService.migration_has_finished?(:migration_name_in_lowercase)`. +1. Create a merge request with these changes. Noting that we should not + accidentally merge this before the major release is started. -- GitLab