diff --git a/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md index dce9883be4b624bf79b63f21ffef51fa297f3e18..942195879ca64fab43904c1d405e740c9d98f083 100644 --- a/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md +++ b/doc/architecture/blueprints/ci_data_decay/pipeline_partitioning.md @@ -323,9 +323,116 @@ scope block takes an argument). Preloading instance dependent scopes is not supported. ``` -We also need to build a proof of concept for removing data on the PostgreSQL -side (using foreign keys with `ON DELETE CASCADE`) and removing data through -Rails associations, as this might be an important area of uncertainty. +### Foreign keys + +Foreign keys must reference columns that either are a primary key or form a +unique constraint. We can define them using these strategies: + +#### Between routing tables sharing partition ID + +For relations that are part of the same pipeline hierarchy it is possible to +share the `partition_id` column to define the foreign key constraint: + +```plaintext +p_ci_pipelines: + - id + - partition_id + +p_ci_builds: + - id + - partition_id + - pipeline_id +``` + +In this case, `p_ci_builds.partition_id` indicates the partition for the build +and also for the pipeline. We can add a FK on the routing table using: + +```sql +ALTER TABLE ONLY p_ci_builds + ADD CONSTRAINT fk_on_pipeline_and_partition + FOREIGN KEY (pipeline_id, partition_id) + REFERENCES p_ci_pipelines(id, partition_id) ON DELETE CASCADE; +``` + +#### Between routing tables with different partition IDs + +It's not possible to reuse the `partition_id` for all relations in the CI domain, +so in this case we'll need to store the value as a different attribute. For +example, when canceling redundant pipelines we store on the old pipeline row +the ID of the new pipeline that cancelled it as `auto_canceled_by_id`: + +```plaintext +p_ci_pipelines: + - id + - partition_id + - auto_canceled_by_id + - auto_canceled_by_partition_id +``` + +In this case we can't ensure that the canceling pipeline is part of the same +hierarchy as the canceled pipelines, so we need an extra attribute to store its +partition, `auto_canceled_by_partition_id`, and the FK becomes: + +```sql +ALTER TABLE ONLY p_ci_pipelines + ADD CONSTRAINT fk_cancel_redundant_pieplines + FOREIGN KEY (auto_canceled_by_id, auto_canceled_by_partition_id) + REFERENCES p_ci_pipelines(id, partition_id) ON DELETE SET NULL; +``` + +#### Between routing tables and regular tables + +Not all of the tables in the CI domain will be partitioned, so we'll have routing +tables that will reference non-partitioned tables, for example we reference +`external_pull_requests` from `ci_pipelines`: + +```sql +FOREIGN KEY (external_pull_request_id) +REFERENCES external_pull_requests(id) +ON DELETE SET NULL +``` + +In this case we only need to move the FK definition from the partition level +to the routing table so that new pipeline partitions may use it: + +```sql +ALTER TABLE p_ci_pipelines + ADD CONSTRAINT fk_external_request + FOREIGN KEY (external_pull_request_id) + REFERENCES external_pull_requests(id) ON DELETE SET NULL; +``` + +#### Between regular tables and routing tables + +Most of the tables from the CI domain reference at least one table that will be +turned into a routing tables, for example `ci_pipeline_messages` references +`ci_pipelines`. These definitions will need to be updated to use the routing +tables and for this they will need a `partition_id` column: + +```plaintext +p_ci_pipelines: + - id + - partition_id + +ci_pipeline_messages: + - id + - pipeline_id + - pipeline_partition_id +``` + +The foreign key can be defined by using: + +```sql +ALTER TABLE ci_pipeline_messages ADD CONSTRAINT fk_pipeline_partitioned + FOREIGN KEY (pipeline_id, pipeline_partition_id) + REFERENCES p_ci_pipelines(id, partition_id) ON DELETE CASCADE; +``` + +The old FK definition will need to be removed, otherwise new inserts in the +`ci_pipeline_messages` with pipeline IDs from non-zero partition will fail with +reference errors. + +### Indexes We [learned](https://gitlab.com/gitlab-org/gitlab/-/issues/360148) that `PostgreSQL` does not allow to create a single index (unique or otherwise) across all partitions of a table.