From 8c7f09f1548f12898195d4c5b4a34cda77cc845b Mon Sep 17 00:00:00 2001 From: Thong Kuah <tkuah@gitlab.com> Date: Wed, 21 Dec 2022 19:36:37 +0000 Subject: [PATCH] Describe impact of Pods on backups Add cluster-metadata concerns --- doc/architecture/blueprints/pods/index.md | 9 +++ .../blueprints/pods/pods-feature-backups.md | 61 +++++++++++++++++++ 2 files changed, 70 insertions(+) create mode 100644 doc/architecture/blueprints/pods/pods-feature-backups.md diff --git a/doc/architecture/blueprints/pods/index.md b/doc/architecture/blueprints/pods/index.md index 7f7725351d130..077303be30ffa 100644 --- a/doc/architecture/blueprints/pods/index.md +++ b/doc/architecture/blueprints/pods/index.md @@ -150,6 +150,14 @@ At this moment, GitLab.com has "social-network"-like capabilities that may not f We should evaluate if the SMB and mid market segment is interested in these features, or if not having them is acceptable in most cases. +### Self-managed + +For reasons of consistency, it is expected that self-managed instances will +adopt the pods architecture as well. To expand, self-managed instances can +continue with just a single Pod while supporting the option of adding additional +Pods. Organizations, and possible User decomposition will also be adopted for +self-managed instances. + ## High-level architecture problems to solve A number of technical issues need to be resolved to implement Pods (in no particular order). This section will be expanded. @@ -325,6 +333,7 @@ This is the list of known affected features with the proposed solutions. - [Pods: Organizations](pods-feature-organizations.md) - [Pods: Router Endpoints Classification](pods-feature-router-endpoints-classification.md) - [Pods: Schema changes (Postgres and Elasticsearch migrations)](pods-feature-schema-changes.md) +- [Pods: Backups](pods-feature-backups.md) - [Pods: Global Search](pods-feature-global-search.md) - [Pods: CI Runners](pods-feature-ci-runners.md) - [Pods: Admin Area](pods-feature-admin-area.md) diff --git a/doc/architecture/blueprints/pods/pods-feature-backups.md b/doc/architecture/blueprints/pods/pods-feature-backups.md new file mode 100644 index 0000000000000..5e4de42f47326 --- /dev/null +++ b/doc/architecture/blueprints/pods/pods-feature-backups.md @@ -0,0 +1,61 @@ +--- +stage: enablement +group: pods +comments: false +description: 'Pods: Backups' +--- + +This document is a work-in-progress and represents a very early state of the +Pods design. Significant aspects are not documented, though we expect to add +them in the future. This is one possible architecture for Pods, and we intend to +contrast this with alternatives before deciding which approach to implement. +This documentation will be kept even if we decide not to implement this so that +we can document the reasons for not choosing this approach. + +# Pods: Backups + +Each pods will take its own backups, and consequently have its own isolated +backup / restore procedure. + +## 1. Definition + +GitLab Backup takes a backup of the PostgreSQL database used by the application, +and also Git repository data. + +## 2. Data flow + +Each pod has a number of application databases to back up (e.g. `main`, and `ci`). + +Additionally, there may be cluster-wide metadata tables (e.g. `users` table) +which is directly accesible via PostgreSQL. + +## 3. Proposal + +### 3.1. Cluster-wide metadata + +It is currently unknown how cluster-wide metadata tables will be accessible. We +may choose to have cluster-wide metadata tables backed up separately, or have +each pod back up its copy of cluster-wide metdata tables. + +### 3.2 Consistency + +#### 3.2.1 Take backups independently + +As each pod will communicate with each other via API, and there will be no joins +to the users table, it should be acceptable for each pod to take a backup +independently of each other. + +#### 3.2.2 Enforce snapshots + +We can require that each pod take a snapshot for the PostgreSQL databases at +around the same time to allow for a consistent-enough backup. + +## 4. Evaluation + +As the number of pods increases, it will likely not be feasible to take a +snapshot at the same time for all pods. Hence taking backups independently is +the better option. + +## 4.1. Pros + +## 4.2. Cons -- GitLab