@@ -23,25 +23,23 @@ contribution model in a cellular architecture.
...
@@ -23,25 +23,23 @@ contribution model in a cellular architecture.
A Cells 1.0 is meant to target enterprise customers that have the following expectations:
A Cells 1.0 is meant to target enterprise customers that have the following expectations:
1. They want to use our multi-tenant SaaS solution GitLab.com to serve their Organization.
1. They want to use our multi-tenant SaaS solution (GitLab.com) to serve their Organization.
1. They may receive updates later than the rest of GitLab.com.
1. They accept that they may receive updates later than the rest of GitLab.com.
1. They want use environment with higher degree of isolation to rest of the system.
1. They want to use an environment with higher degree of isolation to rest of the system.
1. They want to control all users that contribute to their Organization.
1. They want to control all users that contribute to their Organization.
1. Their groups and projects are meant to be private.
1. Their groups and projects are meant to be private.
1. Their users don't need to interact with many Organizations, or contribute to public projects with their account.
1. Their users don't need to interact with other Organizations, or contribute to public projects with their account.
For example these authenticated users would receive a 404 if they navigated to any public project
1. Are OK with being unable to switch Organizations with their account.
like `gitlab.com/gitlab-org/gitlab`.
1. Are OK with not being able to switch Organizations with their account.
From a development and infrastructure perspective we want to achieve the following goals:
From a development and infrastructure perspective we want to achieve the following goals:
1. All Cells are accessible under a single domain.
1. All Cells are accessible under a single domain.
1. Cells are mostly independent with minimal data sharing. All stateful data is segregated, and minimal data sharing is needed initially. This includes any database and cloud storage buckets.
1. Cells are mostly independent with minimal data sharing. All stateful data is segregated, and minimal data sharing is needed initially. This includes any database and cloud storage buckets.
1. Cells needs to be able to run independently with different versions.
1. Cells need to be able to run independently with different versions.
1.The proposed architecture allows us to achieve a cluster-wide data sharing later.
1.An architecture that allows for eventual cluster-wide data sharing.
1.We have a lightweight routing solution that is robust, but simple.
1.A routing solution that is robust, but simple.
1. All identifiers (primary keys, user, group and project names) are unique across the cluster, so that we can perform logical re-balancing at a later time. This includes all database tables, except ones using schemas `gitlab_internal`, or `gitlab_shared`.
1. All identifiers (primary keys, user, group, and project names) are unique across the cluster, so that we can perform logical re-balancing at a later time. This includes all database tables, except ones using schemas `gitlab_internal`, or `gitlab_shared`.
1.Since all users and groups are unique across the cluster, the same user will be able to access other Organizations and groups at GitLab.com in [Cells 2.0](cells-2.0.md).
1.Because all users and groups are unique across the cluster, the same user can access other Organizations and groups at GitLab.com in [Cells 2.0](cells-2.0.md).
1. The overhead of managing and upgrading Cells is minimal and similar to managing a GitLab Dedicated instance. Secondary Cells should not be a linear increase in operational burden.
1. The overhead of managing and upgrading Cells is minimal and similar to managing a GitLab Dedicated instance. Secondary Cells should not be a linear increase in operational burden.
1. The Cell should be deployed using the same tooling as GitLab Dedicated.
1. The Cell should be deployed using the same tooling as GitLab Dedicated.
...
@@ -108,7 +106,7 @@ The following statements describe a low-level development proposal to achieve th
...
@@ -108,7 +106,7 @@ The following statements describe a low-level development proposal to achieve th
1. The routing service is implemented as a Cloudflare Worker and is run on edge. The routing service is run with a static list of Cells. Each Cell is described by a proxy URL, and a prefix.
1. The routing service is implemented as a Cloudflare Worker and is run on edge. The routing service is run with a static list of Cells. Each Cell is described by a proxy URL, and a prefix.
1. Cells are exposed over the public internet, but might be guarded with Zero Trust.
1. Cells are exposed over the public internet, but might be guarded with Zero Trust.
### Overview (Architecture)
### Architecture overview
```plantuml
```plantuml
@startuml
@startuml
...
@@ -125,21 +123,21 @@ node "GitLab Inc. Infrastructure" {
...
@@ -125,21 +123,21 @@ node "GitLab Inc. Infrastructure" {
}
}
[Container Registry] as PC_Registry
[Container Registry] as PC_Registry
database DB as PC_DB {
database DB as PC_DB {
frame "PostgreSQL Cluster" as PC_PSQL {
frame "PostgreSQL Cluster" as PC_PSQL {
package "ci" as PC_PSQL_ci {
package "ci" as PC_PSQL_ci {
[gitlab_ci] as PC_PSQL_gitlab_ci
[gitlab_ci] as PC_PSQL_gitlab_ci
}
}
package "main" as PC_PSQL_main {
package "main" as PC_PSQL_main {
[gitlab_main_clusterwide] as PC_PSQL_gitlab_main_clusterwide
[gitlab_main_clusterwide] as PC_PSQL_gitlab_main_clusterwide
[gitlab_main_cell] as PC_PSQL_gitlab_main_cell
[gitlab_main_cell] as PC_PSQL_gitlab_main_cell
}
}
PC_PSQL_main -[hidden]-> PC_PSQL_ci
PC_PSQL_main -[hidden]-> PC_PSQL_ci
}
}
frame "Redis Cluster" as PC_Redis {
frame "Redis Cluster" as PC_Redis {
[Redis (many)] as PC_Redis_many
[Redis (many)] as PC_Redis_many
}
}
...
@@ -148,10 +146,10 @@ node "GitLab Inc. Infrastructure" {
...
@@ -148,10 +146,10 @@ node "GitLab Inc. Infrastructure" {
[Gitaly Nodes (many)] as PC_Gitaly_many
[Gitaly Nodes (many)] as PC_Gitaly_many
}
}
}
}
PC_Rails -[hidden]-> PC_DB
PC_Rails -[hidden]-> PC_DB
}
}
node "Secondary Cell" as SC {
node "Secondary Cell" as SC {
frame "GitLab Rails" as SC_Rails {
frame "GitLab Rails" as SC_Rails {
[Puma + Workhorse + LB] as SC_Puma
[Puma + Workhorse + LB] as SC_Puma
...
@@ -159,21 +157,21 @@ node "GitLab Inc. Infrastructure" {
...
@@ -159,21 +157,21 @@ node "GitLab Inc. Infrastructure" {
}
}
[Container Registry] as SC_Registry
[Container Registry] as SC_Registry
database DB as SC_DB {
database DB as SC_DB {
frame "PostgreSQL Cluster" as SC_PSQL {
frame "PostgreSQL Cluster" as SC_PSQL {
package "ci" as SC_PSQL_ci {
package "ci" as SC_PSQL_ci {
[gitlab_ci] as SC_PSQL_gitlab_ci
[gitlab_ci] as SC_PSQL_gitlab_ci
}
}
package "main" as SC_PSQL_main {
package "main" as SC_PSQL_main {
[gitlab_main_clusterwide] as SC_PSQL_gitlab_main_clusterwide
[gitlab_main_clusterwide] as SC_PSQL_gitlab_main_clusterwide
[gitlab_main_cell] as SC_PSQL_gitlab_main_cell
[gitlab_main_cell] as SC_PSQL_gitlab_main_cell
}
}
SC_PSQL_main -[hidden]-> SC_PSQL_ci
SC_PSQL_main -[hidden]-> SC_PSQL_ci
}
}
frame "Redis Cluster" as SC_Redis {
frame "Redis Cluster" as SC_Redis {
[Redis (many)] as SC_Redis_many
[Redis (many)] as SC_Redis_many
}
}
...
@@ -182,7 +180,7 @@ node "GitLab Inc. Infrastructure" {
...
@@ -182,7 +180,7 @@ node "GitLab Inc. Infrastructure" {
[Gitaly Nodes (many)] as SC_Gitaly_many
[Gitaly Nodes (many)] as SC_Gitaly_many
}
}
}
}
SC_Rails -[hidden]-> SC_DB
SC_Rails -[hidden]-> SC_DB
}
}
}
}
...
@@ -195,7 +193,7 @@ CF_RSW --> SC_Registry
...
@@ -195,7 +193,7 @@ CF_RSW --> SC_Registry
@enduml
@enduml
```
```
### Overview (API)
### API overview
```plantuml
```plantuml
@startuml
@startuml
...
@@ -209,7 +207,7 @@ node "GitLab Inc. Infrastructure" {
...
@@ -209,7 +207,7 @@ node "GitLab Inc. Infrastructure" {
[Sidekiq] as PC_Sidekiq
[Sidekiq] as PC_Sidekiq
}
}
}
}
node "Secondary Cell" as SC {
node "Secondary Cell" as SC {
frame SC_Rails [
frame SC_Rails [
{{
{{
...
@@ -235,7 +233,7 @@ The following technical problems have to be addressed:
...
@@ -235,7 +233,7 @@ The following technical problems have to be addressed:
### GitLab Configuration
### GitLab Configuration
The GitLab configuration in `gitlab.yml`will be extended with the following parameters to:
The GitLab configuration in `gitlab.yml`is extended with the following parameters to:
```yaml
```yaml
production:
production:
...
@@ -246,7 +244,7 @@ production:
...
@@ -246,7 +244,7 @@ production:
secrets_prefix:kPptz
secrets_prefix:kPptz
```
```
1.`primary_cell:`will be configured on Secondary Cells, and will indicate the URL endpoint to access the Primary Cell API.
1.`primary_cell:` configured on Secondary Cells, and indicates the URL endpoint to access the Primary Cell API.
1.`secrets_prefix:` can be used on all Cells, and indicates that each secret and session cookie is prefixed with this identifier.
1.`secrets_prefix:` can be used on all Cells, and indicates that each secret and session cookie is prefixed with this identifier.
### Primary Cell
### Primary Cell
...
@@ -317,11 +315,11 @@ The API is considered internal, and is guarded with a secret that is shared with
...
@@ -317,11 +315,11 @@ The API is considered internal, and is guarded with a secret that is shared with
The Secondary Cell does not expose any specific API at this point.
The Secondary Cell does not expose any specific API at this point.
The Secondary Cell implements a solution to guarantee uniqueness of primary database keys.
The Secondary Cell implements a solution to guarantee uniqueness of primary database keys.
1.`ReplenishDatabaseSequencesWorker`: this worker will run periodically, check all sequences, and replenish them.
1.`ReplenishDatabaseSequencesWorker`: this worker runs periodically, check all sequences, and replenish them.
#### Simple uniqueness of Database Sequences
#### Simple uniqueness of Database Sequences
Currently our DDL schema uses ID generation in the form: `id bigint DEFAULT nextval('product_analytics_events_experimental_id_seq'::regclass) NOT NULL`.
Our DDL schema uses ID generation in the form: `id bigint DEFAULT nextval('product_analytics_events_experimental_id_seq'::regclass) NOT NULL`.
The `/api/v4/internal/cells/database/claim` would execute the following sequence to claim range:
The `/api/v4/internal/cells/database/claim` would execute the following sequence to claim range:
...
@@ -410,7 +408,7 @@ $$;
...
@@ -410,7 +408,7 @@ $$;
- Secondary Cells would not be able to enforce unique constraints: create group, project, or user.
- Secondary Cells would not be able to enforce unique constraints: create group, project, or user.
- Other functionality of Secondary Cells would continue working as is: push, run CI.
- Other functionality of Secondary Cells would continue working as is: push, run CI.
- The routing layer makes this service very simple, because it is secret-based and uses prefix.
- The routing layer makes this service very simple, because it is secret-based and uses prefix.
- Reliability of the service is not dependent on Cell availability, since at this stage no dynamic classification is required.
- Reliability of the service is not dependent on Cell availability, because at this stage no dynamic classification is required.
- We anticipate that the routing layer will evolve to perform regular classification at a later point.
- We anticipate that the routing layer will evolve to perform regular classification at a later point.
- Mixed-deployment compatible by design.
- Mixed-deployment compatible by design.
- We do not share database connections. We expose APIs to interact with cluster-wide data.
- We do not share database connections. We expose APIs to interact with cluster-wide data.
...
@@ -463,14 +461,14 @@ We would have to ensure that the JWT token signed by GitLab is in a form that ca
...
@@ -463,14 +461,14 @@ We would have to ensure that the JWT token signed by GitLab is in a form that ca
If GitLab Pages are meant to support the `.gitlab.io` domain:
If GitLab Pages are meant to support the `.gitlab.io` domain:
- GitLab Pages need to be run as a single service that is not run as part of a Cell.
- GitLab Pages need to be run as a single service that is not run as part of a Cell.
-Since GitLab Pages use the API we need to make them routable.
-Because GitLab Pages use the API we need to make them routable.
- Similar to `routes`, claim `pages_domain` on the Primary Cell
- Similar to `routes`, claim `pages_domain` on the Primary Cell
- Implement dynamic classification in the routing service, based on a sharding key.
- Implement dynamic classification in the routing service, based on a sharding key.
- Cons: This adds another table that has to be kept unique cluster-wide.
- Cons: This adds another table that has to be kept unique cluster-wide.
Alternatively:
Alternatively:
- Run GitLab Pages within a Cell, but provide separate a domain.
- Run GitLab Pages in a Cell, but provide a separate domain.
- Custom domains would use the separate domain.
- Custom domains would use the separate domain.
- Cons: This creates a problem with having to manage a domain per Cell.
- Cons: This creates a problem with having to manage a domain per Cell.
- Cons: We expose Cells to users.
- Cons: We expose Cells to users.
...
@@ -541,7 +539,7 @@ We would have to ensure that the JWT token signed by GitLab is in a form that ca
...
@@ -541,7 +539,7 @@ We would have to ensure that the JWT token signed by GitLab is in a form that ca
that we would migrate.
that we would migrate.
- Making all existing routes routable would be a significant effort to fix
- Making all existing routes routable would be a significant effort to fix
for routes like `/-/autocomplete/users` and likely a multi-year effort. Some preliminary analysis
for routes like `/-/autocomplete/users` and likely a multi-year effort. Some preliminary analysis
how many routes are already classified can be found [here](https://gitlab.com/gitlab-org/gitlab/-/issues/430330#note_1633125914).
how many routes are already classified can be found [in this comment](https://gitlab.com/gitlab-org/gitlab/-/issues/430330#note_1633125914).
- In each case the routing service needs to be able to dynamically classify existing routes
- In each case the routing service needs to be able to dynamically classify existing routes
based on some defined criteria, requiring significant development effort, and increasing the
based on some defined criteria, requiring significant development effort, and increasing the
dependency on the Primary Cell or Cluster-wide service.
dependency on the Primary Cell or Cluster-wide service.
...
@@ -573,7 +571,7 @@ We would have to ensure that the JWT token signed by GitLab is in a form that ca
...
@@ -573,7 +571,7 @@ We would have to ensure that the JWT token signed by GitLab is in a form that ca
1. How would we synchronize `users` across Cells?
1. How would we synchronize `users` across Cells?
We build out-of-bounds replication of tables marked as `main_clusterwide`. We have yet to define
We build out-of-bounds replication of tables marked as `main_clusterwide`. We have yet to define
if this would be better to do via an `API` that is part of Rails, or using the Dedicated service.
if this would be better to do with an `API` that is part of Rails, or using the Dedicated service.
However, using Rails would likely be the simplest and most reliable solution, because the
However, using Rails would likely be the simplest and most reliable solution, because the
application knows the expected data structure.
application knows the expected data structure.
...
@@ -600,7 +598,7 @@ We would have to ensure that the JWT token signed by GitLab is in a form that ca
...
@@ -600,7 +598,7 @@ We would have to ensure that the JWT token signed by GitLab is in a form that ca
1. Run the Primary Cell API as an additional node that has dedicated database replica and can work while the main Cell is down.
1. Run the Primary Cell API as an additional node that has dedicated database replica and can work while the main Cell is down.
1. Implement a Primary Cell API on top of another highly available database in a different technology than Rails. Forward the API write calls to this storage (claim), but make read API calls (classify) to use this storage.
1. Implement a Primary Cell API on top of another highly available database in a different technology than Rails. Forward the API write calls to this storage (claim), but make read API calls (classify) to use this storage.
1. How will instancewide CI runners be configured on the new cells?
1. How can instance-wide CI runners be configured on the new cells?