From 3d61da1f43ce9b0766679bbf4a30d5d4d201b080 Mon Sep 17 00:00:00 2001
From: Christina Lohr <clohr@gitlab.com>
Date: Mon, 18 Sep 2023 13:38:51 +0000
Subject: [PATCH] Update impact of Cells on forking

---
 .../impacted_features/contributions-forks.md  | 137 +++++++++++++-----
 1 file changed, 97 insertions(+), 40 deletions(-)

diff --git a/doc/architecture/blueprints/cells/impacted_features/contributions-forks.md b/doc/architecture/blueprints/cells/impacted_features/contributions-forks.md
index cf32485e5c43..2053b87b1252 100644
--- a/doc/architecture/blueprints/cells/impacted_features/contributions-forks.md
+++ b/doc/architecture/blueprints/cells/impacted_features/contributions-forks.md
@@ -6,27 +6,25 @@ description: 'Cells: Contributions: Forks'
 
 <!-- vale gitlab.FutureTense = NO -->
 
-This document is a work-in-progress and represents a very early state of the
-Cells design. Significant aspects are not documented, though we expect to add
-them in the future. This is one possible architecture for Cells, and we intend to
-contrast this with alternatives before deciding which approach to implement.
-This documentation will be kept even if we decide not to implement this so that
-we can document the reasons for not choosing this approach.
+This document is a work-in-progress and represents a very early state of the Cells design.
+Significant aspects are not documented, though we expect to add them in the future.
+This is one possible architecture for Cells, and we intend to contrast this with alternatives before deciding which approach to implement.
+This documentation will be kept even if we decide not to implement this so that we can document the reasons for not choosing this approach.
 
 # Cells: Contributions: Forks
 
-The [Forking workflow](../../../../user/project/repository/forking_workflow.md) allows users to copy existing Project sources into their own namespace of choice (Personal or Group).
+The [forking workflow](../../../../user/project/repository/forking_workflow.md) allows users to copy existing Project sources into their own namespace of choice (personal or Group).
 
 ## 1. Definition
 
-The [Forking workflow](../../../../user/project/repository/forking_workflow.md) is a common workflow with various usage patterns:
+The [forking workflow](../../../../user/project/repository/forking_workflow.md) is a common workflow with various usage patterns:
 
-- It allows users to contribute back to upstream Project.
-- It persists repositories into their Personal Namespace.
-- Users can copy to make changes and release as modified Project.
+- It allows users to contribute back to an upstream Project.
+- It persists repositories into their personal namespace.
+- Users can copy a Project to make changes and release it as a modified Project.
 
 Forks allow users not having write access to a parent Project to make changes.
-The forking workflow is especially important for the open source community to contribute back to public Projects.
+The forking workflow is especially important for the open-source community to contribute back to public Projects.
 However, it is equally important in some companies that prefer a strong split of responsibilities and tighter access control.
 The access to a Project is restricted to a designated list of developers.
 
@@ -40,14 +38,45 @@ Forks enable:
 The forking model is problematic in a Cells architecture for the following reasons:
 
 - Forks are clones of existing repositories. Forks could be created across different Organizations, Cells and Gitaly shards.
-- Users can create merge requests and contribute back to an upstream Project. This upstream Project might in a different Organization and Cell.
+- Users can create merge requests and contribute back to an upstream Project. This upstream Project might be in a different Organization and Cell.
 - The merge request CI pipeline is executed in the context of the source Project, but presented in the context of the target Project.
 
-## 2. Data flow
+## 2. Data exploration
 
-## 3. Proposals
+From a [data exploration](https://gitlab.com/gitlab-data/product-analytics/-/issues/1380), we retrieved the following information about existing forks:
 
-### 3.1. Intra-Cluster forks
+- Roughly 1.8m forks exist on GitLab.com at the moment.
+- The majority of forks are under a personal namespace (82%).
+- We were expecting a minimal use of forks within the same top-level Group and/or organization. Forking is only necessary for users who don't have permissions to access a Project. Inside companies we wouldn't expect teams to use forking workflows much unless they for some reason have different permissions across different team members. The data showed that only 9% of fork relationships have matching ultimate parent namespace identifiers (top-level Groups and personal namespaces). The other 91% of fork relationships are forked across different top-level namespaces. When trying to match top-level Groups to an identifiable company, we saw that:
+  - 3% of forked Projects are forked from an upstream Project in the same organization.
+  - 83% of forked Projects do not have an identifiable organization related to either up or downstream Project.
+  - The remaining 14% are forked from a source Project within a different company.
+- 9% of top-level Groups (95k) with activity in the last 12 months have a project with a fork relationship, compared to 5% of top-level Groups (91k) with no activity in the last 12 months. We expect these top-level Groups to be impacted by Cells.
+
+## 3. Proposal - Forks are created in a dedicated contribution space of the current Organization
+
+Instead of creating Projects across Organizations, forks are created in a contribution space tied to the Organization.
+A contribution space is similar to a personal namespace but rather than existing in the default Organization, it exists within the Organization someone is trying to contribute to.
+Example:
+
+- Any User that can view an Organization (all Users for public Organizations) can create a contribution space in the Organization. This is a dedicated namespace where they can create forks of Projects in that Organization. For example for `Produce Inc.` it could be `gitlab.com/organization/produce-inc/@ayufan`.
+- To create a contribution space we do not require membership of an Organization as this would prevent open source workflows where contributors are able to fork and create a merge request without ever being invited to a Group or Project. We strictly respect visibility, so Users would not be able to create a fork in a private Organization without first being invited.
+- When creating a fork for a Project Users will only be presented with the option to create forks in Groups that are part of the Organization. We will also give Users the option to create a contribution space and put the fork there. Today there is also a "Create a group" option when creating a fork. This functionality would also be limited to creating a new group in the organization to store the new fork.
+- In order to support Users that want to fork without contributing back we might consider an option to create [an unlinked fork](../../../../user/project/repository/forking_workflow.md#unlink-a-fork) in any namespace they have permission to write to.
+- The User has as many contribution spaces as Organizations they contribute to.
+- The User cannot create additional personal Projects within contribution spaces. Personal Projects can continue to be created in their personal namespace.
+- The Organization can prevent or disable usage of contribution spaces. This would disable forking by anyone that does not belong to a Group within the Organization.
+- All current forks are migrated into the contribution space of the User in an Organization. Because this may result in data loss when the fork also has links to data outside of the upstream Project we will also keep the personal Project around as archived and remove the fork relationship.
+- All forks are part of the Organization.
+- Forks are not federated features.
+- The contribution space and forked Project do not share configuration with the parent Project.
+- If the Organization is deleted, the Projects containing forks will be moved either to the default Organization or we'll create a new Organization to house them, which is essentially a ghost Organization of the former Organization.
+- Data in contribution spaces do not contribute to customer usage from a billing perspective.
+- Today we do not have organization-scoped runners but if we do implement that they will likely need special settings for how or if they can be used by contribution space projects.
+
+## 4. Alternative proposals considered
+
+### 4.1. Intra-cluster forks
 
 This proposal implements forks as intra-Cluster forks where communication is done via API between all trusted Cells of a cluster:
 
@@ -59,48 +88,76 @@ This proposal implements forks as intra-Cluster forks where communication is don
 - CI pipeline is fetched in the context of the source Project as it is today, the result is fetched into the merge request of the target Project.
 - The Cell holding the target Project internally uses GraphQL to fetch the status of the source Project and includes in context of the information for merge request.
 
-Upsides:
+Pros:
 
 - All existing forks continue to work as they are, as they are treated as intra-Cluster forks.
 
-Downsides:
+Cons:
 
 - The purpose of Organizations is to provide strong isolation between Organizations. Allowing to fork across does break security boundaries.
 - However, this is no different to the ability of users today to clone a repository to a local computer and push it to any repository of choice.
-- Access control of source Project can be lower than those of target Project. Today, the system requires that in order to contribute back, the access level needs to be the same for fork and upstream.
-
-### 3.2. Forks are created in a Personal Namespace of the current Organization
-
-Instead of creating Projects across Organizations, forks are created in a user's Personal Namespace tied to the Organization. Example:
-
-- Each user that is part of an Organization receives their Personal Namespace. For example for `GitLab Inc.` it could be `gitlab.com/organization/gitlab-inc/@ayufan`.
-- The user has to fork into their own Personal Namespace of the Organization.
-- The user has as many Personal Namespaces as Organizations they belongs to.
-- The Personal Namespace behaves similar to the currently offered Personal Namespace.
-- The user can manage and create Projects within a Personal Namespace.
-- The Organization can prevent or disable usage of Personal Namespaces, disallowing forks.
-- All current forks are migrated into the Personal Namespace of user in an Organization.
-- All forks are part of the Organization.
-- Forks are not federated features.
-- The Personal Namespace and forked Project do not share configuration with the parent Project.
+- Access control of the source Project can be lower than that of the target Project. Today, the system requires that in order to contribute back, the access level needs to be the same for fork and upstream Project.
 
-### 3.3. Forks are created as internal Projects under current Projects
+### 4.2. Forks are created as internal Projects under current Projects
 
 Instead of creating Projects across Organizations, forks are attachments to existing Projects.
-Each user forking a Project receives their unique Project. Example:
+Each user forking a Project receives their unique Project.
+Example:
 
 - For Project: `gitlab.com/gitlab-org/gitlab`, forks would be created in `gitlab.com/gitlab-org/gitlab/@kamil-gitlab`.
 - Forks are created in the context of the current Organization, they do not cross Organization boundaries and are managed by the Organization.
 - Tied to the user (or any other user-provided name of the fork).
 - Forks are not federated features.
 
-Downsides:
+Cons:
 
 - Does not answer how to handle and migrate all existing forks.
 - Might share current Group/Project settings, which could be breaking some security boundaries.
 
-## 4. Evaluation
+## 5. Evaluation
+
+### 5.1. Pros
+
+### 5.2. Cons
+
+## 6. Example
+
+As an example, we will demonstrate the impact of this proposal for the case that we move `gitlab-org/gitlab` to a different Organization.
+`gitlab-org/gitlab` has [over 8K forks](https://gitlab.com/gitlab-org/gitlab/-/forks).
+
+### Does this direction impact the canonical URLs of those forks?
+
+Yes canonical URLs will change for forks.
+Existing users that have forks in personal namespaces and want to continue contributing merge requests, will be required to migrate their fork to a new fork in a contribution space.
+For example, a personal namespace fork at `https://gitlab.com/DylanGriffith/gitlab` will
+need to be migrated to `https://gitlab.com/-/contributions/gitlab-inc/@DylanGriffith/gitlab`.
+We may offer automated ways to move this, but manually the process would involve:
+
+1. Create the contribution space fork
+1. Push your local branch from your original fork to the new fork
+1. Recreate any merge request that was still open and you wanted to merge
+
+### Does it impact the Git URL of the repositories themselves?
+
+Yes.
+In the above the example the Git URL would change from
+`gitlab.com:DylanGriffith/gitlab.git` to `gitlab.com:/-/contributions/gitlab-inc/@DylanGriffith/gitlab.git`.
+
+### Would there be any user action required to accept their fork being moved within an Organization or towards a contribution space?
+
+If we offer an automated process we'd present this as an option for the user as they will become the new owner of the contribution space.
+
+### Can we make promises that we will not break the existing forks of public Projects hosted on GitLab.com?
 
-## 4.1. Pros
+Existing fork projects will not be deleted but their fork relationship will be
+removed when the source project is moved to another Organization.
+The owner of the open source project will be made aware that they will disconnect their
+forks when they move the project which will require them to close all existing
+merge requests from those forks.
+There will need to be some process for keeping the history from these merge requests while effectively losing the ability to
+collaborate on them or merge them.
 
-## 4.2. Cons
+In the case of `gitlab-org/gitlab` we will attempt to give as much notice of this process and make this process as transparent as possible.
+When we make the decision to move this project to an Organization we will seek additional
+feedback about what would be the minimum amount of automated migrations necessary to be acceptable here.
+But the workflow for contributors will change after the move so this will be a punctuated event regardless.
-- 
GitLab