From 3fd605fb5f3542b1517ff9d9c014dcedd7d76734 Mon Sep 17 00:00:00 2001 From: John Cai <jcai@gitlab.com> Date: Thu, 5 May 2022 23:52:51 +0000 Subject: [PATCH] Gitaly docs: Add Cgroups Cgroups in Gitaly has not been documented. Add documentation for configuration, as well as the metrics for observability. --- doc/administration/gitaly/configure_gitaly.md | 132 ++++++++++++++++++ doc/administration/gitaly/monitoring.md | 11 ++ 2 files changed, 143 insertions(+) diff --git a/doc/administration/gitaly/configure_gitaly.md b/doc/administration/gitaly/configure_gitaly.md index e128682af754f..8c78423237151 100644 --- a/doc/administration/gitaly/configure_gitaly.md +++ b/doc/administration/gitaly/configure_gitaly.md @@ -817,6 +817,138 @@ repository. In the example above: You can observe the behavior of this queue using the Gitaly logs and Prometheus. For more information, see the [relevant documentation](monitoring.md#monitor-gitaly-concurrency-limiting). +## Control groups + +> - Introduced in GitLab 13.10. +> - New version of the configuration was introduced in GitLab 15.0. + +Gitaly shells out to Git for many of its operations. Git can consume a lot of resources for certain operations, +especially for large repositories. + +Control groups (cgroups) in Linux allow limits to be imposed on how much memory and CPU can be consumed. +See the [`cgroups` Linux man page](https://man7.org/linux/man-pages/man7/cgroups.7.html) for more information. +cgroups can be useful for protecting the system against resource exhaustion because of overcomsumption of memory and CPU. + +Gitaly has built-in cgroups control. When configured, Gitaly assigns Git +processes to a cgroup based on the repository the Git command is operating in. +Each cgroup has a memory and CPU limit. When a cgroup reaches its: + +- Memory limit, the kernel looks through the processes for a candidate to kill. +- CPU limit, processes are not killed, but the processes are prevented from consuming more CPU than allowed. + +The main reason to configure cgroups for your GitLab installation is that it +protects against system resource starvation due to a few large repositories or +bad actors. + +Some Git operations are expensive by nature. `git clone`, for instance, +spawns a `git-upload-pack` process on the server that can consume a lot of memory +for large repositories. For example, a client that keeps on cloning a +large repository over and over again. This situation could potentially use up all of the +memory on a server, causing other operations to fail for other users. + +There are many ways someone can create a repository that can consume large amounts of memory when cloned or downloaded. +Using cgroups allows the kernel to kill these operations before they hog up all system resources. + +### Configure cgroups in Gitaly + +How you configure cgroups in Gitaly depends on what version of GitLab you use. + +#### GitLab 13.10 to GitLab 14.10 + +To configure cgroups in Gitaly for GitLab versions 13.10 to 14.10, add `gitaly['cgroups']` to `/etc/gitlab/gitlab.rb`. For +example: + +```ruby +# in /etc/gitlab/gitlab.rb +gitaly['cgroups_count'] = 1000 +gitaly['cgroups_mountpoint'] = "/sys/fs/cgroup" +gitaly['cgroups_hierarchy_root'] = "gitaly" +gitaly['cgroups_memory_limit'] = 32212254720 +gitaly['cgroups_memory_enabled'] = true +gitaly['cgroups_cpu_shares'] = 1024 +gitaly['cgroups_cpu_enabled'] = true + +``` + +- `cgroups_count` is the number of cgroups created. Each time a new + command is spawned, Gitaly assigns it to one of these cgroups based + on the command line arguments of the command. A circular hashing algorithm assigns + commands to these cgroups. +- `cgroups_mountpoint` is where the parent cgroup directory is mounted. Defaults to `/sys/fs/cgroup`. +- `cgroups_hierarchy_root` is the parent cgroup under which Gitaly creates groups, and + is expected to be owned by the user and group Gitaly runs as. Omnibus GitLab + creates the set of directories `mountpoint/<cpu|memory>/hierarchy_root` + when Gitaly starts. +- `cgroups_memory_enabled` enables or disables the memory limit on cgroups. +- `cgroups_memory_bytes` is the total memory limit each cgroup imposes on the processes added to it. +- `cgroups_cpu_enabled` enables or disables the CPU limit on cgroups. +- `cgroups_cpu_shares` is the CPU limit each cgroup imposes on the processes added to it. The maximum is 1024 shares, + which represents 100% of CPU. + which represents 100% of CPU. + +#### GitLab 15.0 and later + +To configure cgroups in Gitaly for GitLab versions 15.0 and later, add `gitaly['cgroups']` to `/etc/gitlab/gitlab.rb`. For +example: + +```ruby +# in /etc/gitlab/gitlab.rb +gitaly['cgroups_mountpoint'] = "/sys/fs/cgroup" +gitaly['cgroups_hierarchy_root'] =>"gitaly" +gitaly['cgroups_memory_bytes'] = 64424509440, # 60gb +gitaly['cgroups_cpu_shares'] = 1024 +gitaly['cgroups_repositories_count'] => 1000, +gitaly['cgroups_repositories_memory_bytes'] => 32212254720 # 20gb +gitaly['cgroups_repositories_cpu_shares'] => 512 +``` + +- `cgroups_mountpoint` is where the parent cgroup directory is mounted. Defaults to `/sys/fs/cgroup`. +- `cgroups_hierarchy_root` is the parent cgroup under which Gitaly creates groups, and + is expected to be owned by the user and group Gitaly runs as. Omnibus GitLab + creates the set of directories `mountpoint/<cpu|memory>/hierarchy_root` + when Gitaly starts. +- `cgroups_memory_bytes` is the total memory limit that is imposed collectively on all + Git processes that Gitaly spawns. 0 implies no limit. +- `cgroups_cpu_shares` is the cpu limit that is imposed collectively on all Git + processes that Gitaly spawns. 0 implies no limit. The maximum is 1024 shares, + which represents 100% of CPU. +- `cgroups_repositories_count` is the number of cgroups in the cgroups pool. Each time a new Git + command is spawned, Gitaly assigns it to one of these cgroups based + on the repository the command is for. A circular hashing algorithm assigns + Git commands to these cgroups, so a Git command for a repository is + always assigned to the same cgroup. +- `cgroups_repositories_memory_bytes` is the total memory limit that is imposed collectively on all + Git processes that Gitaly spawns. 0 implies no limit. This value cannot exceed + that of the top level `cgroups_memory_bytes`. +- `cgroups_repositories_cpu_shares` is the CPU limit that is imposed collectively on all Git + processes Gitaly spawns. 0 implies no limit. The maximum is 1024 shares, + which represents 100% of CPU. This value cannot exceed that of the top + level`cgroups_cpu_shares`. + +The difference in the cgroups configuration in GitLab 15.0 and later is that we create a pool of cgroups that are isolated +based on the repository used in the Git command to be placed under one of these cgroups. + +### Configuring oversubscription + +In the previous example configuration for GitLab 15.0 and later: + +- The top level memory limit is capped at 60gb. +- Each of the 1000 cgroups in the repositories pool is capped at 20gb. + +This is called "oversubscription". Each cgroup in the pool has a much larger capacity than 1/1000th +of the top-level memory limit. + +This strategy has two main benefits: + +- It gives the host protection from overall memory starvation (OOM), because the top-level + cgroup's memory limit can be set to a threshold smaller than the host's + capacity. Processes outside of that cgroup are not at risk of OOM. +- It allows each individual cgroup in the pool to burst up to a generous upper + bound (in this example 20 GB) that is smaller than the parent cgroup's limit, + but substantially larger than 1/N of the parent's limit. In this example, up + to 3 child cgroups can concurrently burst up to their max. In general, all + 1000 cgroups would use much less than the 20 GB. + ## Background Repository Optimization Empty directories and unneeded configuration settings may accumulate in a repository and diff --git a/doc/administration/gitaly/monitoring.md b/doc/administration/gitaly/monitoring.md index 7a4f2026f3d25..17f94f912ee06 100644 --- a/doc/administration/gitaly/monitoring.md +++ b/doc/administration/gitaly/monitoring.md @@ -44,6 +44,17 @@ the Gitaly logs and Prometheus: - `gitaly_concurrency_limiting_acquiring_seconds` indicates how long a request has to wait due to concurrency limits before being processed. +## Monitor Gitaly cgroups + +You can observe the status of [control groups (cgroups)](configure_gitaly.md#control-groups) using Prometheus: + +- `gitaly_cgroups_memory_failed_total`, a gauge for the total number of times + the memory limit has been hit. This number resets each time a server is + restarted. +- `gitaly_cgroups_cpu_usage`, a gauge that measures CPU usage per cgroup. +- `gitaly_cgroup_procs_total`, a gauge that measures the total number of + processes Gitaly has spawned under the control of cgroups. + ## `pack-objects` cache The following [`pack-objects` cache](configure_gitaly.md#pack-objects-cache) metrics are available: -- GitLab