Skip to content
代码片段 群组 项目
未验证 提交 e1296ede 编辑于 作者: Evan Read's avatar Evan Read 提交者: GitLab
浏览文件

Merge branch 'gitaly-5527-cc-monorepo-git-performance' into 'master'

doc: Add monorepo and large blob info

See merge request https://gitlab.com/gitlab-org/gitlab/-/merge_requests/148241



Merged-by: default avatarEvan Read <eread@gitlab.com>
Approved-by: default avatarJohn Cai <jcai@gitlab.com>
Approved-by: default avatarEvan Read <eread@gitlab.com>
Co-authored-by: default avatarChristian Couder <christian.couder@gmail.com>
No related branches found
No related tags found
无相关合并请求
......@@ -22,15 +22,28 @@ Monorepos can be large for [many reasons](https://about.gitlab.com/blog/2022/09/
Large repositories pose a performance risk when used in GitLab, especially if a large monorepo receives many clones or pushes a day, which is common for them.
Git itself has performance limitations when it comes to handling
monorepos.
### Git performance issues with large repositories
Monorepos can also impact notably on hardware, in some cases hitting limitations such as vertical scaling and network or disk bandwidth limits.
Git uses [packfiles](https://git-scm.com/book/en/v2/Git-Internals-Packfiles)
to store its objects so that they take up as little space as
possible. Packfiles are also used to transfer objects when cloning,
fetching, or pushing between a Git client and a Git server. Using packfiles is
usually good because it reduces the amount of disk space and network
bandwith required.
However, creating packfiles requires a lot of CPU and memory to compress object
content. So when repositories are large, every Git operation
that requires creating packfiles becomes expensive and slow as more
and bigger objects need to be processed and transfered.
### Consequences for GitLab
[Gitaly](https://gitlab.com/gitlab-org/gitaly) is our Git storage service built
on top of [Git](https://git-scm.com/). This means that any limitations of
Git are experienced in Gitaly, and in turn by end users of GitLab.
Monorepos can also impact notably on hardware, in some cases hitting limitations such as vertical scaling and network or disk bandwidth limits.
## Optimize GitLab settings
You should use as many of the following strategies as possible to minimize
......@@ -39,9 +52,9 @@ fetches on the Gitaly server.
### Rationale
The most resource intensive operation in Git is the
[`git-pack-objects`](https://git-scm.com/docs/git-pack-objects) process. It is
responsible for figuring out all of the commit history and files to send back to
the client.
[`git-pack-objects`](https://git-scm.com/docs/git-pack-objects)
process, which is responsible for creating packfiles after figuring out
all of the commit history and files to send back to the client.
The larger the repository, the more commits, files, branches, and tags that a
repository has and the more expensive this operation is. Both memory and CPU
......@@ -332,10 +345,26 @@ when doing an object graph walk.
### Large blobs
The presence of large files (called blobs in Git), can be problematic for Git
because it does not handle large binary files efficiently. If there are blobs over
10 MB or instance in the `git-sizer` output, this probably means there is binary
data in your repository.
Blobs are the [Git objects](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects)
that are used to store and manage the content of the files that users
have commited into Git repositories.
#### Issues with large blobs
Large blobs can be problematic for Git because Git does not handle
large binary data efficiently. Blobs over 10 MB in the `git-sizer` output
probably means that there is large binary data in your repository.
While source code can usually be efficiently compressed, binary data
is often already compressed. This means that Git is unlikely to be
successful when it tries to compress large blobs when creating packfiles.
This results in larger packfiles and higher CPU, memory, and bandwidth
usage on both Git clients and servers.
On the client side, because Git stores blob content in both packfiles
(usually under `.git/objects/pack/`) and regular files (in
[worktrees](https://git-scm.com/docs/git-worktree)), much more disk
space is usually required than for source code.
#### Use LFS for large blobs
......
0% 加载中 .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册