gitlab-workhorse
Gitlab-workhorse is a smart reverse proxy for GitLab. It handles "large" HTTP requests such as file downloads, file uploads, Git push/pull and Git archive downloads.
Features that rely on Workhorse
Workhorse itself is not a feature, but there are several features in GitLab that would not work efficiently without Workhorse.
To put the efficiency benefit in context, consider that in 2020Q3 on GitLab.com we see Rails application threads using on average about 200MB of RSS vs about 200KB for Workhorse goroutines.
Examples of features that rely on Workhorse:
git clone
and git push
over HTTP
1. Git clone, pull and push are slow because they transfer large amounts of data and because each is CPU intensive on the GitLab side. Without workhorse, HTTP access to Git repositories would compete with regular web access to the application, requiring us to run way more Rails application servers.
2. CI runner long polling
GitLab CI runners fetch new CI jobs by polling the GitLab server. Workhorse acts as a kind of "waiting room" where CI runners can sit and wait for new CI jobs. Because of Go's efficiency we can fit a lot of runners in the waiting room at little cost. Without this waiting room mechanism we would have to add a lot more Rails server capacity.
3. File uploads and downloads
File uploads and downloads may be slow either because the file is large or because the user's connection is slow. Workhorse can handle the slow part for Rails. This improves the efficiency of features such as CI artifacts, package repositories, LFS objects, etc.
4. Websocket proxying
Features such as the web terminal require a long lived connection between the user's web browser and a container inside GitLab that is not directly accessible from the internet. Dedicating a Rails application thread to proxying such a connection would cost much more memory than it costs to have Workhorse look after it.
Quick facts (how does Workhorse work)
- Workhorse can handle some requests without involving Rails at all: for example, JavaScript files and CSS files are served straight from disk.
- Workhorse can modify responses sent by Rails: for example if you use
send_file
in Rails then gitlab-workhorse will open the file on disk and send its contents as the response body to the client. - Workhorse can take over requests after asking permission from Rails.
Example: handling
git clone
. - Workhorse can modify requests before passing them to Rails. Example: when handling a Git LFS upload Workhorse first asks permission from Rails, then it stores the request body in a tempfile, then it sends a modified request containing the tempfile path to Rails.
- Workhorse can manage long-lived WebSocket connections for Rails. Example: handling the terminal websocket for environments.
- Workhorse does not connect to Postgres, only to Rails and (optionally) Redis.
- We assume that all requests that reach Workhorse pass through an upstream proxy such as NGINX or Apache first.
- Workhorse does not accept HTTPS connections.
- Workhorse does not clean up idle client connections.
- We assume that all requests to Rails pass through Workhorse.
For more information see 'A brief history of gitlab-workhorse'.
Usage
gitlab-workhorse [OPTIONS]
Options:
-apiCiLongPollingDuration duration
Long polling duration for job requesting for runners (default 50s - enabled) (default 50ns)
-apiLimit uint
Number of API requests allowed at single time
-apiQueueDuration duration
Maximum queueing duration of requests (default 30s)
-apiQueueLimit uint
Number of API requests allowed to be queued
-authBackend string
Authentication/authorization backend (default "http://localhost:8080")
-authSocket string
Optional: Unix domain socket to dial authBackend at
-cableBackend string
Optional: ActionCable backend (default authBackend)
-cableSocket string
Optional: Unix domain socket to dial cableBackend at (default authSocket)
-config string
TOML file to load config from
-developmentMode
Allow the assets to be served from Rails app
-documentRoot string
Path to static files content (default "public")
-listenAddr string
Listen address for HTTP server (default "localhost:8181")
-listenNetwork string
Listen 'network' (tcp, tcp4, tcp6, unix) (default "tcp")
-listenUmask int
Umask for Unix socket
-logFile string
Log file location
-logFormat string
Log format to use defaults to text (text, json, structured, none) (default "text")
-pprofListenAddr string
pprof listening address, e.g. 'localhost:6060'
-prometheusListenAddr string
Prometheus listening address, e.g. 'localhost:9229'
-proxyHeadersTimeout duration
How long to wait for response headers when proxying the request (default 5m0s)
-secretPath string
File with secret key to authenticate with authBackend (default "./.gitlab_workhorse_secret")
-version
Print version and exit
The 'auth backend' refers to the GitLab Rails application. The name is a holdover from when gitlab-workhorse only handled Git push/pull over HTTP.
Gitlab-workhorse can listen on either a TCP or a Unix domain socket. It can also open a second listening TCP listening socket with the Go net/http/pprof profiler server.
Gitlab-workhorse can listen on redis events (currently only builds/register
for runners). This requires you to pass a valid TOML config file via
-config
flag.
For regular setups it only requires the following (replacing the string
with the actual socket)
Redis
Gitlab-workhorse integrates with Redis to do long polling for CI build requests. This is configured via two things:
- Redis settings in the TOML config file
- The
-apiCiLongPollingDuration
command line flag to control polling behavior for CI build requests
It is OK to enable Redis in the config file but to leave CI polling disabled; this just results in an idle Redis pubsub connection. The opposite is not possible: CI long polling requires a correct Redis configuration.
Below we discuss the options for the [redis]
section in the config
file.
[redis]
URL = "unix:///var/run/gitlab/redis.sock"
Password = "my_awesome_password"
Sentinel = [ "tcp://sentinel1:23456", "tcp://sentinel2:23456" ]
SentinelMaster = "mymaster"
-
URL
takes a string in the formatunix://path/to/redis.sock
ortcp://host:port
. -
Password
is only required if your redis instance is password-protected -
Sentinel
is used if you are using Sentinel. NOTE that if bothSentinel
andURL
are given, onlySentinel
will be used
Optional fields are as follows:
[redis]
DB = 0
ReadTimeout = "1s"
KeepAlivePeriod = "5m"
MaxIdle = 1
MaxActive = 1
-
DB
is the Database to connect to. Defaults to0
-
ReadTimeout
is how long a redis read-command can take. Defaults to1s
-
KeepAlivePeriod
is how long the redis connection is to be kept alive without anything flowing through it. Defaults to5m
-
MaxIdle
is how many idle connections can be in the redis-pool at once. Defaults to 1 -
MaxActive
is how many connections the pool can keep. Defaults to 1
Relative URL support
If you are mounting GitLab at a relative URL, e.g.
example.com/gitlab
, then you should also use this relative URL in
the authBackend
setting:
gitlab-workhorse -authBackend http://localhost:8080/gitlab
Interaction of authBackend and authSocket
The interaction between authBackend
and authSocket
can be a bit
confusing. It comes down to: if authSocket
is set it overrides the
host part of authBackend
but not the relative path.
In table form:
authBackend | authSocket | Workhorse connects to? | Rails relative URL |
---|---|---|---|
unset | unset | localhost:8080 |
/ |
http://localhost:3000 |
unset | localhost:3000 |
/ |
http://localhost:3000/gitlab |
unset | localhost:3000 |
/gitlab |
unset | /path/to/socket |
/path/to/socket |
/ |
http://localhost:3000 |
/path/to/socket |
/path/to/socket |
/ |
http://localhost:3000/gitlab |
/path/to/socket |
/path/to/socket |
/gitlab |
The same applies to cableBackend
and cableSocket
.
Installation
To install gitlab-workhorse you need Go 1.13 or newer and GNU Make.
To install into /usr/local/bin
run make install
.
make install
To install into /foo/bin
set the PREFIX variable.
make install PREFIX=/foo
On some operating systems, such as FreeBSD, you may have to use
gmake
instead of make
.
Dependencies
Exiftool
Workhorse uses exiftool for removing EXIF data (which may contain sensitive information) from uploaded images. If you installed GitLab:
-
Using the Omnibus package, you're all set. NOTE that if you are using CentOS Minimal, you may need to install
perl
package:yum install perl
-
From source, make sure
exiftool
is installed:# Debian/Ubuntu sudo apt-get install libimage-exiftool-perl # RHEL/CentOS sudo yum install perl-Image-ExifTool
Error tracking
GitLab-Workhorse supports remote error tracking with
Sentry. To enable this feature set the
GITLAB_WORKHORSE_SENTRY_DSN
environment variable.
You can also set the GITLAB_WORKHORSE_SENTRY_ENVIRONMENT
environment variable to
use the Sentry environment functionality to separate staging, production and
development.
Omnibus (/etc/gitlab/gitlab.rb
):
gitlab_workhorse['env'] = {
'GITLAB_WORKHORSE_SENTRY_DSN' => 'https://foobar'
'GITLAB_WORKHORSE_SENTRY_ENVIRONMENT' => 'production'
}
Source installations (/etc/default/gitlab
):
export GITLAB_WORKHORSE_SENTRY_DSN='https://foobar'
export GITLAB_WORKHORSE_SENTRY_ENVIRONMENT='production'
Tests
Run the tests with:
make clean test
Coverage / what to test
Each feature in gitlab-workhorse should have an integration test that verifies that the feature 'kicks in' on the right requests and leaves other requests unaffected. It is better to also have package-level tests for specific behavior but the high-level integration tests should have the first priority during development.
It is OK if a feature is only covered by integration tests.
Distributed Tracing
Workhorse supports distributed tracing through LabKit using OpenTracing APIs.
By default, no tracing implementation is linked into the binary, but different OpenTracing providers can be linked in using build tags/build constraints. This can be done by setting the BUILD_TAGS
make variable.
For more details of the supported providers, see LabKit, but as an example, for Jaeger tracing support, include the tags: BUILD_TAGS="tracer_static tracer_static_jaeger"
.
make BUILD_TAGS="tracer_static tracer_static_jaeger"
Once Workhorse is compiled with an opentracing provider, the tracing configuration is configured via the GITLAB_TRACING
environment variable.
For example:
GITLAB_TRACING=opentracing://jaeger ./gitlab-workhorse
Continuous Profiling
Workhorse supports continuous profiling through LabKit using Stackdriver Profiler.
By default, the Stackdriver Profiler implementation is linked in the binary using build tags, though it's not required and can be skipped.
For example:
make BUILD_TAGS=""
Once Workhorse is compiled with Continuous Profiling, the profiler configuration can be set via GITLAB_CONTINUOUS_PROFILING
environment variable.
For example:
GITLAB_CONTINUOUS_PROFILING="stackdriver?service=workhorse&service_version=1.0.1&project_id=test-123 ./gitlab-workhorse"
More information about see the LabKit monitoring docs.
License
This code is distributed under the MIT license, see the LICENSE file.