diff --git a/doc/ci/caching/index.md b/doc/ci/caching/index.md index 6b8e7fa2ad555e2cb5558fc8fe2c965d7859ac98..b6518c87e13bc051deff93cb97e7d0ffe89c648d 100644 --- a/doc/ci/caching/index.md +++ b/doc/ci/caching/index.md @@ -23,61 +23,55 @@ how it is defined in `.gitlab-ci.yml`. NOTE: **Note:** Be careful if you use cache and artifacts to store the same path in your jobs -as **caches are restored before artifacts** and the content would be overwritten. - -Don't mix the caching with passing artifacts between stages. Caching is not -designed to pass artifacts between stages. Cache is for runtime dependencies -needed to compile the project: - -- `cache`: **Use for temporary storage for project dependencies.** Not useful - for keeping intermediate build results, like `jar` or `apk` files. - Cache was designed to be used to speed up invocations of subsequent runs of a - given job, by keeping things like dependencies (e.g., npm packages, Go vendor - packages, etc.) so they don't have to be re-fetched from the public internet. - While the cache can be abused to pass intermediate build results between - stages, there may be cases where artifacts are a better fit. +as **caches are restored before artifacts** and the content could be overwritten. + +Don't use caching for passing artifacts between stages, as it is designed to store +runtime dependencies needed to compile the project: + +- `cache`: **For storing project dependencies** + + Caches are used to speed up runs of a given job in **subsequent pipelines**, by + storing downloaded dependencies so that they don't have to be fetched from the + internet again (like npm packages, Go vendor packages, etc.) While the cache could + be configured to pass intermediate build results between stages, this should be + done with artifacts instead. + - `artifacts`: **Use for stage results that will be passed between stages.** - Artifacts were designed to upload some compiled/generated bits of the build, - and they can be fetched by any number of concurrent Runners. They are - guaranteed to be available and are there to pass data between jobs. They are - also exposed to be downloaded from the UI. **Artifacts can only exist in - directories relative to the build directory** and specifying paths which don't - comply to this rule trigger an unintuitive and illogical error message (an - enhancement is discussed at - [https://gitlab.com/gitlab-org/gitlab-foss/issues/15530](https://gitlab.com/gitlab-org/gitlab-foss/issues/15530) - ). Artifacts need to be uploaded to the GitLab instance (not only the GitLab - runner) before the next stage job(s) can start, so you need to evaluate - carefully whether your bandwidth allows you to profit from parallelization - with stages and shared artifacts before investing time in changes to the - setup. - -It's sometimes confusing because the name artifact sounds like something that -is only useful outside of the job, like for downloading a final image. But -artifacts are also available in between stages within a pipeline. So if you -build your application by downloading all the required modules, you might want -to declare them as artifacts so that each subsequent stage can depend on them -being there. There are some optimizations like declaring an -[expiry time](../yaml/README.md#artifactsexpire_in) so you don't keep artifacts -around too long, and using [dependencies](../yaml/README.md#dependencies) to -control exactly where artifacts are passed around. - -In summary: - -- Caches are disabled if not defined globally or per job (using `cache:`). -- Caches are available for all jobs in your `.gitlab-ci.yml` if enabled globally. -- Caches can be used by subsequent pipelines of that same job (a script in - a stage) in which the cache was created (if not defined globally). -- Caches are stored where the Runner is installed **and** uploaded to S3 if - [distributed cache is enabled](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching). -- Caches defined per job are only used, either: - - For the next pipeline of that job. - - If that same cache is also defined in a subsequent job of the same pipeline. -- Artifacts are disabled if not defined per job (using `artifacts:`). -- Artifacts can only be enabled per job, not globally. -- Artifacts are created during a pipeline and can be used by the subsequent - jobs of that currently active pipeline. -- Artifacts are always uploaded to GitLab (known as coordinator). -- Artifacts can have an expiration value for controlling disk usage (30 days by default). + + Artifacts are files generated by a job which are stored and uploaded, and can then + be fetched and used by jobs in later stages of the **same pipeline**. This data + will not be available in different pipelines, but is available to be downloaded + from the UI. + +The name `artifacts` sounds like it's only useful outside of the job, like for downloading +a final image, but artifacts are also available in later stages within a pipeline. +So if you build your application by downloading all the required modules, you might +want to declare them as artifacts so that subsequent stages can use them. There are +some optimizations like declaring an [expiry time](../yaml/README.md#artifactsexpire_in) +so you don't keep artifacts around too long, or using [dependencies](../yaml/README.md#dependencies) +to control which jobs fetch the artifacts. + +Caches: + +- Are disabled if not defined globally or per job (using `cache:`). +- Are available for all jobs in your `.gitlab-ci.yml` if enabled globally. +- Can be used in subsequent pipelines by the same job in which the cache was created (if not defined globally). +- Are stored where the Runner is installed **and** uploaded to S3 if [distributed cache is enabled](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching). +- If defined per job, are used: + - By the same job in a subsequent pipeline. + - By subsequent jobs in the same pipeline, if the they have identical dependencies. + +Artifacts: + +- Are disabled if not defined per job (using `artifacts:`). +- Can only be enabled per job, not globally. +- Are created during a pipeline and can be used by the subsequent jobs of that currently active pipeline. +- Are always uploaded to GitLab (known as coordinator). +- Can have an expiration value for controlling disk usage (30 days by default). + +NOTE: **Note:** +Both artifacts and caches define their paths relative to the project directory, and +can't link to files outside it. ## Good caching practices diff --git a/doc/ci/yaml/README.md b/doc/ci/yaml/README.md index b4516be4d1347dbd5a1fed4dbd51e96b2f7a9cb5..700aa37dd0c8e2bde00ce6754cb5e54633b8c092 100644 --- a/doc/ci/yaml/README.md +++ b/doc/ci/yaml/README.md @@ -1514,8 +1514,10 @@ globally and all jobs will use that definition. #### `cache:paths` -Use the `paths` directive to choose which files or directories will be cached. You can only specify paths within your `$CI_PROJECT_DIR`. -Wildcards can be used that follow the [glob](https://en.wikipedia.org/wiki/Glob_(programming)) patterns and [filepath.Match](https://golang.org/pkg/path/filepath/#Match). +Use the `paths` directive to choose which files or directories will be cached. Paths +are relative to the project directory (`$CI_PROJECT_DIR`) and cannot directly link outside it. +Wildcards can be used that follow the [glob](https://en.wikipedia.org/wiki/Glob_(programming)) +patterns and [filepath.Match](https://golang.org/pkg/path/filepath/#Match). Cache all files in `binaries` that end in `.apk` and the `.config` file: @@ -1744,8 +1746,9 @@ be available for download in the GitLab UI. #### `artifacts:paths` -You can only use paths that are within the local working copy. -Wildcards can be used that follow the [glob](https://en.wikipedia.org/wiki/Glob_(programming)) patterns and [filepath.Match](https://golang.org/pkg/path/filepath/#Match). +Paths are relative to the project directory (`$CI_PROJECT_DIR`) and cannot directly +link outside it. Wildcards can be used that follow the [glob](https://en.wikipedia.org/wiki/Glob_(programming)) +patterns and [filepath.Match](https://golang.org/pkg/path/filepath/#Match). To restrict which jobs a specific job will fetch artifacts from, see [dependencies](#dependencies).