From 7c06312745110574282e274af51270781ea0d9d8 Mon Sep 17 00:00:00 2001
From: Igor <iwiedler@gitlab.com>
Date: Mon, 17 Feb 2025 16:37:46 +0000
Subject: [PATCH] Fix assets caching in scheduled cache-assets:production job

One of the optimizations for the build and deploy process is to cache
assets as a generic package that can then be consumed by the build
process.

Assets in this context refers to frontend assets built by the
gitlab:assets:compile rake task, which calls out to yarn. We compute a
cached-assets-hash over all frontend files. If none of these source
files changed, the build can reuse the previously compiled assets and
save approximately 40 minutes of build time.

The way this process is intended to work is via a scheduled pipeline on
gitlab-org/gitlab that runs every 2 hours. It checks the
cached-assets-hash, if no package exists, it builds an assets package
and publishes it to the package registry on gitlab-org/gitlab.

This logic was introduced by https://gitlab.com/gitlab-org/gitlab/-/merge_requests/96297. It was most recently updated by https://gitlab.com/gitlab-org/gitlab/-/merge_requests/179950.

That MR introduced a subtle bug: By changing the order of setting
`$GITLAB_ASSETS_HASH` and including
`scripts/gitlab_component_helpers.sh`, that helper library no longer is
able to consume the `$GITLAB_ASSETS_HASH` and instead defaults to the
string `"NO_HASH"`.

There is no logic to fail, when no hash is supplied. And so we compute
a package URL containing the string `NO_HASH`. The job then publishes a
package to that URL, and on the next run it will skip re-compiling
assets, because there already is a package present under `NO_HASH`.

The current cached assets package is 9 days old:

```
$ curl -I https://gitlab.com/api/v4/projects/278964/packages/generic/assets/production-ee-NO_HASH/assets-production-ee-NO_HASH-v2.tar.gz

last-modified: Wed, 05 Feb 2025 22:06:11 GMT
```

The saving grace is that this bug was only introduced for the scheduled
job, and not for the jobs consuming that cache. Thus we avoid building
and deploying omnibus packages or CNG images which contain a stale
cache. We got lucky here.

The only real consequence is that we no longer get any cache hits, so
the build process will always need to rebuild assets, even if none
changed. This was surfaced as part of
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/19280.

This patch fixes the bug by re-introducing the original order. This
allows the cache-assets:production job to produce valid assets cache
packages again, which will speed up builds and deploys in cases where no
assets were changed, which is crucial for rolling forward urgent fixes,
as it cuts 40m from time-to-production.

Additional measures we should consider for more safety:

- Check for NO_HASH and bail out.
- After downloading an assets archive, validate the contained
  cached-assets-hash against the one from the filesystem.
---
 .gitlab/ci/caching.gitlab-ci.yml | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/.gitlab/ci/caching.gitlab-ci.yml b/.gitlab/ci/caching.gitlab-ci.yml
index 65b7571350511..381ce7735c951 100644
--- a/.gitlab/ci/caching.gitlab-ci.yml
+++ b/.gitlab/ci/caching.gitlab-ci.yml
@@ -38,8 +38,11 @@ cache-workhorse:
     - |
       function cache_assets() {
         yarn_install_script
-        source scripts/gitlab_component_helpers.sh
+
+        # GITLAB_ASSETS_HASH must be defined before loading scripts/gitlab_component_helpers.sh
         export GITLAB_ASSETS_HASH=$(bundle exec rake gitlab:assets:hash_sum)
+        source scripts/gitlab_component_helpers.sh
+
         gitlab_assets_archive_doesnt_exist || { echoinfo "INFO: Exiting early as package exists."; exit 0; }
         assets_compile_script
         echo -n "${GITLAB_ASSETS_HASH}" > "cached-assets-hash.txt"
-- 
GitLab