Fix assets caching in scheduled cache-assets:production job
One of the optimizations for the build and deploy process is to cache assets as a generic package that can then be consumed by the build process. Assets in this context refers to frontend assets built by the gitlab:assets:compile rake task, which calls out to yarn. We compute a cached-assets-hash over all frontend files. If none of these source files changed, the build can reuse the previously compiled assets and save approximately 40 minutes of build time. The way this process is intended to work is via a scheduled pipeline on gitlab-org/gitlab that runs every 2 hours. It checks the cached-assets-hash, if no package exists, it builds an assets package and publishes it to the package registry on gitlab-org/gitlab. This logic was introduced by https://gitlab.com/gitlab-org/gitlab/-/merge_requests/96297. It was most recently updated by https://gitlab.com/gitlab-org/gitlab/-/merge_requests/179950. That MR introduced a subtle bug: By changing the order of setting `$GITLAB_ASSETS_HASH` and including `scripts/gitlab_component_helpers.sh`, that helper library no longer is able to consume the `$GITLAB_ASSETS_HASH` and instead defaults to the string `"NO_HASH"`. There is no logic to fail, when no hash is supplied. And so we compute a package URL containing the string `NO_HASH`. The job then publishes a package to that URL, and on the next run it will skip re-compiling assets, because there already is a package present under `NO_HASH`. The current cached assets package is 9 days old: ``` $ curl -I https://gitlab.com/api/v4/projects/278964/packages/generic/assets/production-ee-NO_HASH/assets-production-ee-NO_HASH-v2.tar.gz last-modified: Wed, 05 Feb 2025 22:06:11 GMT ``` The saving grace is that this bug was only introduced for the scheduled job, and not for the jobs consuming that cache. Thus we avoid building and deploying omnibus packages or CNG images which contain a stale cache. We got lucky here. The only real consequence is that we no longer get any cache hits, so the build process will always need to rebuild assets, even if none changed. This was surfaced as part of https://gitlab.com/gitlab-com/gl-infra/production/-/issues/19280. This patch fixes the bug by re-introducing the original order. This allows the cache-assets:production job to produce valid assets cache packages again, which will speed up builds and deploys in cases where no assets were changed, which is crucial for rolling forward urgent fixes, as it cuts 40m from time-to-production. Additional measures we should consider for more safety: - Check for NO_HASH and bail out. - After downloading an assets archive, validate the contained cached-assets-hash against the one from the filesystem.
想要评论请 注册 或 登录