Skip to content
代码片段 群组 项目
未验证 提交 25df6ae8 编辑于 作者: Lucas Charles's avatar Lucas Charles 提交者: GitLab
浏览文件

Merge branch 'epss-documentation-data-to-new-bucket' into 'master'

Document publishing EPSS data to new bucket rather than new dir

See merge request https://gitlab.com/gitlab-org/gitlab/-/merge_requests/158592



Merged-by: default avatarLucas Charles <me@lucascharles.me>
Approved-by: default avatarNick Ilieskou <nilieskou@gitlab.com>
Approved-by: default avatarLucas Charles <me@lucascharles.me>
Reviewed-by: default avatarLucas Charles <me@lucascharles.me>
Co-authored-by: default avatarYasha Rise <yrise@gitlab.com>
No related branches found
No related tags found
无相关合并请求
---
owning-stage: "~devops::secure"
description: 'EPSS Support ADR 002: Use a new bucket for EPSS data'
---
# EPSS Support ADR 002: Use a new bucket for EPSS data
## Context
PMDB exports data to GCP buckets. The data is later pulled by GitLab instances. Advisory data and license data are stored in different buckets. This is sensible, because advisory and license data are not directly related, and rather provide additional information about packages. Data is updated based on deltas—changes from the previous state of the data. Only those changes are saved with each addition to the database.
EPSS data is directly associated with advisories, so it feels natural to add it to the existing advisories bucket. However, the current advisories bucket is structured based on `purl_type`. Adding an `epss` data type would couple `epss` with `purl_type` which is a faulty pairing. Due to the tight coupling between `purl_type` and the existing advisories bucket, it would be difficult and convoluted to add `epss` to it.
Following [extensive discussions on the EPSS epic](https://gitlab.com/groups/gitlab-org/-/epics/11544#note_1952695268) and [discussion](https://gitlab.com/gitlab-org/gitlab/-/issues/468131#note_1961344123) during the refinement of PMDB issues, it was initially decided to use the existing bucket as this feels most intuitive and at the time felt a healthier approach. [Further discussion](https://gitlab.com/gitlab-org/gitlab/-/issues/467672#note_1980715240) during the refinement of the GitLab backend effort led to the decision to use a new bucket, due to the complexity of the coupling of `purl_type` and other, unrelated areas in the monolith. Adding `epss` to `purl_type` would impact other components and we want to avoid having to work around that. We may want to later simplify these areas and reconsider the bucket structure at a later stage.
## Decision
Export EPSS data to a new bucket, rather than exporting it into the existing PMDB advisories bucket.
## Consequences
The implementation is simpler than adding a directory to the existing advisories bucket, but may feel less intuitive.
This change require the relevant Terraform changes regarding the provisioning of a new bucket.
This should also be addressed in the exporter and the GitLab `package_metadata` sync configuration.
## Alternatives
The other option is to add EPSS data to the advisories bucket, since they are directly related. This was the [initial decision](https://gitlab.com/gitlab-org/gitlab/-/issues/468131#note_1980366323). This would allow us to utilize existing mechanisms and keep related data close. However, EPSS data doesn't fit into the current structure of the advisories bucket. An ideal solution would reconstruct the buckets in a manner more fitting for this approach, but this would be a big effort and is not critical enough.
...@@ -169,11 +169,11 @@ Following the discussions in the [EPSS epic](https://gitlab.com/groups/gitlab-or ...@@ -169,11 +169,11 @@ Following the discussions in the [EPSS epic](https://gitlab.com/groups/gitlab-or
1. PMDB database is extended with a new table to store EPSS scores. 1. PMDB database is extended with a new table to store EPSS scores.
1. PMDB infrastructure runs the feeder daily in order to pull and process EPSS data. 1. PMDB infrastructure runs the feeder daily in order to pull and process EPSS data.
1. The advisory-processor receives the EPSS data and stores them to the PMDB DB. 1. The advisory-processor receives the EPSS data and stores them to the PMDB DB.
1. PMDB exports EPSS data to existing PMDB advisories bucket. 1. PMDB exports EPSS data to a new PMDB EPSS bucket.
- Create a new directory in the existing bucket to store EPSS data. - Create a new bucket to store EPSS data.
- Delete former EPSS data once new data is uploaded, as the old data is no longer needed. - Delete former EPSS data once new data is uploaded, as the old data is no longer needed.
- Truncate EPSS scores to two digits after the dot. - Truncate EPSS scores to two digits after the dot.
1. GitLab instances pull data from the PMDB bucket. 1. GitLab instances pull data from the PMDB EPSS bucket.
- Create a new table in rails DB to store EPSS data. - Create a new table in rails DB to store EPSS data.
1. GitLab instances expose EPSS data through GraphQL API and present data in vulnerability report and details pages. 1. GitLab instances expose EPSS data through GraphQL API and present data in vulnerability report and details pages.
...@@ -201,6 +201,10 @@ compared with the pros and cons of alternatives. ...@@ -201,6 +201,10 @@ compared with the pros and cons of alternatives.
## Design and implementation details ## Design and implementation details
### Decisions
- [002: Use a new bucket for EPSS data](decisions/002_use_new_bucket.md)
### Important notes ### Important notes
- All EPSS scores get updated on a daily basis. This is pivotal to this feature's design. - All EPSS scores get updated on a daily basis. This is pivotal to this feature's design.
...@@ -211,7 +215,7 @@ compared with the pros and cons of alternatives. ...@@ -211,7 +215,7 @@ compared with the pros and cons of alternatives.
- Create a new EPSS table in [PMDB](https://gitlab.com/gitlab-org/security-products/license-db) with an advisory identifier and the EPSS score. This includes changing the [schema](https://gitlab.com/gitlab-org/security-products/license-db/schema) and any necessary migrations. - Create a new EPSS table in [PMDB](https://gitlab.com/gitlab-org/security-products/license-db) with an advisory identifier and the EPSS score. This includes changing the [schema](https://gitlab.com/gitlab-org/security-products/license-db/schema) and any necessary migrations.
- Ingest EPSS data into new PMDB table. We want to keep the EPSS data structure as close as possible to the origin so all of the data may be available to the exporter, and the exporter may choose how to process it. Therefore we will save scores and percentiles with their complete values. - Ingest EPSS data into new PMDB table. We want to keep the EPSS data structure as close as possible to the origin so all of the data may be available to the exporter, and the exporter may choose how to process it. Therefore we will save scores and percentiles with their complete values.
- Export EPSS scores in separate directory in the advisories bucket. - Export EPSS scores in separate bucket.
- Delete the previous day's export as it is no longer needed after the new one is added. - Delete the previous day's export as it is no longer needed after the new one is added.
- Add new pubsub topics to deployment to be used by PMDB components, using existing terraform modules. - Add new pubsub topics to deployment to be used by PMDB components, using existing terraform modules.
......
0% 加载中 .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册