diff --git a/doc/development/sec/vulnerability_tracking.md b/doc/development/sec/vulnerability_tracking.md new file mode 100644 index 0000000000000000000000000000000000000000..8fd99d8e80c57148b8c085cb886fb649032c0296 --- /dev/null +++ b/doc/development/sec/vulnerability_tracking.md @@ -0,0 +1,209 @@ +--- +stage: Security Risk Management +group: Security Insights +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments +--- + +# Vulnerability tracking overview + +At GitLab we run Git combined with automated security testing in Continuous +Integration and Continuous Delivery (CI/CD) processes. These processes +continuously monitor code changes to detect security vulnerabilities as early +as possible. Security testing often involves multiple Static Application +Security Testing (SAST) tools, each specialized in detecting specific +vulnerabilities, such as hardcoded passwords or insecure data flows. A +heterogeneous SAST setup, using multiple tools, helps minimize the software's +attack surface. The security findings from these tools undergo Vulnerability +Management, a semi-manual process of understanding, categorizing, storing, and +acting on them. + +Code volatility (the constant change of the project's source code) and double reporting +(the overlap of findings reported by multiple tools) are potential sources of duplication, +imposing futile auditing effort on the analyst. + +Vulnerability tracking is an automated process that helps deduplicate and +track vulnerabilities throughout the lifetime of a software project. + +Our Vulnerability tracking method is based on [Scope+Offset](https://gitlab.com/gitlab-org/security-products/post-analyzers/tracking-calculator/-/blob/master/README.md) (internal). + +The predecessor to the `Scope+Offset` method was line-based fingerprinting which is more +fragile, resulting in many already detected vulnerabilities to be re-introduced. +Avoiding duplication was the motivation for implementing the `Scope+Offset` method. +[See the corresponding research issue for more background](https://gitlab.com/groups/gitlab-org/-/epics/4626) (internal). + +## Components + +On a very high level, the vulnerability tracking flow is depicted below. For the remainder of this section, we assume that the SAST analyzer and the Tracking Calculator represent the tracking signature *producer* component and the Rails backend represents the tracking signature *consumer* component for the purposes Vulnerability tracking. The components are explained in more detail below. + +``` mermaid +flowchart LR + R["Repository"] + S("SAST Analyzer [CI]") + T("tracking-calculator [CI]") + B("Rails backend") + + R --code--> S --gl-sast-report.json--> T --augmented gl-sast-report.json--> B + R --code --> T +``` + +### Tracking signature producer + +The SAST Analyzer runs in a CI context, analyzes the source code and produces a `gl-sast-report.json` file. The [Tracking Calculator](https://gitlab.com/gitlab-org/security-products/post-analyzers/tracking-calculator) computes scopes by means of the source code and matches them with the vulnerabilities listed in the `gl-sast-report.json`. If there is a match, Tracking Calculator computes signatures (by means of Scope+Offset) and includes each into the original report (augmenting `gl-sast-report`) by means of the `tracking` object (depicted below). + +``` json + "tracking": { + "type": "source", + "items": [ + { + "file": "test.c", + "line_start": 12, + "line_end": 12, + "signatures": [ + { + "algorithm": "scope_offset_compressed", + "value": "test.c|main()[0]:5" + }, + { + "algorithm": "scope_offset", + "value": "test.c|main()[0]:8" + } + ] + } + ] + } +``` + +Tracking Calculator is directly embedded into the [Docker image of the SAST Analyzer](https://gitlab.com/gitlab-org/security-products/analyzers/semgrep/-/blob/52bedd15745ddb6124662e0dcda331e2e64b000b/Dockerfile#L5) (internal) +and invoked by means of [this script](https://gitlab.com/gitlab-org/security-products/post-analyzers/scripts/-/blob/474cfd78054d97291155045eaef66aa3b7919368/start.sh). + +It is important to note that Tracking Calculator already [performs deduplication](https://gitlab.com/gitlab-org/security-products/post-analyzers/tracking-calculator/-/blob/c7b6f255ad030e6b9da58c12fa87204b8df71129/trackinginfo/sast.go#L127) +that is enabled by default. In the example above we have two different +algorithms `scope_offset_compressed` and `scope_offset` where +`scope_offset_compressed` is considered an improvement of `scope_offset` so +that `scope_offset_compressed` is assigned a higher priority. + +If `scope_offset` and `scope_offset_compressed` agree on the same fingerprint, +only the result from `scope_offset_compressed` would be added as it is +considered the algorithm with the higher priority. + +The report is then ingested into the consumer component where these signatures +are used to generate vulnerability fingerprints by means of the vulnerability +UUID. + +--- + +### Tracking signature consumer + +In the Rails code we differentiate between security findings (findings that +originate from the report) and vulnerability findings (persisted in the DB). +Security findings are generated when the [reports is parsed](https://gitlab.com/gitlab-org/gitlab/-/blob/e2f0c25d56d7ee5e85e00093331e55197fe66151/lib/gitlab/ci/parsers/security/common.rb#L98); +this is also the place where the [UUID is generated](https://gitlab.com/gitlab-org/gitlab/-/blob/415453f3bf788579f47fb8b471629beb1e063d56/app/services/security/vulnerability_uuid.rb#L6). + +#### Storing security findings temporarily + +The diagram below depicts the flow that is executed on all pipelines for +storing security findings temporarily. One of the most interesting Components +from the vulnerability tracking perspective is the `OverrideUuidsService`. +The `OverrideUuidsService` matches security findings against vulnerability findings on the signature level. If +there is a match, the UUID of the security finding is overwritten +accordingly. The `StoreFindingsService` stores the re-calibrated findings in +the `security_findings` table. Detailed documentation about how +vulnerabilities are created, starting from the security report, is available +[here](security_report_ingestion_overview.md#vulnerability-creation-from-security-reports). + +Source Code References: + +- [StoreScansWorker](https://gitlab.com/gitlab-org/gitlab/-/blob/308529403c2d5ec0049b223cf444163bede4672e/ee/app/workers/security/store_scans_worker.rb#L19) +- [StoreScansService](https://gitlab.com/gitlab-org/gitlab/-/blob/308529403c2d5ec0049b223cf444163bede4672e/ee/app/services/security/store_scans_service.rb#L19) +- [StoreGroupedScansService](https://gitlab.com/gitlab-org/gitlab/-/blob/308529403c2d5ec0049b223cf444163bede4672e/ee/app/services/security/store_grouped_scans_service.rb#L60) +- [StoreScanService](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/services/security/store_scan_service.rb#L47) +- [OverrideUuidsService](https://gitlab.com/gitlab-org/gitlab/-/blob/1b2cc434e43b533c0b393b8c319797e69745498e/ee/app/services/security/override_uuids_service.rb) +- [StoreFindingsService](https://gitlab.com/gitlab-org/gitlab/-/blob/308529403c2d5ec0049b223cf444163bede4672e/ee/app/services/security/store_findings_service.rb) + +``` mermaid +sequenceDiagram + Producer->>Sidekiq: gl-sast-report.json + Sidekiq->>StoreScansWorker: <<start>> + StoreScansWorker->>StoreScansService: pipeline id + loop for all artifacts in "grouped" artifacts + StoreScansService->>StoreGroupedScansService: artifacts + + loop for every artifact in artifacts + StoreGroupedScansService->>StoreScanService: artifact + StoreScanService->>OverrideUuidsService: security-report + + StoreScanService->>StoreFindingsService: store findings + end + end +``` + +#### Scenario 2: Merge request security widget + +The second scenario relates to the merge request security widget. + +Source code references: + +- [MergeRequest](https://gitlab.com/gitlab-org/gitlab/-/blob/1172e63f2485b8f3690895a3798f067429d98732/app/models/merge_request.rb?page=2#L1975) +- [CompareSecurityReportsService](https://gitlab.com/gitlab-org/gitlab/-/blob/1172e63f2485b8f3690895a3798f067429d98732/ee/app/services/ci/compare_security_reports_service.rb#L10) +- [VulnerabilityReportsComparer](https://gitlab.com/gitlab-org/gitlab/-/blob/da6e2037cd494ac8b73bc3ee9e69009c4cdcf124/ee/lib/gitlab/ci/reports/security/vulnerability_reports_comparer.rb#L96) + +The `VulnerabilityReportsComparer` computes the number of newly added or fixed +findings. It first compares the security findings between default and +non-default branches to compute the number of added and fixed findings. This +component filters results by not re-displaying security findings that +correspond to vulnerability findings by [recalibrating the security finding UUIDs](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/gitlab/ci/reports/security/vulnerability_reports_comparer.rb#L70). +The logic implemented in the +[`UUIDOverrider`](https://gitlab.com/gitlab-org/gitlab/-/blob/1172e63f2485b8f3690895a3798f067429d98732/ee/lib/gitlab/ci/reports/security/vulnerability_reports_comparer.rb#L161) +is very similar to +[OverrideUuidsService](https://gitlab.com/gitlab-org/gitlab/-/blob/308529403c2d5ec0049b223cf444163bede4672e/ee/app/services/security/store_scan_service.rb#L47). + +``` mermaid +sequenceDiagram + MergeRequestModel->>CompareSecurityReportsService: compare_sast_reports + CompareSecurityReportsService->>VulnerabilityReportsComparer: calculate_changes +``` + +#### Scenario 3: Report ingestion + +This is the point where either a security finding becomes a vulnerability or the +vulnerability that corresponds to a security finding is updated. This scenario +becomes relevant when a pipeline triggered on the default branch upon merging a +non-default branch into the default branch. In our context, we are most +interested in those cases where we have security findings with +`overridden_uuid` set which implies that there was a clash with an already +existing vulnerability; `overridden_uuid` holds the UUID of the security +finding that was overridden by the corresponding vulnerability UUID. + +The sequence below is executed to update the UUID of a vulnerability +(fingerprint). The recomputation takes place in the +`UpdateVulnerabilityUuids`, ultimately invoking a database update by means of +[`UpdateVulnerabilityUuidsVulnerabilityFinding` class](https://gitlab.com/gitlab-org/gitlab/-/blob/1b2cc434e43b533c0b393b8c319797e69745498e/ee/app/services/security/ingestion/tasks/update_vulnerability_uuids/vulnerability_findings.rb). + +Source Code References: + +- [IngestReportsService](https://gitlab.com/gitlab-org/gitlab/-/blob/1b2cc434e43b533c0b393b8c319797e69745498e/ee/app/services/security/ingestion/ingest_reports_service.rb#L55) +- [IngestReportService](https://gitlab.com/gitlab-org/gitlab/-/blob/1b2cc434e43b533c0b393b8c319797e69745498e/ee/app/services/security/ingestion/ingest_report_service.rb#L41) +- [IngestReportSliceService](https://gitlab.com/gitlab-org/gitlab/-/blob/1b2cc434e43b533c0b393b8c319797e69745498e/ee/app/services/security/ingestion/ingest_report_slice_service.rb#L37) +- [UpdateVulnerabilityUuids](https://gitlab.com/gitlab-org/gitlab/-/blob/1b2cc434e43b533c0b393b8c319797e69745498e/ee/app/services/security/ingestion/tasks/update_vulnerability_uuids.rb#L67) +- [FindingMap](https://gitlab.com/gitlab-org/gitlab/-/blob/1b2cc434e43b533c0b393b8c319797e69745498e/ee/app/services/security/ingestion/finding_map.rb) + +``` mermaid +sequenceDiagram + IngestReportsService->>IngestReportService: security_scan + IngestReportService->>IngestReportSliceService: sliced security_scan + IngestReportSliceService->>UpdateVulnerabilityUuids: findings map +``` + +## Hierarchy: Why are algorithms prioritized and what is the impact of this prioritization? + +The supported algorithms are defined in [`VulnerabilityFindingSignatureHelpers`](https://gitlab.com/gitlab-org/gitlab/-/blob/1172e63f2485b8f3690895a3798f067429d98732/app/models/concerns/vulnerability_finding_signature_helpers.rb). Algorithms are assigned priorities (the integer values in the map below). A higher priority indicates that an algorithm is considered as better than a lower priority algorithm. In other words, going from a lower priority to a higher priority algorithms corresponds to `coarsening` (better deduplication performance) and going from a higher priority algorithm to a lower priority algorithm corresponds to a `refinement` (weaker deduplication performance). + +``` ruby + ALGORITHM_TYPES = { + hash: 1, + location: 2, + scope_offset: 3, + scope_offset_compressed: 4, + rule_value: 5 + }.with_indifferent_access.freeze +```