From 4d1a85ce3853e70280b78c94dc1f59a2cddb04d4 Mon Sep 17 00:00:00 2001 From: Sebastian Rehm <srehm@gitlab.com> Date: Wed, 31 Jan 2024 08:26:21 +0000 Subject: [PATCH] Change sisense references for internal analytics to tableau --- doc/development/internal_analytics/index.md | 72 ++++++++++++++----- .../metrics/metrics_dictionary.md | 4 -- .../metrics/metrics_lifecycle.md | 4 +- .../internal_analytics/service_ping/index.md | 6 +- 4 files changed, 59 insertions(+), 27 deletions(-) diff --git a/doc/development/internal_analytics/index.md b/doc/development/internal_analytics/index.md index 13ecaddaf705f..a36180e703c7d 100644 --- a/doc/development/internal_analytics/index.md +++ b/doc/development/internal_analytics/index.md @@ -64,28 +64,48 @@ On our SaaS instance both individual events and pre-computed metrics are availab Additionally for SaaS page views are automatically instrumented. For self-managed only the metrics instrumented on the version installed on the instance are available. +### Events + +Events are collected in real-time but processed in an asynchronous manner. +In general events are available in the data warehouse at the latest 48 hours after being fired but can already be available earlier. + +### Metrics + +Metrics are being computed and sent once per week for every instance. On GitLab.com this happens on Sunday and newest values become available throughout Monday. +On self-managed this depends on the particular instance. In general, only the metrics instrumented for the installed GitLab version will be sent. + ## Data discovery -The data visualization tools [Sisense](https://about.gitlab.com/handbook/business-technology/data-team/platform/sisensecdt/) and [Tableau](https://about.gitlab.com/handbook/business-technology/data-team/platform/tableau/), -which have access to our Data Warehouse, can be used to query the internal analytics data. +Event and metrics data is ultimately stored in our [Snowflake data warehouse](https://handbook.gitlab.com/handbook/business-technology/data-team/platform/snowflake/). +It can either be accessed directly via SQL in Snowflake for [ad-hoc analyses](https://handbook.gitlab.com/handbook/business-technology/data-team/platform/#snowflake-analyst) or visualized in our data visualization tool +[Tableau](https://about.gitlab.com/handbook/business-technology/data-team/platform/tableau/), which has access to Snowflake. +Both platforms need an access request ([Snowflake](https://handbook.gitlab.com/handbook/business-technology/data-team/platform/#warehouse-access), [Tableau](https://handbook.gitlab.com/handbook/business-technology/data-team/platform/tableau/#tableau-online-access)). -### Querying metrics +### Tableau -The following example query returns all values reported for `count_distinct_user_id_from_feature_used_7d` within the last six months and the according `instance_id`: +Tableau is a data visualization platform and allows building dashboards and GUI based discovery of events and metrics. +This method of discovery is most suited for users who are familiar with business intelligence tooling, basic verifications +and for creating persisted, shareable dashboards and visualizations. +Access to Tableau requires an [access request](https://handbook.gitlab.com/handbook/business-technology/data-team/platform/tableau/#tableau-online-access). -```sql -SELECT - date_trunc('week', ping_created_at), - dim_instance_id, - metric_value -FROM common.fct_ping_instance_metric_rolling_6_months --model limited to last 6 months for performance -WHERE metrics_path = 'counts.users_visiting_dashboard_weekly' --set to metric of interest -ORDER BY ping_created_at DESC -``` +#### Checking events -For a list of other metrics tables refer to the [Data Models Cheat Sheet](https://handbook.gitlab.com/handbook/product/product-analysis/data-model-cheat-sheet/#commonly-used-data-models). +Visit the [Snowplow event exploration dashboard](https://10az.online.tableau.com/#/site/gitlab/views/SnowplowEventExplorationLast30Days/SnowplowEventExplorationLast30D?:iid=1). +This dashboard shows you event counts as well as the most fired events. +You can scroll down to the "Structured Events Firing in Production Last 30 Days" chart and filter for your specific event action. The filter only works with exact names. + +#### Checking metrics + +You can visit the [Metrics exploration dashboard](https://10az.online.tableau.com/#/site/gitlab/views/PDServicePingExplorationDashboard/MetricsExploration). +On the side there is a filter for metric path which is the `key_path` of your metric and a filter for the installation ID including guidance on how to filter for GitLab.com. -### Querying events +### Snowflake + +Snowflake allows direct querying of relevant tables in the warehouse within their UI with the [Snowflake SQL dialect](https://docs.snowflake.com/en/sql-reference-commands). +This method of discovery is most suited to users who are familiar with SQL and for quick and flexible checks whether data is correctly propagated. +Access to Snowflake requires an [access request](https://handbook.gitlab.com/handbook/business-technology/data-team/platform/#warehouse-access). + +#### Querying events The following example query returns the number of daily event occurrences for the `feature_used` event. @@ -100,7 +120,23 @@ AND app_id='gitlab' -- use gitlab for production events and gitlab-staging for e GROUP BY 1 ORDER BY 1 desc ``` -For a list of other event tables refer to the [Data Models Cheat Sheet](https://handbook.gitlab.com/handbook/product/product-analysis/data-model-cheat-sheet/#commonly-used-data-models-2). +For a list of other metrics tables refer to the [Data Models Cheat Sheet](https://handbook.gitlab.com/handbook/product/product-analysis/data-model-cheat-sheet/#commonly-used-data-models). + +#### Querying metrics + +The following example query returns all values reported for `count_distinct_user_id_from_feature_used_7d` within the last six months and the according `instance_id`: + +```sql +SELECT + date_trunc('week', ping_created_at), + dim_instance_id, + metric_value +FROM common.fct_ping_instance_metric_rolling_6_months --model limited to last 6 months for performance +WHERE metrics_path = 'counts.users_visiting_dashboard_weekly' --set to metric of interest +ORDER BY ping_created_at DESC +``` + +For a list of other metrics tables refer to the [Data Models Cheat Sheet](https://about.gitlab.com/handbook/product/product-analysis/data-model-cheat-sheet/#commonly-used-data-models). ## Data flow @@ -131,8 +167,8 @@ flowchart LR; end end snowplow[\Snowplow Pipeline\] - snowflake[(Data Warehouse)] - vis[Dashboards in Sisense/Tableau] + snowflake[(Snowflake Data Warehouse)] + vis[Dashboards in Tableau] ``` ## Data Privacy diff --git a/doc/development/internal_analytics/metrics/metrics_dictionary.md b/doc/development/internal_analytics/metrics/metrics_dictionary.md index c88479b60e188..a6db13f5b88a5 100644 --- a/doc/development/internal_analytics/metrics/metrics_dictionary.md +++ b/doc/development/internal_analytics/metrics/metrics_dictionary.md @@ -162,7 +162,3 @@ To use a metric definition to manage [performance indicator](https://about.gitla [Metrics Dictionary is a separate application](https://gitlab.com/gitlab-org/analytics-section/analytics-instrumentation/metric-dictionary). All metrics available in Service Ping are in the [Metrics Dictionary](https://metrics.gitlab.com/). - -### Copy query to clipboard - -To check if a metric has data in Sisense, use the copy query to clipboard feature. This copies a query that's ready to use in Sisense. The query gets the last five service ping data for GitLab.com for a given metric. For information about how to check if a Service Ping metric has data in Sisense, see this [demo](https://www.youtube.com/watch?v=n4o65ivta48). diff --git a/doc/development/internal_analytics/metrics/metrics_lifecycle.md b/doc/development/internal_analytics/metrics/metrics_lifecycle.md index 681992b43797f..4402013b784e4 100644 --- a/doc/development/internal_analytics/metrics/metrics_lifecycle.md +++ b/doc/development/internal_analytics/metrics/metrics_lifecycle.md @@ -33,12 +33,12 @@ Currently, the [Metrics Dictionary](https://metrics.gitlab.com/) is built automa ## Remove a metric WARNING: -If a metric is not used in Sisense or any other system after 6 months, the +If a metric is not used in Tableau or any other system after 6 months, the Analytics Instrumentation team marks it as inactive and assigns it to the group owner for review. We are working on automating this process. See [this epic](https://gitlab.com/groups/gitlab-org/-/epics/8988) for details. -Analytics Instrumentation removes metrics from Service Ping if they are not used in any Sisense dashboard. +Analytics Instrumentation removes metrics from Service Ping if they are not used in any Tableau dashboard. For an example of the metric removal process, see this [example issue](https://gitlab.com/gitlab-org/gitlab/-/issues/388236). diff --git a/doc/development/internal_analytics/service_ping/index.md b/doc/development/internal_analytics/service_ping/index.md index eb0e384b10df1..5ae7594dbea8e 100644 --- a/doc/development/internal_analytics/service_ping/index.md +++ b/doc/development/internal_analytics/service_ping/index.md @@ -43,7 +43,7 @@ We use the following terminology to describe the Service Ping components: ## Service Ping request flow -The following example shows a basic request/response flow between a GitLab instance, the Versions Application, the License Application, Salesforce, the GitLab S3 Bucket, the GitLab Snowflake Data Warehouse, and Sisense: +The following example shows a basic request/response flow between a GitLab instance, the Versions Application, the License Application, Salesforce, the GitLab S3 Bucket, the GitLab Snowflake Data Warehouse, and Tableau: ```mermaid sequenceDiagram @@ -53,7 +53,7 @@ sequenceDiagram participant Salesforce participant S3 Bucket participant Snowflake DW - participant Sisense Dashboards + participant Tableau Dashboards GitLab Instance->>Versions Application: Send Service Ping loop Process usage data Versions Application->>Versions Application: Parse usage data @@ -70,7 +70,7 @@ sequenceDiagram Versions Application->>S3 Bucket: Export Versions database S3 Bucket->>Snowflake DW: Import data Snowflake DW->>Snowflake DW: Transform data using dbt - Snowflake DW->>Sisense Dashboards: Data available for querying + Snowflake DW->>Tableau Dashboards: Data available for querying Versions Application->>GitLab Instance: DevOps Score (Conversational Development Index) ``` -- GitLab