From 20f46a4f7bf0d82fe39b1ab960241d4bfef77830 Mon Sep 17 00:00:00 2001 From: Sebastian Rehm <srehm@gitlab.com> Date: Thu, 5 Oct 2023 15:49:23 +0000 Subject: [PATCH] Add internal analytics fundamentals docs to index page --- doc/development/internal_analytics/index.md | 79 ++++++++++++++++++++- 1 file changed, 76 insertions(+), 3 deletions(-) diff --git a/doc/development/internal_analytics/index.md b/doc/development/internal_analytics/index.md index d24ecf5a99cce..a3e065d775fdc 100644 --- a/doc/development/internal_analytics/index.md +++ b/doc/development/internal_analytics/index.md @@ -6,7 +6,80 @@ info: To determine the technical writer assigned to the Stage/Group associated w # Internal analytics -Learn how to instrument your features on GitLab using: +The internal analytics system provides the ability to track user behavior and system status for a GitLab instance +to inform customer success services and further product development. -- [Service Ping](service_ping/index.md) -- [Snowplow](snowplow/index.md) +These doc pages provide guides and information on how to leverage internal analytics capabilities of GitLab +when developing new features or instrumenting existing ones. + +## Fundamental concepts + +Events and metrics are the foundation of the internal analytics system. +Understanding the difference between the two concepts is vital to using the system. + +### Event + +An event is a record of an action that happened within the GitLab instance. +An example action would be a user interaction like visiting the issue page or hovering the mouse cursor over the top navigation search. +Other actions can result from background system processing like scheduled pipeline succeeding or receiving API calls from 3rd party system. +Not every action is tracked and thereby turned into a recorded event automatically. +Instead, if an action helps draw out product insights and helps to make more educated business decisions, we can track an event when the action happens. +The produced event record, at the minimum, holds information that the action occurred, +but it can also contain additional details about the context that accompanied this action. +An example of context can be information about who performed the action or the state of the system at the time of the action. + +### Metric + +A single event record is not informative enough and might be caused by a coincidence. +We need to look for sets of events sharing common traits to have a foundation for analysis. +This is where metrics come into play. A metric is a calculation performed on pieces of information. +For example, a single event documenting a paid user visiting the feature's page after a new feature was released tells us nothing about the success of this new feature. +However, if we count the number of page view events happening in the week before the new feature release +and then compare it with the number of events for the week following the feature release, +we can derive insights about the increase in interest due to the release of the new feature. + +This process leads to what we call a metric. An event-based metric always looks at counts them for a specified time frame, like a week. +The same event can be used across different metrics and a metric can count either one or multiple events. +The count can but does not have to be based on a uniqueness criterion, such as only counting distinct users who performed an event. + +Metrics do not have to be based on events. Metrics can also be observations about the state of a GitLab instance itself, +such as the value of a setting or the count of rows in a database table. + +## Data flow + +For GitLab there is an essential difference in analytics setup between SaaS and self-managed or GitLab Dedicated instances. +On SaaS event records are directly sent to a collection system, called Snowplow, and imported into our data warehouse. +Self-managed and GitLab Dedicated instances record event counts locally. Every week, a process called Service Ping sends the current +values for all pre-defined and active metrics to our data warehouse. For GitLab.com, metrics are calculated directly in the data warehouse. + +The following chart aims to illustrate this data flow: + +```mermaid +flowchart LR; + feature-->track + track-->|send event record - only on gitlab.com|snowplow + track-->|increase metric counts|redis + database-->service_ping + redis-->service_ping + service_ping-->|json with metric values - weekly export|snowflake + snowplow-->|event records - continuous import|snowflake + snowflake-->vis + + subgraph glb[Gitlab Application] + feature[Feature Code] + subgraph events[Internal Analytics Code] + track[track_event / trackEvent] + redis[(Redis)] + database[(Database)] + service_ping[\Service Ping Process\] + end + end + snowplow[\Snowplow Pipeline\] + snowflake[(Data Warehouse)] + vis[Dashboards in Sisense/Tableau] +``` + +## Instrumentation + +- To instrument an event-based metric, please look into the [internal event tracking quick start guide](internal_event_instrumentation/quick_start.md). +- To instrument a metric that observes the GitLab instances state, please start with [the service ping implementation](service_ping/implement.md). -- GitLab