From 20f46a4f7bf0d82fe39b1ab960241d4bfef77830 Mon Sep 17 00:00:00 2001
From: Sebastian Rehm <srehm@gitlab.com>
Date: Thu, 5 Oct 2023 15:49:23 +0000
Subject: [PATCH] Add internal analytics fundamentals docs to index page

---
 doc/development/internal_analytics/index.md | 79 ++++++++++++++++++++-
 1 file changed, 76 insertions(+), 3 deletions(-)

diff --git a/doc/development/internal_analytics/index.md b/doc/development/internal_analytics/index.md
index d24ecf5a99cce..a3e065d775fdc 100644
--- a/doc/development/internal_analytics/index.md
+++ b/doc/development/internal_analytics/index.md
@@ -6,7 +6,80 @@ info: To determine the technical writer assigned to the Stage/Group associated w
 
 # Internal analytics
 
-Learn how to instrument your features on GitLab using:
+The internal analytics system provides the ability to track user behavior and system status for a GitLab instance
+to inform customer success services and further product development.
 
-- [Service Ping](service_ping/index.md)
-- [Snowplow](snowplow/index.md)
+These doc pages provide guides and information on how to leverage internal analytics capabilities of GitLab
+when developing new features or instrumenting existing ones.
+
+## Fundamental concepts
+
+Events and metrics are the foundation of the internal analytics system.
+Understanding the difference between the two concepts is vital to using the system.
+
+### Event
+
+An event is a record of an action that happened within the GitLab instance.
+An example action would be a user interaction like visiting the issue page or hovering the mouse cursor over the top navigation search.
+Other actions can result from background system processing like scheduled pipeline succeeding or receiving API calls from 3rd party system.
+Not every action is tracked and thereby turned into a recorded event automatically.
+Instead, if an action helps draw out product insights and helps to make more educated business decisions, we can track an event when the action happens.
+The produced event record, at the minimum, holds information that the action occurred,
+but it can also contain additional details about the context that accompanied this action.
+An example of context can be information about who performed the action or the state of the system at the time of the action.
+
+### Metric
+
+A single event record is not informative enough and might be caused by a coincidence.
+We need to look for sets of events sharing common traits to have a foundation for analysis.
+This is where metrics come into play. A metric is a calculation performed on pieces of information.
+For example, a single event documenting a paid user visiting the feature's page after a new feature was released tells us nothing about the success of this new feature.
+However, if we count the number of page view events happening in the week before the new feature release
+and then compare it with the number of events for the week following the feature release,
+we can derive insights about the increase in interest due to the release of the new feature.
+
+This process leads to what we call a metric. An event-based metric always looks at counts them for a specified time frame, like a week.
+The same event can be used across different metrics and a metric can count either one or multiple events.
+The count can but does not have to be based on a uniqueness criterion, such as only counting distinct users who performed an event.
+
+Metrics do not have to be based on events. Metrics can also be observations about the state of a GitLab instance itself,
+such as the value of a setting or the count of rows in a database table.
+
+## Data flow
+
+For GitLab there is an essential difference in analytics setup between SaaS and self-managed or GitLab Dedicated instances.
+On SaaS event records are directly sent to a collection system, called Snowplow, and imported into our data warehouse.
+Self-managed and GitLab Dedicated instances record event counts locally. Every week, a process called Service Ping sends the current
+values for all pre-defined and active metrics to our data warehouse. For GitLab.com, metrics are calculated directly in the data warehouse.
+
+The following chart aims to illustrate this data flow:
+
+```mermaid
+flowchart LR;
+    feature-->track
+    track-->|send event record - only on gitlab.com|snowplow
+    track-->|increase metric counts|redis
+    database-->service_ping
+    redis-->service_ping
+    service_ping-->|json with metric values - weekly export|snowflake
+    snowplow-->|event records - continuous import|snowflake
+    snowflake-->vis
+    
+    subgraph glb[Gitlab Application]
+        feature[Feature Code]
+        subgraph events[Internal Analytics Code]
+            track[track_event / trackEvent]
+            redis[(Redis)]
+            database[(Database)]
+            service_ping[\Service Ping Process\]
+        end
+    end
+    snowplow[\Snowplow Pipeline\]
+    snowflake[(Data Warehouse)]
+    vis[Dashboards in Sisense/Tableau]
+```
+
+## Instrumentation
+
+- To instrument an event-based metric, please look into the [internal event tracking quick start guide](internal_event_instrumentation/quick_start.md).
+- To instrument a metric that observes the GitLab instances state, please start with [the service ping implementation](service_ping/implement.md).
-- 
GitLab