From 636dd736714880591fe7c54a79dff975780ab7c8 Mon Sep 17 00:00:00 2001
From: Amy Qualls <aqualls@gitlab.com>
Date: Thu, 4 Aug 2022 11:21:15 +0000
Subject: [PATCH] Move order-table-columns page to database dir

This file belongs in doc/development/database/ with its friends.
---
 doc/development/database/index.md             |   2 +-
 .../database/ordering_table_columns.md        | 152 +++++++++++++++++
 doc/development/database_review.md            |   6 +-
 doc/development/ordering_table_columns.md     | 155 +-----------------
 4 files changed, 163 insertions(+), 152 deletions(-)
 create mode 100644 doc/development/database/ordering_table_columns.md

diff --git a/doc/development/database/index.md b/doc/development/database/index.md
index b427f54ff3cb9..0183ccde98581 100644
--- a/doc/development/database/index.md
+++ b/doc/development/database/index.md
@@ -53,7 +53,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
 - [Storing SHA1 hashes as binary](../sha1_as_binary.md)
 - [Iterating tables in batches](../iterating_tables_in_batches.md)
 - [Insert into tables in batches](../insert_into_tables_in_batches.md)
-- [Ordering table columns](../ordering_table_columns.md)
+- [Ordering table columns](ordering_table_columns.md)
 - [Verifying database capabilities](../verifying_database_capabilities.md)
 - [Database Debugging and Troubleshooting](../database_debugging.md)
 - [Query Count Limits](../query_count_limits.md)
diff --git a/doc/development/database/ordering_table_columns.md b/doc/development/database/ordering_table_columns.md
new file mode 100644
index 0000000000000..7cd3d4fb208d4
--- /dev/null
+++ b/doc/development/database/ordering_table_columns.md
@@ -0,0 +1,152 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Ordering Table Columns in PostgreSQL
+
+For GitLab we require that columns of new tables are ordered to use the
+least amount of space. An easy way of doing this is to order them based on the
+type size in descending order with variable sizes (`text`, `varchar`, arrays,
+`json`, `jsonb`, and so on) at the end.
+
+Similar to C structures the space of a table is influenced by the order of
+columns. This is because the size of columns is aligned depending on the type of
+the following column. Let's consider an example:
+
+- `id` (integer, 4 bytes)
+- `name` (text, variable)
+- `user_id` (integer, 4 bytes)
+
+The first column is a 4-byte integer. The next is text of variable length. The
+`text` data type requires 1-word alignment, and on 64-bit platform, 1 word is 8
+bytes. To meet the alignment requirements, four zeros are to be added right
+after the first column, so `id` occupies 4 bytes, then 4 bytes of alignment
+padding, and only next `name` is being stored. Therefore, in this case, 8 bytes
+are spent for storing a 4-byte integer.
+
+The space between rows is also subject to alignment padding. The `user_id`
+column takes only 4 bytes, and on 64-bit platform, 4 zeroes are added for
+alignment padding, to allow storing the next row beginning with the "clear" word.
+
+As a result, the actual size of each column would be (omitting variable length
+data and 24-byte tuple header): 8 bytes, variable, 8 bytes. This means that
+each row requires at least 16 bytes for the two 4-byte integers. If a table
+has a few rows this is not an issue. However, once you start storing millions of
+rows you can save space by using a different order. For the above example, the
+ideal column order would be the following:
+
+- `id` (integer, 4 bytes)
+- `user_id` (integer, 4 bytes)
+- `name` (text, variable)
+
+or
+
+- `name` (text, variable)
+- `id` (integer, 4 bytes)
+- `user_id` (integer, 4 bytes)
+
+In these examples, the `id` and `user_id` columns are packed together, which
+means we only need 8 bytes to store _both_ of them. This in turn means each row
+requires 8 bytes less space.
+
+Since Ruby on Rails 5.1, the default data type for IDs is `bigint`, which uses 8 bytes.
+We are using `integer` in the examples to showcase a more realistic reordering scenario.
+
+## Type Sizes
+
+While the [PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype.html) contains plenty
+of information we list the sizes of common types here so it's easier to
+look them up. Here "word" refers to the word size, which is 4 bytes for a 32
+bits platform and 8 bytes for a 64 bits platform.
+
+| Type             | Size                                 | Alignment needed |
+|:-----------------|:-------------------------------------|:-----------|
+| `smallint`         | 2 bytes                              | 1 word     |
+| `integer`          | 4 bytes                              | 1 word     |
+| `bigint`           | 8 bytes                              | 8 bytes    |
+| `real`             | 4 bytes                              | 1 word     |
+| `double precision` | 8 bytes                              | 8 bytes    |
+| `boolean`          | 1 byte                               | not needed |
+| `text` / `string`  | variable, 1 byte plus the data       | 1 word     |
+| `bytea`            | variable, 1 or 4 bytes plus the data | 1 word     |
+| `timestamp`        | 8 bytes                              | 8 bytes    |
+| `timestamptz`      | 8 bytes                              | 8 bytes    |
+| `date`             | 4 bytes                              | 1 word     |
+
+A "variable" size means the actual size depends on the value being stored. If
+PostgreSQL determines this can be embedded directly into a row it may do so, but
+for very large values it stores the data externally and store a pointer (of
+1 word in size) in the column. Because of this variable sized columns should
+always be at the end of a table.
+
+## Real Example
+
+Let's use the `events` table as an example, which currently has the following
+layout:
+
+| Column        | Type                        | Size     |
+|:--------------|:----------------------------|:---------|
+| `id`          | integer                     | 4 bytes  |
+| `target_type` | character varying           | variable |
+| `target_id`   | integer                     | 4 bytes  |
+| `title`       | character varying           | variable |
+| `data`        | text                        | variable |
+| `project_id`  | integer                     | 4 bytes  |
+| `created_at`  | timestamp without time zone | 8 bytes  |
+| `updated_at`  | timestamp without time zone | 8 bytes  |
+| `action`      | integer                     | 4 bytes  |
+| `author_id`   | integer                     | 4 bytes  |
+
+After adding padding to align the columns this would translate to columns being
+divided into fixed size chunks as follows:
+
+| Chunk Size | Columns               |
+|:-----------|:----------------------|
+| 8 bytes    | `id`                  |
+| variable   | `target_type`         |
+| 8 bytes    | `target_id`           |
+| variable   | `title`               |
+| variable   | `data`                |
+| 8 bytes    | `project_id`          |
+| 8 bytes    | `created_at`          |
+| 8 bytes    | `updated_at`          |
+| 8 bytes    | `action`, `author_id` |
+
+This means that excluding the variable sized data and tuple header, we need at
+least 8 * 6 = 48 bytes per row.
+
+We can optimise this by using the following column order instead:
+
+| Column        | Type                        | Size     |
+|:--------------|:----------------------------|:---------|
+| `created_at`  | timestamp without time zone | 8 bytes  |
+| `updated_at`  | timestamp without time zone | 8 bytes  |
+| `id`          | integer                     | 4 bytes  |
+| `target_id`   | integer                     | 4 bytes  |
+| `project_id`  | integer                     | 4 bytes  |
+| `action`      | integer                     | 4 bytes  |
+| `author_id`   | integer                     | 4 bytes  |
+| `target_type` | character varying           | variable |
+| `title`       | character varying           | variable |
+| `data`        | text                        | variable |
+
+This would produce the following chunks:
+
+| Chunk Size | Columns                |
+|:-----------|:-----------------------|
+| 8 bytes    | `created_at`           |
+| 8 bytes    | `updated_at`           |
+| 8 bytes    | `id`, `target_id`      |
+| 8 bytes    | `project_id`, `action` |
+| 8 bytes    | `author_id`            |
+| variable   | `target_type`          |
+| variable   | `title`                |
+| variable   | `data`                 |
+
+Here we only need 40 bytes per row excluding the variable sized data and 24-byte
+tuple header. 8 bytes being saved may not sound like much, but for tables as
+large as the `events` table it does begin to matter. For example, when storing
+80 000 000 rows this translates to a space saving of at least 610 MB, all by
+just changing the order of a few columns.
diff --git a/doc/development/database_review.md b/doc/development/database_review.md
index 37643f04db7ab..df589e849a3b3 100644
--- a/doc/development/database_review.md
+++ b/doc/development/database_review.md
@@ -200,7 +200,7 @@ Include in the MR description:
 
 #### Preparation when adding tables
 
-- Order columns based on the [Ordering Table Columns](ordering_table_columns.md) guidelines.
+- Order columns based on the [Ordering Table Columns](database/ordering_table_columns.md) guidelines.
 - Add foreign keys to any columns pointing to data in other tables, including [an index](migration_style_guide.md#adding-foreign-key-constraints).
 - Add indexes for fields that are used in statements such as `WHERE`, `ORDER BY`, `GROUP BY`, and `JOIN`s.
 - New tables and columns are not necessarily risky, but over time some access patterns are inherently
@@ -225,7 +225,7 @@ Include in the MR description:
     - Consider [access patterns and data layout](database/layout_and_access_patterns.md) if new tables or columns are added.
   - Review migrations follow [database migration style guide](migration_style_guide.md),
     for example
-    - [Check ordering of columns](ordering_table_columns.md)
+    - [Check ordering of columns](database/ordering_table_columns.md)
     - [Check indexes are present for foreign keys](migration_style_guide.md#adding-foreign-key-constraints)
   - Ensure that migrations execute in a transaction or only contain
     concurrent index/foreign key helpers (with transactions disabled)
@@ -256,7 +256,7 @@ Include in the MR description:
 - Check migrations are reversible and implement a `#down` method
 - Check new table migrations:
   - Are the stated access patterns and volume reasonable? Do the assumptions they're based on seem sound? Do these patterns pose risks to stability?
-  - Are the columns [ordered to conserve space](ordering_table_columns.md)?
+  - Are the columns [ordered to conserve space](database/ordering_table_columns.md)?
   - Are there foreign keys for references to other tables?
 - Check data migrations:
   - Establish a time estimate for execution on GitLab.com.
diff --git a/doc/development/ordering_table_columns.md b/doc/development/ordering_table_columns.md
index 7cd3d4fb208d4..b665cb0d4c79c 100644
--- a/doc/development/ordering_table_columns.md
+++ b/doc/development/ordering_table_columns.md
@@ -1,152 +1,11 @@
 ---
-stage: Data Stores
-group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/ordering_table_columns.md'
+remove_date: '2022-11-04'
 ---
 
-# Ordering Table Columns in PostgreSQL
+This document was moved to [another location](database/ordering_table_columns.md).
 
-For GitLab we require that columns of new tables are ordered to use the
-least amount of space. An easy way of doing this is to order them based on the
-type size in descending order with variable sizes (`text`, `varchar`, arrays,
-`json`, `jsonb`, and so on) at the end.
-
-Similar to C structures the space of a table is influenced by the order of
-columns. This is because the size of columns is aligned depending on the type of
-the following column. Let's consider an example:
-
-- `id` (integer, 4 bytes)
-- `name` (text, variable)
-- `user_id` (integer, 4 bytes)
-
-The first column is a 4-byte integer. The next is text of variable length. The
-`text` data type requires 1-word alignment, and on 64-bit platform, 1 word is 8
-bytes. To meet the alignment requirements, four zeros are to be added right
-after the first column, so `id` occupies 4 bytes, then 4 bytes of alignment
-padding, and only next `name` is being stored. Therefore, in this case, 8 bytes
-are spent for storing a 4-byte integer.
-
-The space between rows is also subject to alignment padding. The `user_id`
-column takes only 4 bytes, and on 64-bit platform, 4 zeroes are added for
-alignment padding, to allow storing the next row beginning with the "clear" word.
-
-As a result, the actual size of each column would be (omitting variable length
-data and 24-byte tuple header): 8 bytes, variable, 8 bytes. This means that
-each row requires at least 16 bytes for the two 4-byte integers. If a table
-has a few rows this is not an issue. However, once you start storing millions of
-rows you can save space by using a different order. For the above example, the
-ideal column order would be the following:
-
-- `id` (integer, 4 bytes)
-- `user_id` (integer, 4 bytes)
-- `name` (text, variable)
-
-or
-
-- `name` (text, variable)
-- `id` (integer, 4 bytes)
-- `user_id` (integer, 4 bytes)
-
-In these examples, the `id` and `user_id` columns are packed together, which
-means we only need 8 bytes to store _both_ of them. This in turn means each row
-requires 8 bytes less space.
-
-Since Ruby on Rails 5.1, the default data type for IDs is `bigint`, which uses 8 bytes.
-We are using `integer` in the examples to showcase a more realistic reordering scenario.
-
-## Type Sizes
-
-While the [PostgreSQL documentation](https://www.postgresql.org/docs/current/datatype.html) contains plenty
-of information we list the sizes of common types here so it's easier to
-look them up. Here "word" refers to the word size, which is 4 bytes for a 32
-bits platform and 8 bytes for a 64 bits platform.
-
-| Type             | Size                                 | Alignment needed |
-|:-----------------|:-------------------------------------|:-----------|
-| `smallint`         | 2 bytes                              | 1 word     |
-| `integer`          | 4 bytes                              | 1 word     |
-| `bigint`           | 8 bytes                              | 8 bytes    |
-| `real`             | 4 bytes                              | 1 word     |
-| `double precision` | 8 bytes                              | 8 bytes    |
-| `boolean`          | 1 byte                               | not needed |
-| `text` / `string`  | variable, 1 byte plus the data       | 1 word     |
-| `bytea`            | variable, 1 or 4 bytes plus the data | 1 word     |
-| `timestamp`        | 8 bytes                              | 8 bytes    |
-| `timestamptz`      | 8 bytes                              | 8 bytes    |
-| `date`             | 4 bytes                              | 1 word     |
-
-A "variable" size means the actual size depends on the value being stored. If
-PostgreSQL determines this can be embedded directly into a row it may do so, but
-for very large values it stores the data externally and store a pointer (of
-1 word in size) in the column. Because of this variable sized columns should
-always be at the end of a table.
-
-## Real Example
-
-Let's use the `events` table as an example, which currently has the following
-layout:
-
-| Column        | Type                        | Size     |
-|:--------------|:----------------------------|:---------|
-| `id`          | integer                     | 4 bytes  |
-| `target_type` | character varying           | variable |
-| `target_id`   | integer                     | 4 bytes  |
-| `title`       | character varying           | variable |
-| `data`        | text                        | variable |
-| `project_id`  | integer                     | 4 bytes  |
-| `created_at`  | timestamp without time zone | 8 bytes  |
-| `updated_at`  | timestamp without time zone | 8 bytes  |
-| `action`      | integer                     | 4 bytes  |
-| `author_id`   | integer                     | 4 bytes  |
-
-After adding padding to align the columns this would translate to columns being
-divided into fixed size chunks as follows:
-
-| Chunk Size | Columns               |
-|:-----------|:----------------------|
-| 8 bytes    | `id`                  |
-| variable   | `target_type`         |
-| 8 bytes    | `target_id`           |
-| variable   | `title`               |
-| variable   | `data`                |
-| 8 bytes    | `project_id`          |
-| 8 bytes    | `created_at`          |
-| 8 bytes    | `updated_at`          |
-| 8 bytes    | `action`, `author_id` |
-
-This means that excluding the variable sized data and tuple header, we need at
-least 8 * 6 = 48 bytes per row.
-
-We can optimise this by using the following column order instead:
-
-| Column        | Type                        | Size     |
-|:--------------|:----------------------------|:---------|
-| `created_at`  | timestamp without time zone | 8 bytes  |
-| `updated_at`  | timestamp without time zone | 8 bytes  |
-| `id`          | integer                     | 4 bytes  |
-| `target_id`   | integer                     | 4 bytes  |
-| `project_id`  | integer                     | 4 bytes  |
-| `action`      | integer                     | 4 bytes  |
-| `author_id`   | integer                     | 4 bytes  |
-| `target_type` | character varying           | variable |
-| `title`       | character varying           | variable |
-| `data`        | text                        | variable |
-
-This would produce the following chunks:
-
-| Chunk Size | Columns                |
-|:-----------|:-----------------------|
-| 8 bytes    | `created_at`           |
-| 8 bytes    | `updated_at`           |
-| 8 bytes    | `id`, `target_id`      |
-| 8 bytes    | `project_id`, `action` |
-| 8 bytes    | `author_id`            |
-| variable   | `target_type`          |
-| variable   | `title`                |
-| variable   | `data`                 |
-
-Here we only need 40 bytes per row excluding the variable sized data and 24-byte
-tuple header. 8 bytes being saved may not sound like much, but for tables as
-large as the `events` table it does begin to matter. For example, when storing
-80 000 000 rows this translates to a space saving of at least 610 MB, all by
-just changing the order of a few columns.
+<!-- This redirect file can be deleted after <2022-11-04>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->
-- 
GitLab