@@ -52,3 +52,7 @@ Note that we are still formulating proposals and will update the blueprint accor
## Best Practices
Best practices and guidelines for developing performant and scalable features using ClickHouse are located in the [ClickHouse developer documentation](../../../development/database/clickhouse/index.md).
## Cost and maintenance analysis
ClickHouse components cost and maintenance analysis is located in the [ClickHouse Self-Managed component costs and maintenance requirements](self_managed_costs_and_requirements/index.md).
# ClickHouse Self-Managed component costs and maintenance requirements
## Summary
[ClickHouse](https://clickhouse.com/) requires additional cost and maintenance for self-managed customers:
-**Resource allocation cost**: ClickHouse requires a considerable amount of resources to run optimally.
-[Minimum cost estimation](#minimum-self-managed-component-costs) shows that setting up ClickHouse can be applicable only for very large Reference Architectures: 25k and up.
-**High availability**: ClickHouse SaaS supports HA. No documented HA configuration for self-managed at the moment.
-**Geo setups**: Sync and replication complexity for GitLab Geo setups.
-**Upgrades**: An additional database to maintain and upgrade along with existing Postgres database. This also includes compatibility issues of mapping GitLab version to ClickHouse version and keeping them up-to-date.
-**Backup and restore:** Self-managed customers need to have an engineer who is familiar with backup strategies and disaster recovery process in ClickHouse or switch to ClickHouse SaaS.
-**Monitoring**: ClickHouse can use Prometheus, additional component to monitor and troubleshoot.
-**Limitations**: Azure object storage is not supported. GitLab does not have the documentation or support expertise to assist customers with deployment and operation of self-managed ClickHouse.
-**ClickHouse SaaS**: Customers using a self-managed GitLab instance with regulatory or compliance requirements, or latency concerns likely cannot use ClickHouse SaaS.
### Minimum self-managed component costs
Based on [ClickHouse spec requirements](https://gitlab.com/gitlab-com/www-gitlab-com/-/issues/14384#note_1307456092) analysis
and collaborating with ClickHouse team, we identified the following minimal configurations for ClickHouse self-managed:
1. ClickHouse High Availability (HA)
- ClickHouse - 2 machines with >=16-cores, >=64 GB RAM, SSD, 10 GB Internet. Each machine also runs Keeper.
-[Keeper](https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper) - 1 machine with 2 CPU, 4 GB of RAM, SSD with high IOPS
The following [cost table](https://gitlab.com/gitlab-com/www-gitlab-com/-/issues/14384#note_1324085466) was compiled using the machine CPU and memory requirements for ClickHouse, and comparing them to the
GitLab Reference Architecture sizes and [costs](../../../../administration/reference_architectures/index.md#cost-to-run) from the GCP calculator.
The ClickHouse Self-Managed component evaluation is the minimum estimation for the costs
with a simplified architecture.
The following components increase the cost, and were not considered in the minimum calculation:
- Disk size - depends on data size, hard to estimate.
- Disk types - ClickHouse recommends [fast SSDs](https://clickhouse.com/docs/ru/operations/tips#storage-subsystem).
- Network usage - ClickHouse recommends using [10 GB network, if possible](https://clickhouse.com/docs/en/operations/tips#network).
- For HA we sum minimum cost across all reference architectures from 3k to 50k users, but HA specs tend to increase with user count.
### Resources
-[Research and understand component costs and maintenance requirements of running a ClickHouse instance with GitLab](https://gitlab.com/gitlab-com/www-gitlab-com/-/issues/14384)
-[ClickHouse for Error Tracking on GitLab.com](https://gitlab.com/gitlab-com/gl-infra/readiness/-/blob/master/library/database/clickhouse/index.md)