Skip to content
代码片段 群组 项目
未验证 提交 944f0bb2 编辑于 作者: Ian Baum's avatar Ian Baum 提交者: GitLab
浏览文件

Adding documentation for geo failover with secondary runners

* Adds steps for what may need to be done in order to avoid any
  potential load issues upon failover
上级 00afb5aa
No related branches found
No related tags found
无相关合并请求
...@@ -186,6 +186,10 @@ On the **primary** site: ...@@ -186,6 +186,10 @@ On the **primary** site:
takes to finish syncing. takes to finish syncing.
1. Select **Add broadcast message**. 1. Select **Add broadcast message**.
### Runner failover
If you have any runners connected to your current secondary, see [how to handle them](../secondary_proxy/runners.md#handling-a-planned-failover-with-secondary-runners) during the failover.
## Prevent updates to the **primary** site ## Prevent updates to the **primary** site
To ensure that all data is replicated to a secondary site, updates (write requests) need to To ensure that all data is replicated to a secondary site, updates (write requests) need to
......
...@@ -43,3 +43,44 @@ Using separate secondary URLs, the runners should be: ...@@ -43,3 +43,44 @@ Using separate secondary URLs, the runners should be:
1. Registered with the secondary external URL. 1. Registered with the secondary external URL.
1. Configured with [`clone_url`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#how-clone_url-works) set to the `external_url` of the secondary instance. 1. Configured with [`clone_url`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#how-clone_url-works) set to the `external_url` of the secondary instance.
## Handling a Planned Failover with secondary runners
When executing [a planned failover](../disaster_recovery/planned_failover.md), secondary runners try to keep talking to their local instance. This leads to decreased runner capacity, and may need to be accounted for.
### With Location Aware public URL
When using the [Location Aware public URL](location_aware_external_url.md), all runners automatically connect to the closest Geo site.
When failing over to a new primary:
- While the old primary is still in the DNS record, any runners previously connected to your old primary still attempt to pick up jobs from the old primary. If it is unreachable, the runners [detect this](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#how-unhealthy_requests_limit-and-unhealthy_interval-works), and stop requesting for an extended period of time after the instance returns.
- If you have [multiple secondary nodes](../disaster_recovery/index.md#promoting-secondary-geo-replica-in-multi-secondary-configurations), after the initial failover the remaining secondaries are in an unhealthy state until they are [replicated](../disaster_recovery/index.md#step-2-initiate-the-replication-process) with the new primary. The runners attached to them are then unable to check in, and their health check also kicks in.
- If you remove any of the unhealthy nodes from the Geo DNS entry, the runners pick the next closest instance. Depending on your architecture, this may not be what you want, as you could overwhelm your site in its reduced state.
To alleviate any of these issues, you can [pause](#pausing-runners) or shutdown some of the runners until the site is back up to 100%.
If you are not concerned about these issues, there is nothing to do here.
### With separate URLs
- If you are returning the old primary to service, you can pause the old primary runners until it is back online. This prevents the health check from kicking in.
- If the old primary is not returning, or you want to avoid temporarily reduced runner capacity, the primary runners should be reconfigured to connect to the new primary.
- If multiple secondaries are being used, the runners should be [paused](#pausing-runners), shutdown, or reconfigured to connect to the new primary while they are being replicated to the new primary.
### Pausing runners
You must have administrator access to use any of the following methods:
- Through the Admin Area:
1. On the left sidebar, at the bottom, select **Admin Area**.
1. Select **Settings > Runners**.
1. Identify the runners you would like to pause.
1. Select the `pause` button next to each runner you would like to pause.
1. After the failover is complete, unpause the runners you paused in the previous step.
- Use the [Runners API](../../../api/runners.md):
1. Fetch or create a [Personal Access Token](../../../user/profile/personal_access_tokens.md) with administrator access.
1. Get the list of runners. You can filter the list [using the API](../../../api/runners.md#list-all-runners).
1. Identify the runners you would like to pause, and make note of their `id`.
1. [Follow the API documentation](../../../api/runners.md#pause-a-runner) to pause each runner.
1. After the failover is complete, unpause the list of runners using the API by setting `paused=false`.
0% 加载中 .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册