support for GPU monitoring installation
Created by: zhu733756
Signed-off-by: zhu733756 talonzhu@yunify.com
This pr aims to support GPU monitoring installation. Fix https://github.com/kubesphere/kubesphere/issues/4082
The notable fix can be described as follows:
- make definition in cluster-configuration yaml:
monitoring:
gpu:
nvidia_dcgm_exporter:
enabled: true
- Integrate a GPU monitoring task in the monitoring section, the steps can be found at
gpu-monitoring.yaml
.- Getting GPU monitoring installation files.
- Creating GPU monitoring manifests.
- Labeling GPU nodes.
- Installing NVIDIA DCGM exporter( From NVIDIA/gpu-monitoring-tools).
- Installing custom GPU dashboards( From nvidia-gpu-dcgm-exporter-clusterdashboard.yaml).
- For the cluster role updates:
- apiGroups:
- monitoring.kubesphere.io
resources:
- '*'
verbs:
- '*'
/cc @benjaminhuo @pixiake