GPUs on Kubernetes
Motivation
I wanted to add a GPU node onto my cluster to run/serve some local LLMs for something fun and new to play with. Is the kubernetes part of this strictly necessary? No. Am I going to not do it anyways ? Hell yeah ! After all, overkill is under rated !
Problems Encountered.
There’s actually a fair amount of incomplete (or inconsistent) data out there on getting gpus to work in kubernetes. Unfortunately, its fragmented and at least for a number of more “home lab” installations it seems that the information is out of date, or has been tainted by drifting configurations and installation instructions from vendors over time.
Proceedure
-
Install the container tool kit.
curl -fsSL https://nvidia.github.io/libnvidia-contain er/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list && sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
-
Install your drivers.
sudo apt install -y nvidia-container-runtime cuda-drivers-fabricmanager-550 nvidia-headless-550-server
-
Configure your runtimes. (Only containerd is needed but I’m doing docker as well because the GPU computer is a desktop.)
sudo nvidia-ctk runtime configure --runtime=docker && sudo nvidia-ctk runtime configure --runtime=containerd
-
Test if docker can run on the GPU.
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Should return
1+-----------------------------------------------------------------------------------------+
2| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
3|-----------------------------------------+------------------------+----------------------+
4| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
5| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
6| | | MIG M. |
7|=========================================+========================+======================|
8| 0 NVIDIA GeForce RTX 3060 Off | 00000000:0B:00.0 On | N/A |
9| 0% 53C P3 32W / 170W | 316MiB / 12288MiB | 36% Default |
10| | | N/A |
11+-----------------------------------------+------------------------+----------------------+
12
13+-----------------------------------------------------------------------------------------+
14| Processes: |
15| GPU GI CI PID Type Process name GPU Memory |
16| ID ID Usage |
17|=========================================================================================|
18+-----------------------------------------------------------------------------------------+
-
Install k3s on your node now.
curl -sfL https://get.k3s.io | K3S_URL=https://${PRIMARY_K3S_IP}:6443 K3S_TOKEN=${K3S_CLUSTER_TOKEN} sh -s -
-
Label the node as having a GPU.
kubectl label nodes <your-node-name> gpu=true
. -
Create a
RunTimeClass
object on the cluster.
1---
2apiVersion: node.k8s.io/v1
3kind: RuntimeClass
4metadata:
5 name: nvidia
6handler: nvidia
- Create a
DaemonSet
for the Nvidia GPU controller. (Notice we set the node selector for gpu : “true” as we only need the DaemonSet to run on GPU nodes.)
1---
2apiVersion: apps/v1
3kind: DaemonSet/home/dstorey/dylanbstorey.gitlab.io/content/blog/rasberry_pi_usb_boot.md
4spec:
5 selector:
6 matchLabels:
7 name: nvidia-device-plugin-ds
8 updateStrategy:
9 type: RollingUpdate
10 template:
11 metadata:
12 labels:
13 name: nvidia-device-plugin-ds
14 spec:
15 nodeSelector:
16 gpu : "true"
17 tolerations:
18 - key: nvidia.com/gpu
19 operator: Exists
20 effect: NoSchedule
21 # Mark this pod as a critical add-on; when enabled, the critical add-on
22 # scheduler reserves resources for critical add-on pods so that they can
23 # be rescheduled after a failure.
24 # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
25 priorityClassName: "system-node-critical"
26 runtimeClassName: nvidia
27 containers:
28 - image: nvcr.io/nvidia/k8s-device-plugin:v0.14.1
29 name: nvidia-device-plugin-ctr
30 env:
31 - name: FAIL_ON_INIT_ERROR
32 value: "false"
33 securityContext:
34 allowPrivilegeEscalation: false
35 capabilities:
36 drop: ["ALL"]
37 volumeMounts:
38 - name: device-plugin
39 mountPath: /var/lib/kubelet/device-plugins
40 volumes:
41 - name: device-plugin
42 hostPath:
43 path: /var/lib/kubelet/device-plugins
- Test your new toy with the nbody problem.
1cat << EOF | kubectl create -f - ─╯
2apiVersion: v1
3kind: Pod
4metadata:
5 name: nbody-gpu-benchmark
6 namespace: default
7spec:
8 restartPolicy: OnFailure
9 runtimeClassName: nvidia
10 containers:
11 - name: cuda-container
12 image: nvcr.io/nvidia/k8s/cuda-sample:nbody
13 args: ["nbody", "-gpu", "-benchmark"]
14 resources:
15 limits:
16 nvidia.com/gpu: 1
17 env:
18 - name: NVIDIA_VISIBLE_DEVICES
19 value: all
20 - name: NVIDIA_DRIVER_CAPABILITIES
21 value: all
22EOF
kubectl logs nbody-gpu-benchmark -n default
1Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
2 -fullscreen (run n-body simulation in fullscreen mode)
3 -fp64 (use double precision floating point values for simulation)
4 -hostmem (stores simulation data in host memory)
5 -benchmark (run benchmark to measure performance)
6 -numbodies=<N> (number of bodies (>= 1) to run in simulation)
7 -device=<d> (where d=0,1,2.... for the CUDA device to use)
8 -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
9 -compare (compares simulation results running once on the default GPU and once on the CPU)
10 -cpu (run n-body simulation on the CPU)
11 -tipsy=<file.bin> (load a tipsy model file for simulation)
12
13NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
14
15> Windowed mode
16> Simulation data stored in video memory
17> Single precision floating point simulation
18> 1 Devices used for simulation
19GPU Device 0: "Ampere" with compute capability 8.6
20
21> Compute 8.6 CUDA device: [NVIDIA GeForce RTX 3060]
2228672 bodies, total time for 10 iterations: 22.099 ms
23= 372.001 billion interactions per second
24= 7440.026 single-precision GFLOP/s at 20 flops per interaction