No need to run device-plugin manually anymore. #25

rohitagarwal003 · 2017-11-14T02:55:02Z

It's an addon in GCP now. The manifest for that is here:
https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml

Instead of that a pause container is run, so that the installer can remain an init container.
Also, added node affinity to only run on nodes that have a label with key 'cloud.google.com/gke-accelerator'.

There are three files now:

device-plugin-daemonset.yaml
daemonset.yaml
daemonset-preloaded.yaml

(1) already existed but its name no longer makes sense. (2) is copy of (1).
(3) is copy of (2) except that the installer image is assumed to be present on
the node. (1) would be deleted once tests in kubernetes/kubernetes are updated
to point to (2) or (3).

It's an addon in GCP now. The manifest for that is here: https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml Instead of that a pause container is run, so that the installer can remain an init container. Also, added node affinity to only run on nodes that have a label with key 'cloud.google.com/gke-accelerator'. --- There are three files now: 1) device-plugin-daemonset.yaml 2) daemonset.yaml 3) daemonset-preloaded.yaml (1) already existed but its name no longer makes sense. (2) is copy of (1). (3) is copy of (2) except that the installer image is assumed to be present on the node. (1) would be deleted once tests in kubernetes/kubernetes are updated to point to (2) or (3).

jiayingz · 2017-11-14T20:06:04Z

daemonset-preloaded.yaml

+        name: nvidia-driver-installer
+        resources:
+          requests:
+            cpu: 0.15


Nit: should we increase this based on what we measured in PR kubernetes/kubernetes#53541 that installer sometimes can take up to 2 cores?

I am not sure. That would in effect reduce the capacity of all GPU nodes by 2 cores.

I also have similar concern. What I am worried about is installer may take much longer time when the node is short of cpu. Perhaps we can discuss this later outside this PR.

/lgtm

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Update URLs for nvidia gpu device plugin and nvidia driver installer. Device plugin is now an addon and its manifest is now in kubernetes/kubernetes. The manifest on GoogleCloudPlatform/container-engine-accelerators no longer contains device plugin. This is needed after #54826 and GoogleCloudPlatform/container-engine-accelerators#25 **Release note**: ```release-note NONE ``` /sig scheduling

rohitagarwal003 requested review from vishh and jiayingz November 14, 2017 02:55

googlebot added the cla: yes label Nov 14, 2017

jiayingz reviewed Nov 14, 2017

View reviewed changes

rohitagarwal003 merged commit 8dcc32e into GoogleCloudPlatform:master Nov 14, 2017

rohitagarwal003 mentioned this pull request Nov 14, 2017

Update URLs for nvidia gpu device plugin and nvidia driver installer. kubernetes/kubernetes#55737

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No need to run device-plugin manually anymore. #25

No need to run device-plugin manually anymore. #25

rohitagarwal003 commented Nov 14, 2017

jiayingz Nov 14, 2017

rohitagarwal003 Nov 14, 2017

jiayingz Nov 14, 2017

No need to run device-plugin manually anymore. #25

No need to run device-plugin manually anymore. #25

Conversation

rohitagarwal003 commented Nov 14, 2017

jiayingz Nov 14, 2017

Choose a reason for hiding this comment

rohitagarwal003 Nov 14, 2017

Choose a reason for hiding this comment

jiayingz Nov 14, 2017

Choose a reason for hiding this comment