-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached #54826
Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached #54826
Conversation
/assign @vishh @jiayingz @roberthbailey |
I'll review for approval once this has an lgtm from @vishh |
@@ -1,25 +1,25 @@ | |||
### Addon-manager | |||
|
|||
addon-manager manages two classes of addons with given template files. | |||
addon-manager manages two classes of addons with given template files in `$ADDON_PATH`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to introduce $ADDON_PATH earlier here, maybe we should also move its default setting note "(default /etc/kubernetes/addons/
)" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Just a small nit.
8f0e152
to
1663212
Compare
/lgtm |
requiredDuringSchedulingIgnoredDuringExecution: | ||
nodeSelectorTerms: | ||
- matchExpressions: | ||
- key: cloud.google.com/gke-accelerator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious how is this different from using nodeSelector
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodeSelector
can only do key=value
checks. I wanted to do key exists
check (because I want to run it on nodes that have nvidia-tesla-k80
as value or nvidia-tesla-p100
as value or any later value we may add.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good to know :)
1663212
to
05b5b04
Compare
^ Fixed a typo. |
/lgtm |
|
||
Notes: | ||
- Label `kubernetes.io/cluster-service=true` is deprecated (only for Addon Manager). | ||
In future release (after one year), Addon Manager may not respect it anymore. Addons | ||
have this label but without `addonmanager.kubernetes.io/mode=EnsureExists` will be | ||
treated as "reconcile class addons" for now. | ||
- Resources under $ADDON_PATH (default `/etc/kubernetes/addons/`) needs to have either one | ||
of these two labels. Meanwhile namespaced resources need to be in `kube-system` namespace. | ||
- Resources under `$ADDON_PATH` needs to have either one of these two labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/needs/need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
@@ -182,7 +182,10 @@ RUNTIME_CONFIG="${KUBE_RUNTIME_CONFIG:-}" | |||
FEATURE_GATES="${KUBE_FEATURE_GATES:-ExperimentalCriticalPodAnnotation=true}" | |||
|
|||
if [[ ! -z "${NODE_ACCELERATORS}" ]]; then | |||
FEATURE_GATES="${FEATURE_GATES},Accelerators=true" | |||
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does switching this work across upgrades? Or if it doesn't that should be mentioned as part of the release note for this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accelerators
is an alpha feature. DevicePlugins
is also an alpha feature (and the replacement for Accelerators
).
I had added the following release note to the PR:
GCE nodes with NVIDIA GPUs attached now expose `nvidia.com/gpu` as a resource instead of `alpha.kubernetes.io/nvidia-gpu`.
which captures the difference between clusters created using the old script and clusters created using the new script. But I haven't actually tried upgrading a cluster that had the old flag to a cluster with the new flag. In GKE, we don't have to worry about this because we don't allow alpha cluster upgrade.
What should be the right release note here?
cc - @vishh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upgrades are not really supported when alpha features are turned on. So I don't see much value in thinking about the upgrade/downgrade scenario. We recommend users to stick to specific versions and our workflow has changed considerably over releases while in alpha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the release note as it is LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense; I wasn't sure where this was on the alpha -> ga slider. Even with alpha features it's nice to have a release note so that people using them will know how we've changed things and I agree that the release note looks good.
It would be pretty easy to run ./cluster/upgrade.sh
to upgrade a GCE cluster and see what happens to the nodes w.r.t. labels. I'm guessing that they would change to the new label, but not entirely sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roberthbailey , I tried the following:
# Last commit from master on this branch
git checkout 55e216f56eac0082acc6be655d9ae09cf9ba38a8
go run hack/e2e.go -- -v --build
export NODE_ACCELERATORS=type=nvidia-tesla-k80,count=2; export KUBE_NODE_OS_DISTRIBUTION=gci; export KUBE_GCE_ZONE=us-west1-b; export KUBE_GCE_NODE_IMAGE=gke-1-8-2-gke-0-cos-stable-60-9592-90-0-v171103-pre-nvda-gpu; export KUBE_GCE_NODE_PROJECT=gke-node-images;
cluster/kube-up.sh
# Latest commit on this branch
git checkout cf292754ba423aa6782564ea83fe48cc1ed677d4
go run hack/e2e.go -- -v --build
export NODE_ACCELERATORS=type=nvidia-tesla-k80,count=2; export KUBE_NODE_OS_DISTRIBUTION=gci; export KUBE_GCE_ZONE=us-west1-b; export KUBE_GCE_NODE_IMAGE=gke-1-8-2-gke-0-cos-stable-60-9592-90-0-v171103-pre-nvda-gpu; export KUBE_GCE_NODE_PROJECT=gke-node-images;
cluster/gce/upgrade.sh -l
Master got the new label cloud.google.com/gke-accelerator=nvidia-tesla-k80
. Master also had the correct feature-gate DevicePlugins=true
set. However, not sure how to check node upgrade because upgrade.sh doesn't support upgrading nodes to local binaries.
FEATURE_GATES="${FEATURE_GATES},Accelerators=true" | ||
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true" | ||
if [[ "${NODE_ACCELERATORS}" =~ .*type=([a-zA-Z0-9-]+).* ]]; then | ||
NODE_LABELS="${NODE_LABELS},cloud.google.com/gke-accelerator=${BASH_REMATCH[1]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the cases where we want to enable deviceplugins but not set node labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, we would enable device plugins by default (there would be no feature gate). So line 185 would go away.
Ideally, we would also like that each node that has special device add a node label. Lines 186-188 are doing that, they see a special device (accelerator) and are adding a label to the node for that. As long as GCE APIs follow the convention of specifying accelerators as type=TYPE,count=COUNT, this line would continue to work. Once, GCE adds devices that are not accelerators we would have to add more logic here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That didn't really answer my question but looking at the code again, is the reason for the conditional here to capture the device type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
FEATURE_GATES="${FEATURE_GATES},Accelerators=true" | ||
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true" | ||
if [[ "${NODE_ACCELERATORS}" =~ .*type=([a-zA-Z0-9-]+).* ]]; then | ||
NODE_LABELS="${NODE_LABELS},cloud.google.com/gke-accelerator=${BASH_REMATCH[1]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is BASH_REMATCH portable across at least linux & mac (and ideally cygwin)? This runs client side so it needs to work on places where we run kube-up.sh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to http://git.savannah.gnu.org/cgit/bash.git/tree/CHANGES?h=bash-4.4#n4839, BASH_REMATCH was added in bash-3.0 which was released in 2004
I tested it on my mac and it works there. I don't have access to a Windows machine but will ask someone with windows to test it on cygwin/mingw. Thanks for pointing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I ran a Windows VM on GCP, installed cygwin and tested this if
block. It works as expected. Fun experience! :D
Also please squash your commits. |
The comment is also present in lines 143-145 where it makes more sense.
Instead of the old Accelerators feature that added alpha.kubernetes.io/nvidia-gpu resource, use the new DevicePlugins feature that adds vendor specific resources. (In case of nvidia it will add nvidia.com/gpu resource.)
This node label is the same as what GKE attaches to node pools with accelerators attached. This will help us target accelerator specific daemonsets etc. to these nodes.
…have nvidia GPUs attached.
05b5b04
to
cf29275
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please squash your commits.
The four commits are doing four different things:
- improving documentation of addon-manager.
- enabling DevicePlugins instead of Accelerators for GPU nodes.
- adding labels to nodes that have accelerators attached
- making nvidia-gpu-device-plugin an addon.
I would prefer to keep them separate unless you feel strongly otherwise.
|
||
Notes: | ||
- Label `kubernetes.io/cluster-service=true` is deprecated (only for Addon Manager). | ||
In future release (after one year), Addon Manager may not respect it anymore. Addons | ||
have this label but without `addonmanager.kubernetes.io/mode=EnsureExists` will be | ||
treated as "reconcile class addons" for now. | ||
- Resources under $ADDON_PATH (default `/etc/kubernetes/addons/`) needs to have either one | ||
of these two labels. Meanwhile namespaced resources need to be in `kube-system` namespace. | ||
- Resources under `$ADDON_PATH` needs to have either one of these two labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
FEATURE_GATES="${FEATURE_GATES},Accelerators=true" | ||
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true" | ||
if [[ "${NODE_ACCELERATORS}" =~ .*type=([a-zA-Z0-9-]+).* ]]; then | ||
NODE_LABELS="${NODE_LABELS},cloud.google.com/gke-accelerator=${BASH_REMATCH[1]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to http://git.savannah.gnu.org/cgit/bash.git/tree/CHANGES?h=bash-4.4#n4839, BASH_REMATCH was added in bash-3.0 which was released in 2004
I tested it on my mac and it works there. I don't have access to a Windows machine but will ask someone with windows to test it on cygwin/mingw. Thanks for pointing this out.
@@ -182,7 +182,10 @@ RUNTIME_CONFIG="${KUBE_RUNTIME_CONFIG:-}" | |||
FEATURE_GATES="${KUBE_FEATURE_GATES:-ExperimentalCriticalPodAnnotation=true}" | |||
|
|||
if [[ ! -z "${NODE_ACCELERATORS}" ]]; then | |||
FEATURE_GATES="${FEATURE_GATES},Accelerators=true" | |||
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accelerators
is an alpha feature. DevicePlugins
is also an alpha feature (and the replacement for Accelerators
).
I had added the following release note to the PR:
GCE nodes with NVIDIA GPUs attached now expose `nvidia.com/gpu` as a resource instead of `alpha.kubernetes.io/nvidia-gpu`.
which captures the difference between clusters created using the old script and clusters created using the new script. But I haven't actually tried upgrading a cluster that had the old flag to a cluster with the new flag. In GKE, we don't have to worry about this because we don't allow alpha cluster upgrade.
What should be the right release note here?
cc - @vishh
FEATURE_GATES="${FEATURE_GATES},Accelerators=true" | ||
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true" | ||
if [[ "${NODE_ACCELERATORS}" =~ .*type=([a-zA-Z0-9-]+).* ]]; then | ||
NODE_LABELS="${NODE_LABELS},cloud.google.com/gke-accelerator=${BASH_REMATCH[1]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, we would enable device plugins by default (there would be no feature gate). So line 185 would go away.
Ideally, we would also like that each node that has special device add a node label. Lines 186-188 are doing that, they see a special device (accelerator) and are adding a label to the node for that. As long as GCE APIs follow the convention of specifying accelerators as type=TYPE,count=COUNT, this line would continue to work. Once, GCE adds devices that are not accelerators we would have to add more logic here.
Are the commits really separable? Would we roll one back but not the others? I can see an argument for the addon manager documentation, but the other three seem like they are the same change and need to be applied or rolled back atomically. |
They are separable: we can rollback just the last one which adds the add-on, or we can rollback the last two that add the add-on and node-labels or we can rollback all three. However, the last one depends on the other two, so if we rollback one of them, we should roll the last one back as well. |
/cc @mikedanese |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mindprince, roberthbailey, vishh Associated issue: 368 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all Tests are more than 96 hours old. Re-running tests. |
Automatic merge from submit-queue (batch tested with PRs 54826, 53576, 55591, 54946, 54825). If you want to cherry-pick this change to another branch, please follow the instructions here. |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Update URLs for nvidia gpu device plugin and nvidia driver installer. Device plugin is now an addon and its manifest is now in kubernetes/kubernetes. The manifest on GoogleCloudPlatform/container-engine-accelerators no longer contains device plugin. This is needed after #54826 and GoogleCloudPlatform/container-engine-accelerators#25 **Release note**: ```release-note NONE ``` /sig scheduling
Instead of the old
Accelerators
feature that addedalpha.kubernetes.io/nvidia-gpu
resource, use the newDevicePlugins
feature that adds vendor specific resources. (In case of nvidia GPUs it willadd
nvidia.com/gpu
resource.)Add node label to GCE nodes with accelerators attached. This node label is the same as what GKE attaches to node pools with accelerators attached. (For example, for nvidia-tesla-p100 GPU, the label would be
cloud.google.com/gke-accelerator=nvidia-tesla-p100
) This will help us target accelerator specificdaemonsets etc. to these nodes.
Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached.
Some minor documentation improvements in addon manager.
Release note:
/sig cluster-lifecycle
/sig scheduling
/area hw-accelerators
kubernetes/enhancements#368