Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached #54826

Merged
merged 4 commits into from
Nov 13, 2017

Conversation

rohitagarwal003
Copy link
Member

@rohitagarwal003 rohitagarwal003 commented Oct 30, 2017

  • Instead of the old Accelerators feature that added alpha.kubernetes.io/nvidia-gpu resource, use the new DevicePlugins feature that adds vendor specific resources. (In case of nvidia GPUs it will
    add nvidia.com/gpu resource.)

  • Add node label to GCE nodes with accelerators attached. This node label is the same as what GKE attaches to node pools with accelerators attached. (For example, for nvidia-tesla-p100 GPU, the label would be cloud.google.com/gke-accelerator=nvidia-tesla-p100) This will help us target accelerator specific
    daemonsets etc. to these nodes.

  • Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached.

  • Some minor documentation improvements in addon manager.

Release note:

GCE nodes with NVIDIA GPUs attached now expose `nvidia.com/gpu` as a resource instead of `alpha.kubernetes.io/nvidia-gpu`.

/sig cluster-lifecycle
/sig scheduling
/area hw-accelerators

kubernetes/enhancements#368

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. area/hw-accelerators cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 30, 2017
@rohitagarwal003
Copy link
Member Author

/assign @vishh @jiayingz @roberthbailey

@roberthbailey
Copy link
Contributor

I'll review for approval once this has an lgtm from @vishh

@@ -1,25 +1,25 @@
### Addon-manager

addon-manager manages two classes of addons with given template files.
addon-manager manages two classes of addons with given template files in `$ADDON_PATH`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to introduce $ADDON_PATH earlier here, maybe we should also move its default setting note "(default /etc/kubernetes/addons/)" here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@jiayingz jiayingz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Just a small nit.

@vishh
Copy link
Contributor

vishh commented Nov 1, 2017

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2017
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-accelerator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how is this different from using nodeSelector field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeSelector can only do key=value checks. I wanted to do key exists check (because I want to run it on nodes that have nvidia-tesla-k80 as value or nvidia-tesla-p100 as value or any later value we may add.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good to know :)

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2017
@rohitagarwal003
Copy link
Member Author

^ Fixed a typo.

@vishh
Copy link
Contributor

vishh commented Nov 1, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 1, 2017

Notes:
- Label `kubernetes.io/cluster-service=true` is deprecated (only for Addon Manager).
In future release (after one year), Addon Manager may not respect it anymore. Addons
have this label but without `addonmanager.kubernetes.io/mode=EnsureExists` will be
treated as "reconcile class addons" for now.
- Resources under $ADDON_PATH (default `/etc/kubernetes/addons/`) needs to have either one
of these two labels. Meanwhile namespaced resources need to be in `kube-system` namespace.
- Resources under `$ADDON_PATH` needs to have either one of these two labels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/needs/need

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

@@ -182,7 +182,10 @@ RUNTIME_CONFIG="${KUBE_RUNTIME_CONFIG:-}"
FEATURE_GATES="${KUBE_FEATURE_GATES:-ExperimentalCriticalPodAnnotation=true}"

if [[ ! -z "${NODE_ACCELERATORS}" ]]; then
FEATURE_GATES="${FEATURE_GATES},Accelerators=true"
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does switching this work across upgrades? Or if it doesn't that should be mentioned as part of the release note for this pr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accelerators is an alpha feature. DevicePlugins is also an alpha feature (and the replacement for Accelerators).

I had added the following release note to the PR:

GCE nodes with NVIDIA GPUs attached now expose `nvidia.com/gpu` as a resource instead of `alpha.kubernetes.io/nvidia-gpu`.

which captures the difference between clusters created using the old script and clusters created using the new script. But I haven't actually tried upgrading a cluster that had the old flag to a cluster with the new flag. In GKE, we don't have to worry about this because we don't allow alpha cluster upgrade.

What should be the right release note here?
cc - @vishh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrades are not really supported when alpha features are turned on. So I don't see much value in thinking about the upgrade/downgrade scenario. We recommend users to stick to specific versions and our workflow has changed considerably over releases while in alpha.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the release note as it is LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense; I wasn't sure where this was on the alpha -> ga slider. Even with alpha features it's nice to have a release note so that people using them will know how we've changed things and I agree that the release note looks good.

It would be pretty easy to run ./cluster/upgrade.sh to upgrade a GCE cluster and see what happens to the nodes w.r.t. labels. I'm guessing that they would change to the new label, but not entirely sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roberthbailey , I tried the following:

# Last commit from master on this branch
git checkout 55e216f56eac0082acc6be655d9ae09cf9ba38a8

go run hack/e2e.go -- -v --build

export NODE_ACCELERATORS=type=nvidia-tesla-k80,count=2; export KUBE_NODE_OS_DISTRIBUTION=gci; export KUBE_GCE_ZONE=us-west1-b; export KUBE_GCE_NODE_IMAGE=gke-1-8-2-gke-0-cos-stable-60-9592-90-0-v171103-pre-nvda-gpu; export KUBE_GCE_NODE_PROJECT=gke-node-images;
cluster/kube-up.sh

# Latest commit on this branch
git checkout cf292754ba423aa6782564ea83fe48cc1ed677d4

go run hack/e2e.go -- -v --build

export NODE_ACCELERATORS=type=nvidia-tesla-k80,count=2; export KUBE_NODE_OS_DISTRIBUTION=gci; export KUBE_GCE_ZONE=us-west1-b; export KUBE_GCE_NODE_IMAGE=gke-1-8-2-gke-0-cos-stable-60-9592-90-0-v171103-pre-nvda-gpu; export KUBE_GCE_NODE_PROJECT=gke-node-images;
cluster/gce/upgrade.sh -l

Master got the new label cloud.google.com/gke-accelerator=nvidia-tesla-k80. Master also had the correct feature-gate DevicePlugins=true set. However, not sure how to check node upgrade because upgrade.sh doesn't support upgrading nodes to local binaries.

FEATURE_GATES="${FEATURE_GATES},Accelerators=true"
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true"
if [[ "${NODE_ACCELERATORS}" =~ .*type=([a-zA-Z0-9-]+).* ]]; then
NODE_LABELS="${NODE_LABELS},cloud.google.com/gke-accelerator=${BASH_REMATCH[1]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the cases where we want to enable deviceplugins but not set node labels?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we would enable device plugins by default (there would be no feature gate). So line 185 would go away.

Ideally, we would also like that each node that has special device add a node label. Lines 186-188 are doing that, they see a special device (accelerator) and are adding a label to the node for that. As long as GCE APIs follow the convention of specifying accelerators as type=TYPE,count=COUNT, this line would continue to work. Once, GCE adds devices that are not accelerators we would have to add more logic here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That didn't really answer my question but looking at the code again, is the reason for the conditional here to capture the device type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

FEATURE_GATES="${FEATURE_GATES},Accelerators=true"
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true"
if [[ "${NODE_ACCELERATORS}" =~ .*type=([a-zA-Z0-9-]+).* ]]; then
NODE_LABELS="${NODE_LABELS},cloud.google.com/gke-accelerator=${BASH_REMATCH[1]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is BASH_REMATCH portable across at least linux & mac (and ideally cygwin)? This runs client side so it needs to work on places where we run kube-up.sh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to http://git.savannah.gnu.org/cgit/bash.git/tree/CHANGES?h=bash-4.4#n4839, BASH_REMATCH was added in bash-3.0 which was released in 2004

I tested it on my mac and it works there. I don't have access to a Windows machine but will ask someone with windows to test it on cygwin/mingw. Thanks for pointing this out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I ran a Windows VM on GCP, installed cygwin and tested this if block. It works as expected. Fun experience! :D

@roberthbailey
Copy link
Contributor

Also please squash your commits.

The comment is also present in lines 143-145 where it makes more sense.
Instead of the old Accelerators feature that added
alpha.kubernetes.io/nvidia-gpu resource, use the new DevicePlugins
feature that adds vendor specific resources. (In case of nvidia it will
add nvidia.com/gpu resource.)
This node label is the same as what GKE attaches to node pools with
accelerators attached. This will help us target accelerator specific
daemonsets etc. to these nodes.
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 2, 2017
Copy link
Member Author

@rohitagarwal003 rohitagarwal003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please squash your commits.

The four commits are doing four different things:

  • improving documentation of addon-manager.
  • enabling DevicePlugins instead of Accelerators for GPU nodes.
  • adding labels to nodes that have accelerators attached
  • making nvidia-gpu-device-plugin an addon.

I would prefer to keep them separate unless you feel strongly otherwise.


Notes:
- Label `kubernetes.io/cluster-service=true` is deprecated (only for Addon Manager).
In future release (after one year), Addon Manager may not respect it anymore. Addons
have this label but without `addonmanager.kubernetes.io/mode=EnsureExists` will be
treated as "reconcile class addons" for now.
- Resources under $ADDON_PATH (default `/etc/kubernetes/addons/`) needs to have either one
of these two labels. Meanwhile namespaced resources need to be in `kube-system` namespace.
- Resources under `$ADDON_PATH` needs to have either one of these two labels.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

FEATURE_GATES="${FEATURE_GATES},Accelerators=true"
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true"
if [[ "${NODE_ACCELERATORS}" =~ .*type=([a-zA-Z0-9-]+).* ]]; then
NODE_LABELS="${NODE_LABELS},cloud.google.com/gke-accelerator=${BASH_REMATCH[1]}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to http://git.savannah.gnu.org/cgit/bash.git/tree/CHANGES?h=bash-4.4#n4839, BASH_REMATCH was added in bash-3.0 which was released in 2004

I tested it on my mac and it works there. I don't have access to a Windows machine but will ask someone with windows to test it on cygwin/mingw. Thanks for pointing this out.

@@ -182,7 +182,10 @@ RUNTIME_CONFIG="${KUBE_RUNTIME_CONFIG:-}"
FEATURE_GATES="${KUBE_FEATURE_GATES:-ExperimentalCriticalPodAnnotation=true}"

if [[ ! -z "${NODE_ACCELERATORS}" ]]; then
FEATURE_GATES="${FEATURE_GATES},Accelerators=true"
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accelerators is an alpha feature. DevicePlugins is also an alpha feature (and the replacement for Accelerators).

I had added the following release note to the PR:

GCE nodes with NVIDIA GPUs attached now expose `nvidia.com/gpu` as a resource instead of `alpha.kubernetes.io/nvidia-gpu`.

which captures the difference between clusters created using the old script and clusters created using the new script. But I haven't actually tried upgrading a cluster that had the old flag to a cluster with the new flag. In GKE, we don't have to worry about this because we don't allow alpha cluster upgrade.

What should be the right release note here?
cc - @vishh

FEATURE_GATES="${FEATURE_GATES},Accelerators=true"
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true"
if [[ "${NODE_ACCELERATORS}" =~ .*type=([a-zA-Z0-9-]+).* ]]; then
NODE_LABELS="${NODE_LABELS},cloud.google.com/gke-accelerator=${BASH_REMATCH[1]}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we would enable device plugins by default (there would be no feature gate). So line 185 would go away.

Ideally, we would also like that each node that has special device add a node label. Lines 186-188 are doing that, they see a special device (accelerator) and are adding a label to the node for that. As long as GCE APIs follow the convention of specifying accelerators as type=TYPE,count=COUNT, this line would continue to work. Once, GCE adds devices that are not accelerators we would have to add more logic here.

@roberthbailey
Copy link
Contributor

Are the commits really separable? Would we roll one back but not the others? I can see an argument for the addon manager documentation, but the other three seem like they are the same change and need to be applied or rolled back atomically.

@rohitagarwal003
Copy link
Member Author

They are separable: we can rollback just the last one which adds the add-on, or we can rollback the last two that add the add-on and node-labels or we can rollback all three. However, the last one depends on the other two, so if we rollback one of them, we should roll the last one back as well.

@rohitagarwal003
Copy link
Member Author

/cc @mikedanese

@roberthbailey
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 13, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mindprince, roberthbailey, vishh

Associated issue: 368

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 13, 2017
@k8s-github-robot
Copy link

/test all

Tests are more than 96 hours old. Re-running tests.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 54826, 53576, 55591, 54946, 54825). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 4f91113 into kubernetes:master Nov 13, 2017
k8s-github-robot pushed a commit that referenced this pull request Nov 18, 2017
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update URLs for nvidia gpu device plugin and nvidia driver installer.

Device plugin is now an addon and its manifest is now in kubernetes/kubernetes. The manifest on
GoogleCloudPlatform/container-engine-accelerators no longer contains device plugin.

This is needed after #54826 and GoogleCloudPlatform/container-engine-accelerators#25

**Release note**:
```release-note
NONE
```

/sig scheduling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/hw-accelerators cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants