Make healthchecks skippable, and check masters only #56130

anguslees · 2017-11-21T05:23:12Z

What this PR does / why we need it:

Previously kubeadm would abort if any node was not Ready. This is obviously infeasible in a non-trivial (esp. baremetal) cluster.

This PR makes two changes:

Allows kubeadm healthchecks to be selectively skipped (made non-fatal) with --ignore-checks-errors.
Check only that the master nodes are Ready.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes kubernetes/kubeadm#539

Special notes for your reviewer:

Builds on #56072

Release note:

kubeadm health checks can also be skipped with `--ignore-preflight-errors`

anguslees · 2017-11-21T06:24:59Z

/assign @luxas @kad

kad · 2017-11-21T13:40:15Z

cmd/kubeadm/app/phases/upgrade/health.go

@@ -106,16 +107,25 @@ func apiServerHealthy(client clientset.Interface) error {
 	return nil
 }

-// nodesHealthy checks whether all Nodes in the cluster are in the Running state
-func nodesHealthy(client clientset.Interface) error {


I would actually keep this check and produce warning in case some nodes are not in healthy state.

Agree if possible to plumb through a warning -- users may want a heads up if their cluster isn't at full strength.

Technically, it is easy to add a warning of course, and I'm happy to do so if you insist.

I strongly disagree in principle however: kubeadm is not a monitoring service - if the admin wants to be informed that they have a node down, they shouldn't run kubeadm to discover that. Furthermore, I think the general posture of "we should protect the kubeadm user from unspecified problems that don't actually prevent an upgrade" has a real danger of making kubeadm fragile and not useful for real work (eg kubernetes/kubeadm#539 made kubeadm unusable for me). Imo kubeadm must be a do-what-I-say tool.

jbeda · 2017-11-22T01:00:52Z

cmd/kubeadm/app/phases/upgrade/health.go

@@ -106,16 +107,25 @@ func apiServerHealthy(client clientset.Interface) error {
 	return nil
 }

-// nodesHealthy checks whether all Nodes in the cluster are in the Running state
-func nodesHealthy(client clientset.Interface) error {


Agree if possible to plumb through a warning -- users may want a heads up if their cluster isn't at full strength.

jbeda · 2017-11-22T01:05:06Z

/approve

anguslees · 2017-11-22T02:27:04Z

/kind cleanup
/priority important-soon
/sig cluster-lifecycle
/area kubeadm

luxas · 2017-11-22T23:48:29Z

LGTM overall after a rebase

luxas

/lgtm

fejta-bot · 2017-11-23T20:01:14Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

cblecker · 2017-11-23T20:18:44Z

/lgtm cancel
Removing LGTM to stop retesting loop. This is not building:

W1123 17:36:02.397] # k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade
W1123 17:36:02.398] cmd/kubeadm/app/cmd/upgrade/apply.go:122:178: flags.parent.ignoreChecksErrorsSet undefined (type *cmdUpgradeFlags has no field or method ignoreChecksErrorsSet)
W1123 17:36:02.398] cmd/kubeadm/app/cmd/upgrade/plan.go:58:166: parentFlags.ignoreChecksErrorsSet undefined (type *cmdUpgradeFlags has no field or method ignoreChecksErrorsSet)

@anguslees This needs attention ASAP.

k8s-github-robot · 2017-11-23T20:19:08Z

[MILESTONENOTIFIER] Milestone Pull Request Current

@anguslees @cblecker @justinsb @kad @luxas @mikedanese

Note: This pull request is marked as priority/critical-urgent, and must be updated every 1 day during code freeze.

Example update:

ACK.  In progress
ETA: DD/MM/YYYY
Risks: Complicated fix required

Pull Request Labels

sig/cluster-lifecycle: Pull Request will be escalated to these SIGs if needed.
priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
kind/cleanup: Adding tests, refactoring, fixing old bugs.

Help

cblecker · 2017-11-24T00:21:48Z

/retest

anguslees · 2017-11-24T01:11:47Z

/retest

luxas

/lgtm

k8s-github-robot · 2017-11-24T10:21:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anguslees, jbeda, luxas

Associated issue: 539

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~cmd/kubeadm/OWNERS~~ [jbeda,luxas]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-11-24T11:30:51Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2017-11-24T12:20:25Z

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

xiangpengzhao · 2017-12-07T02:57:00Z

@anguslees since we changed the flag name to --ignore-preflight-errors in #56208, can you please change the release note in this PR?

I see the release note kubeadm health checks can also be skipped with --ignore-checks-errors in CHANGELOG-1.9.md. This will mislead users :)

luxas · 2017-12-11T18:45:19Z

@xiangpengzhao I fixed it here and in release notes

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 21, 2017

k8s-github-robot assigned mikedanese and justinsb Nov 21, 2017

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 21, 2017

anguslees changed the title ~~Kubeadm nodehealth~~ Make healthchecks skippable, and check masters only Nov 21, 2017

k8s-ci-robot assigned kad and luxas Nov 21, 2017

anguslees force-pushed the kubeadm-nodehealth branch from 51aa112 to 7dd3c1d Compare November 21, 2017 06:33

kad reviewed Nov 21, 2017

View reviewed changes

jbeda approved these changes Nov 22, 2017

View reviewed changes

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 22, 2017

jbeda added this to the v1.9 milestone Nov 22, 2017

k8s-github-robot added the milestone/incomplete-labels label Nov 22, 2017

k8s-github-robot added milestone/needs-approval needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed milestone/incomplete-labels labels Nov 22, 2017

anguslees force-pushed the kubeadm-nodehealth branch from 7dd3c1d to 1db7eed Compare November 23, 2017 00:01

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 23, 2017

k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 23, 2017

anguslees force-pushed the kubeadm-nodehealth branch from 1db7eed to 8dc77a5 Compare November 23, 2017 11:20

k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 23, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 23, 2017

luxas approved these changes Nov 23, 2017

View reviewed changes

k8s-ci-robot assigned cblecker Nov 23, 2017

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 23, 2017

anguslees added 2 commits November 24, 2017 10:27

Allow healthchecks to be skipped with --ignore-checks-errors too

68ea48b

Only check Readiness of masters, not every node

3da5985

anguslees force-pushed the kubeadm-nodehealth branch from 8dc77a5 to 3da5985 Compare November 23, 2017 23:27

luxas approved these changes Nov 24, 2017

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 24, 2017

k8s-github-robot merged commit 58fca39 into kubernetes:master Nov 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make healthchecks skippable, and check masters only #56130

Make healthchecks skippable, and check masters only #56130

anguslees commented Nov 21, 2017 •

edited by luxas

Loading

anguslees commented Nov 21, 2017

kad Nov 21, 2017

jbeda Nov 22, 2017

anguslees Nov 22, 2017 •

edited

Loading

jbeda Nov 22, 2017

jbeda commented Nov 22, 2017

anguslees commented Nov 22, 2017

luxas commented Nov 22, 2017

luxas left a comment

fejta-bot commented Nov 23, 2017

cblecker commented Nov 23, 2017

k8s-github-robot commented Nov 23, 2017

cblecker commented Nov 24, 2017

anguslees commented Nov 24, 2017

luxas left a comment

k8s-github-robot commented Nov 24, 2017

k8s-github-robot commented Nov 24, 2017

k8s-github-robot commented Nov 24, 2017

xiangpengzhao commented Dec 7, 2017

luxas commented Dec 11, 2017

Make healthchecks skippable, and check masters only #56130

Make healthchecks skippable, and check masters only #56130

Conversation

anguslees commented Nov 21, 2017 • edited by luxas Loading

anguslees commented Nov 21, 2017

kad Nov 21, 2017

Choose a reason for hiding this comment

jbeda Nov 22, 2017

Choose a reason for hiding this comment

anguslees Nov 22, 2017 • edited Loading

Choose a reason for hiding this comment

jbeda Nov 22, 2017

Choose a reason for hiding this comment

jbeda commented Nov 22, 2017

anguslees commented Nov 22, 2017

luxas commented Nov 22, 2017

luxas left a comment

Choose a reason for hiding this comment

fejta-bot commented Nov 23, 2017

cblecker commented Nov 23, 2017

k8s-github-robot commented Nov 23, 2017

cblecker commented Nov 24, 2017

anguslees commented Nov 24, 2017

luxas left a comment

Choose a reason for hiding this comment

k8s-github-robot commented Nov 24, 2017

k8s-github-robot commented Nov 24, 2017

k8s-github-robot commented Nov 24, 2017

xiangpengzhao commented Dec 7, 2017

luxas commented Dec 11, 2017

anguslees commented Nov 21, 2017 •

edited by luxas

Loading

anguslees Nov 22, 2017 •

edited

Loading