Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm refuses to upgrade if a node is unhealthy #539

Closed
anguslees opened this issue Nov 13, 2017 · 12 comments · Fixed by kubernetes/kubernetes#56130
Closed

kubeadm refuses to upgrade if a node is unhealthy #539

anguslees opened this issue Nov 13, 2017 · 12 comments · Fixed by kubernetes/kubernetes#56130
Assignees
Milestone

Comments

@anguslees
Copy link
Member

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version: &version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.4-beta.0.30+7c3a25cf07b734-dirty", GitCommit:"7c3a25cf07b734006bafe7267d3ca7e90d376bee", GitTreeState:"dirty", BuildDate:"2017-11-13T06:23:45Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/arm"}

Environment:

  • Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.0", GitCommit:"6e937839ac04a38cac63e6a7a306c5d035fe7b0a", GitTreeState:"clean", BuildDate:"2017-09-28T22:46:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/arm"}

  • Cloud provider or hardware configuration: baremetal (arm + amd64)

What happened?

./kubeadm upgrade plan --skip-preflight-checks
[preflight] Skipping pre-flight checks
[upgrade] Making sure the cluster is healthy:
[upgrade/health] Checking API Server health: Healthy
[upgrade/health] Checking Node health: More than one Node unhealthy
[upgrade/health] FATAL: The cluster is not in an upgradeable state due to: there are NotReady Nodes in the cluster: [<node name>]

.. and yes, I currently have a (non-master) node offline and drained.

What you expected to happen?

Some method of upgrading the cluster, despite a node being unhealthy. In a large cluster, it is almost inconceivable that every node will be healthy, and that's just fine.

I suggest removing this check. The existing MaxUnavailable and PodDisruptionBudget rules should provide sufficient safety, assuming jobs are updated using normal mechanisms and nodes are cleanly drained where necessary. If those things aren't happening, then we should fix that problem instead.

How to reproduce it (as minimally and precisely as possible)?

Drain a node, or otherwise have one unhealthy. Try to upgrade.

Anything else we need to know?

@luxas
Copy link
Member

luxas commented Nov 13, 2017

If we meet in the middle, would you be happy with making the health checks skippable with -f @anguslees?
I don't want to remove this completely

@luxas luxas added this to the v1.9 milestone Nov 13, 2017
@anguslees
Copy link
Member Author

anguslees commented Nov 13, 2017

Yep happy to do so. So I know what warning to give, can you outline an issue that this health check protects us from?

@kad
Copy link
Member

kad commented Nov 14, 2017

side suggestion: maybe make this check compatible with interfaces of preflight checks, so it can be run with RunChecks() ? this will allow later on (soonish...) to unify mechanism of individual control of the checks.

@luxas
Copy link
Member

luxas commented Nov 14, 2017

@kad sounds reasonable to me; but doesn't have to be done in the same PR.
@anguslees I'm happy to see a PR to make it skippable with -f

We currently have the following checks:

  • Make sure the API Server answers ok on /healthz -- kinda required
  • Make sure all nodes are Ready; we shouldn't upgrade for example, a cluster where the master is unhealthy; instead let the admin fix the problem first. This also ties into the CNI config. In order for the cluster to be functional you have to install a CNI network. It guards for instance from upgrading a cluster without a CNI network installed.
  • Make sure the required Static Pod manifests are there if Static Pods are used
  • ...or make sure the right control plane DaemonSets are healthy if self-hosting.

@anguslees
Copy link
Member Author

anguslees commented Nov 15, 2017

for example, a cluster where the master is unhealthy

Isn't this (better) covered by the other apiserver /healthz check?
Why can't I upgrade the control plane with a non-functional CNI network (assuming the apiserver is otherwise reachable and responding)?

This is what I'm trying to get at: I can trivially construct hypothetical situations where every worker node is down, and - assuming the admin is otherwise ok with this situation - there's no reason for kubeadm to prevent upgrading the control plane. For master nodes themselves, I can similarly easily construct situations where (eg) a minority of master nodes are down, and again - assuming the admin is otherwise ok with this - there's no reason for kubeadm to prevent upgrading the masters. More importantly I can easily construct hypothetical situations where upgrading the control plane is required to deploy a bug fix that is a critical part of recovering the cluster from a partial outage.

I can't think of any situation where verifying all the nodes are healthy is useful. Adding a global -f flag makes me uncomfortable, since anyone with more than a toy cluster is going to have to use that flag every time they use kubeadm - and I wouldn't want to disable other (actually useful) checks just because of this one. So if I'm going to implement an ignore flag, it'll be something more specific like --ignore=nodehealth

I guess what I'm trying to say is: kubeadm is not a monitoring tool and it shouldn't be doing monitoring "just because". The only checks it should perform are the ones required as part of the upgrade method (eg: if the apiserver isn't responding at all, then that's a problem for a self-hosted upgrade). I know I'm well into rant territory now, but there is also no acknowledgement in the above discussion that a naive "all healthy" criteria is simply unachievable in the real world, and this makes me feel uncertain about whether I'm going to be able to use kubeadm in practical situations. As a concerned contributor, I'm not sure how I can best help if there's going to be pushback when these sorts of well-intentioned-but-wrong issues are identified :(

@kad
Copy link
Member

kad commented Nov 15, 2017

@anguslees before start introducing new flags, please wait couple of days for me to finalize solution for #480 We will have flag that initially supposed to cover pre-flight checks, but if in all other places we will be using same interface for all kind of checks, it will be possible to fine-control even in situations like this.

@anguslees
Copy link
Member Author

@kad: hah, that's basically exactly what I implemented yesterday, only I added it just for the healthChecks in health.go. I'll hold off on sending the PR for a few days.

@luxas
Copy link
Member

luxas commented Nov 18, 2017

@anguslees Would you be ok with converting this to a preflight check, having it skippable with using @kad's ignore-specific-preflight-check, and only requiring that the master nodes are healthy?

@anguslees
Copy link
Member Author

anguslees commented Nov 19, 2017

@luxas yep, sure.

Just to continue beating this dead horse, however: If the apiserver is responding healthy, does it matter whether all nodes tagged with the "master" role are healthy?

I suspect the actual requirement is that we don't take the masters below the etcd majority quorum (assuming we're in some future kubeadm-HA world). I presume the size of etcd cluster is (going to be) stored in the kubeadm config, or do we have some other plan here?

With the current upgrade approach (single master, and in-place master upgrades) do we just need a single healthy master? (in particular, with a single master we are going to go below the majority quorum 1->0 during the upgrade since the upgrade process is so different to a rolling-update HA approach)

@luxas
Copy link
Member

luxas commented Nov 20, 2017

We need the master to be Ready so we know the CNI network is set up there. Otherwise we can't assume normal functionality in the cluster to rely on when upgrading. Especially when self-hosting comes around we need this requirement.

@anguslees
Copy link
Member Author

ok, rewritten on top of @kad's (unmerged) ignore-specific-preflight-check PR, and only requires master nodes to be Ready.

@luxas
Copy link
Member

luxas commented Nov 22, 2017

@kad's PR is now merged :)

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Nov 24, 2017
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Make healthchecks skippable, and check masters only

**What this PR does / why we need it**:

Previously kubeadm would abort if _any_ node was not Ready.  This is obviously infeasible in a non-trivial (esp. baremetal) cluster.

This PR makes two changes:
- Allows kubeadm healthchecks to be selectively skipped (made non-fatal) with --ignore-checks-errors.
- Check only that the *master* nodes are Ready.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes/kubeadm#539

**Special notes for your reviewer**:

Builds on #56072

**Release note**:

```release-note
kubeadm health checks can also be skipped with `--ignore-checks-errors`
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants