-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default --cert-dir for kubelet to a non-transient location #53317
Change default --cert-dir for kubelet to a non-transient location #53317
Conversation
/lgtm |
/assign @derekwaynecarr @dchen1107 |
/test pull-kubernetes-e2e-kubeadm-gce |
1a953eb
to
540516b
Compare
540516b
to
8c25265
Compare
@kubernetes/sig-cluster-lifecycle-bugs @luxas PTAL at the kubeadm pre-init check change |
/approve |
/assign @mikedanese @luxas |
@@ -650,7 +650,6 @@ func RunInitMasterChecks(cfg *kubeadmapi.MasterConfiguration) error { | |||
PortOpenCheck{port: 10252}, | |||
HTTPProxyCheck{Proto: "https", Host: cfg.API.AdvertiseAddress, Port: int(cfg.API.BindPort)}, | |||
DirAvailableCheck{Path: filepath.Join(kubeadmconstants.KubernetesDir, kubeadmconstants.ManifestsSubDirName)}, | |||
DirAvailableCheck{Path: "/var/lib/kubelet"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be a lot better if DirAvailableCheck will be enhanced to do filepath.Walk() and see if there are files present. It should fail check only if directory contains files, but should tolerate if there are empty sub-dirs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should fail check only if directory contains files, but should tolerate if there are empty sub-dirs.
not really... the kubelet is running in the background, so kubeadm
should not expect its runtime dir to be empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preflight checks are executed at the time when kubelet has no valid configuration, so should not be really running. And even if it running, it should be nothing in /var/lib/kubelet/pods
, which might mean leftovers from previous configurations if kubeadm reset
was not properly executed.
As alternative, we can add check that /var/lib/kubelet/pods
and /var/lib/kubelet/pki
are empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luxas asked that rather than remove these, I change these to check the /var/lib/kubelet/pods
subfolder specifically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luxas asked that rather than remove these, I change these to check the
/var/lib/kubelet/pods
subfolder specifically
actually, I don't think that's a good idea... the kubelet owns the runtime dir. kubeadm
can check the paths kubeadm
writes to (pod manifests, etc), but shouldn't make assumptions about what the kubelet writes to its runtime dir. I think this is good as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen on nodes, where there are no static manifests, but kublet was not properly reset and still running some pods and having certs from previous node setup ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As alternative, we can add check that /var/lib/kubelet/pods and /var/lib/kubelet/pki are empty.
/var/lib/kubelet/pki
will not be empty if the kubelet is crashlooping in the background... it generates serving certificates there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok for now to remove that directory check, as files are generated by crashlooping kubelet. In 1.9 I think we need to have separate issue that would be handling kubelet start/stop in better way in init/join commands. Right now it seems that kubeadm
has even more issues, like kubelet will not be started if user did kubeadm reset
then kubeadm join --skip-preflight-checks
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm but @luxas for approve.
/approve |
/hold for lucas |
discussed with @luxas on slack about his suggestion to narrow the preflight check to I think that introspecting the kubelet's runtime dir in a preflight is a fragile check for kubeadm to make and will lead to similar broken assumptions in the future. Would like feedback on that from representatives from sig-node (@derekwaynecarr) and sig-cluster-lifecycle (@mikedanese / @timothysc). If the consensus is to remove the check, @luxas said he is fine with that. If the consensus is to continue introspecting the runtime dir and just narrow the check, I can make that change as well. We await your feedback. |
I agree that broad preflight checks like this are bound to break. Kubelet fs layout is not a stable API and there are many reasons this directory wouldn't be empty. If we want to prevent kubeadm from installing over an already init'd node we should write a file that is deleted during |
It is hard to check if the kubelet is running right now as it crash loops. For 1.9 to really address this we'd have to have more formal communications between the kubelet and kubeadm. That comes with Kubelet Dynamic Config. We should also have a way to reliably "clean" the kubelet config to reset it on reinstall or when things are busted. If we can't just delete a directory then there should be some other mechanism to do that. It feels like we need a design doc on the ideal work flow for kubelet/kubeadm for 1.9 that is shared across SIG-cluster-lifecycle and SIG-node to help avoid these types of issues in the future. |
Talked with @derekwaynecarr as well and he agreed it didn't make sense for kubeadm to introspect the runtime dir. Sounds like there is consensus to remove this preflight check.
That's already done...
The "API" boundary here should be the config the kubelet exposes (via flags, as today, or via config objects/files, as in dynamic config) |
/unhold |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekwaynecarr, jcbsmpsn, liggitt, mikedanese, timothysc Associated issue: 53288 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 53317, 52186). If you want to cherry-pick this change to another branch, please follow the instructions here. |
Commit found in the "release-1.8" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
The default kubelet
--cert-dir
location is/var/run/kubernetes
, which is automatically erased on reboot on many platforms. As of 1.8.0, kubelet TLS bootstrapping and cert rotation now persist files in--cert-dir
, this should default to a non-transient location. Default it to thepki
subfolder of the default--root-dir
Fixes #53288Additionally, since
kubeadm
expects a running (albeit crashlooping) kubelet prior to runningkubeadm init
orkubeadm join
, and was using the default--root-dir
of/var/lib/kubelet
, it should not expect that folder to be empty as a pre-init check. Fixes #53356