-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm reset doesn't work properly with containerd #926
Comments
crictl is required for non-docker runtimes. If you add This should throw an error if crisocket is set to not the default and crictl is not found. |
I already have crictl on the path - it is returning error 1. Not found is usually error 127.
|
😖missed that line, my fault. |
@luxas i'm thinking we have two choices here:
wdyt? |
@NeilW Can you run this command and show the output here?
I suspect that crictl complains about cri socket format and that causes next crictl command (crictl stopp) to fail. It would also help if you add -v 1 to the 'kubeadm reset' command line. It will show you full crictl command lines used. @chuckha, @luxas: would it make sense to proceed with PR #64611 further? |
Looks like stderr and stdout are combined
Fails with the path for the socket, but switching to using the URL works fine |
it seems to be related to this issue |
Yes, they are. The reason seems to be that the code is bended to work around absence of fakeexec.Output in k8s.io/utils/exec/testing/fake_exec.go |
@luxas What would be a good way to change DefaultCRISocket to "unix:///var/run/dockershim.sock"? We can't just change it in app/apis/kubeadm/, can we? |
Proposed solution:
This solution would avoid breaking older configs (with a proper deprecation policy) and unblock crictl people. We would also have to add this change into the upgrade path as well 🤔 |
+1
yep and we need to
wouldn't it be OK to just start using the full URIs ( |
So I'm firmly in the camp that we can address this in 1.12, but let's be clear. Folks are advertising 1st class CRIs with absolutely 0 upstream CI-signal. Until that core issue is addressed, multiple CRIs are best effort on break-fix. /cc @runcom @countspongebob @kubernetes/sig-node-feature-requests @Random-Liu @BenTheElder |
we can temporary filter out stderr output as @NeilW showed above. +1 for continuing the loop. Even if we can't stop one pod it worth trying to stop the rest.
+1 for removing fallback to docker. Not sure about ignoring errors. We should at least show warnings if pod can't be stopped or removed. BTW, some of this is done in this PR, so I'd suggest to continue with it. Especially if this work is scheduled for 1.12 |
I'm late to the discussion, so excuse me if I missed details in the long thread of discussion. The issue seems like a generic problem with switching to using CRI to replace docker CLI in kubeadm. I think we can add a general kubeadm test to cover this code path by configuring criSocket to
I'm guessing the "0 upstream CI-signal" refers to the kubeadm integration testing specifically... I am not aware of any presubmit, blocking kubeadm test. If there is one (or sig-cluster-lifecycle wants to add one), we should make sure it has sufficient test coverage for generic CRI runtimes. A little bit of clarification on CRI testing overall: There are no classes for CRI implementations today. Each CRI is implementation is maintained independently. sig-node has been working on defining a better testing policy to track conformance & features for these runtimes/OSs. Some runtimes also run more tests than the minimal requirements (available in the sig-node test dashboard). |
So, where we are with this issue? @NeilW Can you check if current master works for you? There were quite a bit of changes in this area recently, so this issue can be already fixed. |
Using 12.0-alpha1 kubeadm Still have to specify the socket to use to the reset command
Without it, the command uses docker Reset doesn't remove the containers at all now. Doesn't even try. Additionally reset doesn't seem to put the kubelet back into a condition where it can be bootstrapped again. Running init after a reset gives
|
That's after removing the pods by hand using |
@NeilW thank you for testing it. That's very strange that |
Reconfirmed the original fault this morning with 1.11.0.
Version details
Crictl output after reset
|
Running again gets rid of some more containers
Repeated re-running doesn't get rid of any more by the look of it. |
@NeilW Very interesting! Looks like kubeadm either can't get list of running pods or crictl doesn't return error when it fails to remove pods. Can you show the output of |
Sure
|
The normal view is
|
There's a magic glyph in the pods filters for both docker and crictl That looks out of sync with the way the creation side names and stamps pods. N.B. crictl has a '--namespace ' filter now, which may allow a bit of refactoring here.
(Getting 'kube-system' from whatever universal constant name holds the system namespace name) |
The same magic name glyph is on the docker side as well |
@NeilW Thank you for pointing that out! I've submitted PR to fix that.
It works for docker, but doesn't work for CRI as CRI pod names don't start with k8s_:
|
Current implementation of this API always returns checks output of 'crictl pods -q' and filters out everything that doesn't start with k8s_. 'crictl pods -q' returns only pod ids, so everything is always filtered out. Removing filtering by name prefix should fix this. Fixes: kubernetes/kubeadm#926
…eContainers Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubeadm: fix CRI ListKubeContainers API **What this PR does / why we need it**: Current implementation of this API always returns checks output of 'crictl pods -q' and filters out everything that doesn't start with k8s_. 'crictl pods -q' returns only pod ids, so everything is always filtered out. Removing filtering by name prefix should fix this. **Which issue this PR fixes** Fixes: kubernetes/kubeadm#926 **Release note**: ```release-note NONE ```
BUG REPORT
Versions
kubeadm version (use
kubeadm version
):kubeadm version: &version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0-beta.2", GitCommit:"be2cfcf9e44b5162a294e977329d6c8194748c4e", GitTreeState:"clean", BuildDate:"2018-06-07T16:19:15Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Environment:
kubectl version
):Client Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0-beta.2", GitCommit:"be2cfcf9e44b5162a294e977329d6c8194748c4e", GitTreeState:"clean", BuildDate:"2018-06-07T16:21:58Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0-beta.2", GitCommit:"be2cfcf9e44b5162a294e977329d6c8194748c4e", GitTreeState:"clean", BuildDate:"2018-06-07T16:13:01Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Brightbox
Ubuntu 18.04 LTS
uname -a
):Linux srv-4pp5n 4.15.0-23-generic Updating kubeadm manifests #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
containerd github.com/containerd/containerd v1.1.1-rc.0 395068d2b7256518259816ae19e45824b15da071
What happened?
The containerd logs show the request for some odd named pods
What you expected to happen?
kubeadm reset ought to read the config file initially used with init (so pick up the cri-socket) and then correctly remove the pods
How to reproduce it (as minimally and precisely as possible)?
kubeadm init --config=kubeadm.conf
withthen
kubeadm reset --cri-socket=/var/run/containerd/containerd.sock
Anything else we need to know?
If you run kubeadm reset without the cri-socket it still tries to talk to docker and misses the containers completely. Isn't the choice to use containerd in the k8s databases?
The text was updated successfully, but these errors were encountered: