-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-dns-anti-affinity: kube-dns never-co-located-in-the-same-node #52193
kube-dns-anti-affinity: kube-dns never-co-located-in-the-same-node #52193
Conversation
Hi @StevenACoffman. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Adding do-not-merge/release-note-label-needed because the release note process has not been followed. |
/ok-to-test |
/retest |
Not sure why this would have caused the test failure... perhaps in the test there's a single node? |
@justinsb @chrislovecnm I'm not familiar enough with the test suites to understand why this change would cause these errors:
|
@csbell @kubernetes/sig-federation-misc for federation e2e failures |
@@ -44,7 +44,38 @@ spec: | |||
k8s-app: kube-dns | |||
annotations: | |||
scheduler.alpha.kubernetes.io/critical-pod: '' | |||
# For 1.6, we keep the old tolerations in case of a downgrade to 1.5 | |||
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]' | |||
# For 1.6, we keep the old affinity annotation in case of a downgrade to 1.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might not need these alpha annotations anymore. This PR will likely go in for 1.9, and 1.5 is already 4 major versions away.
The main question that was raised on the PR into kops was if we use required or perferred. @thockin et al any thoughts? |
a88c158
to
adee6fa
Compare
Any updates on this? We also just lost cluster dns service as all dns pods were located on the same node. Anything I could help? |
There is a discussion on the kops repo about using "preferred" vs "required". kubernetes/kops#2705 It seems to me we should go with "preferred" for now as it is strictly better than what we have before and does not result in kube-dns instances not being scheduled due to constraints. If the PR is updated to preferred, I can lgtm. |
adee6fa
to
ad586e1
Compare
@bowei Done! |
/lgtm |
29b7a4c
to
e6540d4
Compare
Thanks! Could you update release note as well? Ref PULL_REQUEST_TEMPLATE. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bowei, MrHohn, StevenACoffman Associated issue: 2705 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 53106, 52193, 51250, 52449, 53861). If you want to cherry-pick this change to another branch, please follow the instructions here. |
YAY! Thanks! I just noticed that one of our clusters was running both its instances of kube-dns on a single machine today, then found this issue while searching around for bugs or mentions. I was going to attempt this, but this is even better! |
…affinity Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Revert "kube-dns-anti-affinity: kube-dns never-co-located-in-the-same-node" Reverts #52193 As this has slowed down scheduling of kube-dns pods significantly (fixes #54164) /cc @bowei @MrHohn @StevenACoffman
This got reverted now, but please do keep kubeadm in sync at all times. The code lives in |
I just opened a ticket on this very issue (in the wrong project unfortunately), as result of the described behavior causing problems in one of our production clusters. We have three small control nodes which service critical infrastructure pods, and a slew of pre-emptive nodes (on GKE). Without this patch we end up with poor scheduling decisions, where the scheduler stacks kube-dns pods on top of each other on the same nodes, causing our own workload pods to fail scheduling due to kube-dns having stolen the Allocatable resources. We had hoped to be able to use the merged patch with node anti-affinity, to keep kube-dns away from the preemptive worker nodes, to avoid the problem faced and described in #41125 (spotty DNS service). Without this patch we'll end up with dozens of kube-dns pods (created by kube-dns-autoscaler), all stacked onto 3 x 2 core machines, stealing all allocatable CPU and preventing our own critical backplane pods from being scheduled. Right now it seems our only option is to buy more machines for the sole purpose of hosting kube-dns. The pod(s) must run somewhere in the cluster, but we don't want them on pre-emptive nodes, and we don't want more than at most one on each control node. Since the reverted merge prevents us from telling the scheduler "not more than one per node!", it seems to me we'll need a new set of machines dedicated to kube-dns hosting. This will increase our cost and there will be more moving parts to manage (additional node group). |
Unfortunately the scheduling feature (anti-affinity) does not currently scale very well, it cannot be used for a system service such as this. When the scheduling team improves the performance of the feature, we can re-examine the scheduler constraint... |
Bowei, do you have a proposal / suggestion for how to cope with kubedns until such a fix is out? Edit: If we had self-hosted k8s, we could of course have disabled the kube-dns autoscaler and set the deployment replica count manually, and revised the deployment to enable anti-affinity ourselves. Unfortunately we're using google and they have some reconciliation process that tend to overwrite / revert customer made changes in kube-system. |
It possible to run a customized version of kube-dns on the cluster: NOTE: after following these steps, you will be responsible for maintaining the configuration of the DNS system. Changes can always be reverted to return to the original configuration.
|
Excellent. Just one question. |
Yes, anything related to the addon manager will need to be removed. |
Perfect, thanks! |
@bowei Is there an issue tracking the anti-affinity performance/scaling problem we can link to this PR to remind us to revisit it? |
I believe that issue is #54189. |
What this PR does / why we need it:
This is upstreaming the kubernetes/kops#2705 pull request by @jamesbucher that was originally against kops.
Please see kubernetes/kops#2705 for more details, including a lengthy discussion.
Briefly, given the constraints of how the system works today:
preferredDuringScheduling IgnoredDuringExecution makes sense because it will allow the DNS pods to schedule even if they can't be spread across nodes
Which issue this PR fixes
fixes kubernetes/kops#2693
Release note: