Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sunset and remove https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/0004-bootstrap-checkpointing.md #1103

Closed
timothysc opened this issue Jun 12, 2019 · 13 comments
Assignees
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@timothysc
Copy link
Member

During earlier phases of kubeadm's life we wanted to pursue the option of full self hosted control plane. This idea breaks down during certain corner conditions, namely single control plane node cluster on a full DC outage. The idea was explored but eventually abandoned due to security constraints involved in checkpointing, but also b/c other ideas obviated the need to manage k8s in this manner.

/cc @yastij

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 12, 2019
@timothysc timothysc self-assigned this Jun 12, 2019
@yastij
Copy link
Member

yastij commented Jun 12, 2019

/sig cluster-lifecycle

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 12, 2019
@yastij
Copy link
Member

yastij commented Jun 13, 2019

/assign

@smarterclayton
Copy link
Contributor

And in the future we might want checkpointing to survive DC outage, but the current design was somewhat weak (since it was limited). This allows us to take more time to come back and do checkpointing in a broader sense without having the urgency to make it work for kubeadm (even though it may be valuable).

@kacole2
Copy link

kacole2 commented Jul 9, 2019

@timothysc @smarterclayton @yastij I'm collecting enhancements to be tracked for the 1.16 release cycle. Can someone shed some light if this is an actual enhancement that needs to be tracked? KEP? etc etc. Thanks!

@neolit123
Copy link
Member

@kacole2
hi, no this is more of a proposal to remove an existing KEP.

@kfox1111
Copy link

kfox1111 commented Sep 3, 2019

This would be a step back IMO. I've wanted to use it to checkpoint a few things like haproxy or keepalived for providing the load balancing service kubeadm is currently lacking. But unfortunately the issue #72202 is still blocking that. Its a simple change to get things working. Then it can be iterated on. Please don't rip it out because you don't want to use it. Others do though.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2019
@yastij
Copy link
Member

yastij commented Dec 2, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2019
@anguslees
Copy link
Member

anguslees commented Jan 8, 2020

This idea breaks down during certain corner conditions, namely single control plane node cluster on a full DC outage.

Just putting it out there: Shouldn't we instead drop support for single-node control planes?

At least for non-test/toy scenarios. ie: just don't attempt to provide automatic recovery from a full DC outage for single-node clusters. Ditto for any other feature that falls outside a simple "kind" test usecase, like upgrading existing clusters. It seems peculiar to make HA meet the requirements of non-HA, rather than the other way around.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 7, 2020
@palnabarun
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 9, 2020
@neolit123 neolit123 added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Apr 9, 2020
@neolit123
Copy link
Member

neolit123 commented Jul 2, 2020

going back to this topic. overall self-hosting is mostly out of scope for SIG Cluster Lifecycle at this point. one exception is boot-kube, but they have not shown interest in this proposal.

given the feature was not successful we should revert the changes in the kubelet introduced by kubernetes/kubernetes#50984 and let another party take over this topic if they want to.

the discussion here was suggesting that:
kubernetes/kubernetes#72202 (comment)

i have logged this to get help from contributors with the cleanup:
kubernetes/kubernetes#92763

@anguslees

Just putting it out there: Shouldn't we instead drop support for single-node control planes?

statistically, from what i've seen from recent user reports, there are still single-node production clusters out there, due to a variety of factors including budget. i have not seen any requests for checkpointing or self-hosting recently, though.

@kfox1111

I've wanted to use it to checkpoint a few things like haproxy or keepalived for providing the load balancing service kubeadm is currently lacking

kubeadm has no plans to provide a load balancer by default. we do have some updated documentation though:
https://github.com/kubernetes/kubeadm/blob/master/docs/ha-considerations.md

it still feels right to remove the existing partial feature, as is and let another SIG / different contributors take over by writing a new KEP.

@neolit123
Copy link
Member

actually....looks like someone sent the PR already as part of the kubelet flag cleanup:
kubernetes/kubernetes#91577

we have too many tickets for this at this point, so i'm going to close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

10 participants