sunset and remove https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/0004-bootstrap-checkpointing.md #1103

timothysc · 2019-06-12T21:14:54Z

During earlier phases of kubeadm's life we wanted to pursue the option of full self hosted control plane. This idea breaks down during certain corner conditions, namely single control plane node cluster on a full DC outage. The idea was explored but eventually abandoned due to security constraints involved in checkpointing, but also b/c other ideas obviated the need to manage k8s in this manner.

/cc @yastij

yastij · 2019-06-12T22:00:51Z

/sig cluster-lifecycle

yastij · 2019-06-13T16:46:32Z

/assign

smarterclayton · 2019-06-13T16:52:38Z

And in the future we might want checkpointing to survive DC outage, but the current design was somewhat weak (since it was limited). This allows us to take more time to come back and do checkpointing in a broader sense without having the urgency to make it work for kubeadm (even though it may be valuable).

kacole2 · 2019-07-09T14:02:25Z

@timothysc @smarterclayton @yastij I'm collecting enhancements to be tracked for the 1.16 release cycle. Can someone shed some light if this is an actual enhancement that needs to be tracked? KEP? etc etc. Thanks!

neolit123 · 2019-07-29T15:15:02Z

@kacole2
hi, no this is more of a proposal to remove an existing KEP.

kfox1111 · 2019-09-03T15:44:42Z

This would be a step back IMO. I've wanted to use it to checkpoint a few things like haproxy or keepalived for providing the load balancing service kubeadm is currently lacking. But unfortunately the issue #72202 is still blocking that. Its a simple change to get things working. Then it can be iterated on. Please don't rip it out because you don't want to use it. Others do though.

fejta-bot · 2019-12-02T15:47:18Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

yastij · 2019-12-02T15:52:05Z

/remove-lifecycle stale

anguslees · 2020-01-08T19:41:26Z

This idea breaks down during certain corner conditions, namely single control plane node cluster on a full DC outage.

Just putting it out there: Shouldn't we instead drop support for single-node control planes?

At least for non-test/toy scenarios. ie: just don't attempt to provide automatic recovery from a full DC outage for single-node clusters. Ditto for any other feature that falls outside a simple "kind" test usecase, like upgrading existing clusters. It seems peculiar to make HA meet the requirements of non-HA, rather than the other way around.

fejta-bot · 2020-04-07T20:26:25Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

palnabarun · 2020-04-09T19:01:51Z

/remove-lifecycle stale

neolit123 · 2020-07-02T17:05:35Z

going back to this topic. overall self-hosting is mostly out of scope for SIG Cluster Lifecycle at this point. one exception is boot-kube, but they have not shown interest in this proposal.

given the feature was not successful we should revert the changes in the kubelet introduced by kubernetes/kubernetes#50984 and let another party take over this topic if they want to.

the discussion here was suggesting that:
kubernetes/kubernetes#72202 (comment)

i have logged this to get help from contributors with the cleanup:
kubernetes/kubernetes#92763

@anguslees

Just putting it out there: Shouldn't we instead drop support for single-node control planes?

statistically, from what i've seen from recent user reports, there are still single-node production clusters out there, due to a variety of factors including budget. i have not seen any requests for checkpointing or self-hosting recently, though.

@kfox1111

I've wanted to use it to checkpoint a few things like haproxy or keepalived for providing the load balancing service kubeadm is currently lacking

kubeadm has no plans to provide a load balancer by default. we do have some updated documentation though:
https://github.com/kubernetes/kubeadm/blob/master/docs/ha-considerations.md

it still feels right to remove the existing partial feature, as is and let another SIG / different contributors take over by writing a new KEP.

neolit123 · 2020-07-02T17:09:51Z

actually....looks like someone sent the PR already as part of the kubelet flag cleanup:
kubernetes/kubernetes#91577

we have too many tickets for this at this point, so i'm going to close this one.

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 12, 2019

timothysc self-assigned this Jun 12, 2019

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 12, 2019

k8s-ci-robot assigned yastij Jun 13, 2019

yastij mentioned this issue Sep 3, 2019

Clear the NodeSelector when loading checkpointed Pods from disk kubernetes/kubernetes#72202

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 2, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 7, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 9, 2020

neolit123 added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Apr 9, 2020

This was referenced Jul 2, 2020

cleanup the code in the kubelet for bootstrap checkpointing kubernetes/kubernetes#92763

Closed

cleanup the code in the kubelet for bootstrap checkpointing kubernetes/kubeadm#2211

Closed

neolit123 closed this as completed Jul 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sunset and remove https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/0004-bootstrap-checkpointing.md #1103

sunset and remove https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/0004-bootstrap-checkpointing.md #1103

timothysc commented Jun 12, 2019

yastij commented Jun 12, 2019

yastij commented Jun 13, 2019

smarterclayton commented Jun 13, 2019

kacole2 commented Jul 9, 2019

neolit123 commented Jul 29, 2019

kfox1111 commented Sep 3, 2019

fejta-bot commented Dec 2, 2019

yastij commented Dec 2, 2019

anguslees commented Jan 8, 2020 •

edited

Loading

fejta-bot commented Apr 7, 2020

palnabarun commented Apr 9, 2020

neolit123 commented Jul 2, 2020 •

edited

Loading

neolit123 commented Jul 2, 2020

sunset and remove https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/0004-bootstrap-checkpointing.md #1103

sunset and remove https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/0004-bootstrap-checkpointing.md #1103

Comments

timothysc commented Jun 12, 2019

yastij commented Jun 12, 2019

yastij commented Jun 13, 2019

smarterclayton commented Jun 13, 2019

kacole2 commented Jul 9, 2019

neolit123 commented Jul 29, 2019

kfox1111 commented Sep 3, 2019

fejta-bot commented Dec 2, 2019

yastij commented Dec 2, 2019

anguslees commented Jan 8, 2020 • edited Loading

fejta-bot commented Apr 7, 2020

palnabarun commented Apr 9, 2020

neolit123 commented Jul 2, 2020 • edited Loading

neolit123 commented Jul 2, 2020

anguslees commented Jan 8, 2020 •

edited

Loading

neolit123 commented Jul 2, 2020 •

edited

Loading