-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"etcdctl auth enable" command breaks Kubernetes cluster #8458
Comments
Hi @struz , thanks for reporting the problem.
This opinion seems to be reasonable for me. Adding other special role for the maintenance purpose will be useful for avoiding having multiple user accounts or certs in the client side. How do you think? @xiang90 @heyitsanthony If it is ok for other etcd design policies, I'll solve it. |
@mitake special roles probably won't be flexible enough. Could roles encode the rpc permissions? Something like:
The service permission for |
When you want to do application-driven compaction, you almost assume that the application actually owns etcd. That is true for kubernetes. We suggest people to view k8s as a owner of the etcd cluster. |
/cc @jpbetz |
I believe @xiang90 is correct. For additional details on the kubernetes side, please direct questions to sig-auth: https://github.com/kubernetes/community/tree/master/sig-auth |
@heyitsanthony @xiang90 @jpbetz thanks for your comments. It seems that more consideration is required before working on. @struz could you share more detail about your use case? Do you have to store information in your k8s' etcd? Is having multiple etcd clusters not acceptable? |
@jpbetz I'm not sure why directing questions at the sig-auth would be relevant here. This is an inability of the app to do something. It either can or can't be done with the auth configuration inside etcd; Kubernetes itself has nothing to do with it, and was merely affected by it. @mitake sure thing. We could run a separate etcd cluster for the Calico data, but the overhead in monitoring and maintaining a separate etcd cluster is not one we want to go with as a default option. Also, because of the nature of running some untrusted code, we would prefer not to have our k8s nodes with a certificate that has root access to the etcd node so that we mitigate risk in the rare case of container breakouts. To play devil's advocate to myself, no pods should ever be running on the nodes with this certificate anyway (k8s apiserver nodes), but it seems to undermine the overall effect of auth to not be able to secure an etcd cluster like this. Given that we can lock the apiserver down to write only in '/registry/*' keys, the fact that this also breaks the cluster via breaking auto compaction is not ideal. Overall, it feels wrong to me to require a client of a database to run the compaction itself. This is like asking a Java app to run it's own garbage collection, or a postgres user to vacuum its own database. These might happen in specific circumstances to cope with specific load patterns in the apps but it should not be the norm, and in both cases there are ways to use automatic cleanup. The automatic compaction built into etcd is insufficient because a) it can only run hourly, and b) compaction requests from a non-root source are not possible due to auth. To draw another parallel with postgres this would be like needing to give all users of your database administrator access to allow them to save you, the administrator, space and prevent the cluster eventually falling over. Of course, I am not an expert on the etcd architecture and I may be wrong with these comparisons. If so I am keen to hear about why things were done in a certain way for etcd. At the very least the fact that the recommendation is to give the root certificate to an owning application, e.g. Kubernetes, should be clearly documented somewhere. This could be in etcd docs, k8s docs, or both. I did not once encounter anything that mentioned this, and discovering that auto-compaction was even a thing in k8s was only done post-breakage of the cluster. To fix this problem I think that either auto compaction needs to become much more granular and configurable, auth to the cluster needs to be more granular, or even both. |
@struz thanks for sharing the detail. I understand the motivation. How about disabling compaction requests from apiserver and issuing the requests from external management stuff (e.g. cron job of |
@mitake yes, we ended up making a cron job (although unfortunately making sure this cron-job only runs once in a HA system is a bit harder than we would like) but it works just fine. Thanks for raising that PR. |
@struz I added a new option |
I think there's some confusion here over what this ticket is about. (I work in the same team as @struz, for explanation). We understand that the Kubernetes apiserver manages the compaction - that is not the problem. The service that we're using allows us to perform the same compaction from the etcd node, so that we don't have to grant root access to our single persistent datastore across the network. I'd like to hear from @xiang90 and @jpbetz - Is the decision to require root access and root access alone for common database maintenance operations an architectural decision or a resourcing one? If it's an architectural one, I disagree, but it's your call to make. If it's a resourcing decision, that's fine, but right now, reading this ticket, it seems like the only answer we have is "kubernetes should manage the compaction" - which doesn't feel like answering the actual question we're asking. To be clear - here is what I would like to know:
I understand that the consumer of the database (in this case Kubernetes) is the one that should manage the compaction. But, I'd like the tools to be able to do it securely. Regardless, I think documentation updates to the authentication and maintenance docs would be helpful to highlight that compaction is only possible with the root user/role right now. Having the process be "I'm going to enable authentication for etcd. Hmm, why is my kubernetes cluster now dying?" is not good for anyone - we need to make this information more discoverable. I'm having a look through the documentation now and will prepare a PR. |
@youngnick authorizing RPCs like compaction with non-root roles would be fine to have, but it's not urgent |
What I said is that Kubernetes assumes it owns etcd as it is today. And I wanted to understand why you want to enable Auth when using etcd with Kubernetes. If there is a valid reason, then we can prioritize this issue. Neither @struz or you provided the answer. If you want a feature, it is better to tell us about the general use case and why it is important instead of telling us |
Thanks @xiang90 - that wasn't clear from your answer before. The proximal reason that we enabled auth right now is that we wanted to be able to only have one etcd cluster to store the main Kubernetes bits, and also the the Calico Network Policy bits (which need to be accessible from all Nodes, not just the ones running the apiserver and controller-manager). I understand this is possibly inadvisable, but we were willing to try it out. (Calico uses the v2 backend, Kubernetes the v3, so it kind of works.) However, the secondary, and more long term reason for enabling auth is that I believe auth should be enabled everywhere. I think that granting a specific set of privileges rather than root is better security practice and leads to less reliability problems in the long term. I agree that Kubernetes does assume that it runs with root on etcd right now. As I just said though, I don't think that's desirable in the longer term . I had hoped that, by starting the process of teasing out the bits that require root authentication right now, we could begin the process of making Kubernetes not require root, and to document the permissions it does require. I have no objection whatsoever to you saying "this is a problem, but we cannot prioritise it right now". That's completely acceptable, I acknowledge that we are running in an unusual configuration right now. However, getting this onto your backlog now will mean that it can be done sooner rather than later, when you have the resources available. I'm not in any way trying to throw stones at you about it. I just want some acknowledgement that this is an issue that will need to be looked at at some point. Alternatively, if you believe that remote consumers of the database will always be the owner and require root access for maintenance tasks, that's your call. I just need to know if that is the case so I can prioritise downstream resources for us. |
Thanks for the explanation. I agree with what you said above. From my understanding this is super urgent for you either. @mitake and @heyitsanthony already have this on their list. We will prioritize this accordingly. |
Yes, this is not urgent for us right now, we have a workaround. I'll put this on our 'tickets to watch' list and we can all move on until the work can be prioritised. Thanks! |
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add an option for turning on/off compaction from apiserver in etcd3 mode …erver **What this PR does / why we need it**: This commit adds an option for controlling request of compaction to etcd3 from apiserver. There is a situation that apiserver cannot fully own its etcd cluster (e.g. sharing it with canal). In such a case, apiserver should have limited access in terms of etcd's auth functionality so it don't have a privilege to issue compaction requests. It means that the compaction requests should be issued by other component and apiserver's compaction requests are needless. For such use cases, this commit adds a new flag storagebackend.Config.DoCompaction. If the flag is true (default), apiserver issues the compaction requests like current behaviour. If it is false, apiserver doesn't issue the requests. **Related issue (etcd)** etcd-io/etcd#8458 /cc @xiang90 @struz **Release note:** ```release-note Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver. ```
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add an option for turning on/off compaction from apiserver in etcd3 mode …erver **What this PR does / why we need it**: This commit adds an option for controlling request of compaction to etcd3 from apiserver. There is a situation that apiserver cannot fully own its etcd cluster (e.g. sharing it with canal). In such a case, apiserver should have limited access in terms of etcd's auth functionality so it don't have a privilege to issue compaction requests. It means that the compaction requests should be issued by other component and apiserver's compaction requests are needless. For such use cases, this commit adds a new flag storagebackend.Config.DoCompaction. If the flag is true (default), apiserver issues the compaction requests like current behaviour. If it is false, apiserver doesn't issue the requests. **Related issue (etcd)** etcd-io/etcd#8458 /cc @xiang90 @struz **Release note:** ```release-note Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver. ``` Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add an option for turning on/off compaction from apiserver in etcd3 mode …erver **What this PR does / why we need it**: This commit adds an option for controlling request of compaction to etcd3 from apiserver. There is a situation that apiserver cannot fully own its etcd cluster (e.g. sharing it with canal). In such a case, apiserver should have limited access in terms of etcd's auth functionality so it don't have a privilege to issue compaction requests. It means that the compaction requests should be issued by other component and apiserver's compaction requests are needless. For such use cases, this commit adds a new flag storagebackend.Config.DoCompaction. If the flag is true (default), apiserver issues the compaction requests like current behaviour. If it is false, apiserver doesn't issue the requests. **Related issue (etcd)** etcd-io/etcd#8458 /cc @xiang90 @struz **Release note:** ```release-note Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver. ``` Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add an option for turning on/off compaction from apiserver in etcd3 mode …erver **What this PR does / why we need it**: This commit adds an option for controlling request of compaction to etcd3 from apiserver. There is a situation that apiserver cannot fully own its etcd cluster (e.g. sharing it with canal). In such a case, apiserver should have limited access in terms of etcd's auth functionality so it don't have a privilege to issue compaction requests. It means that the compaction requests should be issued by other component and apiserver's compaction requests are needless. For such use cases, this commit adds a new flag storagebackend.Config.DoCompaction. If the flag is true (default), apiserver issues the compaction requests like current behaviour. If it is false, apiserver doesn't issue the requests. **Related issue (etcd)** etcd-io/etcd#8458 /cc @xiang90 @struz **Release note:** ```release-note Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver. ``` Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add an option for turning on/off compaction from apiserver in etcd3 mode …erver **What this PR does / why we need it**: This commit adds an option for controlling request of compaction to etcd3 from apiserver. There is a situation that apiserver cannot fully own its etcd cluster (e.g. sharing it with canal). In such a case, apiserver should have limited access in terms of etcd's auth functionality so it don't have a privilege to issue compaction requests. It means that the compaction requests should be issued by other component and apiserver's compaction requests are needless. For such use cases, this commit adds a new flag storagebackend.Config.DoCompaction. If the flag is true (default), apiserver issues the compaction requests like current behaviour. If it is false, apiserver doesn't issue the requests. **Related issue (etcd)** etcd-io/etcd#8458 /cc @xiang90 @struz **Release note:** ```release-note Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver. ``` Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7
Hey @gyuho or @xiang90, looks like the original cause of this ticket is even less urgent now - with the addition of the revision-based auto-compaction in 3.3 (from #8098, it seems), once Kubernetes supports 3.3, we'll just be able to have the Kubernetes docs updated to say, if you turn on etcd auth, turn on these etcd options. That is, our original request to have a role available that can do compaction across the network is not required, as we (or anyone else) can now trigger compactions on a desired interval using etcd itself. I'll leave it to you to determine if you want to do anything further here, it's probably good to have more finely-grained control of the permissions, but for our use case, is no longer required. |
We just ran into this issue. |
This commit adds new etcd client certificates to be generated in testing environments, as with etcd RBAC enabled, each certificate's CN will represent a separate user. The intention is to have following users: - root - fully-privileged user for administrative actions - kube-apiserver - user dedicated for kube-apiserver, also fully privileged for the time being because of etcd-io/etcd#8458. - prometheus - user for Prometheus to scrape etcd metrics For local-testing, we also add rendering of few scripts, which can be used for testing with etcdctl and to manually enable RBAC on etcd cluster. Signed-off-by: Mateusz Gozdek <[email protected]>
Change the metrics port of etcd from `https` to `http` because: - When you keep metrics port on https you need certificate to scrape that endpoint. You can't simply skip the TLS check and expect to get the data, a client cert is needed. - Providing the apiserver client cert to prometheus operator is counter productive to security. So it is not a very viable option. Because this cert has root permissions on the etcd cluster. - We can create another user that has permissions to scrape metrics endpoint only, but it is not trivial. See the upstream issue which mentions how cert access etcd is either access to everything or nothing. Issue: etcd-io/etcd#8458. Signed-off-by: Suraj Deshmukh <[email protected]>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Problem
Enabling auth on etcd v3 when it is used by any Kubernetes cluster will force the cluster into read only mode after less than a few days.
The database slowly (or rapidly on a high load cluster) uses more and more memory for its database due to extra revisions until it maxes out and goes into read-only mode.
This is because the Kubernetes API server no longer has access to perform compaction every five minutes (see here for the source). Enabling etcd auth will restrict all administration functions to users granted the "root" role only.
Reproduction
System info
etcd version:
Kubernetes version: 1.7.2
Reproduction steps
No error message is logged on the etcd server side when this access is denied.
Resolution
It seems to us that auth is fundamentally broken right now when used in conjunction with Kubernetes.
Not being able to delegate some of the less impactful administration commands means that there is an all-or-nothing approach to auth. You either have the root account/role and can modify the entire auth system, or you don't and can't even compact the database. Obviously giving the root cert out to anything but the etcd nodes themselves would be silly from a security perspective since then it is possible for other places to reconfigure your authn/z security.
The best fix for this, in my opinion, is to allow modular access privileges to be granted to different administration commands. Specifically defrag and compaction should be able to be delegated to other roles in the auth system.
The text was updated successfully, but these errors were encountered: