Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"etcdctl auth enable" command breaks Kubernetes cluster #8458

Closed
struz opened this issue Aug 28, 2017 · 19 comments
Closed

"etcdctl auth enable" command breaks Kubernetes cluster #8458

struz opened this issue Aug 28, 2017 · 19 comments

Comments

@struz
Copy link

struz commented Aug 28, 2017

Problem

Enabling auth on etcd v3 when it is used by any Kubernetes cluster will force the cluster into read only mode after less than a few days.

The database slowly (or rapidly on a high load cluster) uses more and more memory for its database due to extra revisions until it maxes out and goes into read-only mode.

This is because the Kubernetes API server no longer has access to perform compaction every five minutes (see here for the source). Enabling etcd auth will restrict all administration functions to users granted the "root" role only.

Reproduction

System info

etcd version:

etcd-0 ~ # etcd --version
etcd Version: 3.2.1
Git SHA: 61fc123
Go Version: go1.8.3
Go OS/Arch: linux/amd64

Kubernetes version: 1.7.2

Reproduction steps

  1. Create a certificate etcd-server where CN=etcd-server. This is used to allow Kubernetes to auth with the etcd cluster
  2. Run etcd with the following parameters
Environment="ETCD_ADVERTISE_CLIENT_URLS=https://etcd-0.<domain>:2379"
Environment="ETCD_CERT_FILE=/etc/ssl/etcd/etcd-server.pem"
Environment="ETCD_CLIENT_CERT_AUTH=true"
Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=https://etcd-0.<url>:2380"
Environment="ETCD_INITIAL_CLUSTER=etcd-0.<domain>=https://etcd-0.<domain>:2380,etcd-1.<domain>=https://etcd-1.<domain>:2380,etcd-2.<domain>=https://etcd-2.<domain>:2380"
Environment="ETCD_KEY_FILE=/etc/ssl/etcd/etcd-server-key.pem"
Environment="ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379"
Environment="ETCD_LISTEN_PEER_URLS=https://<ip_address>:2380"
Environment="ETCD_NAME=etcd-0.<domain>"
# Environment="ETCD_DATA_DIR=/media/data"
Environment="ETCD_PEER_CERT_FILE=/etc/ssl/etcd/etcd-peer.pem"
Environment="ETCD_PEER_KEY_FILE=/etc/ssl/etcd/etcd-peer-key.pem"
Environment="ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ca.pem"
Environment="ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ca.pem"
Environment="ETCD_METRICS=extensive"
  1. Enable auth on the cluster with the following commands
$ etcdctl role add apiserver
$ etcdctl role grant-permission apiserver --prefix=true readwrite /registry/
$ etcdctl user add etcd-client:password
$ etcdctl user grant-role etcd-client apiserver
$ etcdctl auth enable
  1. Set up Kubernetes (we used v1.7.2) using the etcd cluster as a backend
  2. Kubernetes can now read and write everything in its keyspace fine but it will not be able to run compaction commands. The error provided in the Kube API server logs is:
kube-system/<hostname>[kube-apiserver]: E0822 03:53:13.000640       1 compact.go:123] etcd: endpoint ([https://etcd-0.<domain>:2379 https://etcd-1.<domain>:2379 https://etcd-2.<domain>:2379]) compact failed: etcdserver: permission denied

No error message is logged on the etcd server side when this access is denied.

Resolution

It seems to us that auth is fundamentally broken right now when used in conjunction with Kubernetes.

Not being able to delegate some of the less impactful administration commands means that there is an all-or-nothing approach to auth. You either have the root account/role and can modify the entire auth system, or you don't and can't even compact the database. Obviously giving the root cert out to anything but the etcd nodes themselves would be silly from a security perspective since then it is possible for other places to reconfigure your authn/z security.

The best fix for this, in my opinion, is to allow modular access privileges to be granted to different administration commands. Specifically defrag and compaction should be able to be delegated to other roles in the auth system.

@struz struz changed the title etcdctl auth enable command breaks Kubernetes cluster "etcdctl auth enable" command breaks Kubernetes cluster Aug 28, 2017
@mitake
Copy link
Contributor

mitake commented Aug 29, 2017

Hi @struz , thanks for reporting the problem.

The best fix for this, in my opinion, is to allow modular access privileges to be granted to different administration commands. Specifically defrag and compaction should be able to be delegated to other roles in the auth system.

This opinion seems to be reasonable for me. Adding other special role for the maintenance purpose will be useful for avoiding having multiple user accounts or certs in the client side. How do you think? @xiang90 @heyitsanthony If it is ok for other etcd design policies, I'll solve it.

@heyitsanthony
Copy link
Contributor

@mitake special roles probably won't be flexible enough. Could roles encode the rpc permissions? Something like:

message ServicePermission {
  string service = 1;
  string procedure = 2;
}

message Role {
  bytes name = 1;

  repeated Permission keyPermission = 2;
  repeated ServicePermission servicePermission = 3;
}

The service permission for Compact would be {Service: "KV", Procedure: "Compact"}

@xiang90
Copy link
Contributor

xiang90 commented Aug 29, 2017

It seems to us that auth is fundamentally broken right now when used in conjunction with Kubernetes.

When you want to do application-driven compaction, you almost assume that the application actually owns etcd. That is true for kubernetes. We suggest people to view k8s as a owner of the etcd cluster.

@xiang90
Copy link
Contributor

xiang90 commented Aug 29, 2017

/cc @jpbetz

@jpbetz
Copy link
Contributor

jpbetz commented Aug 29, 2017

I believe @xiang90 is correct. For additional details on the kubernetes side, please direct questions to sig-auth: https://github.com/kubernetes/community/tree/master/sig-auth

@mitake
Copy link
Contributor

mitake commented Aug 30, 2017

@heyitsanthony @xiang90 @jpbetz thanks for your comments. It seems that more consideration is required before working on.

@struz could you share more detail about your use case? Do you have to store information in your k8s' etcd? Is having multiple etcd clusters not acceptable?

@struz
Copy link
Author

struz commented Aug 31, 2017

@jpbetz I'm not sure why directing questions at the sig-auth would be relevant here. This is an inability of the app to do something. It either can or can't be done with the auth configuration inside etcd; Kubernetes itself has nothing to do with it, and was merely affected by it.


@mitake sure thing.
Our k8s cluster has to run Calico with the etcdv2 backend to get the level of network access control we need for our environments. Some of our clusters run untrusted code on Pods and so we need the ability to limit what those Pods can talk to at the network level.

We could run a separate etcd cluster for the Calico data, but the overhead in monitoring and maintaining a separate etcd cluster is not one we want to go with as a default option.

Also, because of the nature of running some untrusted code, we would prefer not to have our k8s nodes with a certificate that has root access to the etcd node so that we mitigate risk in the rare case of container breakouts.

To play devil's advocate to myself, no pods should ever be running on the nodes with this certificate anyway (k8s apiserver nodes), but it seems to undermine the overall effect of auth to not be able to secure an etcd cluster like this. Given that we can lock the apiserver down to write only in '/registry/*' keys, the fact that this also breaks the cluster via breaking auto compaction is not ideal.


Overall, it feels wrong to me to require a client of a database to run the compaction itself. This is like asking a Java app to run it's own garbage collection, or a postgres user to vacuum its own database. These might happen in specific circumstances to cope with specific load patterns in the apps but it should not be the norm, and in both cases there are ways to use automatic cleanup. The automatic compaction built into etcd is insufficient because a) it can only run hourly, and b) compaction requests from a non-root source are not possible due to auth.

To draw another parallel with postgres this would be like needing to give all users of your database administrator access to allow them to save you, the administrator, space and prevent the cluster eventually falling over. Of course, I am not an expert on the etcd architecture and I may be wrong with these comparisons. If so I am keen to hear about why things were done in a certain way for etcd.

At the very least the fact that the recommendation is to give the root certificate to an owning application, e.g. Kubernetes, should be clearly documented somewhere. This could be in etcd docs, k8s docs, or both. I did not once encounter anything that mentioned this, and discovering that auto-compaction was even a thing in k8s was only done post-breakage of the cluster.

To fix this problem I think that either auto compaction needs to become much more granular and configurable, auth to the cluster needs to be more granular, or even both.

@mitake
Copy link
Contributor

mitake commented Sep 1, 2017

@struz thanks for sharing the detail. I understand the motivation. How about disabling compaction requests from apiserver and issuing the requests from external management stuff (e.g. cron job of etcdctl with root privilege)? I created an experimental change of k8s for disabling the compaction here: kubernetes/kubernetes#51765

@struz
Copy link
Author

struz commented Sep 4, 2017

@mitake yes, we ended up making a cron job (although unfortunately making sure this cron-job only runs once in a HA system is a bit harder than we would like) but it works just fine. Thanks for raising that PR.

@mitake
Copy link
Contributor

mitake commented Sep 15, 2017

@struz I added a new option --etcd-compaction-interval to kube-apiserver side: kubernetes/kubernetes#51765 . If you pass an argument 0 with the option, apiserver won't issue compaction requests. The PR is already approved so it will be merged in the near future. Could you try it?

@youngnick
Copy link

youngnick commented Sep 17, 2017

I think there's some confusion here over what this ticket is about. (I work in the same team as @struz, for explanation). We understand that the Kubernetes apiserver manages the compaction - that is not the problem. The service that we're using allows us to perform the same compaction from the etcd node, so that we don't have to grant root access to our single persistent datastore across the network.

I'd like to hear from @xiang90 and @jpbetz - Is the decision to require root access and root access alone for common database maintenance operations an architectural decision or a resourcing one? If it's an architectural one, I disagree, but it's your call to make. If it's a resourcing decision, that's fine, but right now, reading this ticket, it seems like the only answer we have is "kubernetes should manage the compaction" - which doesn't feel like answering the actual question we're asking.

To be clear - here is what I would like to know:

  • Is the idea of creating an assignable role that can trigger a compaction (and only a compaction) a non-starter? As in, something that the project is architecturally opposed to? If so, why?

I understand that the consumer of the database (in this case Kubernetes) is the one that should manage the compaction. But, I'd like the tools to be able to do it securely.

Regardless, I think documentation updates to the authentication and maintenance docs would be helpful to highlight that compaction is only possible with the root user/role right now. Having the process be "I'm going to enable authentication for etcd. Hmm, why is my kubernetes cluster now dying?" is not good for anyone - we need to make this information more discoverable. I'm having a look through the documentation now and will prepare a PR.

@heyitsanthony
Copy link
Contributor

@youngnick authorizing RPCs like compaction with non-root roles would be fine to have, but it's not urgent

@xiang90
Copy link
Contributor

xiang90 commented Sep 18, 2017

@youngnick

What I said is that Kubernetes assumes it owns etcd as it is today. And I wanted to understand why you want to enable Auth when using etcd with Kubernetes. If there is a valid reason, then we can prioritize this issue. Neither @struz or you provided the answer. If you want a feature, it is better to tell us about the general use case and why it is important instead of telling us x is fundamentally broken for you or your company.

@youngnick
Copy link

Thanks @xiang90 - that wasn't clear from your answer before.

The proximal reason that we enabled auth right now is that we wanted to be able to only have one etcd cluster to store the main Kubernetes bits, and also the the Calico Network Policy bits (which need to be accessible from all Nodes, not just the ones running the apiserver and controller-manager). I understand this is possibly inadvisable, but we were willing to try it out. (Calico uses the v2 backend, Kubernetes the v3, so it kind of works.)

However, the secondary, and more long term reason for enabling auth is that I believe auth should be enabled everywhere. I think that granting a specific set of privileges rather than root is better security practice and leads to less reliability problems in the long term.

I agree that Kubernetes does assume that it runs with root on etcd right now. As I just said though, I don't think that's desirable in the longer term . I had hoped that, by starting the process of teasing out the bits that require root authentication right now, we could begin the process of making Kubernetes not require root, and to document the permissions it does require.

I have no objection whatsoever to you saying "this is a problem, but we cannot prioritise it right now". That's completely acceptable, I acknowledge that we are running in an unusual configuration right now. However, getting this onto your backlog now will mean that it can be done sooner rather than later, when you have the resources available. I'm not in any way trying to throw stones at you about it. I just want some acknowledgement that this is an issue that will need to be looked at at some point.

Alternatively, if you believe that remote consumers of the database will always be the owner and require root access for maintenance tasks, that's your call. I just need to know if that is the case so I can prioritise downstream resources for us.

@xiang90
Copy link
Contributor

xiang90 commented Sep 18, 2017

@youngnick

Thanks for the explanation. I agree with what you said above.

From my understanding this is super urgent for you either. @mitake and @heyitsanthony already have this on their list. We will prioritize this accordingly.

@youngnick
Copy link

Yes, this is not urgent for us right now, we have a workaround.

I'll put this on our 'tickets to watch' list and we can all move on until the work can be prioritised. Thanks!

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Oct 3, 2017
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add an option for turning on/off compaction from apiserver in etcd3 mode

…erver

**What this PR does / why we need it**:

This commit adds an option for controlling request of compaction to
etcd3 from apiserver. There is a situation that apiserver cannot fully
own its etcd cluster (e.g. sharing it with canal). In such a case,
apiserver should have limited access in terms of etcd's auth
functionality so it don't have a privilege to issue compaction
requests. It means that the compaction requests should be issued by
other component and apiserver's compaction requests are needless.

For such use cases, this commit adds a new flag
storagebackend.Config.DoCompaction. If the flag is true (default),
apiserver issues the compaction requests like current behaviour. If it
is false, apiserver doesn't issue the requests.

**Related issue (etcd)**
etcd-io/etcd#8458
/cc @xiang90 @struz

**Release note:**
```release-note
Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver.
```
sttts pushed a commit to sttts/apiserver that referenced this issue Oct 4, 2017
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add an option for turning on/off compaction from apiserver in etcd3 mode

…erver

**What this PR does / why we need it**:

This commit adds an option for controlling request of compaction to
etcd3 from apiserver. There is a situation that apiserver cannot fully
own its etcd cluster (e.g. sharing it with canal). In such a case,
apiserver should have limited access in terms of etcd's auth
functionality so it don't have a privilege to issue compaction
requests. It means that the compaction requests should be issued by
other component and apiserver's compaction requests are needless.

For such use cases, this commit adds a new flag
storagebackend.Config.DoCompaction. If the flag is true (default),
apiserver issues the compaction requests like current behaviour. If it
is false, apiserver doesn't issue the requests.

**Related issue (etcd)**
etcd-io/etcd#8458
/cc @xiang90 @struz

**Release note:**
```release-note
Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver.
```

Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7
sttts pushed a commit to sttts/apiserver that referenced this issue Oct 13, 2017
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add an option for turning on/off compaction from apiserver in etcd3 mode

…erver

**What this PR does / why we need it**:

This commit adds an option for controlling request of compaction to
etcd3 from apiserver. There is a situation that apiserver cannot fully
own its etcd cluster (e.g. sharing it with canal). In such a case,
apiserver should have limited access in terms of etcd's auth
functionality so it don't have a privilege to issue compaction
requests. It means that the compaction requests should be issued by
other component and apiserver's compaction requests are needless.

For such use cases, this commit adds a new flag
storagebackend.Config.DoCompaction. If the flag is true (default),
apiserver issues the compaction requests like current behaviour. If it
is false, apiserver doesn't issue the requests.

**Related issue (etcd)**
etcd-io/etcd#8458
/cc @xiang90 @struz

**Release note:**
```release-note
Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver.
```

Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7
sttts pushed a commit to sttts/apiserver that referenced this issue Oct 14, 2017
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add an option for turning on/off compaction from apiserver in etcd3 mode

…erver

**What this PR does / why we need it**:

This commit adds an option for controlling request of compaction to
etcd3 from apiserver. There is a situation that apiserver cannot fully
own its etcd cluster (e.g. sharing it with canal). In such a case,
apiserver should have limited access in terms of etcd's auth
functionality so it don't have a privilege to issue compaction
requests. It means that the compaction requests should be issued by
other component and apiserver's compaction requests are needless.

For such use cases, this commit adds a new flag
storagebackend.Config.DoCompaction. If the flag is true (default),
apiserver issues the compaction requests like current behaviour. If it
is false, apiserver doesn't issue the requests.

**Related issue (etcd)**
etcd-io/etcd#8458
/cc @xiang90 @struz

**Release note:**
```release-note
Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver.
```

Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7
sttts pushed a commit to sttts/apiserver that referenced this issue Oct 16, 2017
Automatic merge from submit-queue (batch tested with PRs 51765, 53053, 52771, 52860, 53284). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add an option for turning on/off compaction from apiserver in etcd3 mode

…erver

**What this PR does / why we need it**:

This commit adds an option for controlling request of compaction to
etcd3 from apiserver. There is a situation that apiserver cannot fully
own its etcd cluster (e.g. sharing it with canal). In such a case,
apiserver should have limited access in terms of etcd's auth
functionality so it don't have a privilege to issue compaction
requests. It means that the compaction requests should be issued by
other component and apiserver's compaction requests are needless.

For such use cases, this commit adds a new flag
storagebackend.Config.DoCompaction. If the flag is true (default),
apiserver issues the compaction requests like current behaviour. If it
is false, apiserver doesn't issue the requests.

**Related issue (etcd)**
etcd-io/etcd#8458
/cc @xiang90 @struz

**Release note:**
```release-note
Add --etcd-compaction-interval to apiserver for controlling request of compaction to etcd3 from apiserver.
```

Kubernetes-commit: 5dfea9e6091a818f90e8df5afcc750b5f01fa9b7
@youngnick
Copy link

Hey @gyuho or @xiang90, looks like the original cause of this ticket is even less urgent now - with the addition of the revision-based auto-compaction in 3.3 (from #8098, it seems), once Kubernetes supports 3.3, we'll just be able to have the Kubernetes docs updated to say, if you turn on etcd auth, turn on these etcd options.

That is, our original request to have a role available that can do compaction across the network is not required, as we (or anyone else) can now trigger compactions on a desired interval using etcd itself.

I'll leave it to you to determine if you want to do anything further here, it's probably good to have more finely-grained control of the permissions, but for our use case, is no longer required.

@Quentin-M
Copy link
Contributor

We just ran into this issue.

invidian added a commit to invidian/libflexkube that referenced this issue Apr 18, 2020
This commit adds new etcd client certificates to be generated in testing
environments, as with etcd RBAC enabled, each certificate's CN will
represent a separate user. The intention is to have following users:
- root - fully-privileged user for administrative actions
- kube-apiserver - user dedicated for kube-apiserver, also fully
  privileged for the time being because of
  etcd-io/etcd#8458.
- prometheus - user for Prometheus to scrape etcd metrics

For local-testing, we also add rendering of few scripts, which can be
used for testing with etcdctl and to manually enable RBAC on etcd
cluster.

Signed-off-by: Mateusz Gozdek <[email protected]>
surajssd added a commit to kinvolk/lokomotive that referenced this issue May 29, 2020
Change the metrics port of etcd from `https` to `http` because:

- When you keep metrics port on https you need certificate to scrape
that endpoint. You can't simply skip the TLS check and expect to get the
data, a client cert is needed.

- Providing the apiserver client cert to prometheus operator is
counter productive to security. So it is not a very viable option.
Because this cert has root permissions on the etcd cluster.

- We can create another user that has permissions to scrape metrics
endpoint only, but it is not trivial. See the upstream issue which
mentions how cert access etcd is either access to everything or nothing.
Issue: etcd-io/etcd#8458.

Signed-off-by: Suraj Deshmukh <[email protected]>
@stale
Copy link

stale bot commented Sep 21, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 21, 2022
@stale stale bot closed this as completed Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

9 participants