-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is workingSetBytes of 0 really an indication of a terminated process? #1330
Comments
This is how working set bytes are calculated: https://github.com/google/cadvisor/blob/fbd519ba03978d54cb54ea7ed8ab9d6e3dd64590/container/libcontainer/handler.go#L831-L844 |
/assign @dgrisonnet Thanks @SergeyKanzhelev |
I'd add the question if metrics-server should be the one deciding when a pod is terminated? |
I think it is because the experience is based on parsing prometheus metrics. I'm not sure if pod status is available. We had a short discussion with @mrunalp. We still need to understand exactly how this happens. Once we know, and we confirm that this is expected, we may need to expose additional metrics to help understand that that |
/cc @serathius |
Metrics server treats |
@serathius I think in principle that makes sense, however in practice we have seen a case where a single container in a pod with I'm curious if any progress has been made on the investigation here or if we have a theory as to how this can happen? Maybe a middle ground here is to just drop the single bad container's metrics? |
It's just a status quo, if someone can bring a nicely documented case that would show that kernel can report workingSetBytes=0 and that's a correct state we can switch it. We could also consider leaving the decision up to user. On the other hand, what will HPA do if there are 10 pods under it and one of them has issue. Can hpa make a decision based on remaining 9 pods? |
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
FWIW this was solved by migrating to cgroup v2 and we haven't seen the issue since. I don't think there is anything else to do here. /close |
@raywainman: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I see the problem happened with cgroup v2 today: On the node
kubectl to the pod
According to https://github.com/google/cadvisor/blob/master/container/libcontainer/handler.go#L842-L851, Could someone shed light on how to fix this issue? |
Interesting, we noticed the issue with Did you notice anything else happening in the node itself, was it under significant memory pressure? |
The node usage is packed with cpu.
But usage is not high.
I also have node with logrotate has similar problem. Both are very easy to reproduce. @raywainman do you want to reopen this issue or should I create a new one? |
Re-opening this issue seems totally reasonable. Do you mind sharing how to reproduce? That was something we really struggled with on our end. |
@dgrisonnet please reopen this issue |
Correction. I'm using cgroup v1. @raywainman do you see this problem only on v1 or v2 as well?
|
We only saw this with cgroup v1 and seemed to go away when upgrading to cgroup v2. |
We also observe this issue with cgroups v1. Container in question is very light (2 MB RSS) and is mostly sleeping, but working set is sometimes reported as 0. Please reopen this issue, @dgrisonnet! |
Any chance anyone could paste some instructions to reproduce this? Would be great to capture that here. |
See if you can reproduce it with this. It sometimes triggers 36+ hours after deployment.
|
What happened:
We observe sometimes metrics stop flowing for a Pod when one of the containers start reporting
0
as aworking_set_bytes
.The container in question is doing nothing but sleep for hours and then check on some files to do some work. We do not have direct access to the repro.
The check that discards Pod metrics in this case is here:
metrics-server/pkg/scraper/client/resource/decode.go
Lines 195 to 198 in ffbdb6f
This code was introduced by #759 to filter out situations when the Container is terminated already. I would question the reasoning for throwing the entire Pod stats away when container is terminated. How would it work in case of Jobs with two containers that are not restarteable and one of them already terminated.
I also found similar logic and this comment in heapster repo: kubernetes-retired/heapster#1708 (comment) that says (as expected) that CPU alone is a bad indicator of bad container. Both memory and cpu needs to be checked, while this PR: #759 ignores container if either is zero.
What you expected to happen:
0
workingSetBytes should not result in missing Pod measurements.Anything else we need to know?:
One questions I still cannot confirm and repro with confidence is whether
0
workingSetBytes is a legitimate situation. I tried to create a small container and a memory pressure, but wasn't able to driveworkingSetBytes = usage - total_inactive_memory
to0
. It went quite low, but not zero. If we have a repro for this, than the behavior in this repo is definitely incorrect.If this is impossible, than he logic may be legit and there is a bug somewhere else. I am opening this issue for the discussion in case community wisdom can help understand this better.
/sig node
Environment:
Kubernetes distribution (GKE, EKS, Kubeadm, the hard way, etc.):
Container Network Setup (flannel, calico, etc.):
Kubernetes version (use
kubectl version
):Metrics Server manifest
spoiler for Metrics Server manifest:
spoiler for Kubelet config:
spoiler for Metrics Server logs:
spolier for Status of Metrics API:
/kind bug
The text was updated successfully, but these errors were encountered: