Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus metric for container healthcheck status #2166

Open
replicajune opened this issue Feb 6, 2019 · 17 comments
Open

prometheus metric for container healthcheck status #2166

replicajune opened this issue Feb 6, 2019 · 17 comments

Comments

@replicajune
Copy link

Hi,

As far as I know, no metrics are available for healthcheck status of a container.

I see a metric about the "up" state of a container (container_last_seen) but nothing about what can be checked over State.Health.Status with docker

This statistic isn't really a metric because it return a string but i would guess that a bolean for each possible value would be useful (running, healthy, unhealthy for the ones I know )

@dashpole
Copy link
Collaborator

dashpole commented Feb 6, 2019

Does an equivalent exist for all container runtimes cAdvisor supports (mesos, containerd, rkt, docker)?

We usually try and stay away from spec-based metrics, as they tend to be runtime-specific, and generate large numbers of metric streams for each container.

@replicajune
Copy link
Author

I'm quite unaware of all specifications that could exist at this time. I'm under the impression (and could be wrong) that the OCI had or would propose something standard for this.

So, I've no idea unfortunately

The need I have is to have a metric that is about the work produced by a container rather than a state (container_tasks_state) of a processus or the fact that a container might be up or not.

The healthcheck instruction and related statistics with docker helps to really figure out if a container actually does what it should and I don't really see metrics about that for now

@xavs
Copy link

xavs commented May 29, 2019

This would be one very useful addition.

@BulatSaif
Copy link

Does anyone find the workaround?

@fontanacalifornia
Copy link

I am also looking to accomplish this.

@dashpole
Copy link
Collaborator

The kubelet does have these kind of metrics: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/prober/prober_manager.go#L38.
Those metrics are registered at /metrics/probes on the kubelet's port.

But that doesn't help anyone not using kubernetes...

I'm not sure if cAdvisor should take on metrics collection on probes, as it isn't performing them. I believe we currently only fetch the container from docker at container creation time, so this would require us to poll the runtime for the information. I'm not sure we can provide accurate cumulative probe metrics based on sampling the state. It seems like we are bound to miss probe failures.

@anil4u-04
Copy link

Hi Team,

Any advice/update/workaround here is much helpful for everyone. We needed this "health_check" very badly.

@serhiiromaniuk
Copy link

serhiiromaniuk commented Feb 18, 2020

Hi everybody! sum(time() - container_last_seen) by (name) is a workaround for me, but sometimes it works really bad.

@serhiiromaniuk
Copy link

Also, for alerts sum(rate(container_last_seen{name=~".+"}[5m])) by (container_label_com_docker_compose_service) < 1, with 15s scrapes helps me to stop crying all day.

@mbigras
Copy link

mbigras commented Apr 3, 2020

It's hard to create alerts based on metrics that disappear and it also goes against prometheus best practices. I still don't understand why we can't just use absent and move on but you can read more about it here:

https://www.robustperception.io/existential-issues-with-metrics

Recently, a coworker discovered this exporter:

https://github.com/prometheus-net/docker_exporter

Which exposed a very valuable metric: docker_container_running_state, this metric won't disappear when the container stops!

Here's an example:

$ sudo docker run \
	--name docker_exporter \
	--detach \
	--restart always \
	--volume /var/run/docker.sock:/var/run/docker.sock \
	--publish 9417:9417 \
	prometheusnet/docker_exporter
$ sudo docker create --name foo -it ubuntu sleep 10
$ sudo docker start foo
$ curl -s localhost:9417/metrics | grep state
docker_container_running_state{name="foo"} 1
docker_container_running_state{name="docker_exporter"} 1
# wait ten seconds
$ curl -s localhost:9417/metrics | grep state
docker_container_running_state{name="foo"} 0
docker_container_running_state{name="docker_exporter"} 1

@Cobertos
Copy link

Cobertos commented Aug 28, 2020

Healthchecking should be added to the above repo when prometheus-net/docker_exporter#11 is merged.

@karugaru
Copy link

karugaru commented Mar 9, 2021

To solve this issue, I created an application that exports the state of the container in Go language.

https://github.com/karugaru/docker_state_exporter
https://hub.docker.com/r/karugaru/docker_state_exporter

Try it if you like!

@mariusleu
Copy link

Can someone add this metric for monitoring HEALTCHECK of a container? (https://docs.docker.com/engine/reference/builder/#healthcheck)

@Pandede
Copy link

Pandede commented Nov 16, 2023

Is there any progress about this issue?

@luanpaschoal
Copy link

Up!

@skast96
Copy link

skast96 commented Dec 14, 2023

Is there a way to do that in 2023 ?

@mateuszdrab
Copy link

mateuszdrab commented May 28, 2024

I just had a play with the source code and I somewhat made a working poc which I commited to my fork

The metric is called container_health_state but I noticed it reports 0 when there is no health check so probably a better way to present this is needed.

container_health_state{container_label_org_opencontainers_image_created="",container_label_org_opencontainers_image_description="",container_label_org_opencontainers_image_licenses="",container_label_org_opencontainers_image_revision="",container_label_org_opencontainers_image_source="",container_label_org_opencontainers_image_title="",container_label_org_opencontainers_image_url="",container_label_org_opencontainers_image_version="",id="/user.slice/user-0.slice/[email protected]/init.scope",image="",name=""} 0 1716855349446

container_health_state{container_label_org_opencontainers_image_created="2024-05-08T20:07:29.227Z",container_label_org_opencontainers_image_description="Pi-hole in a docker container",container_label_org_opencontainers_image_licenses="NOASSERTION",container_label_org_opencontainers_image_revision="c2887aeffe4ac7d4d0730e739c4cd5a4ad40e958",container_label_org_opencontainers_image_source="https://github.com/pi-hole/docker-pi-hole",container_label_org_opencontainers_image_title="docker-pi-hole",container_label_org_opencontainers_image_url="https://github.com/pi-hole/docker-pi-hole",container_label_org_opencontainers_image_version="2024.05.0",id="/system.slice/docker-3dc08d059431db016cf7bf1065b11f600a8acd9b7b654ad59fd00596b891d9b1.scope",image="pihole/pihole:latest",name="Pi-hole-Redirect"} 1 1716855349134

Anyone wants to give it a try?

You'll have to compile cadvisor from source or I can provide a compiled binary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests