ELK support: understanding metrics, displaying graph in kibana4, tuning #951

gregbkr · 2015-11-04T13:09:04Z

Hello everyone,

First, thank you cAdvisor team for making that product available for Docker infra, it's a great solution for monitoring and so easy to deploy! :-)

I am implementing Cadvisor(latest version) with ELK support (ElasticSearch 1.7|Logstash 1.5.3|Kibana 4) on docker.
It is working so far and I can get some graphs in Kibana4. Please excuse me if I am in the wrong place for these questions :-O. I would need help on several points:

MACHINE_NAME ID
field machine_name is represented as the ID of the cAdvisor container, could be great if we could have the name of the container... (same of what you did with the field Container_Name).
LIMITS
How to find limits - CPU, RAM, network - available in order to set the graph limit, or to display percentages? I can't find in the logs these values, but I can see it is represented on the cAdvisor live webpage.
GRAPH VALUES LOGIC
Understanding container_Name and graphs, can you confirm the following?
"/" = root = /docker + /user + other_proccess_running_locally
"/docker" = container1 + container2 + etc
"/user" = user session
For CPU it seems to work.

For RAM: /docker is nearly null while some containers got 4GB of RAM. So my guesses are false.

For page fault and network: cAdvisor container surpasses the root "/". So root don't represent the sum of all containers :-(
CPU METRIC
How to explain CPU usage metric?
I can see cadvisor container stats.cpu.usage.total=30,000,000,000,000.
"/" (root) seems to display: cpu.usage.total=180,000,000,000,000 (and still growing)
I found that cpu.usage = The usage value is the delta of cumulative CPU usage from the beginning of the minute to the end of the minute.
On my server: cat /proc/cpuinfo
bogomips : 5200.18 X 2 procs = 10 400.36 instruction per second --> 624 021.6 instructions per minute maximum.
So I don't know how I can have a so big number for a container...
PAGEFAULT & TUNING
CAdvisor making lots of page fault and usually crash after few days? Is there some tuning you recommend?
DATA IN ARRAY:
I see the available field for eth0, but nothing for eth1. I am not sure what I am doing wrong.

Same for other metric: (IO, filesystem), I can see the field in an array, but I can't do any visu with them, and kibana say "array is not well supported"