This document describes API part of MVP version of Resource Metrics API effort in Kubernetes. Once the agreement will be made the document will be extended to also cover implementation details. The shape of the effort may be also a subject of changes once we will have more well-defined use cases.
The goal for the effort is to provide resource usage metrics for pods and nodes through the API server. This will be a stable, versioned API which core Kubernetes components can rely on. In the first version only the well-defined use cases will be handled, although the API should be easily extensible for potential future use cases.
This section describes well-defined use cases which should be handled in the first version. Use cases which are not listed below are out of the scope of MVP version of Resource Metrics API.
HPA uses the latest value of cpu usage as an average aggregated across 1 minute (the window may change in the future). The data for a given set of pods (defined either by pod list or label selector) should be accessible in one request due to performance issues.
Scheduler in order to schedule best-effort pods requires node level resource usage metrics as an average aggregated across 1 minute (the window may change in the future). The metrics should be available for all resources supported in the scheduler. Currently the scheduler does not need this information, because it schedules best-effort pods without considering node usage. But having the metrics available in the API server is a blocker for adding the ability to take node usage into account when scheduling best-effort pods.
This section describes the other considered use cases and explains why they are out of the scope of the MVP version.
HPA requires the latest value of application level metrics.
The design of the pipeline for collecting application level metrics should be revisited and it's not clear whether application level metrics should be available in API server so the use case initially won't be supported.
The Cluster Federation control system might want to consider cluster-level usage (in addition to cluster-level request) of running pods when choosing where to schedule new pods. Although Cluster Federation is still in design, we expect the metrics API described here to be sufficient. Cluster-level usage can be obtained by summing over usage of all nodes in the cluster.
This feature is not yet specified/implemented although it seems reasonable to provide users information about resource usage on pod/node level.
Since this feature has not been fully specified yet it will be not supported initially in the API although it will be probably possible to provide a reasonable implementation of the feature anyway.
Kubernetes dashboard in order to draw graphs requires resource usage in timeseries format from relatively long period of time. The aggregations should be also possible on various levels including replication controllers, deployments, services, etc.
Since the use case is complicated it will not be supported initially in the API and they will query Heapster directly using some custom API there.
Initially the metrics API will be in a separate API group called metrics
.
Later if we decided to have Node and Pod in different API groups also
NodeMetrics and PodMetrics should be in different API groups.
The proposed schema is as follow. Each top-level object has TypeMeta
and ObjectMeta
fields
to be compatible with Kubernetes API standards.
type NodeMetrics struct {
unversioned.TypeMeta
ObjectMeta
// The following fields define time interval from which metrics were
// collected in the following format [Timestamp-Window, Timestamp].
Timestamp unversioned.Time
Window unversioned.Duration
// The memory usage is the memory working set.
Usage v1.ResourceList
}
type PodMetrics struct {
unversioned.TypeMeta
ObjectMeta
// The following fields define time interval from which metrics were
// collected in the following format [Timestamp-Window, Timestamp].
Timestamp unversioned.Time
Window unversioned.Duration
// Metrics for all containers are collected within the same time window.
Containers []ContainerMetrics
}
type ContainerMetrics struct {
// Container name corresponding to the one from v1.Pod.Spec.Containers.
Name string
// The memory usage is the memory working set.
Usage v1.ResourceList
}
By default Usage
is the mean from samples collected within the returned time window.
The default time window is 1 minute.
All endpoints are GET endpoints, rooted at /apis/metrics/v1alpha1/
.
There won't be support for the other REST methods.
The list of supported endpoints:
/nodes
- all node metrics; type[]NodeMetrics
/nodes/{node}
- metrics for a specified node; typeNodeMetrics
/namespaces/{namespace}/pods
- all pod metrics within namespace with support forall-namespaces
; type[]PodMetrics
/namespaces/{namespace}/pods/{pod}
- metrics for a specified pod; typePodMetrics
The following query parameters are supported:
labelSelector
- restrict the list of returned objects by labels (list endpoints only)
In the future we may want to introduce the following params:
aggregator
(max
, min
, 95th
, etc.) and window
(1h
, 1d
, 1w
, etc.)
which will allow to get the other aggregates over the custom time window.
Depending on the further requirements the following features may be added:
- support for more metrics
- support for application level metrics
- watch for metrics
- possibility to query for window sizes and aggregation functions (though single window size/aggregation function per request)
- cluster level metrics