-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run ListMetrics calls with goroutines. #53
base: live
Are you sure you want to change the base?
Conversation
…abelConsistencyForMetrics
Eliminate global labelMap and build a labelSet for metrics in ensureLabelConsistencyForMetrics
…tch to structured logging (#7)
* Pass logger to the different structs * Use a builder for the AWS services * Use job specific loggers to propagate properties
* Merge latest release 0.34.0-alpha * Add missing metric
* Move AWS metrics back to prometheus.go and expose them as array in update.go * The go workflow should run on PR's for live * Remove unused field
Execute ListMetrics calls in separate goroutines (one for each metric), in a similar way to how GetMetricData requests are handled. This change removes semaphore usage (it's not used in GetMetricData as well) for the sake of making it easier to reason about the code. It can easily be added back in case we hit any issue around e.g. rate limits. In local tests, the speedup seems to be particularly effective when requesting 4 or more metrics in parallel (e.g. especially with EC2/EBS).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change removes semaphore usage (it's not used in GetMetricData as well) for the sake of making it easier to reason about the code. It can easily be added back in case we hit any issue around e.g. rate limits.
Slight concern, errors aren't propagated out of the parallel sections of the code ATM so we might want to make sure we are monitoring errors a little closer after this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to worry about ratelimiting from AWS in the case of large numbers of metrics?
metricsList, err := getFullMetricsList(ctx, svc.Namespace, m, clientCloudwatch) | ||
|
||
if err != nil { | ||
level.Error(logger).Log("msg", "Failed to get full metric list", "err", err, "metric_name", m.Name, "namespace", svc.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add a prom metrics to track both this errors and the scenario below with zero resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we roll this out to a single cluster first to observe any impact?
LGTM as we're going to do so
f35cc9e
to
b15fdd9
Compare
Execute ListMetrics calls in separate goroutines (one for each metric),
in a similar way to how GetMetricData requests are handled.
This change removes semaphore usage (it's not used in GetMetricData as
well) for the sake of making it easier to reason about the code. It can
easily be added back in case we hit any issue around e.g. rate limits.
In local tests, the speedup seems to be particularly effective when
requesting 4 or more metrics in parallel (e.g. especially with EC2/EBS).