[FEATURE] Start GetMetricData calls before all ListMetrics calls finish #1094

kgeckhart · 2023-08-06T15:03:40Z

Is there an existing issue for this?

I have searched the existing issues

Feature description

After the most recent batch of performance fixes, discovery jobs which pull metrics from large AWS environments have very reasonable resource utilization 🎉 . The currently limitation which can still cause longer scrape times is that all ListMetrics calls must complete before any GetMetricData calls are made. Since the APIs have independent rate limits (ListMetrics is 25 TPS and GetMetricData is ~50 TPS) we can safely start calling GetMetricData before ListMetrics completes.

I think the most idiomatic go way of going about this is via introducing channels to runDiscoveryJob. The current challenge with doing this is the current code is really complex and relatively untestable. I think before doing this it needs to be refactored it to dramatically reduce the risk of such an impactful change. I would like to start by decomposing the main steps of runDiscoveryJob in to smaller composable/testable "dataflows" listed below

GetResources
ListMetrics
AssociateMetricsToResources
GetMetricData

At this point we should have the ability to have solid test coverage across the complex logic used by each flow and that runDiscoveryJob is going to flow the data appropriately. After this introducing channels should hopefully be as simple as introducing a new strategy for how runDiscoveryJob composes the flow of data which can be gated behind a feature flag. This level of decoupling will make it much easier to to test the complex test cases channels require like shutdown, and error propagation.

If this pattern works out well it I think it should be adaptable to reduce the amount of code copy CustomNamespace require. A CustomNamespace job should be a composition of the ListMetrics and GetMetricData dataflows.

What might the configuration look like?

Ideally, no configuration changes are required

The text was updated successfully, but these errors were encountered:

kgeckhart added the enhancement New feature or request label Aug 6, 2023

kgeckhart mentioned this issue Aug 14, 2023

move duplicated fields from CloudwatchData to a new JobContext #1106

Merged

kgeckhart mentioned this issue Feb 2, 2024

Add abstraction for GetMetricsData processing kgeckhart/yet-another-cloudwatch-exporter#2

Closed

kgeckhart mentioned this issue Mar 4, 2024

Add abstraction for GetMetricsData processing #1325

Merged

kgeckhart mentioned this issue Apr 5, 2024

getmetricdata: introduce an iterator #1368

Closed

This was referenced Apr 18, 2024

getmetricdata: move window calculator to processor #1388

Merged

getmetricdata: Move batching to an iterator #1389

Merged

[Feature] Add Support for Historical Data Points #986

Open

kgeckhart mentioned this issue Jun 3, 2024

Start a unified scraper #1432

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Start GetMetricData calls before all ListMetrics calls finish #1094

[FEATURE] Start GetMetricData calls before all ListMetrics calls finish #1094

kgeckhart commented Aug 6, 2023 •

edited

Loading

[FEATURE] Start GetMetricData calls before all ListMetrics calls finish #1094

[FEATURE] Start GetMetricData calls before all ListMetrics calls finish #1094

Comments

kgeckhart commented Aug 6, 2023 • edited Loading

Is there an existing issue for this?

Feature description

What might the configuration look like?

kgeckhart commented Aug 6, 2023 •

edited

Loading