You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After the most recent batch of performance fixes, discovery jobs which pull metrics from large AWS environments have very reasonable resource utilization 🎉 . The currently limitation which can still cause longer scrape times is that all ListMetrics calls must complete before any GetMetricData calls are made. Since the APIs have independent rate limits (ListMetrics is 25 TPS and GetMetricData is ~50 TPS) we can safely start calling GetMetricData before ListMetrics completes.
I think the most idiomatic go way of going about this is via introducing channels to runDiscoveryJob. The current challenge with doing this is the current code is really complex and relatively untestable. I think before doing this it needs to be refactored it to dramatically reduce the risk of such an impactful change. I would like to start by decomposing the main steps of runDiscoveryJob in to smaller composable/testable "dataflows" listed below
GetResources
ListMetrics
AssociateMetricsToResources
GetMetricData
At this point we should have the ability to have solid test coverage across the complex logic used by each flow and that runDiscoveryJob is going to flow the data appropriately. After this introducing channels should hopefully be as simple as introducing a new strategy for how runDiscoveryJob composes the flow of data which can be gated behind a feature flag. This level of decoupling will make it much easier to to test the complex test cases channels require like shutdown, and error propagation.
If this pattern works out well it I think it should be adaptable to reduce the amount of code copy CustomNamespace require. A CustomNamespace job should be a composition of the ListMetrics and GetMetricData dataflows.
What might the configuration look like?
Ideally, no configuration changes are required
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
Feature description
After the most recent batch of performance fixes, discovery jobs which pull metrics from large AWS environments have very reasonable resource utilization 🎉 . The currently limitation which can still cause longer scrape times is that all
ListMetrics
calls must complete before anyGetMetricData
calls are made. Since the APIs have independent rate limits (ListMetrics
is 25 TPS andGetMetricData
is ~50 TPS) we can safely start callingGetMetricData
beforeListMetrics
completes.I think the most idiomatic go way of going about this is via introducing channels to
runDiscoveryJob
. The current challenge with doing this is the current code is really complex and relatively untestable. I think before doing this it needs to be refactored it to dramatically reduce the risk of such an impactful change. I would like to start by decomposing the main steps ofrunDiscoveryJob
in to smaller composable/testable "dataflows" listed belowAt this point we should have the ability to have solid test coverage across the complex logic used by each flow and that
runDiscoveryJob
is going to flow the data appropriately. After this introducing channels should hopefully be as simple as introducing a new strategy for howrunDiscoveryJob
composes the flow of data which can be gated behind a feature flag. This level of decoupling will make it much easier to to test the complex test cases channels require like shutdown, and error propagation.If this pattern works out well it I think it should be adaptable to reduce the amount of code copy CustomNamespace require. A CustomNamespace job should be a composition of the
ListMetrics
andGetMetricData
dataflows.What might the configuration look like?
Ideally, no configuration changes are required
The text was updated successfully, but these errors were encountered: