How is `sum_over_time() / count_over_time()` different than `avg_over_time()`? #354

mmazur · 2022-07-20T15:02:11Z

The (default) optimized slo:sli_error:ratio_rate30d uses an expression of sum_over_time() / count_over_time(). This is following 9cd3177 which changed it from avg_over_time().

I'm very confused on what the difference is. The definition of an arithmetic average (mean) is sum() / count(), so unless there's something unusual in prom's implementation of these functions, I would expect the two expressions to be equivalent.

Prom's best practices on recording rules does mention:

When aggregating up ratios, aggregate up the numerator and denominator separately and then divide. Do not take the average of a ratio or average of an average as that is not statistically valid.

But sloth does not preserve either the numerator or denominator, therefore doing that is not possible.

The text was updated successfully, but these errors were encountered:

ThomWright · 2024-02-27T15:39:29Z

Agreed. This seems to be just a different way of averaging ratios, as far as I can tell.

The missing information is the number of requests in each 5m period. Without that, a 5 minute period with 1 error in 10 requests (10% error rate) will be treated equally to a 5 minute period with 1,000 errors in 10,000 requests (also a 10% error rate). But the 1,000 errors should contribute significantly more to the overall 30 day error rate than the 1 error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is `sum_over_time() / count_over_time()` different than `avg_over_time()`? #354

How is `sum_over_time() / count_over_time()` different than `avg_over_time()`? #354

mmazur commented Jul 20, 2022

ThomWright commented Feb 27, 2024

How is sum_over_time() / count_over_time() different than avg_over_time()? #354

How is sum_over_time() / count_over_time() different than avg_over_time()? #354

Comments

mmazur commented Jul 20, 2022

ThomWright commented Feb 27, 2024

How is `sum_over_time() / count_over_time()` different than `avg_over_time()`? #354

How is `sum_over_time() / count_over_time()` different than `avg_over_time()`? #354