Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement PrometheusMetric to use slices for label pairs #1528

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

cristiangreco
Copy link
Contributor

This refactoring stems from an attempt to optimise memory usage in BuildMetrics and createPrometheusLabels, where labels are copied across various maps. The new PrometheusMetric uses slices to store label pairs and is implemented to guarantee that labels are always sorted by key. The rationale is that slices might be more memory efficient than maps for large preallocation sizes. Moreover, the fact that label keys are promptly available (no need to iterate over the map) comes handy in a bunch of places where we save additional allocations. Lastly, while we spend cycles to do explicit sorting in yace now, it should save us some comparisons when prometheus sorts labels internally.

The refactoring also comes with a reimplementation of signature for labels, since the prometheus models only work with maps.

I've added a bunch of benchmarks of specific methods. They show that sometimes the change is noticeable, sometimes it's not (but the overall impact is hard to judge in synthetic benchs due to the variety of input one can get at runtime fromcoming from large aws responses).

Benchmark_EnsureLabelConsistencyAndRemoveDuplicates:

                                              │  before.txt  │              after.txt              │
                                              │    sec/op    │   sec/op     vs base                │
_EnsureLabelConsistencyAndRemoveDuplicates-12   14.203µ ± 2%   9.115µ ± 1%  -35.82% (p=0.000 n=10)

                                              │ before.txt │             after.txt              │
                                              │    B/op    │    B/op     vs base                │
_EnsureLabelConsistencyAndRemoveDuplicates-12   448.0 ± 0%   256.0 ± 0%  -42.86% (p=0.000 n=10)

                                              │ before.txt  │             after.txt              │
                                              │  allocs/op  │ allocs/op   vs base                │
_EnsureLabelConsistencyAndRemoveDuplicates-12   17.000 ± 0%   9.000 ± 0%  -47.06% (p=0.000 n=10)

Benchmark_createPrometheusLabels:

                           │ before.txt  │           after.txt           │
                           │   sec/op    │   sec/op     vs base          │
_createPrometheusLabels-12   41.86m ± 5%   41.40m ± 9%  ~ (p=0.481 n=10)

                           │  before.txt  │              after.txt               │
                           │     B/op     │     B/op      vs base                │
_createPrometheusLabels-12   2.867Mi ± 0%   1.531Mi ± 0%  -46.59% (p=0.000 n=10)

                           │ before.txt  │             after.txt              │
                           │  allocs/op  │  allocs/op   vs base               │
_createPrometheusLabels-12   40.00k ± 0%   40.00k ± 0%  -0.00% (p=0.000 n=10)

Benchmark_BuildMetrics:

                 │ before.txt  │             after.txt              │
                 │   sec/op    │   sec/op     vs base               │
_BuildMetrics-12   110.4µ ± 1%   114.1µ ± 1%  +3.35% (p=0.000 n=10)

                 │  before.txt  │              after.txt               │
                 │     B/op     │     B/op      vs base                │
_BuildMetrics-12   4.344Ki ± 0%   3.797Ki ± 0%  -12.59% (p=0.000 n=10)

                 │ before.txt │             after.txt             │
                 │ allocs/op  │ allocs/op   vs base               │
_BuildMetrics-12   95.00 ± 0%   99.00 ± 0%  +4.21% (p=0.000 n=10)

Benchmark_NewPrometheusCollector:

                           │ before.txt  │             after.txt              │
                           │   sec/op    │   sec/op     vs base               │
_NewPrometheusCollector-12   154.8µ ± 1%   143.5µ ± 1%  -7.26% (p=0.000 n=10)

                           │  before.txt  │              after.txt              │
                           │     B/op     │     B/op      vs base               │
_NewPrometheusCollector-12   4.516Ki ± 0%   4.281Ki ± 0%  -5.19% (p=0.000 n=10)

                           │ before.txt │             after.txt              │
                           │ allocs/op  │ allocs/op   vs base                │
_NewPrometheusCollector-12   142.0 ± 0%   127.0 ± 0%  -10.56% (p=0.000 n=10)

@cristiangreco cristiangreco force-pushed the cristian/metric-label-pair branch 2 times, most recently from f8a888c to 7abad63 Compare September 23, 2024 08:29
This refactoring stems from an attempt to optimise memory usage in
BuildMetrics and createPrometheusLabels, where labels are copied across
various maps. The new PrometheusMetric uses slices to store label pairs
and is implemented to guarantee that labels are always sorted by key.
The rationale is that slices _might_ be more memory efficient than maps
for large preallocation sizes. Moreover, the fact that label keys are
promptly available (no need to iterate over the map) comes handy in a
bunch of places where we save additional allocations. Lastly, while we
spend cycles to do explicit sorting in yace now, it should save us some
comparisons when prometheus sorts labels internally.

The refactoring also comes with a reimplementation of signature for
labels, since the prometheus models only work with maps.

I've added a bunch of benchmarks of specific methods. They show that
sometimes the change is noticeable, sometimes it's not (but the overall
impact is hard to judge in synthetic benchs due to the variety of input
one can get at runtime fromcoming from large aws responses).

Benchmark_EnsureLabelConsistencyAndRemoveDuplicates:

```
                                              │  before.txt  │              after.txt              │
                                              │    sec/op    │   sec/op     vs base                │
_EnsureLabelConsistencyAndRemoveDuplicates-12   14.203µ ± 2%   9.115µ ± 1%  -35.82% (p=0.000 n=10)

                                              │ before.txt │             after.txt              │
                                              │    B/op    │    B/op     vs base                │
_EnsureLabelConsistencyAndRemoveDuplicates-12   448.0 ± 0%   256.0 ± 0%  -42.86% (p=0.000 n=10)

                                              │ before.txt  │             after.txt              │
                                              │  allocs/op  │ allocs/op   vs base                │
_EnsureLabelConsistencyAndRemoveDuplicates-12   17.000 ± 0%   9.000 ± 0%  -47.06% (p=0.000 n=10)
```

Benchmark_createPrometheusLabels:

```
                           │ before.txt  │           after.txt           │
                           │   sec/op    │   sec/op     vs base          │
_createPrometheusLabels-12   41.86m ± 5%   41.40m ± 9%  ~ (p=0.481 n=10)

                           │  before.txt  │              after.txt               │
                           │     B/op     │     B/op      vs base                │
_createPrometheusLabels-12   2.867Mi ± 0%   1.531Mi ± 0%  -46.59% (p=0.000 n=10)

                           │ before.txt  │             after.txt              │
                           │  allocs/op  │  allocs/op   vs base               │
_createPrometheusLabels-12   40.00k ± 0%   40.00k ± 0%  -0.00% (p=0.000 n=10)
```

Benchmark_BuildMetrics:

```
                 │ before.txt  │             after.txt              │
                 │   sec/op    │   sec/op     vs base               │
_BuildMetrics-12   110.4µ ± 1%   114.1µ ± 1%  +3.35% (p=0.000 n=10)

                 │  before.txt  │              after.txt               │
                 │     B/op     │     B/op      vs base                │
_BuildMetrics-12   4.344Ki ± 0%   3.797Ki ± 0%  -12.59% (p=0.000 n=10)

                 │ before.txt │             after.txt             │
                 │ allocs/op  │ allocs/op   vs base               │
_BuildMetrics-12   95.00 ± 0%   99.00 ± 0%  +4.21% (p=0.000 n=10)
```

Benchmark_NewPrometheusCollector:

```
                           │ before.txt  │             after.txt              │
                           │   sec/op    │   sec/op     vs base               │
_NewPrometheusCollector-12   154.8µ ± 1%   143.5µ ± 1%  -7.26% (p=0.000 n=10)

                           │  before.txt  │              after.txt              │
                           │     B/op     │     B/op      vs base               │
_NewPrometheusCollector-12   4.516Ki ± 0%   4.281Ki ± 0%  -5.19% (p=0.000 n=10)

                           │ before.txt │             after.txt              │
                           │ allocs/op  │ allocs/op   vs base                │
_NewPrometheusCollector-12   142.0 ± 0%   127.0 ± 0%  -10.56% (p=0.000 n=10)
```
@cristiangreco cristiangreco force-pushed the cristian/metric-label-pair branch from 7abad63 to 93048ed Compare September 24, 2024 07:43
Copy link
Contributor

@thepalbi thepalbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some not super big Qs, but changes look good. Also, code looks much more cleaner!

Comment on lines 98 to 116
// labelPair joins two slices of keys and values
// and allows simultaneous sorting.
type labelPair struct {
keys []string
vals []string
}

func (p labelPair) Len() int {
return len(p.keys)
}

func (p labelPair) Swap(i, j int) {
p.keys[i], p.keys[j] = p.keys[j], p.keys[i]
p.vals[i], p.vals[j] = p.vals[j], p.vals[i]
}

func (p labelPair) Less(i, j int) bool {
return p.keys[i] < p.keys[j]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to implement Sortable or sth like that right? Can we add a comment noting that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this implements sort.Interface, will add a note in the comment above about labelPair.

Comment on lines 176 to 189
// LabelsSignature returns a hash of the labels. It emulates
// prometheus' LabelsToSignature implementation but works on
// labelPair instead of map[string]string. Assumes that
// the labels are sorted.
func (p *PrometheusMetric) LabelsSignature() uint64 {
xxh := xxhash.New()
for i, key := range p.labels.keys {
_, _ = xxh.WriteString(key)
_, _ = xxh.Write(separatorByteSlice)
_, _ = xxh.WriteString(p.labels.vals[i])
_, _ = xxh.Write(separatorByteSlice)
}
return xxh.Sum64()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know if this is super important, but maybe add some test to check that the hasing is the same as implemented on prometheus?

Copy link
Contributor Author

@cristiangreco cristiangreco Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this is not the same as model.LabelsToSignature, because that implementation uses a custom version of fnv which is not exposed externally. I'll change the comment as "emulate" is not clear enough.

xxhash is used in NewDesc for similar purpose of deriving a signature of labels names and values. The two hash functions are different though.

It shouldn't really matter, but I'm happy to do a little copy from LabelsToSignature's implementation if we think that's better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yhea I've seen the xxhash based implementation, so let's keep that one

if context == nil {
return map[string]string{}
return []string{}, []string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe nil, nil to not even allocate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd guess this is optimised away by the compiler, let me look how easily we could change it though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nahh nvm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah actually on a second thought it's more consistent to keep return empty collections as we do in other places.

Copy link
Contributor

@thepalbi thepalbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@kgeckhart kgeckhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally worth a shot :shipit: two things I was thinking about,

  1. How it scales to larger label sets. I have to imagine there's a break even point between slices and maps where for a small number of elements a slice is going to be faster than a map. IMO this is what the benchmarks are showing as the test data has a small number of labels. I think this transitions well to real data sets though as the number of labels is not often large enough to matter.
  2. Does this approach leave us open to duplicate labels that the map based implementation was hiding and will prometheus error/panic for duplicate labels?

@cristiangreco cristiangreco force-pushed the cristian/metric-label-pair branch from 51dc599 to 1d2077a Compare September 25, 2024 08:36
@cristiangreco
Copy link
Contributor Author

cristiangreco commented Sep 25, 2024

  1. Does this approach leave us open to duplicate labels that the map based implementation was hiding and will prometheus error/panic for duplicate labels?

Yes this edge case is very possible, e.g. if cloudwatch returns any duplicate dimension. I've pushed a change that validates the correct length of labels within EnsureLabelConsistencyAndRemoveDuplicates.

@cristiangreco cristiangreco marked this pull request as draft September 26, 2024 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants