Skip to content

Split metrics (cached) and tracing (uncached) label decoding#363

Merged
bobrik merged 1 commit intocloudflare:masterfrom
bobrik:ivan/decode-no-cache
Mar 25, 2024
Merged

Split metrics (cached) and tracing (uncached) label decoding#363
bobrik merged 1 commit intocloudflare:masterfrom
bobrik:ivan/decode-no-cache

Conversation

@bobrik
Copy link
Copy Markdown
Contributor

@bobrik bobrik commented Mar 23, 2024

In tracing there is no concern with cardinality and one can have PID as a label. With caching enabled, this means that we grow the cache map indefinitely, leaking memory until OOM stops us. Let's disable caching for tracing instead.

In tracing there is no concern with cardinality and one can have PID as a label.
With caching enabled, this means that we grow the cache map indefinitely,
leaking memory until OOM stops us. Let's disable caching for tracing instead.
@bobrik
Copy link
Copy Markdown
Contributor Author

bobrik commented Mar 23, 2024

Perhaps an LRU with 1k entries per config is a better option here.

@bobrik
Copy link
Copy Markdown
Contributor Author

bobrik commented Mar 23, 2024

With LRU there's overhead in terms of CPU for the metrics case:

ivan@vm:~/projects/ebpf_exporter$ ~/go/bin/benchcmp /tmp/old.txt /tmp/new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                   old ns/op     new ns/op     delta
BenchmarkCache/direct-8     382           395           +3.30%
BenchmarkCache/cached-8     20.9          35.7          +70.30%

benchmark                   old allocs     new allocs     delta
BenchmarkCache/direct-8     13             13             +0.00%
BenchmarkCache/cached-8     0              0              +0.00%

benchmark                   old bytes     new bytes     delta
BenchmarkCache/direct-8     216           216           +0.00%
BenchmarkCache/cached-8     0             0             +0.00%

There is no overhead for metrics with separate methods.

For tracing we have a set of nearly unique labels that are not worth caching:

        - name: trace_id
          size: 16
          decoders:
            - name: hex
        - name: parent_span_id
          size: 8
          decoders:
            - name: hex
        - name: span_id
          size: 8
          decoders:
            - name: hex
        - name: span_monotonic_timestamp_ns
          size: 8
          decoders:
            - name: uint
        - name: span_duration_ns
          size: 8
          decoders:
            - name: uint

Given that we cache the whole input with unique and non-unique fields, there's zero chance of actually caching anything for tracing. There doesn't seem to be any point in LRU then.

@bobrik bobrik merged commit 5b4a43f into cloudflare:master Mar 25, 2024
@bobrik bobrik deleted the ivan/decode-no-cache branch March 25, 2024 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants