Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parentheses to sanitize list to prevent invalid metric name generation #1563

Merged

Conversation

nixargh
Copy link
Contributor

@nixargh nixargh commented Nov 14, 2024

Using PR(:1) type of statistics and others, that have brackets, leads to an error. Clodwatch types reference

Config example:

discovery:
  jobs:
    - type: AWS/ELB
      regions:
        - eu-central-1
      period: 60
      length: 60
      dimensionNameRequirements:
        - AvailabilityZone
        - LoadBalancerName
      metrics:
        - name: Latency
          statistics: ["PR(:10)"]

Error:

 error(s) occurred:
* "aws_elb_latency_pr(_10)" is not a valid metric name

After fix it works like this:

aws_elb_latency_pr__10_{account_id="***...

Looks ugly from my POV but works and requires minimal changes 😃

@SuperQ SuperQ changed the title Add brackets to sanitize list to prevent invalid metric name generation Add parentheses to sanitize list to prevent invalid metric name generation Nov 20, 2024
@SuperQ
Copy link
Contributor

SuperQ commented Nov 20, 2024

Interesting, PR metrics seem to follow Prometheus histogram bucket format. We could probably add real support for this. Do you have a link to docs of what values are available?

The ELB metrics doc doesn't seem to include it. :(

@SuperQ SuperQ merged commit 3d12bc9 into prometheus-community:master Nov 20, 2024
3 checks passed
@nixargh
Copy link
Contributor Author

nixargh commented Nov 21, 2024

Interesting, PR metrics seem to follow Prometheus histogram bucket format. We could probably add real support for this. Do you have a link to docs of what values are available?

The ELB metrics doc doesn't seem to include it. :(

Actually my initial post contains the link to 'statistics' description: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Statistics-definitions.html

And I agree that some of them looks like a good source to build a prometheus histogram:

Trimmed count (TC) is the number of data points in the chosen range for a trimmed mean statistic. 

probably together with:

SampleCount is the number of data points during the period.

So my plan is to collect a number of TCs, then relabel them and use for latency SLO with sloth framework the same way as I use other metrics. Something like configs below (but I haven't add SampleCount).
YACE config:

---
apiVersion: v1alpha1
sts-region: eu-central-1
discovery:
  jobs:
    - type: AWS/ELB
      regions:
        - eu-central-1
      period: 600
      length: 600
      dimensionNameRequirements:
        - AvailabilityZone
        - LoadBalancerName
      metrics:
        - name: Latency
          statistics:
            - p50
            - p90
            - p99
            - "TC(:0.2)"
            - "TC(:0.4)"
            - "TC(:0.8)"
            - "TC(:1)"
            - "TC(:2.5)"
            - "TC(:5)"
            - "TC(:10)"
            - "TC(:20)"
            - "TC(:40)"
            - "TC(:80)"
            - "TC(:160)"
            - "TC(:320)"

Relabel config:

---
scrape_configs:
  - job_name: yace
    static_configs:
      - targets:
        - localhost:5000
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: 'aws_elb_latency_tc__([0-9]+)_([0-9]+)_'
        replacement: "$1.$2"
        target_label: le

      - source_labels: [__name__]
        regex: 'aws_elb_latency_tc__([0-9]+)_'
        replacement: "$1"
        target_label: le

      - source_labels: [__name__]
        regex: 'aws_elb_latency_tc__[0-9_]+_'
        replacement: aws_elb_request_time_seconds_bucket
        target_label: __name__

Result:
2024-11-21_10-07-38

@SuperQ
Copy link
Contributor

SuperQ commented Nov 21, 2024

I read the docs, but it wasn't clear how buckets are managed on the cloudwatch side. It says it "creates" buckets but with no information about how this really works.

@nixargh nixargh deleted the add_brackets_to_sanitize_lists branch November 21, 2024 08:40
@cristiangreco
Copy link
Contributor

@nixargh thanks for this contribution! If you have time and interest, would be cool to add a test case for this change! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants