Aggregate by label when querying instance cpu usage in prometheus

Currently, when the prometheus datasource query ceilometer_cpu metric for instance cpu usage, it aggregates by instance and filter by the label containing the instance uuid. While this works fine in real scenarios, where a single metric is provided in a single instance, in some cases as the CI jobs where metrics are directly injected, leads to incorrect metric calculation. We applied a similar fix for the host metrics in [1] but we did not implement it for instance cpu. I am also converting the query formatting to the dict format to improve understability. [1] https://review.opendev.org/c/openstack/watcher/+/946049 Closes-Bug: #2113936 Change-Id: I3038dec20612162c411fc77446e86a47e0354423
2025-06-11 14:40:23 +02:00
parent 15981117ee
commit 3860de0b1e
2 changed files with 10 additions and 8 deletions
--- a/watcher/decision_engine/datasources/prometheus.py
+++ b/watcher/decision_engine/datasources/prometheus.py
@@ -342,10 +342,12 @@ class PrometheusHelper(base.DataSourceBase):
                )
                vcpus = 1
            query_args = (
-                "clamp_max((%s by (instance)(rate(%s{%s='%s'}[%ss]))/10e+8) "
-                "*(100/%s), 100)" %
-                (aggregate, meter, uuid_label_key, instance_label, period,
-                 vcpus)
+                "clamp_max((%(agg)s by (%(label)s)"
+                "(rate(%(meter)s{%(label)s='%(label_value)s'}[%(period)ss]))"
+                "/10e+8) *(100/%(vcpus)s), 100)"
+                % {'label': uuid_label_key, 'label_value': instance_label,
+                   'agg': aggregate, 'meter': meter, 'period': period,
+                   'vcpus': vcpus}
            )
        else:
            raise exception.InvalidParameter(