Aggregate by label when querying instance cpu usage in prometheus

Currently, when the prometheus datasource query ceilometer_cpu metric
for instance cpu usage, it aggregates by instance and filter by the
label containing the instance uuid. While this works fine in real
scenarios, where a single metric is provided in a single instance, in
some cases as the CI jobs where metrics are directly injected, leads to
incorrect metric calculation.

We applied a similar fix for the host metrics in [1] but we did not
implement it for instance cpu.

I am also converting the query formatting to the dict format to improve
understability.

[1] https://review.opendev.org/c/openstack/watcher/+/946049

Closes-Bug: #2113936
Change-Id: I3038dec20612162c411fc77446e86a47e0354423
This commit is contained in:
Alfredo Moralejo
2025-06-11 14:40:23 +02:00
parent 15981117ee
commit 3860de0b1e
2 changed files with 10 additions and 8 deletions

View File

@@ -342,10 +342,12 @@ class PrometheusHelper(base.DataSourceBase):
)
vcpus = 1
query_args = (
"clamp_max((%s by (instance)(rate(%s{%s='%s'}[%ss]))/10e+8) "
"*(100/%s), 100)" %
(aggregate, meter, uuid_label_key, instance_label, period,
vcpus)
"clamp_max((%(agg)s by (%(label)s)"
"(rate(%(meter)s{%(label)s='%(label_value)s'}[%(period)ss]))"
"/10e+8) *(100/%(vcpus)s), 100)"
% {'label': uuid_label_key, 'label_value': instance_label,
'agg': aggregate, 'meter': meter, 'period': period,
'vcpus': vcpus}
)
else:
raise exception.InvalidParameter(