Commit Graph

26 Commits

Author SHA1 Message Date
Zuul
fe8d8c8839 Merge "Use KiB as unit for host_ram_usage when using prometheus datasource" 2025-06-20 16:19:50 +00:00
Alfredo Moralejo
6ea362da0b Use KiB as unit for host_ram_usage when using prometheus datasource
The prometheus datasource was reporting host_ram_usage in MiB as
described in the docstring for the base datasource interface
definition [1].

However, the gnocchi datasource is reporting it in KiB following
ceilometer metric `hardware.memory.used` [2] and the strategies
using that metric expect it to be in KiB so the best approach is
to change the unit in the prometheus datasource and update the
docstring to avoid missunderstandings in future. So, this patch
is fixing the prometheus datasource to return host_ram_usage
in KiB instead of MiB.

Additionally, it is adding more unit tests for the check_threshold
method so that it covers the memory based strategy execution, validates
the calculated standard deviation and adds the cases where it is below
the threshold.

[1] 15981117ee/watcher/decision_engine/datasources/base.py (L177-L183)
[2] https://docs.openstack.org/ceilometer/train/admin/telemetry-measurements.html#snmp-based-meters

Closes-Bug: #2113776
Change-Id: Idc060d1e709c0265c64ada16062c3a206c6b04fa
2025-06-19 16:25:27 +02:00
Alfredo Moralejo
3860de0b1e Aggregate by label when querying instance cpu usage in prometheus
Currently, when the prometheus datasource query ceilometer_cpu metric
for instance cpu usage, it aggregates by instance and filter by the
label containing the instance uuid. While this works fine in real
scenarios, where a single metric is provided in a single instance, in
some cases as the CI jobs where metrics are directly injected, leads to
incorrect metric calculation.

We applied a similar fix for the host metrics in [1] but we did not
implement it for instance cpu.

I am also converting the query formatting to the dict format to improve
understability.

[1] https://review.opendev.org/c/openstack/watcher/+/946049

Closes-Bug: #2113936
Change-Id: I3038dec20612162c411fc77446e86a47e0354423
2025-06-11 14:49:56 +02:00
Alfredo Moralejo
c7158b08d1 Aggregate by fqdn label instead instance in host cpu metrics
While in a regular case a specific metric for a specific host will be
provider by a single instance (exporter) so aggregating by label and by
intances should be the same, it is more correct to aggregate by the same
label that the one we use to filter the metrics.

This is follow up of https://review.opendev.org/c/openstack/watcher/+/944795

Related-Bug: #2103451

Change-Id: Ia61f051547ddc51e0d1ccd5a56485ab49ce84c2e
2025-04-02 15:36:17 +02:00
Alfredo Moralejo
a65e7e9b59 Query by fqdn_label instead of instance for host metrics
Currently we are using `instance` label to query about host metrics to
prometheus. This label is assigned to the url of each endpoint being
scrapped.

While this work fine in one-exporter-per-compute cases as the driver is
mapping the fqdn_label value to the `instance` label value, it fails
when there are more that one target with the same value for the fqdn
label. This is a valid case, to be able to query by fqdn and do not
care about what exporter in the host is providing the metric.

This patch is changing the queries we use for hosts to be based on the
fqdn_label instead of the instance one. To implement it, we are also
simplifying the way we check the metric exist for the host by converting
prometheus_fqdn_instance_map into a prometheus_fqdn_labels set
which stores the list of fqdn found in  prometheus.

Closes-Bug: #2103451
Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e
2025-03-19 15:25:24 +01:00
Zuul
f2ee231f14 Merge "pre-commit: Integrate bandit" 2025-03-11 09:58:29 +00:00
Takashi Kajinami
977f014cba Deprecate Monasca data source
The Monasca project was marked inactive during 2023.1. Although we have
seen multiple people showing interest to keep the project, we haven't
seen any real progress.

Because the project is likely retired soon, let's deprecate the feature
dependent on Monasca so that we can remove it in a future release.

Change-Id: Ifd64f5ba59bbac238ff62302ec36a3e36954d6d0
2025-02-16 18:45:31 +09:00
Takashi Kajinami
dd0082c343 pre-commit: Integrate bandit
Run bandit check from per-commit so that the check is executed in pep8
job.

Also remove requirements installed automatically by pre-commit from
test-requirements.

Change-Id: I45af8c47afb262882ebbee74ae52446fed741e26
2025-02-10 22:50:34 +09:00
Zuul
4527f89d8d Merge "Add support for instance metrics to prometheus datasource" 2025-02-03 13:22:28 +00:00
Zuul
e535177bc0 Merge "Remove ceilometer datasource" 2025-01-29 13:22:46 +00:00
Alfredo Moralejo
136e5d927c Add support for instance metrics to prometheus datasource
In order to support vm_workload_consolidation, workload_balance and
workload_stabilization strategis some instance metrics are required.
This patch is adding support for them.

Implementation is based on a prometheus store populated using sg-core
from ceilometer metrics with Pollster source.

- instance_ram_usage: rely on ceilometer_memory_usage metrics created from
  ceilometer memory.usage meter.
- instance_ram_allocated: rely on the memory value provided by the
  inventory created from nova and placement APIs.
- instance_cpu_usage: rely on ceilometer_cpu metric created from
  ceilometer cpu meter. A max value of 100 is set in the query.
- instance_root_disk_size: rely on the `disk` value provided by the
  inventory created from nova and placement APIs.

A new parameterer `instance_uuid_label` has been added to the prometheus
datasource configuration to identify the label used to store the value of the
OpenStack instance uuid for eache instance metric in prometheus. Default
value is `resource`.

Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2
2025-01-23 13:23:04 +01:00
m
3f26dc47f2 Add prometheus data source for watcher decision engine
This adds a new data source for the Watcher decision engine that
implements the watcher.decision_engine.datasources.DataSourceBase.

related spec was merged at [1].

Implements: blueprint prometheus-datasource

[1] https://review.opendev.org/c/openstack/watcher-specs/+/933300

Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906
2025-01-10 15:20:37 +02:00
Takashi Kajinami
da23fdc621 Remove ceilometer datasource
This datasource requires Ceilometer API which was already removed some
years ago. The implementation should have been removed when dependency
on ceilometerclient was removed by [1].

Also remove some job definitions which are not actually used.

[1] 01d74d0a87

Change-Id: I29c3865dc1207f1bbbb266e4217cf8888afebfb6
2024-12-16 23:51:27 +09:00
Takashi Natsume
61a7dd85ca Replace deprecated datetime.utcnow()
The datetime.utcnow() is deprecated in Python 3.12.
Replace datetime.utcnow() with oslo_utils.timeutils.utcnow().
This bumps oslo.utils to 7.0.0.

Change-Id: Icccbb0549add686a744a72b354932471cbf91c92
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2024-10-02 22:24:47 +09:00
Takashi Kajinami
566a830f64 Bump hacking
hacking 3.0.x is quite old. Bump it to the current latest version.

Change-Id: I8d87fed6afe5988678c64090af261266d1ca20e6
2024-09-22 23:54:36 +09:00
Zuul
40e93407c7 Merge "Handle deprecated "cpu_util" metric" 2023-10-27 09:47:38 +00:00
Lucian Petrut
00fea975e2 Handle deprecated "cpu_util" metric
The "cpu_util" metric has been deprecated a few years ago.
We'll obtain the same result by converting the cumulative cpu
time to a percentage, leveraging the rate of change aggregation.

Change-Id: I18fe0de6f74c785e674faceea0c48f44055818fe
2023-10-24 10:47:23 +00:00
Lucian Petrut
fd6562382e Avoid performing retries in case of missing resources
There may be no available metrics for instances that are stopped
or were recently spawned. This makes retries unnecessary and time
consuming.

For this reason, we'll ignore gnocchi MetricNotFound errors.

Change-Id: I79cd03bf04db634b931d6dfd32d5150f58e82044
2023-10-23 14:14:21 +00:00
BubaVV
0610070e59 Add timeout option for Grafana request
Implemented config option to setup Grafana API request timeout

Change-Id: I8cbf8ce22f199fe22c0b162ba1f419169881f193
2023-08-23 17:46:19 +03:00
sue
c28756c48b use HTTPStatus instead of direct code
Python introduced http.HTTPStatus since version 3.5,
and Wallaby has targeted a minimum version of python 3.6.

Change-Id: I45f732f0f59b8fae831bb6c07f4fdd98cdd7409a
2021-07-09 11:02:36 +02:00
Dantali0n
cca0d9f7d7 Implements base method for time series metrics
Implements base method as well as some basic implementations to
retrieve time series metrics. Ceilometer can not be supported
as API documentation has been unavailable. Grafana will be
supported in follow-up patch.

Partially Implements: blueprint time-series-framework

Change-Id: I55414093324c8cff379b28f5b855f41a9265c2d3
2020-08-26 16:01:15 +02:00
chenke
0ef0f165cb Remove six[7]
Since our code will only support py3. So remove six is necessary.

Change-Id: I3738118b1898421ee41e9e2902c255ead73f3915
2020-04-22 15:59:15 +08:00
Andreas Jaeger
1bb2aefec3 Update hacking for Python3
The repo is Python 3 now, so update hacking to version 3.0 which
supports Python 3.

Fix problems found.

Update local hacking checks for new flake8.

Remove hacking and friends from lower-constraints, they are not needed
to be installed at run-time.

Change-Id: Ia6af344ec8441dc98a0820176373dcff3a8c80d5
2020-04-02 07:50:02 +02:00
licanwei
f685bf62ab Don't throw exception when missing metrics
When querying data from datasource, it's possible to miss some data.
In this case if we throw an exception, Audit will failed because of
the exception. We should remove the exception and give the decision
to the strategy.

Change-Id: I1b0e6b78b3bba4df9ba16e093b3910aab1de922e
Closes-Bug: #1847434
2019-10-16 21:01:39 -07:00
licanwei
4b83bf33e2 remove id field from CDM
There are 3 related fields(id, uuid and hostname) in ComputeNode[1].
according to [2], after nova api 2.53, the id of the hypervisor as a UUID.
and service.host is equal to hypervisor name for compute node.
so we can remove id and only keep uuid then set uuid to node.id

[1]:https://github.com/openstack/watcher/blob/master/watcher/decision_engine/model/collector/nova.py#L306
[2]:https://developer.openstack.org/api-ref/compute/?expanded=list-hypervisors-details-detail#list-hypervisors-details

Change-Id: Ie1d1ad56808270d936ec25186061f7f12cc49fdc
Closes-Bug: #1835192
Depends-on: I752fbfa560313e28e87d83e46431c283b4db4f23
Depends-on: I0975500f359de92b6d6fdea2e01614cf0ba73f05
2019-07-23 10:28:47 +08:00
Dantali0n
433eabb8d1 Move datasources folder into decision_engine
The datasources are only used by the decision_engine, however, they
are placed in a directory one level higher. This patch moves the
datasources code into the decision_engine folder.

Change-Id: Ia54531fb899b79a59bb77adea079ff27c0d518fa
2019-07-12 08:54:09 +02:00