watcher

Author	SHA1	Message	Date
Zuul	fe8d8c8839	Merge "Use KiB as unit for host_ram_usage when using prometheus datasource"	2025-06-20 16:19:50 +00:00
Alfredo Moralejo	6ea362da0b	Use KiB as unit for host_ram_usage when using prometheus datasource The prometheus datasource was reporting host_ram_usage in MiB as described in the docstring for the base datasource interface definition [1]. However, the gnocchi datasource is reporting it in KiB following ceilometer metric `hardware.memory.used` [2] and the strategies using that metric expect it to be in KiB so the best approach is to change the unit in the prometheus datasource and update the docstring to avoid missunderstandings in future. So, this patch is fixing the prometheus datasource to return host_ram_usage in KiB instead of MiB. Additionally, it is adding more unit tests for the check_threshold method so that it covers the memory based strategy execution, validates the calculated standard deviation and adds the cases where it is below the threshold. [1] `15981117ee/watcher/decision_engine/datasources/base.py (L177-L183)` [2] https://docs.openstack.org/ceilometer/train/admin/telemetry-measurements.html#snmp-based-meters Closes-Bug: #2113776 Change-Id: Idc060d1e709c0265c64ada16062c3a206c6b04fa	2025-06-19 16:25:27 +02:00
Alfredo Moralejo	3860de0b1e	Aggregate by label when querying instance cpu usage in prometheus Currently, when the prometheus datasource query ceilometer_cpu metric for instance cpu usage, it aggregates by instance and filter by the label containing the instance uuid. While this works fine in real scenarios, where a single metric is provided in a single instance, in some cases as the CI jobs where metrics are directly injected, leads to incorrect metric calculation. We applied a similar fix for the host metrics in [1] but we did not implement it for instance cpu. I am also converting the query formatting to the dict format to improve understability. [1] https://review.opendev.org/c/openstack/watcher/+/946049 Closes-Bug: #2113936 Change-Id: I3038dec20612162c411fc77446e86a47e0354423	2025-06-11 14:49:56 +02:00
Alfredo Moralejo	c7158b08d1	Aggregate by fqdn label instead instance in host cpu metrics While in a regular case a specific metric for a specific host will be provider by a single instance (exporter) so aggregating by label and by intances should be the same, it is more correct to aggregate by the same label that the one we use to filter the metrics. This is follow up of https://review.opendev.org/c/openstack/watcher/+/944795 Related-Bug: #2103451 Change-Id: Ia61f051547ddc51e0d1ccd5a56485ab49ce84c2e	2025-04-02 15:36:17 +02:00
Alfredo Moralejo	a65e7e9b59	Query by fqdn_label instead of instance for host metrics Currently we are using `instance` label to query about host metrics to prometheus. This label is assigned to the url of each endpoint being scrapped. While this work fine in one-exporter-per-compute cases as the driver is mapping the fqdn_label value to the `instance` label value, it fails when there are more that one target with the same value for the fqdn label. This is a valid case, to be able to query by fqdn and do not care about what exporter in the host is providing the metric. This patch is changing the queries we use for hosts to be based on the fqdn_label instead of the instance one. To implement it, we are also simplifying the way we check the metric exist for the host by converting prometheus_fqdn_instance_map into a prometheus_fqdn_labels set which stores the list of fqdn found in prometheus. Closes-Bug: #2103451 Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e	2025-03-19 15:25:24 +01:00
Zuul	f2ee231f14	Merge "pre-commit: Integrate bandit"	2025-03-11 09:58:29 +00:00
Takashi Kajinami	977f014cba	Deprecate Monasca data source The Monasca project was marked inactive during 2023.1. Although we have seen multiple people showing interest to keep the project, we haven't seen any real progress. Because the project is likely retired soon, let's deprecate the feature dependent on Monasca so that we can remove it in a future release. Change-Id: Ifd64f5ba59bbac238ff62302ec36a3e36954d6d0	2025-02-16 18:45:31 +09:00
Takashi Kajinami	dd0082c343	pre-commit: Integrate bandit Run bandit check from per-commit so that the check is executed in pep8 job. Also remove requirements installed automatically by pre-commit from test-requirements. Change-Id: I45af8c47afb262882ebbee74ae52446fed741e26	2025-02-10 22:50:34 +09:00
Zuul	4527f89d8d	Merge "Add support for instance metrics to prometheus datasource"	2025-02-03 13:22:28 +00:00
Zuul	e535177bc0	Merge "Remove ceilometer datasource"	2025-01-29 13:22:46 +00:00
Alfredo Moralejo	136e5d927c	Add support for instance metrics to prometheus datasource In order to support vm_workload_consolidation, workload_balance and workload_stabilization strategis some instance metrics are required. This patch is adding support for them. Implementation is based on a prometheus store populated using sg-core from ceilometer metrics with Pollster source. - instance_ram_usage: rely on ceilometer_memory_usage metrics created from ceilometer memory.usage meter. - instance_ram_allocated: rely on the memory value provided by the inventory created from nova and placement APIs. - instance_cpu_usage: rely on ceilometer_cpu metric created from ceilometer cpu meter. A max value of 100 is set in the query. - instance_root_disk_size: rely on the `disk` value provided by the inventory created from nova and placement APIs. A new parameterer `instance_uuid_label` has been added to the prometheus datasource configuration to identify the label used to store the value of the OpenStack instance uuid for eache instance metric in prometheus. Default value is `resource`. Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2	2025-01-23 13:23:04 +01:00
m	3f26dc47f2	Add prometheus data source for watcher decision engine This adds a new data source for the Watcher decision engine that implements the watcher.decision_engine.datasources.DataSourceBase. related spec was merged at [1]. Implements: blueprint prometheus-datasource [1] https://review.opendev.org/c/openstack/watcher-specs/+/933300 Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906	2025-01-10 15:20:37 +02:00
Takashi Kajinami	da23fdc621	Remove ceilometer datasource This datasource requires Ceilometer API which was already removed some years ago. The implementation should have been removed when dependency on ceilometerclient was removed by [1]. Also remove some job definitions which are not actually used. [1] `01d74d0a87` Change-Id: I29c3865dc1207f1bbbb266e4217cf8888afebfb6	2024-12-16 23:51:27 +09:00
Takashi Natsume	61a7dd85ca	Replace deprecated datetime.utcnow() The datetime.utcnow() is deprecated in Python 3.12. Replace datetime.utcnow() with oslo_utils.timeutils.utcnow(). This bumps oslo.utils to 7.0.0. Change-Id: Icccbb0549add686a744a72b354932471cbf91c92 Signed-off-by: Takashi Natsume <takanattie@gmail.com>	2024-10-02 22:24:47 +09:00
Takashi Kajinami	566a830f64	Bump hacking hacking 3.0.x is quite old. Bump it to the current latest version. Change-Id: I8d87fed6afe5988678c64090af261266d1ca20e6	2024-09-22 23:54:36 +09:00
Zuul	40e93407c7	Merge "Handle deprecated "cpu_util" metric"	2023-10-27 09:47:38 +00:00
Lucian Petrut	00fea975e2	Handle deprecated "cpu_util" metric The "cpu_util" metric has been deprecated a few years ago. We'll obtain the same result by converting the cumulative cpu time to a percentage, leveraging the rate of change aggregation. Change-Id: I18fe0de6f74c785e674faceea0c48f44055818fe	2023-10-24 10:47:23 +00:00
Lucian Petrut	fd6562382e	Avoid performing retries in case of missing resources There may be no available metrics for instances that are stopped or were recently spawned. This makes retries unnecessary and time consuming. For this reason, we'll ignore gnocchi MetricNotFound errors. Change-Id: I79cd03bf04db634b931d6dfd32d5150f58e82044	2023-10-23 14:14:21 +00:00
BubaVV	0610070e59	Add timeout option for Grafana request Implemented config option to setup Grafana API request timeout Change-Id: I8cbf8ce22f199fe22c0b162ba1f419169881f193	2023-08-23 17:46:19 +03:00
sue	c28756c48b	use HTTPStatus instead of direct code Python introduced http.HTTPStatus since version 3.5, and Wallaby has targeted a minimum version of python 3.6. Change-Id: I45f732f0f59b8fae831bb6c07f4fdd98cdd7409a	2021-07-09 11:02:36 +02:00
Dantali0n	cca0d9f7d7	Implements base method for time series metrics Implements base method as well as some basic implementations to retrieve time series metrics. Ceilometer can not be supported as API documentation has been unavailable. Grafana will be supported in follow-up patch. Partially Implements: blueprint time-series-framework Change-Id: I55414093324c8cff379b28f5b855f41a9265c2d3	2020-08-26 16:01:15 +02:00
chenke	0ef0f165cb	Remove six[7] Since our code will only support py3. So remove six is necessary. Change-Id: I3738118b1898421ee41e9e2902c255ead73f3915	2020-04-22 15:59:15 +08:00
Andreas Jaeger	1bb2aefec3	Update hacking for Python3 The repo is Python 3 now, so update hacking to version 3.0 which supports Python 3. Fix problems found. Update local hacking checks for new flake8. Remove hacking and friends from lower-constraints, they are not needed to be installed at run-time. Change-Id: Ia6af344ec8441dc98a0820176373dcff3a8c80d5	2020-04-02 07:50:02 +02:00
licanwei	f685bf62ab	Don't throw exception when missing metrics When querying data from datasource, it's possible to miss some data. In this case if we throw an exception, Audit will failed because of the exception. We should remove the exception and give the decision to the strategy. Change-Id: I1b0e6b78b3bba4df9ba16e093b3910aab1de922e Closes-Bug: #1847434	2019-10-16 21:01:39 -07:00
licanwei	4b83bf33e2	remove id field from CDM There are 3 related fields(id, uuid and hostname) in ComputeNode[1]. according to [2], after nova api 2.53, the id of the hypervisor as a UUID. and service.host is equal to hypervisor name for compute node. so we can remove id and only keep uuid then set uuid to node.id [1]:https://github.com/openstack/watcher/blob/master/watcher/decision_engine/model/collector/nova.py#L306 [2]:https://developer.openstack.org/api-ref/compute/?expanded=list-hypervisors-details-detail#list-hypervisors-details Change-Id: Ie1d1ad56808270d936ec25186061f7f12cc49fdc Closes-Bug: #1835192 Depends-on: I752fbfa560313e28e87d83e46431c283b4db4f23 Depends-on: I0975500f359de92b6d6fdea2e01614cf0ba73f05	2019-07-23 10:28:47 +08:00
Dantali0n	433eabb8d1	Move datasources folder into decision_engine The datasources are only used by the decision_engine, however, they are placed in a directory one level higher. This patch moves the datasources code into the decision_engine folder. Change-Id: Ia54531fb899b79a59bb77adea079ff27c0d518fa	2019-07-12 08:54:09 +02:00

26 Commits