Commit Graph

681 Commits

Author SHA1 Message Date
Alfredo Moralejo
c7158b08d1 Aggregate by fqdn label instead instance in host cpu metrics
While in a regular case a specific metric for a specific host will be
provider by a single instance (exporter) so aggregating by label and by
intances should be the same, it is more correct to aggregate by the same
label that the one we use to filter the metrics.

This is follow up of https://review.opendev.org/c/openstack/watcher/+/944795

Related-Bug: #2103451

Change-Id: Ia61f051547ddc51e0d1ccd5a56485ab49ce84c2e
2025-04-02 15:36:17 +02:00
Alfredo Moralejo
a65e7e9b59 Query by fqdn_label instead of instance for host metrics
Currently we are using `instance` label to query about host metrics to
prometheus. This label is assigned to the url of each endpoint being
scrapped.

While this work fine in one-exporter-per-compute cases as the driver is
mapping the fqdn_label value to the `instance` label value, it fails
when there are more that one target with the same value for the fqdn
label. This is a valid case, to be able to query by fqdn and do not
care about what exporter in the host is providing the metric.

This patch is changing the queries we use for hosts to be based on the
fqdn_label instead of the instance one. To implement it, we are also
simplifying the way we check the metric exist for the host by converting
prometheus_fqdn_instance_map into a prometheus_fqdn_labels set
which stores the list of fqdn found in  prometheus.

Closes-Bug: #2103451
Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e
2025-03-19 15:25:24 +01:00
Zuul
f2ee231f14 Merge "pre-commit: Integrate bandit" 2025-03-11 09:58:29 +00:00
Takashi Kajinami
df3d67a4ed Replace deprecated abc.abstractproperty
It was deprecated in Python 3.3 [1].

[1] https://docs.python.org/3.13/whatsnew/3.3.html#abc

Change-Id: Ibd98cb93f697a6da6a6bc5a5030640a262c7a66b
2025-03-02 15:36:48 +09:00
Zuul
383751904c Merge "Further database refactoring" 2025-02-27 11:52:59 +00:00
Takashi Kajinami
977f014cba Deprecate Monasca data source
The Monasca project was marked inactive during 2023.1. Although we have
seen multiple people showing interest to keep the project, we haven't
seen any real progress.

Because the project is likely retired soon, let's deprecate the feature
dependent on Monasca so that we can remove it in a future release.

Change-Id: Ifd64f5ba59bbac238ff62302ec36a3e36954d6d0
2025-02-16 18:45:31 +09:00
James Page
753c44b0c4 Further database refactoring
More refactoring of the SQLAlchemy database layer to improve
compatility with eventlet on newer Pythons.

Inspired by 0ce2c41404

Related-Bug: 2067815
Change-Id: Ib5e9aa288232cc1b766bbf2a8ce2113d5a8e2f7d
2025-02-14 11:42:47 +00:00
Takashi Kajinami
dd0082c343 pre-commit: Integrate bandit
Run bandit check from per-commit so that the check is executed in pep8
job.

Also remove requirements installed automatically by pre-commit from
test-requirements.

Change-Id: I45af8c47afb262882ebbee74ae52446fed741e26
2025-02-10 22:50:34 +09:00
Zuul
4527f89d8d Merge "Add support for instance metrics to prometheus datasource" 2025-02-03 13:22:28 +00:00
Zuul
e535177bc0 Merge "Remove ceilometer datasource" 2025-01-29 13:22:46 +00:00
Alfredo Moralejo
136e5d927c Add support for instance metrics to prometheus datasource
In order to support vm_workload_consolidation, workload_balance and
workload_stabilization strategis some instance metrics are required.
This patch is adding support for them.

Implementation is based on a prometheus store populated using sg-core
from ceilometer metrics with Pollster source.

- instance_ram_usage: rely on ceilometer_memory_usage metrics created from
  ceilometer memory.usage meter.
- instance_ram_allocated: rely on the memory value provided by the
  inventory created from nova and placement APIs.
- instance_cpu_usage: rely on ceilometer_cpu metric created from
  ceilometer cpu meter. A max value of 100 is set in the query.
- instance_root_disk_size: rely on the `disk` value provided by the
  inventory created from nova and placement APIs.

A new parameterer `instance_uuid_label` has been added to the prometheus
datasource configuration to identify the label used to store the value of the
OpenStack instance uuid for eache instance metric in prometheus. Default
value is `resource`.

Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2
2025-01-23 13:23:04 +01:00
m
3f26dc47f2 Add prometheus data source for watcher decision engine
This adds a new data source for the Watcher decision engine that
implements the watcher.decision_engine.datasources.DataSourceBase.

related spec was merged at [1].

Implements: blueprint prometheus-datasource

[1] https://review.opendev.org/c/openstack/watcher-specs/+/933300

Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906
2025-01-10 15:20:37 +02:00
Takashi Kajinami
da23fdc621 Remove ceilometer datasource
This datasource requires Ceilometer API which was already removed some
years ago. The implementation should have been removed when dependency
on ceilometerclient was removed by [1].

Also remove some job definitions which are not actually used.

[1] 01d74d0a87

Change-Id: I29c3865dc1207f1bbbb266e4217cf8888afebfb6
2024-12-16 23:51:27 +09:00
Sean Mooney
5fadd0de57 [pre-commit] Fix execute and shebang lines
This commit removes the execute bit from several files
and remove the shebang lines from the devstack plugin.

While the devstack plugin is written in bash, it is not an executable
script. The devstack plugin is sourced by devstack as needed,
as such it is not executed in a subshell and the #!/bin/bash
lines are not used even when present.

Change-Id: I82ca22b7a47bf267fe6cf11f3e3519510108c146
2024-11-07 20:12:59 +00:00
Sean Mooney
5f79ab87c7 [pre-commit] fix typos and configure codespell
This chanage enabled codespell in precommit and
fixes the existing typos.

A followup commit will enable this in tox and ci.

Change-Id: I0a11bcd5a88247a48d3437525fc8a3cb3cdd4e58
2024-11-07 19:50:21 +00:00
Sean Mooney
9d8b990fd1 [pre-commit] Add initial pre-commit config
This change adds configuration for the pre-commit tool,
follow-up changes will address the remaining issues in a phased
approach to make the reviews simpler.

This is based on the pre-commit config used in nova
with some additional hooks.

Follow-up changes will address the FIXME comments
related to sphinx-lint and codespell, as well as update tox
to enforce these checks in ci.

Change-Id: I87681a19f7fa88366c2b0d310c8b3153aa6a137b
2024-10-22 20:12:53 +01:00
Takashi Natsume
61a7dd85ca Replace deprecated datetime.utcnow()
The datetime.utcnow() is deprecated in Python 3.12.
Replace datetime.utcnow() with oslo_utils.timeutils.utcnow().
This bumps oslo.utils to 7.0.0.

Change-Id: Icccbb0549add686a744a72b354932471cbf91c92
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2024-10-02 22:24:47 +09:00
Takashi Kajinami
566a830f64 Bump hacking
hacking 3.0.x is quite old. Bump it to the current latest version.

Change-Id: I8d87fed6afe5988678c64090af261266d1ca20e6
2024-09-22 23:54:36 +09:00
Lucian Petrut
c95ce4ec17 Add MAAS support
At the moment, Watcher can use a single bare metal provisioning
service: Openstack Ironic.

We're now adding support for Canonical's MAAS service [1], which
is commonly used along with Juju [2] to deploy Openstack.

In order to do so, we're building a metal client abstraction, with
concrete implementations for Ironic and MAAS. We'll pick the MAAS
client if the MAAS url is provided, otherwise defaulting to Ironic.

For now, we aren't updating the baremetal model collector since it
doesn't seem to be used by any of the existing Watcher strategy
implementations.

[1] https://maas.io/docs
[2] https://juju.is/docs

Implements: blueprint maas-support

Change-Id: I6861995598f6c542fa9c006131f10203f358e0a6
2023-12-11 10:21:33 +00:00
Lucian Petrut
424e9a76af vm workload consolidation: use actual host metrics
The "vm workload consolidation" strategy is summing up instance
usage in order to estimate host usage.

The problem is that some infrastructure services (e.g. OVS or Ceph
clients) may also use a significant amount of resources, which
would be ignored. This can impact Watcher's ability to detect
overloaded nodes and correctly rebalance the workload.

This commit will use the host metrics, if available. The proposed
implementation uses the maximum value between the host metric
and the sum of the instance metrics.

Note that we're holding a dict of host metric deltas in order to
account for planned migrations.

Change-Id: I82f474ee613f6c9a7c0a9d24a05cba41d2f68edb
2023-10-27 21:54:42 +03:00
Zuul
40e93407c7 Merge "Handle deprecated "cpu_util" metric" 2023-10-27 09:47:38 +00:00
Zuul
721aec1cb6 Merge "vm workload consolidation: allow cold migrations" 2023-10-27 09:47:36 +00:00
Zuul
8a3ee8f931 Merge "Improve vm_consolidation logging" 2023-10-27 09:20:13 +00:00
Lucian Petrut
00fea975e2 Handle deprecated "cpu_util" metric
The "cpu_util" metric has been deprecated a few years ago.
We'll obtain the same result by converting the cumulative cpu
time to a percentage, leveraging the rate of change aggregation.

Change-Id: I18fe0de6f74c785e674faceea0c48f44055818fe
2023-10-24 10:47:23 +00:00
Lucian Petrut
fd6562382e Avoid performing retries in case of missing resources
There may be no available metrics for instances that are stopped
or were recently spawned. This makes retries unnecessary and time
consuming.

For this reason, we'll ignore gnocchi MetricNotFound errors.

Change-Id: I79cd03bf04db634b931d6dfd32d5150f58e82044
2023-10-23 14:14:21 +00:00
Lucian Petrut
ec90891636 Improve vm_consolidation logging
We're adding a few info log messages in order to trace the
"vm consolidation" strategy more easily.

Change-Id: I8ce1a9dd173733f1b801839d3ad0c1269c4306bb
2023-10-23 14:10:02 +00:00
Lucian Petrut
7336a48057 vm workload consolidation: allow cold migrations
Although Watcher supports cold migrations, the vm workload
consolidation workflow only allows live migrations to be
performed.

We'll remove this unnecessary limitation so that stopped instances
could be cold migrated.

Change-Id: I4b41550f2255560febf8586722a0e02045c3a486
2023-10-23 13:03:18 +00:00
Lucian Petrut
922478fbda Unblock the CI gate
The Nova collector json schema validation started [1][2] failing after
the jsonschema upper constraint was bumped from 4.17.3 to 4.19.1 [3].

The reason is that jsonschema v4.18.0a1 switched to a reference
resolving library [4], which treats the aggregate "id" as a jsonschema
id and expects it to be a string [5]. For this reason, we're now getting
AttributeError exceptions.

As a workaround, we'll rename the "id" ref element as "host_aggr_id".

Also, the watcher-tempest-multinode job is configured to use Focal,
which is no longer supported by Devstack [6]. That being considered,
we'll switch to Ubuntu Jammy (22.04).

While at it, we're disabling Cinder Backup, which isn't used while
testing Watched. It currently causes Devstack failures since it
uses the Swift backend by default, which is disabled.

[1] https://paste.opendev.org/raw/bjQ1uIdbDMnmA1UEhxLL/
[2] https://paste.opendev.org/raw/bNgxqulBwBLYB7tNhrU4/
[3] ab0dcbdda2
[4] https://github.com/python-jsonschema/jsonschema/releases/tag/v4.18.0a1
[5] c23a5dc1c9/referencing/jsonschema.py (L54-L55C18)
[6] https://paste.openstack.org/raw/bSoSyXgbtmq6d9768HQn/

Change-Id: I300620c2ec4857b1e0d402a9b57a637f576eeb24
2023-10-23 09:21:55 +03:00
BubaVV
0610070e59 Add timeout option for Grafana request
Implemented config option to setup Grafana API request timeout

Change-Id: I8cbf8ce22f199fe22c0b162ba1f419169881f193
2023-08-23 17:46:19 +03:00
chenker
4ea3eada3e Fix watcher comment
Change-Id: I4512cf1032e08934886d5e3ca858b3e05c3da76c
2023-08-13 00:00:12 +00:00
chenker
52da088011 Modify saving_energy log info
Change-Id: I84879a453aa3ff78917d1136c62978b9d0e606de
2023-02-07 10:20:04 +00:00
chenker
6dd2f2a9c1 BugFix: Prevent float type variables from being passed to random
>>> random.sample([5,10], 1.3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/random.py", line 321, in sample
    result = [None] * k
TypeError: can't multiply sequence by non-int of type 'float'

Change-Id: Ifa5dca06f07220512579e4fe3c5c741aeffc71cc
2021-08-23 01:58:52 +00:00
Takashi Kajinami
a993849928 Use Block Storage API v3 instead of API v2
Block Storage API v2 was deprecated during Pike cycle and is being
removed during Xena cycle, and current v3 API should be used instead.

Change-Id: Ia5247742b31f5f07186ef908588f0972d3ac609f
2021-07-27 11:04:16 +09:00
sue
c28756c48b use HTTPStatus instead of direct code
Python introduced http.HTTPStatus since version 3.5,
and Wallaby has targeted a minimum version of python 3.6.

Change-Id: I45f732f0f59b8fae831bb6c07f4fdd98cdd7409a
2021-07-09 11:02:36 +02:00
zhufl
204b276693 Fix missing self argument in instances_no_attached
instances_no_attached should have self as the first argument, this is
to add it.

Change-Id: I010d9d1e9ddb8790c398bcf06d0772a0d17f57ec
2020-11-27 17:01:52 +08:00
zhufl
af02bebca9 Fix parameter passed to IronicNodeNotFound exception
IronicNodeNotFound expects uuid parameter for the error message,
not name.

Change-Id: I9fefa98fa9fe6f6491e5f621190cac7d376db6c9
2020-11-02 15:48:27 +08:00
xuanyandong
16a0486655 Remove six
Replace the following items with Python 3 style code.

- six.string_types
- six.integer_types
- six.moves
- six.PY2

Implements: blueprint six-removal

Change-Id: I2a0624bd4b455c7e5a0617f1253efa05485dc673
2020-09-30 16:25:13 +08:00
Dantali0n
cca0d9f7d7 Implements base method for time series metrics
Implements base method as well as some basic implementations to
retrieve time series metrics. Ceilometer can not be supported
as API documentation has been unavailable. Grafana will be
supported in follow-up patch.

Partially Implements: blueprint time-series-framework

Change-Id: I55414093324c8cff379b28f5b855f41a9265c2d3
2020-08-26 16:01:15 +02:00
licanwei
9f0138e1cf Check if scope is None
if scope is None, don't create data model

Change-Id: Icf611966c9b0a3882615d778ee6c72a8da73841d
Closed-Bug: #1881920
2020-06-18 00:58:16 +00:00
zhangbailin
f0f15f89c6 Remove future imports
These particular imports are no longer needed in a Python 3-only world.

Change-Id: I5e9e15556c04871c451f6363380f2a7ac026c968
2020-05-02 00:33:39 +00:00
chenke
0ef0f165cb Remove six[7]
Since our code will only support py3. So remove six is necessary.

Change-Id: I3738118b1898421ee41e9e2902c255ead73f3915
2020-04-22 15:59:15 +08:00
Andreas Jaeger
1bb2aefec3 Update hacking for Python3
The repo is Python 3 now, so update hacking to version 3.0 which
supports Python 3.

Fix problems found.

Update local hacking checks for new flake8.

Remove hacking and friends from lower-constraints, they are not needed
to be installed at run-time.

Change-Id: Ia6af344ec8441dc98a0820176373dcff3a8c80d5
2020-04-02 07:50:02 +02:00
Zuul
8835576374 Merge "Add audit type: event" 2020-01-10 03:30:03 +00:00
zhufl
db709691be Fix duplicated words issue like "an active instance instance"
This is to fix the duplicated words issue like
"Pick up an active instance instance to migrate".

Change-Id: I74de4eb06aa1e462f0b499e3fd62a7cdc7570b31
2020-01-06 15:29:25 +08:00
licanwei
6a173a9161 Add audit type: event
This patchset added a new audit type: event,
and the handler to execute event audit.

Partially Implements: blueprint event-driven-optimization-based

Change-Id: I287471ee4d1dcc42af7a6bcc15f8509d4ce73072
2019-12-13 15:14:41 +08:00
Zuul
b7baa88010 Merge "Use threadpool when building compute data model" 2019-11-30 02:23:51 +00:00
Zuul
65ec309050 Merge "General purpose threadpool for decision engine" 2019-11-30 02:22:13 +00:00
licanwei
4a269ba039 Change self.node to self.nodes in model_root
networkx removed G.node in version 2.4[1]
G.node was replaced by G.nodes since version 2.0[2],
and supports Python 2.7, 3.5, 3.6 and 3.7 from 2.2
so the lower constraint version is 2.2.
lib task_flow also invokes lib networkx,
task_flow version is also needed to be updated.
[1]: https://networkx.github.io/documentation/stable/release/release_2.4.html
[2]: https://networkx.github.io/documentation/stable/release/release_2.0.html
Change-Id: I268bcf57ec977bd8132a9f1573b28b681cb4ce1e
Closes-Bug: #1854132
2019-11-27 17:19:29 +08:00
licanwei
689ae25ef5 Refactoring the codes about getting used and free resources
We have provided functions to get used and free resources in
class ModelRoot. So strategies can invoke the functions to
get used and free resources.

Change-Id: I3c74d56539ac6c6eb16b0d254a76260bc791567c
2019-11-12 16:22:09 +08:00
Dantali0n
c644e23ca0 Use threadpool when building compute data model
Use the general purpose threadpool when building the nova compute
data model. Additionally, adds thorough explanation about theory of
operation.

Updates related test cases to better ensure the correct operation
of add_physical_layer.

Partially Implements: blueprint general-purpose-decision-engine-threadpool

Change-Id: I53ed32a4b2a089b05d1ffede629c9f4c5cb720c8
2019-11-01 13:44:15 +01:00