Commit Graph

2553 Commits

Author SHA1 Message Date
Zuul
024815af71 Merge "use cinder migrate for swap volume" into stable/2025.1 14.1.0 2025-08-19 13:39:36 +00:00
Sean Mooney
ffec800f59 use cinder migrate for swap volume
This change removes watchers in tree functionality
for swapping instance volumes and defines swap as an alias
of cinder volume migrate.

The watcher native implementation was missing error handling
which could lead to irretrievable data loss.

The removed code also forged project user credentials to
perform admin request as if it was done by a member of a project.
this was unsafe an posses a security risk due to how it was
implemented. This code has been removed without replacement.

While some effort has been made to allow existing
audits that were defined to work, any reduction of functionality
as a result of this security hardening is intentional.

Closes-Bug: #2112187
Change-Id: Ic3b6bfd164e272d70fe86d7b182478dd962f8ac0
Signed-off-by: Sean Mooney <work@seanmooney.info>
(cherry picked from commit 3742e0a79c)
2025-08-18 16:36:38 +00:00
Douglas Viroel
defd3953d8 Configure watcher tempest's microversion in devstack
Adds a tempest configuration for min and max microversions supported
by watcher. This help us to define the correct range of microversion
to be tested on each stable branch.
New microversion proposals should also increase the default
max_microversion, in order to work with watcher-tempest-plugin
microversion testing.

Change-Id: I0b695ba4530eb89ed17b3935b87e938cadec84cc
(cherry picked from commit adfe3858aa)
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-04 12:18:53 +00:00
Zuul
de75a2a5b2 Merge "Return HTTP code 400 when creating an audit with wrong parameters" into stable/2025.1 2025-07-21 19:28:47 +00:00
Zuul
7c15812d68 Merge "Add a unit test to check the error when creating an audit with wrong parameters" into stable/2025.1 2025-07-21 19:28:45 +00:00
Zuul
c47f6fb66c Merge "Fix audit creation with no name and no goal or audit_template" into stable/2025.1 2025-07-21 19:28:43 +00:00
Zuul
a4ece6f084 Merge "Added unit test to validate audit creation with no goal and no name" into stable/2025.1 2025-07-21 19:25:04 +00:00
Zuul
758acdfb99 Merge "Set actionplan state to FAILED if any action has failed" into stable/2025.1 2025-07-08 19:46:06 +00:00
Zuul
b65cfc283a Merge "Add unit test to check action plan state when a nested action fails" into stable/2025.1 2025-07-08 19:46:05 +00:00
Zuul
f5a21ba43e Merge "Aggregate by label when querying instance cpu usage in prometheus" into stable/2025.1 2025-07-08 12:41:34 +00:00
Alfredo Moralejo
ba417b38bf Fix audit creation with no name and no goal or audit_template
Currently, in that case it was failing because watcher tried to create a
name based on a goal automatically and the goal is not defined.

This patch is moving the check for goal specification in the audit
creation call earlier, and if there is not goal defined, it returns an
invalid call error.

This patch is also modifying the existing error for this case to check
the expected behavior.

Closes-Bug: #2110947

Change-Id: I6f3d73b035e8081e86ce82c205498432f0e0fc33
(cherry picked from commit bf6a28bd1e)
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-07-02 08:16:12 +02:00
Alfredo Moralejo
38622442f2 Set actionplan state to FAILED if any action has failed
Currently, an actionplan state is set to SUCCEEDED once the execution
has finished, but that does not imply that all the actions finished
successfully.

This patch is checking the actual state of all the actions in the plan
after the execution has finished. If any action has status FAILED, it
will set the state of the action plan as FAILED and will apply the
appropiate notification parameters. This is the expected behavior according
to Watcher documentation.

The patch is also fixing the unit test for this to set the expected
action plan state to FAILED and notification parameters.

Closes-Bug: #2106407
Change-Id: I7bfc6759b51cd97c26ec13b3918bd8d3b7ac9d4e
(cherry picked from commit 88d81c104e)
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-07-02 08:12:45 +02:00
Alfredo Moralejo
c7fde92411 Add unit test to check action plan state when a nested action fails
This patch is adding a new unit test to check the behavior of the action
plan when one of the actions in it fails during execution.

Note this is to show a bug, and the expected state will be changed in
the fixing patch.

Related-Bug: #2106407
Change-Id: I2f3fe8f4da772a96db098066d253e5dee330101a
(cherry picked from commit b36ba8399e)
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-07-02 08:12:13 +02:00
Alfredo Moralejo
e5b5ff5d56 Return HTTP code 400 when creating an audit with wrong parameters
Currently, when trying to create an audit which misses a mandatory
parameter watcher returns error 500 instead of 400 which is the
documented error in the API [1] and the appropiate error code for
malformed requests.

This patch catch parameters validation errors according to the json
schema for each strategy and returns error 400. It also fixes the
unit test to validate the expected behavior.

[1] https://docs.openstack.org/api-ref/resource-optimization/#audits

Closes-Bug: #2110538
Change-Id: I23232b3b54421839bb01d54386d4e7b244f4e2a0
(cherry picked from commit 4629402f38)
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-07-02 08:07:26 +02:00
Zuul
3a923dbf16 Merge "Use KiB as unit for host_ram_usage when using prometheus datasource" into stable/2025.1 2025-06-27 16:59:14 +00:00
Alfredo Moralejo
fb85b27ae3 Use KiB as unit for host_ram_usage when using prometheus datasource
The prometheus datasource was reporting host_ram_usage in MiB as
described in the docstring for the base datasource interface
definition [1].

However, the gnocchi datasource is reporting it in KiB following
ceilometer metric `hardware.memory.used` [2] and the strategies
using that metric expect it to be in KiB so the best approach is
to change the unit in the prometheus datasource and update the
docstring to avoid missunderstandings in future. So, this patch
is fixing the prometheus datasource to return host_ram_usage
in KiB instead of MiB.

Additionally, it is adding more unit tests for the check_threshold
method so that it covers the memory based strategy execution, validates
the calculated standard deviation and adds the cases where it is below
the threshold.

[1] 15981117ee/watcher/decision_engine/datasources/base.py (L177-L183)
[2] https://docs.openstack.org/ceilometer/train/admin/telemetry-measurements.html#snmp-based-meters

Closes-Bug: #2113776
Change-Id: Idc060d1e709c0265c64ada16062c3a206c6b04fa
(cherry picked from commit 6ea362da0b)
2025-06-20 16:35:40 +00:00
Alfredo Moralejo
53872f9af2 Aggregate by label when querying instance cpu usage in prometheus
Currently, when the prometheus datasource query ceilometer_cpu metric
for instance cpu usage, it aggregates by instance and filter by the
label containing the instance uuid. While this works fine in real
scenarios, where a single metric is provided in a single instance, in
some cases as the CI jobs where metrics are directly injected, leads to
incorrect metric calculation.

We applied a similar fix for the host metrics in [1] but we did not
implement it for instance cpu.

I am also converting the query formatting to the dict format to improve
understability.

[1] https://review.opendev.org/c/openstack/watcher/+/946049

Closes-Bug: #2113936
Change-Id: I3038dec20612162c411fc77446e86a47e0354423
(cherry picked from commit 3860de0b1e)
2025-06-19 14:54:56 +00:00
Chandan Kumar (raukadah)
c0ebb8ddb3 Drop code from Host maintenance strategy migrating instance to disabled hosts
Currently host maintenance strategy also migrate instances from maintenance
node to watcher_disabled compute nodes.

watcher_disabled compute nodes might be disabled for some other purpose
by different strategy. If host maintenace use those compute nodes for
migration, It might affect customer workloads.

Host maintenance strategy should never touch disabled hosts unless the user
specify a disable host as backup node.

This cr drops the logic for using disabled compute node for maintenance.
Host maintaince is already using nova schedular for migrating the
instance, will use the same. If there is no available node, strategy
will fail.

Closes-Bug: #2109945

Change-Id: If9795fd06f684eb67d553405cebd8a30887c3997
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
(cherry picked from commit 9dea55bd64)
2025-06-09 19:40:41 +05:30
Alfredo Moralejo
1d7f163651 Added unit test to validate audit creation with no goal and no name
This patch is adding a new unit test to validate the behavior
of the API when trying to create an audit without a goal (whether using
a goal or audit template parameters) and no name is provided.

Related-Bug: https://bugs.launchpad.net/watcher/+bug/2110947
Change-Id: I04df10a8a0eea4509856f2f4b9d11bae24cd563a
(cherry picked from commit 0651fff910)
2025-06-04 13:06:14 +00:00
Alfredo Moralejo
c6ceaacf27 Add a unit test to check the error when creating an audit with wrong parameters
Currently, it is returning http error code 500 instead of 400, which
would be the appropiate code.

A follow-up patch will be sent with the vix and switching the error code
and message.

Related-Bug: #2110538
Change-Id: I35ccbb9cf29fc08e78c4d5f626a6518062efbed3
(cherry picked from commit 891119470c)
2025-06-04 12:52:40 +00:00
Chandan Kumar (raukadah)
f4bfb10525 [host_maintenance] Pass des hostname in add_action solution
Currently we are passing src_node and des_node uuid when we try to run
migrate action.

In the watcher-applier log, migration fails with following exception
```
Nova client exception occurred while live migrating instance <uuid>Exception: Compute host <uuid> could not be found
```
Based on 57f55190ff/watcher/applier/actions/migration.py (L122)
and
57f55190ff/watcher/common/nova_helper.py (L322),
live_migrate_instance expects destination hostname not uuid.

This cr replaces dest_node uuid to hostname.

Closes-Bug: #2109309

Change-Id: I3911ff24ea612f69dddae5eab15fabb4891f938d
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
(cherry picked from commit 278cb7e98c)
2025-05-05 02:53:03 +00:00
Sean Mooney
8a99d4c5c1 Add support for pyproject.toml and wsgi module paths
pip 23.1 removed the "setup.py install" fallback for projects that do
not have pyproject.toml and now uses a pyproject.toml which is vendored
in pip [1][2]. pip 24.2 has now deprecated a similar fallback to
"setup.py develop" and plans to fully remove this in pip 25.0 [3][4][5].
pbr supports editable installs since 6.0.0

pip 25.1 has now been released and the removal is complete.
by adding our own minimal pyproject.toml to ensure we are using the
correct build system.

This change also requires that we adapt how we generate our wsgi
entry point. when pyproject.toml is used the wsgi console script is
not generated in an editbale install such as is used in devstck

To adress this we need to refactor our usage of our wsgi applciation
to use a module path instead. This change does not remove
the declaration of our wsgi_scrtip entry point but it shoudl
be considered deprecated and it will be removed in the future.

To unblock the gate the devstack plugin is modifed to to deploy
using the wsgi module instead of the console script.

Finally supprot for the mod_wsgi wsgi mode is removed.
that was deprecated in devstack a few cycle ago and
support was removed in I8823e98809ed6b66c27dbcf21a00eea68ef403e8

[1] https://pip.pypa.io/en/stable/news/#v23-1
[2] https://github.com/pypa/pip/issues/8368
[3] https://pip.pypa.io/en/stable/news/#v24-2
[4] https://github.com/pypa/pip/issues/11457
[5] https://ichard26.github.io/blog/2024/08/whats-new-in-pip-24.2/
Closes-Bug: #2109608

Change-Id: Iad77939ab0403c5720c549f96edfc77d2b7d90ee
2025-05-01 00:19:23 +00:00
Alfredo Moralejo
ce9f0b4c1e Skip real-data tests in non-real-data jobs
I am excluding strategies execution with annotation `real_load` in
non-real-load jobs.

This is partial backport of [1].

[1] https://review.opendev.org/c/openstack/watcher/+/945627

Change-Id: I77d4c23ebc21693bba8ca0247b8954c6dc8eaba9
2025-04-24 17:02:21 +02:00
Alfredo Moralejo
e385ece629 Aggregate by fqdn label instead instance in host cpu metrics
While in a regular case a specific metric for a specific host will be
provider by a single instance (exporter) so aggregating by label and by
intances should be the same, it is more correct to aggregate by the same
label that the one we use to filter the metrics.

This is follow up of https://review.opendev.org/c/openstack/watcher/+/944795

Related-Bug: #2103451

Change-Id: Ia61f051547ddc51e0d1ccd5a56485ab49ce84c2e
(cherry picked from commit c7158b08d1)
2025-04-09 09:00:17 +02:00
Alfredo Moralejo
c6505ad06f Query by fqdn_label instead of instance for host metrics
Currently we are using `instance` label to query about host metrics to
prometheus. This label is assigned to the url of each endpoint being
scrapped.

While this work fine in one-exporter-per-compute cases as the driver is
mapping the fqdn_label value to the `instance` label value, it fails
when there are more that one target with the same value for the fqdn
label. This is a valid case, to be able to query by fqdn and do not
care about what exporter in the host is providing the metric.

This patch is changing the queries we use for hosts to be based on the
fqdn_label instead of the instance one. To implement it, we are also
simplifying the way we check the metric exist for the host by converting
prometheus_fqdn_instance_map into a prometheus_fqdn_labels set
which stores the list of fqdn found in  prometheus.

Closes-Bug: #2103451
Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e
(cherry picked from commit a65e7e9b59)
2025-04-09 08:59:52 +02:00
Chandan Kumar (raukadah)
64f70b948d Drop sg_core prometheus related vars
The depends-on pr removes the installation of promotheus[1] and node
exporter[2] from sg_core. We no longer need to define those vars in
the devstack config.

Links:
[1]. https://github.com/openstack-k8s-operators/sg-core/pull/21
[2]. https://github.com/openstack-k8s-operators/sg-core/pull/23

Note: We do not need to enable sg_core service on compute node,
so removing it's plugin call.

Change-Id: Ie8645813a360605635de4dff9e8d1ba0d7a0cdc3
Signed-off-by: Chandan Kumar (raukadah) <raukadah@gmail.com>
(cherry picked from commit 0702cb3869)
2025-04-09 08:58:28 +02:00
OpenStack Release Bot
68c9ce65d2 Update TOX_CONSTRAINTS_FILE for stable/2025.1
Update the URL to the upper-constraints file to point to the redirect
rule on releases.openstack.org so that anyone working on this branch
will switch to the correct upper-constraints list automatically when
the requirements repository branches.

Until the requirements repository has as stable/2025.1 branch, tests will
continue to use the upper-constraints list on master.

Change-Id: I29d11e287122d21e62bd6266a193db480dcc4a23
2025-03-13 13:51:53 +00:00
OpenStack Release Bot
5fa0926528 Update .gitreview for stable/2025.1
Change-Id: Ic5083f5a799b269aee36b2c83408f0ba7cbded0d
2025-03-13 13:51:51 +00:00
Zuul
f2ee231f14 Merge "pre-commit: Integrate bandit" 14.0.0 14.0.0.0rc1 2025-03-11 09:58:29 +00:00
Zuul
3861701f4a Merge "Replace deprecated abc.abstractproperty" 2025-03-11 09:47:31 +00:00
Zuul
d167134265 Merge "Drop implicit test dependency on iso8601" 2025-03-11 09:47:30 +00:00
Sean Mooney
bbf5c41cab Add epoxy prelude
This change added the prelude for the 2025.1 Expoxy release cycle.

Change-Id: I8223842a57491a91c565e47bd1819db4d142e628
2025-03-05 17:57:55 +00:00
Takashi Kajinami
df3d67a4ed Replace deprecated abc.abstractproperty
It was deprecated in Python 3.3 [1].

[1] https://docs.python.org/3.13/whatsnew/3.3.html#abc

Change-Id: Ibd98cb93f697a6da6a6bc5a5030640a262c7a66b
2025-03-02 15:36:48 +09:00
Takashi Kajinami
82f1c720dd Drop implicit test dependency on iso8601
The library has been missing from the test requirements although it is
directly used. Replace it by the built-in datetime module to get rid
of the unmaintained direct dependency.

Change-Id: I1d08b38862b54fee4c7c26161f59264fb3f2ce51
2025-03-01 15:23:15 +09:00
Zuul
77a30ef281 Merge "Enable prometheus datasource in watcher-prometheus-integration job" 2025-02-28 13:26:10 +00:00
Zuul
383751904c Merge "Further database refactoring" 2025-02-27 11:52:59 +00:00
Zuul
6a1f19d314 Merge "Deprecate Monasca data source" 2025-02-27 11:45:15 +00:00
Douglas Viroel
342fe8882a Enable prometheus datasource in watcher-prometheus-integration job
Enable prometheus as datasouce in tempest configuration,
to enable metric generation needed to run some scenario
tests. It is enabled on the watcher-prometheus-integration
job

Depends-On: https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942141
Depends-On: https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942308

Change-Id: I2b657782aedf61d89766fcd18bb453b62c0b0e3b
2025-02-22 10:46:01 -03:00
Chandan Kumar (raukadah)
7fcca0cc46 Enable prometheus and node_exporter from devstack-plugin-prometheus
https://opendev.org/openstack/devstack-plugin-prometheus is the new
devstack plugin providing functionality to install/configure
prometheus/node_exporter.

It will replace sg_core devstack plugin in future.

Depends-On: https://review.opendev.org/c/openstack/watcher/+/938893
Depends-On: https://review.opendev.org/c/openstack/devstack-plugin-prometheus/+/940426

Change-Id: Ia75e6597275b36c04cde653c16f7d45ed23bc261
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-02-19 08:49:53 -03:00
Takashi Kajinami
977f014cba Deprecate Monasca data source
The Monasca project was marked inactive during 2023.1. Although we have
seen multiple people showing interest to keep the project, we haven't
seen any real progress.

Because the project is likely retired soon, let's deprecate the feature
dependent on Monasca so that we can remove it in a future release.

Change-Id: Ifd64f5ba59bbac238ff62302ec36a3e36954d6d0
2025-02-16 18:45:31 +09:00
James Page
753c44b0c4 Further database refactoring
More refactoring of the SQLAlchemy database layer to improve
compatility with eventlet on newer Pythons.

Inspired by 0ce2c41404

Related-Bug: 2067815
Change-Id: Ib5e9aa288232cc1b766bbf2a8ce2113d5a8e2f7d
2025-02-14 11:42:47 +00:00
Takashi Kajinami
dd0082c343 pre-commit: Integrate bandit
Run bandit check from per-commit so that the check is executed in pep8
job.

Also remove requirements installed automatically by pre-commit from
test-requirements.

Change-Id: I45af8c47afb262882ebbee74ae52446fed741e26
2025-02-10 22:50:34 +09:00
Takashi Kajinami
5f6fbaea56 Remove unused os-api-ref from test requirements
It is used when building API reference but is not used in any testing.

Change-Id: I6af7c7b110b338acad10eccf42344a338afbc915
2025-02-09 08:14:17 +09:00
Takashi Kajinami
6b81b34b27 Drop import fallback for Python 2
cPickle no longer exists in Python 3 and pickle should be used always.

Change-Id: I5ddedb3e996d9a0679bab38ea94263886274ece4
2025-02-09 08:04:36 +09:00
Zuul
961bbb9460 Merge "Update master for stable/2024.2" 2025-02-06 08:07:22 +00:00
Zuul
d56e8ee65a Merge "X-Project-Name key in test code was duplicated" 2025-02-03 18:29:23 +00:00
Zuul
4527f89d8d Merge "Add support for instance metrics to prometheus datasource" 2025-02-03 13:22:28 +00:00
Zuul
e535177bc0 Merge "Remove ceilometer datasource" 2025-01-29 13:22:46 +00:00
Zuul
022d150d20 Merge "Add prometheus data source for watcher decision engine" 2025-01-24 13:46:32 +00:00
Alfredo Moralejo
136e5d927c Add support for instance metrics to prometheus datasource
In order to support vm_workload_consolidation, workload_balance and
workload_stabilization strategis some instance metrics are required.
This patch is adding support for them.

Implementation is based on a prometheus store populated using sg-core
from ceilometer metrics with Pollster source.

- instance_ram_usage: rely on ceilometer_memory_usage metrics created from
  ceilometer memory.usage meter.
- instance_ram_allocated: rely on the memory value provided by the
  inventory created from nova and placement APIs.
- instance_cpu_usage: rely on ceilometer_cpu metric created from
  ceilometer cpu meter. A max value of 100 is set in the query.
- instance_root_disk_size: rely on the `disk` value provided by the
  inventory created from nova and placement APIs.

A new parameterer `instance_uuid_label` has been added to the prometheus
datasource configuration to identify the label used to store the value of the
OpenStack instance uuid for eache instance metric in prometheus. Default
value is `resource`.

Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2
2025-01-23 13:23:04 +01:00