Commit Graph

2678 Commits

Author SHA1 Message Date
Zuul
848cde3606 Merge "Rename confusing query timeout options" 2025-08-28 09:26:40 +00:00
Zuul
63cf35349c Merge "Extend compute model attributes" 2025-08-27 16:40:53 +00:00
Takashi Kajinami
7106a12251 Rename confusing query timeout options
These do not actually define timeout but interval. Rename the options
to reflect what they actually define. The existing deprecated options
in the [gnocchi_client] are also removed, because these have been kept
for 6 years.

In addition, fix inconsistent name (query vs call).

Change-Id: Ib29115746a25b45bdff1c3da8df9d7167c2db662
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-27 23:22:45 +09:00
Douglas Viroel
03c09825f7 Extend compute model attributes
This patch extends compute model attributes by
adding new fields to Instance element. Values are
populated by nova the collector, using the same
nova list call, but requires a more recent compute
API microversion.
A new config option was added to allow users to
enable or disable the extended attributes and it is
disable by default.
Configure prometheus-based jobs to run on newer version
of nova api (2.96) and enables the extended attributes
collection.

Implements: bp/extend-compute-model-attributes

Assisted-By: Cursor (claude-4-sonnet)

Change-Id: Ibf31105d780dce510a59fc74241fa04e28529ade
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-26 11:35:18 -03:00
Douglas Viroel
2452c1e541 Follow up changes for skip-action blueprint
These are some of the requested changes from reviews
in the series of patches for add-skip-action blueprint.
Some of them may required another specific patch since
would touch in more files that are not related to
this feature.

Change-Id: I9e30ca385e7b184ab19449a60db6f6d0f3c0e1b9
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-26 10:27:57 -03:00
Zuul
d91b550fc9 Merge "Fix missing watcher_workflow_engines.taskflow section" 2025-08-26 13:16:19 +00:00
Zuul
1668b9b9f8 Merge "API changes for skipped actions: patch actions and status_message" 2025-08-26 12:54:31 +00:00
Zuul
5e05b50048 Merge "Skip actions automatically based on pre_condition results" 2025-08-26 12:33:08 +00:00
Zuul
4d8f86b432 Merge "Fix NovaHelper microversion comparison" 2025-08-25 19:18:57 +00:00
Zuul
05d8f0e3c8 Merge "Validate endpoint_type option at loading" 2025-08-25 12:06:44 +00:00
Takashi Kajinami
1a87abc666 Fix missing watcher_workflow_engines.taskflow section
... caused by AttributeError.

Closes-Bug: #2121286
Change-Id: I52bab27afdc96d8ce2d9733316737c3aa505f5fe
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-24 22:58:28 +09:00
Zuul
fa4552b93f Merge "Fix type mismatch between option and its default" 2025-08-24 13:21:43 +00:00
Takashi Kajinami
a07bfa141d Fix type mismatch between option and its default
... to avoid the following warning.

```
UserWarning: converting '1' to a string
  warnings.warn('converting \'%s\' to a string' % str_val)
```

Change-Id: I852d63523d3582f00d4d7953199181e3d2b6a885
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-24 04:22:33 +09:00
Zuul
a6668a1b39 Merge "Update Overload standard deviation doc" 2025-08-22 15:22:04 +00:00
Zuul
534c340df1 Merge "Add new tests to validate GET /infra-optim/v1/data_model" 2025-08-22 14:16:05 +00:00
Zuul
a963e0ff85 Merge "Fix api-ref doc for GET /infra-optim/v1/data_model" 2025-08-22 14:03:15 +00:00
Ronelle Landy
457819072f Update Overload standard deviation doc
Bug #2113862 details a number of suggested
corrections and additions to the Workload
Stabilization doc. This patch adds those
suggested changes.

Closes-Bug: #2113862
Assisted-By: Cursor (claude-3.5-sonnet)
Change-Id: I4131a304c064d2ea397b2447025c7edf69a56e2a
Signed-off-by: Ronelle Landy <rlandy@redhat.com>
2025-08-21 11:09:46 -04:00
Zuul
6d155c4be6 Merge "Add status_message to objects and notifications" 2025-08-21 14:59:53 +00:00
Zuul
83fea206df Merge "Add status_message column to Actions, Audits and ActionPlans tables" 2025-08-21 14:50:46 +00:00
Zuul
00a3edeac6 Merge "Add parameters to force failures in nop action" 2025-08-21 14:32:37 +00:00
Zuul
b69642181b Merge "Add patch call validation based on allowed_attrs" 2025-08-21 14:24:09 +00:00
Zuul
616c8f4cc4 Merge "Add options to disable migration in host maintenance" 2025-08-21 14:11:22 +00:00
Quang Ngo
cc26b3b334 Add options to disable migration in host maintenance
This change enhances the Host Maintenance strategy by introducing
two new input parameters: `disable_live_migration` and
`disable_cold_migration`. These parameters allow cloud
administrators to control whether live or cold migration should be
considered during host maintenance operations.

If `disable_live_migration` is set, active instances will be cold
migrated if `disable_cold_migration` is not set, otherwise
active instances will be stopped. If `disable_cold_migration` is set,
inactive instances will not be cold migrated.
If both are set, only stop actions will be performed on instances.

The strategy logic and action plan generation have been updated to
reflect these behaviors. A new "stop" action is introduced and
registered, and the weight planner is updated to handle new action.

Documentation for the Host Maintenance strategy is updated to
describe the new parameters and their effects.

Test Plan:
- Unit tests for HostMaintenance strategy with new parameters
- Integration tests for action plan generation with stop action

This implements the specification:
Spec: https://review.opendev.org/c/openstack/watcher-specs/+/943873

Change-Id: I201b8e5c52e1bc1a74f3886a0e301e3c0fa5d351
Signed-off-by: Quang Ngo <quang.ngo@canonical.com>
2025-08-20 22:32:33 +10:00
Douglas Viroel
9003906bdc Fix NovaHelper microversion comparison
Fixes the microversion comparison in both enable and
disable nova-compute service methods in NovaHelper.
The previous implementation was incorrect and started to
fail for microversion greather than 2.99.

Closes-Bug: #2120586

Assisted-By: Cursor (claude-4-sonnet)

Change-Id: I69da7f10cd5b42f7d4613d8947bca3e382815c3f
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-20 08:35:18 -03:00
Alfredo Moralejo
e06f1b0475 API changes for skipped actions: patch actions and status_message
This patch implements the changes in the API required for the
skipped action blueprint. It includes:

- New field `status_message` is visible in API get calls for Audits,
  ActionPlans and Audits.
- New Patch call is added to `/actions/{action_id}` which allows to
  manually move actions in PENDING state to SKIPPED for ActionPlans
  which have not been started.
- A new API microversion 1.5 is added for these changes.

It also adds requried tests and documentation.

Implements: blueprint add-skip-actions

Assisted-By: Cursor (claude-4-sonnet)

Change-Id: I71fb9af76085e5941a7fd3e9e4c89d6f3a3ada47
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-20 13:13:19 +02:00
Alfredo Moralejo
6d35be11ec Skip actions automatically based on pre_condition results
This patch is implementing skipping automatically actions based on the
result of action pre_condition method. This will allow to manage
properly situations as migration actions for vms which does not longer
exist. This patch includes:

- Adding a new state SKIPPED to the Action objects.
- Add a new Exception ActionSkipped. An action which raises it from the
  pre_condition execution is moved to SKIPPED state.
- pre_condition will not be executed for any action in SKIPPED state.
- execute will not be executed for any action in SKIPPED or FAILED state.
- post_condition will not be executed for any action in SKIPPED state.
- moving transition to ONGOING from pre_condition to execute. That means
  that actions raising ActionSkipped will move from PENDING to SKIPPED
  while actions raising any other Exception will move from PENDING to
  FAILED.
- Adding information on action failed or skipped state to the
  `status_message` field.
- Adding a new option to the testing action nop to simulate skipping on
  pre_condition, so that we can easily test it.

Implements: blueprint add-skip-actions

Assisted-By: Cursor (claude-4-sonnet)

Change-Id: I59cb4c7006c7c3bcc5ff2071886d3e2929800f9e
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-20 13:10:10 +02:00
Takashi Kajinami
1009c3781b Validate endpoint_type option at loading
... instead of documenting the supported values, so that more explicit
error is presented to users.

Also drop redundant description about the default values. The default
values are added to sample config files generated, so don't have to
be explained in help texts.

Change-Id: I12b201da3e742b55f6cfcf71bdd4413cbf3ee4e5
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-20 01:44:59 +09:00
Alfredo Moralejo
5048a6e3ba Add status_message to objects and notifications
This patch is part of the skipped action blueprint. It adds the
`status_message` field to the Audit, ActionPlan and Action objects and
all related notifications.

It bumps the versions of all the affected objects and notifications and
update the tests to include the new fields.

Change-Id: I3b9467e7e37188e647379cd9c4cbbda8ed75383f
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-19 13:01:00 +02:00
Alfredo Moralejo
84742be8c2 Add status_message column to Actions, Audits and ActionPlans tables
This patch implements the changes in the database required for the
skipped action blueprint.

It just adds a new nullable column to the required tables and add tests
for it.

Note that I am  also introducing a fix in a previous tables tests which
will be affected by the changes in the objects.

Implements: blueprint add-skip-actions

Change-Id: I027bc3861b589bd281a7216583a8c5c351a53c57
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-19 11:05:39 +02:00
Alfredo Moralejo
1fb89aeac3 Add parameters to force failures in nop action
In order to test the different code paths for action execution
it is very useful to be able to make the actions fail in the different
execution stages.

This patch adds three new options `fail_pre_condition`, `fail_execute`
and `fail_post_condition`. Setting any of them to True makes the action
to fail in the specified step.

Change-Id: Ied8c0bb767d9bb6bdfb9209365857a3b4d606b40
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-19 11:05:11 +02:00
Alfredo Moralejo
1a9f17748e Add patch call validation based on allowed_attrs
Currently, patch call field validations are done based on exclussion,
all the fields can be patched unless included in a list
`internal_attrs`.

This patch is adding a new validation rule based on fields inclussion
in a list `allowed_attrs`. When that list is non-empty, only the fields
included on it can be patched. in order to keep the existing behavior
for the existing patch calls, I am defining the list as empty, so that
the rest of validation rules are applied and it is not affecting the
current behavior.

Change-Id: I22010649332c8fb872446a9d0483a0303a4eba3b
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-19 11:01:20 +02:00
Zuul
90f0c2264c Merge "use cinder migrate for swap volume" 2025-08-18 20:32:42 +00:00
Sean Mooney
3742e0a79c use cinder migrate for swap volume
This change removes watchers in tree functionality
for swapping instance volumes and defines swap as an alias
of cinder volume migrate.

The watcher native implementation was missing error handling
which could lead to irretrievable data loss.

The removed code also forged project user credentials to
perform admin request as if it was done by a member of a project.
this was unsafe an posses a security risk due to how it was
implemented. This code has been removed without replacement.

While some effort has been made to allow existing
audits that were defined to work, any reduction of functionality
as a result of this security hardening is intentional.

Closes-Bug: #2112187
Change-Id: Ic3b6bfd164e272d70fe86d7b182478dd962f8ac0
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-08-18 16:35:38 +00:00
Jaromir Wysoglad
8309d9848a Add Aetos datasource
Implement the spec for multi-tenancy support for metrics. This adds
a new 'Aetos' datasource very similar to the current Prometheus
datasource. Because of that, the original PrometheusHelper class
was split into two classes and the base class is used for
PrometheusHelper and for AetosHelper. Except for the split, there
is one more change to the original PrometheusHelper class code, which
is the addition and use of the _get_fqdn_label() and
_get_instance_uuid_label() methods.

As part of the change, I refactored the current prometheus datasource
unit tests. Most of them are now used to test the PrometheusBase class
with minimal changes. Changes I've made to the original tests:

- the ones that can be be used to test the base class are moved into the
  TestPrometheusBase class
- the _setup_prometheus_client, _get_instance_uuid_label and
  _get_fqdn_label functions are mocked in the base class tests.
  Their concrete implementations are tested in each datasource tests
  separately.
- a self._create_helper() is used to instantiate the helper class with
  correct mocking.
- all config value modification is the original tests got moved out and
  instead of modifying the config values, the _get_* methods are mocked
  to return the wanted values
- to keep similar test coverage, config retrieval is tested for each
  concrete class by testing the _get_* methods.

New watcher-aetos-integration and watcher-aetos-integration-realdata
zuul jobs are added to test the new datasource. These use the same set
of tempest tests as the current watcher-prometheus-integration jobs.
The only difference is the environment setup and the Watcher config,
so that the job deploys Aetos and Watcher uses it instead of accessing
Prometheus directly.

At first this was generated by asking cursor to implement the linked spec
with some additional prompts for some smaller changes. Afterwards I manually
went through the code doing some cleanups, ensuring it complies with
PEP8 and hacking and so on. Later on I manually adjusted the code to use
the latest observabilityclient changes.
The zuul job was also mostly generated by cursor.

Implements: https://blueprints.launchpad.net/watcher/+spec/prometheus-multitenancy-support

Generated-By: Cursor with claude-4-sonnet model
Change-Id: I72c2171f72819bbde6c9cbbf565ee895e5d2bd53
Signed-off-by: Jaromir Wysoglad <jwysogla@redhat.com>
2025-08-14 02:27:24 -04:00
Zuul
355671e979 Merge "Add a new tox environment to run unit tests in threading mode" 2025-08-13 21:37:19 +00:00
Douglas Viroel
9becb68495 Add new tests to validate GET /infra-optim/v1/data_model
The data_model list API response comes from the model to_list()
method, which generates both server_* and node_* attributes from
Instance and Node classes fields[1]. Any change on these classes
can break the data_model list API and require a new microversion.
These tests validate the current expected fields.

[1] 5ba086095c/watcher/decision_engine/model/model_root.py (L250-L270)

Change-Id: I77fac162101013aa923272aa99c7c6695cc5fdca
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-12 09:47:01 -03:00
Douglas Viroel
37faf614e2 Fix api-ref doc for GET /infra-optim/v1/data_model
Some response parameters from GET /infra-optim/v1/data_model
endpoint are missing from api-ref documentation. This patch
updates the doc to include them.
For more details see, LP #2117726

Closes-Bug: #2117726

Change-Id: Iaa775f56bb8167d9c6b458cd07f1ec3cefaf70fe
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-12 09:47:01 -03:00
Zuul
4080d5767d Merge "Disable real metrics on devstack injected data jobs" 2025-08-11 17:10:32 +00:00
Zuul
9925fd2cc9 Merge "Replace dateutils usage with datetime and oslo.utils" 2025-08-07 20:46:25 +00:00
Zuul
27baff5184 Merge "Extend decision engine to support threading mode" 2025-08-06 15:38:31 +00:00
Douglas Viroel
8ca794cdbb Add a new tox environment to run unit tests in threading mode
It is done by disabling the eventlet patching and configuring
oslo.service backend to threading. Once oslo.service backend is
configured, it can't be reverted to eventlet. This needs to be
done before including other modules, which may include oslo.service
library.
Adds a job that run a subset of tests with eventlet patching disabled.

Change-Id: I9f8c2c5bbcf3192313cc3b309e8f2719a3bea18f
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-05 16:50:29 -03:00
Douglas Viroel
f879b10b05 Extend decision engine to support threading mode
With the events of eventlet removal, Watcher will need
to be adapted to support both modes, eventlet and threading, for
a couple of releases before removing all eventlet code.
This patch adds methods and classes that allow decision engine
modules to create futurist thread pools instead of green thread pools,
based on a environment variable that can be enabled by service.
It moves continuous audit handler instance to decison engine service,
so it can be started together with the main decision engine service.
Adds an environment variable that allows the user to disable
eventlet monkey patching and to use oslo.service threading backend.

Change-Id: I8a8be0a7cebdc44005fd77ec960543828c7da318
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-05 16:45:48 -03:00
Chandan Kumar (raukadah)
95d975f339 Replace dateutils usage with datetime and oslo.utils
This cr fixes:
* Replaced ``dateutil.tz.tzlocal()`` and ``dateutil.tz.tzutc()`` with
  ``datetime.timezone`` built-in classes in audit controllers and
  continuous audit scheduling.

* Replaced ``dateutil.parser.parse()`` with
  ``oslo_utils.timeutils.parse_isotime()`` in the zone migration
  strategy for parsing datetime strings.

Closes-Bug: #2118404

Change-Id: I6d8a345fa4339a688769b147413dcdf3016bf4a0
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-08-05 23:09:50 +05:30
morenod
0435200fb1 Disable real metrics on devstack injected data jobs
We need to disable real data metrics comming from host and
instances on injected data jobs as they are creating wrong results
when they are mixed with the injected data.

We already did this on watcher-operator disabling ceilometer agent and
node_exported on [1] so now we have to do it on devstack installations,
disabling meminfo on node_exporter for host metrics (cpu is already
disabled) and sg-core for instance metrics

[1] https://github.com/openstack-k8s-operators/watcher-operator/pull/196

Change-Id: I4130ca6dd7cb52d96842e04e7720431ebc76efff
Signed-off-by: morenod <dsanzmor@redhat.com>
2025-08-04 12:41:54 +02:00
Douglas Viroel
adfe3858aa Configure watcher tempest's microversion in devstack
Adds a tempest configuration for min and max microversions supported
by watcher. This help us to define the correct range of microversion
to be tested on each stable branch.
New microversion proposals should also increase the default
max_microversion, in order to work with watcher-tempest-plugin
microversion testing.

Change-Id: I0b695ba4530eb89ed17b3935b87e938cadec84cc
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-01 17:28:40 -03:00
Zuul
a1e7156c7e Merge "finalize python 3.9 support removal" 2025-07-30 15:54:12 +00:00
Zuul
71470dac73 Merge "Add comprehensive release liaison guide for DPL model" 2025-07-30 15:24:49 +00:00
Zuul
5ba086095c Merge "Fix release notes typo and extra information" 2025-07-21 18:41:57 +00:00
Sean Mooney
3e8392b8f1 finalize python 3.9 support removal
The last release of openstack to support python 3.9
was 2025.1 (epoxy), with this change watcher now requires
3.10, testing of 3.9 was removed in previous commits.

Change-Id: Ida53740293e93b0c20dec2e175b390fa18bed852
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-07-21 18:25:04 +01:00
Sean Mooney
20cd4a0394 Add comprehensive release liaison guide for DPL model
Transform Nova's PTL guide into Watcher-specific release liaison
documentation following the DPL governance model. This guide provides
chronological guidance for release liaisons managing Watcher's
cycle-with-intermediary release process.

Key features:
* DPL liaison coordination with proper precedence hierarchies
* Watcher-specific project context and repository references
* Enhanced FFE process with release liaison decision authority
* Proper RST formatting with code blocks and cross-references
* Comprehensive glossary of OpenStack release terminology
* Usage guidance for both new and experienced release liaisons

Adapts Nova's proven chronological structure while reflecting
Watcher's distributed leadership model and technical requirements.

Assisted-By: claude-code
Change-Id: I133bb06e47c14deaca162a2bf024210f68d78ab2
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-07-21 16:34:47 +01:00