Commit Graph

2714 Commits

Author SHA1 Message Date
Chandan Kumar (raukadah)
3c8bc6be62 Add user guide for continuous audits
Introduce a new user guide describing how to run continuous audits using
the dummy strategy. The guide covers:
- Overview and state machine
- Creating audits with interval and cron expressions
- Time window constraints (start/end time)
- Monitoring executions and action plan lifecycle
- Managing audits (stop/modify)
- Configuration reference and links to related specs

Closes-Bug: #2120437

Assisted-By: GPT-5 (Cursor)
Assisted-By: claude-sonnet-4 (Claude Code)
Change-Id: I842139271752cedb138e422027020488f22fe248
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-09-23 18:42:30 +05:30
Zuul
635be7a009 Merge "Enable Continuous Audit tests in CI" 2025-09-22 13:30:10 +00:00
Zuul
fe50d270c3 Merge "Resolve deprecation warning from pecan" 2025-09-19 10:50:57 +00:00
Zuul
27961d8574 Merge "Add missing 1.6 API doc in rest version history" 2025-09-17 20:37:00 +00:00
Douglas Viroel
408abaee49 Enable Continuous Audit tests in CI
Scenario continuous audit tests is being added
but will not run by default, since not all stable
branches have the zone_migration fixes needed to
make tests stable.

Depends-On: https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954264

Change-Id: I5c49b251a49ee439bad024a1cf2569fcbeb2eaf1
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-09-17 15:22:16 -03:00
Douglas Viroel
ed0f7457fb Add missing 1.6 API doc in rest version history
The version history was not updated in the patch that
bumped the API to 1.6[1]. This patch adds the missing doc
and also sets 1.6 to the maximun API for the latest release.

[1] https://review.opendev.org/c/openstack/watcher/+/955827

Closes-Bug: #2124938

Change-Id: I62473e84415896387fda8ca6d0982f78d2a1a9f1
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-09-17 11:53:31 -03:00
Douglas Viroel
680518ad6d Fix zone migration instance not found issue
When retrieving the list of instances and volumes to propose a
solution, the zone migration strategy can raise an exception for
instance or volume not found, which will make the audit goes to a
failure state. This fix maintains the logic of listing all elements
directly from the client (nova) but now checks if the instance
is already in the model. The storage model check was already fixed
in another patch[1].

[1] cb6fb16097

Closes-Bug: #2098984
Assisted-By: Cursor (claude-3.5-sonnet)

Change-Id: I4c8993f051b797104172047eaae1fe1523eaf7eb
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-09-16 16:12:35 -03:00
Douglas Viroel
cada9acced Add unit tests for instance and volume not found in model
The Zone Migration strategy was implemented to list all
instances and volumes from clients (nova and cinder) and
check if they exist in the models. But the code is not
properly treating model exceptions, taking audit to a failure
state when the model doesn't have the requested element.
This patch adds unit tests to validate this scenario, which
should be fixed in a follow up change.
The additional check for volumes in the model was recently
added in [1]

[1] cb6fb16097

Related-Bug: #2098984

Assisted-By: Cursor (claude-3.5-sonnet)

Change-Id: Icf1e5d4c83862c848d11dae994842ad0ee62ba12
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-09-16 15:56:13 -03:00
jgilaber
8211475478 Improve unit tests for zone migration strategy
The unit tests were mocking part of the Zone Migration strategy class,
which could hide possible bugs. This patch removes this mocking, leaving
mocked only other classes that are used by the zone migration one.

Additionally, it includes improved suggestions as follow-up from the
review of previous patches, like more explicit comments and additional
asserts of mocked functions.

Assisted-By: Cursor (Claude-4-sonnet)
Change-Id: Ie1894311b0e384ab52b1b3dfe0eb50618eef6c9f
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-16 12:57:54 +02:00
jgilaber
c2ad4b28da Support zone migration audit without compute_nodes
When only running volume migrations, a zone migration
strategy audit without setting compute_nodes should work.

Before this change, an audit with defined storage_pools,
no compute_nodes parameters, and with_attached_volume is set to True
would trigger the migration of the instances attached to the volumes
being migrated.

This patch decouples instance and volume migrations unless the user
explicitely asks for both. When migrating attached volumes, the zone
migration strategy will check for which instances should be migrated
according to the audit parameters, and if the instance the volume is
attached to can be migrated, it will be just after the volume.

On the other hand, when the attached instances should not be migrated
according to user input, only the volumes will be migrated.

In an audit that migrates instnaces but not volumes, the
with_attached_volume parameter will continue doing nothing.

Closes-Bug: 2111429
Change-Id: If641af77ba368946398f9860c537a639d1053f69
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-16 12:20:18 +02:00
Alfredo Moralejo
296856101f Allow volume and vm migrations in zone_migration
Currently, when an audit with strategy zone_migration has added at least
one volume_migration action, it will not process the instances
migrations according to the definition of the `compute_nodes` parameter.
This behavior is unexpected according to the documentation of the
strategy.

This patch is fixing that behavior and making sure that not duplicated
actions are added to the solution, to handle the case where instances
migration actions are created when analyzing the volumes if the
`with_attached_volume` parameter is enabled. The patch is also removing
the method `instances_no_attached` which is not longer used.

Finally, it's adding some unit tests for the new method and fixing the
ones to cover the mixed instances and volumes migration situation.

Closes-Bug: #2109722
Change-Id: Ief7386ab448c2711d0d8a94a77fa9ba189c8b7d2
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-16 12:20:18 +02:00
Alfredo Moralejo
2f2134fc7a Add test for zone_migration with instances and volumes
Currently, unit tests for zone_migration strategy do not include any
test for instances and volumes mixed, which is currently not working as
expected.

This patch is adding two new tests which include both compute_nodes and
storage_pools in audit configuration. One of them is also setting
with_attached_volume option.

These tests will be fixed to validate the expected behavior of the
strategy in the fixing patch.

Related-Bug: #2109722
Change-Id: I496ce3e1f21b7a4165aa47d5862cf0497be79487
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-16 12:20:18 +02:00
jgilaber
cb6fb16097 Use src_type to filter volumes in zone migration
Despite having the src_type paremeter for the storage_pool dictionary as
a mandatory parameter, the value is not being used to filter the volumes
to migrate, using only 'src_pool'.

This change makes 'src_type' optional, since it was ignored until this
point, making it optional keeps the same behaviour by default. If
'src_type' is in the audit parameters, the strategy uses both 'src_pool' and
'src_type' to filter the volumes to migrate.

Closes-Bug: 2111507
Change-Id: Id83a96de85ada1ae6c0e25f8b7fcf54034604911
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-16 12:20:18 +02:00
Zuul
cee72d2bda Merge "Fix missing CORS middleware" 2025-09-15 17:00:56 +00:00
Zuul
cd1154d09c Merge "Add capability to parse forward headers" 2025-09-15 16:23:46 +00:00
Zuul
90f6552c74 Merge "Fix missing X-OpenStack-Request-ID header" 2025-09-15 15:56:03 +00:00
OpenStack Release Bot
0368cea4c1 Update master for stable/2025.2
Add file to the reno documentation build to show release notes for
stable/2025.2.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.2.

Sem-Ver: feature
Change-Id: I21fd5f9a613e5e2ee81ae4fe34165f3f4a6ae479
Signed-off-by: OpenStack Release Bot <infra-root@openstack.org>
Generated-By: openstack/project-config:roles/copy-release-tools-scripts/files/release-tools/add_release_note_page.sh
2025-09-15 10:13:29 +00:00
Takashi Kajinami
e1c8961a7c Fix missing CORS middleware
CORS middleware needs to be added to api pipeline to support
Cross-Origin Resource Sharing(CORS). CORS is supported globally by
multiple OpenStack services but is not by watcher, due to lack of
CORS middleware and no mechanism to inject it into api pipeline.

Closes-Bug: #2122347
Change-Id: I6b47abe4f08dc257e9156b254fa60005b82898d7
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-09-13 11:49:11 +09:00
Zuul
61cca16dcd Merge "Handle missing dst_pool parameter in zone_migration" 15.0.0 15.0.0.0rc1 2025-09-11 20:57:34 +00:00
Zuul
f3d0ec5869 Merge "Enable storage model collector by default" 2025-09-11 19:36:22 +00:00
Takashi Kajinami
17a4c96c66 Add capability to parse forward headers
In case standalone watcher-api runs behind forwarders (like load
balancers), it should parse specific request headers to determine
the endpoint url clients actually use.

Add http_proxy_to_wsgi middleware to api pipeline to handle this.

Closes-Bug: #2122353
Change-Id: I27ade17f7ce1649295f92f3ea1af620df63ba1bc
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-09-11 15:50:04 +00:00
Takashi Kajinami
a562880b1c Fix missing X-OpenStack-Request-ID header
Request ID is essential in operating OpenStack services, especially
when troubleshooting some API problems. It allows us to find out
the log lines actually related to a specific request.

However watcher api hasn't returned it properly, so operators had no
way to determine the exact ID they should search.

Add RequestID middleware to return the id in X-OpenStack-Request-Id
header, which is globally used.

Closes-Bug: #2122350
Change-Id: Ie4a8307e8e7e981cedbeaf5fe731dbd47a50bade
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-09-11 15:46:21 +00:00
jgilaber
fe56660c44 Handle missing dst_pool parameter in zone_migration
Unlike Nova, Cinder does not support calling the 'os-migrate_volume'[1]
action without a host or a cluster. For volume migrations of type
'migrate' in watcher the dst_pool is required, but for other migrations
that migrate the volumes to different types is not needed. This
change checks if the dst_pool is defined and prevents some migrations
when it's misssing information.

Adds testing for creating audits with the Zone Migration status,
validating the schema changes.

[1] https://docs.openstack.org/api-ref/block-storage/v3/index.html#migrate-a-volume

Closes-Bug: 2108988

Change-Id: I305c58e47093c4a884e86f1d91fdc15ef2a1cfba
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-10 15:58:24 +02:00
jgilaber
6cb4e2fa83 Enable storage model collector by default
By default Watcher enables only the compute model collector [1]. This
change enables the storage one as well, since otherwise when doing
volume migration the model quickly becomes obsolete if there are new
volumes created while an audit is running. The storage model is only
enabled if a cinder service is registered in keystone.

[1] https://docs.openstack.org/watcher/latest/configuration/watcher.html#collector.collector_plugins

Assisted-By: Cursor
Closes-Bug: 2111785
Change-Id: I864d3fc12d6364f1932cf5d2348a6b68169641e9
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-10 15:58:24 +02:00
Sean Mooney
9b1adaa7c7 Add 2025.2 release notes prelude
The prelude provides a high-level overview of the
security improvements, operational enhancements,
and new monitoring capabilities for operators.

Assisted-By: claude-code
Change-Id: Ia2c1409d26aca0eddfb1685e9009305215c2405a
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-09-09 17:43:53 +01:00
Douglas Viroel
f21df7ce1e Update prometheus-threading parent jop
Updates watcher-prometheus-integration-threading job
parent, so every new config option added to
watcher-prometheus-integration job is also added/tested
in the threading job.

Change-Id: I38c95f638f748fd5c051c312817e9123d6037ab5
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-09-03 13:44:06 -03:00
Zuul
b1aad46209 Merge "Check result of retype action based on type and status" 2025-09-02 12:44:05 +00:00
Alfredo Moralejo
90009aac84 Check result of retype action based on type and status
Currently, when there is a volume_migrate action and migration_type is
`retype`, watcher assumes that the retype always triggers a migration
and checks the result of the retype based on the fields related to
the migration action (actually, it uses the same function to check the
result when `migration_type` is `retype` or `migrate`. This creates
problem in different scenarios:

- Actions keep in ONGOING status forever for volumes which have never
  being migrated as the migration fields of the volume are empty.
- Actions which were migrated anytime before, still have the old values
  so it may report the status of te retype actions wrongly.

This patch is implementing an entirely new function to check the result
of a retype action based on the final type and the status field of the
volume. This should be valid for any kind of retype action, with or
without migration. The criteria for successfull retype is that the type
for the volume is the destination one in the action and the status is
available or in-use.

Closes-Bug: #2112100

Change-Id: I76e91ed99e7a814a43a6dd906b6bcc150d471624
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-01 16:59:38 +02:00
Zuul
e5b18afa01 Merge "Fix doc section to enable cinder notifications" 2025-09-01 14:15:29 +00:00
Zuul
fedc74a5b0 Merge "Update aetos fake data job to disable real metrics" 2025-09-01 12:06:53 +00:00
jgilaber
a4b785e4f1 Fix doc section to enable cinder notifications
The section in the Watcher docs that describes how to enable cinder
notifications incorrectly tells the user to change the cinder config to
send notification to the watcher.watcher_notifications exchange and
topic. Instead, it should instruct the user to change the Watcher
configuration of the notification_topics [1] to listen to the
'openstack.notifications', which is the one used by cinder by
default[2].

This patch also adds 'openstack.notifications' to the default value
for the 'notification_topics' parameter.

[1] https://docs.openstack.org/watcher/latest/configuration/watcher.html#watcher_decision_engine.notification_topics
[2] https://docs.openstack.org/cinder/latest/configuration/block-storage/samples/cinder.conf.html

Partial-Bug: 2121384
Change-Id: I4dc1a72af79a23c9ca07d2da5ff41bd7741e37d8
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-01 11:23:00 +02:00
Zuul
cdde0fb41e Merge "Allow status_message updates for actions in SKIPPED state" 2025-08-28 20:04:34 +00:00
Sean Mooney
ef0f35192d Make Monasca client optional and lazy-load
Monasca is deprecated for removal. This change makes the Monasca client
an optional dependency and ensures it is only imported and instantiated
when the Monasca datasource is explicitly selected. This reduces the
default footprint while preserving functionality for deployments that
still rely on Monasca.

What changed
============
- requirements.txt: remove python-monascaclient from hard deps
- setup.cfg: add [options.extras_require] monasca extra
- watcher/common/clients.py: lazy import with clear UnsupportedError
- watcher/decision_engine/datasources/monasca.py: lazy client property
  and deferred import of monascaclient.exc; reset on Unauthorized
- watcher/decision_engine/datasources/manager.py: unconditionally
  import Monasca helper and include in metric_map; helper is lazy
- tests: conditionally include Monasca based on availability; adjust
  expectations instead of skipping by default; avoid over-mocking
- tox.ini: enable optional extras via WATCHER_EXTRAS env var
- docs: datasources index notes Monasca is deprecated and optional
- releasenotes: upgrade note with install example and behavior

Why
===
- Allow deployments not using Monasca to run without the client
- Keep Monasca functional when explicitly installed via extras
- Provide clear operator guidance and smooth upgrades

Compatibility
=============
- No change for deployments that do not use Monasca
- Deployments using Monasca must install the optional extra:
  pip install watcher[monasca]

Testing
=======
- Default: tox -e py3
- With Monasca: WATCHER_EXTRAS=monasca tox -e py3

Assisted-By: GPT-5 (Cursor)
Closes-Bug: #2120192
Change-Id: I7c02b74e83d656083ce612727e6da58761200ae4
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-08-28 16:53:48 +01:00
Sean Mooney
c9bfb763c2 Allow status_message updates for actions in SKIPPED state
Fixed action status_message update restrictions to allow updates when
action is already in SKIPPED state. Previously, users could only update
the status_message when initially transitioning to SKIPPED state.

Changes include:
- Modified validation logic to allow status_message updates for SKIPPED actions
- Changed exception type from PatchError to Conflict for better semantics
- Added comprehensive test coverage for the new behavior
- Updated API documentation and samples
- Added release note documenting the fix

This enables administrators to fix typos, provide more detailed
explanations, or expand on reasons in action status messages after
the action has been skipped.

Generated-By: claude-code
Closes-Bug: #2121601
Change-Id: I64def708389a8ecd32080fba1638a4499ead349d
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-08-28 16:16:01 +01:00
morenod
eb3fdb1e97 Update aetos fake data job to disable real metrics
Job watcher-aetos-integration is failing because of
having real metrics enabled coming from ceilometer.

We need to disable ceilometer-acompute and node_exporter so only
injected data will be considered when asking prometheus to take
decisions


Change-Id: If4f2c3f6f89527d768c48f1ca4967339837bb994
Signed-off-by: morenod <dsanzmor@redhat.com>
2025-08-28 10:51:08 +00:00
Zuul
848cde3606 Merge "Rename confusing query timeout options" 2025-08-28 09:26:40 +00:00
Zuul
63cf35349c Merge "Extend compute model attributes" 2025-08-27 16:40:53 +00:00
Takashi Kajinami
7106a12251 Rename confusing query timeout options
These do not actually define timeout but interval. Rename the options
to reflect what they actually define. The existing deprecated options
in the [gnocchi_client] are also removed, because these have been kept
for 6 years.

In addition, fix inconsistent name (query vs call).

Change-Id: Ib29115746a25b45bdff1c3da8df9d7167c2db662
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-27 23:22:45 +09:00
Douglas Viroel
03c09825f7 Extend compute model attributes
This patch extends compute model attributes by
adding new fields to Instance element. Values are
populated by nova the collector, using the same
nova list call, but requires a more recent compute
API microversion.
A new config option was added to allow users to
enable or disable the extended attributes and it is
disable by default.
Configure prometheus-based jobs to run on newer version
of nova api (2.96) and enables the extended attributes
collection.

Implements: bp/extend-compute-model-attributes

Assisted-By: Cursor (claude-4-sonnet)

Change-Id: Ibf31105d780dce510a59fc74241fa04e28529ade
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-26 11:35:18 -03:00
Douglas Viroel
2452c1e541 Follow up changes for skip-action blueprint
These are some of the requested changes from reviews
in the series of patches for add-skip-action blueprint.
Some of them may required another specific patch since
would touch in more files that are not related to
this feature.

Change-Id: I9e30ca385e7b184ab19449a60db6f6d0f3c0e1b9
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-26 10:27:57 -03:00
Zuul
d91b550fc9 Merge "Fix missing watcher_workflow_engines.taskflow section" 2025-08-26 13:16:19 +00:00
Zuul
1668b9b9f8 Merge "API changes for skipped actions: patch actions and status_message" 2025-08-26 12:54:31 +00:00
Zuul
5e05b50048 Merge "Skip actions automatically based on pre_condition results" 2025-08-26 12:33:08 +00:00
Zuul
4d8f86b432 Merge "Fix NovaHelper microversion comparison" 2025-08-25 19:18:57 +00:00
Zuul
05d8f0e3c8 Merge "Validate endpoint_type option at loading" 2025-08-25 12:06:44 +00:00
Takashi Kajinami
1a87abc666 Fix missing watcher_workflow_engines.taskflow section
... caused by AttributeError.

Closes-Bug: #2121286
Change-Id: I52bab27afdc96d8ce2d9733316737c3aa505f5fe
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-24 22:58:28 +09:00
Zuul
fa4552b93f Merge "Fix type mismatch between option and its default" 2025-08-24 13:21:43 +00:00
Takashi Kajinami
a07bfa141d Fix type mismatch between option and its default
... to avoid the following warning.

```
UserWarning: converting '1' to a string
  warnings.warn('converting \'%s\' to a string' % str_val)
```

Change-Id: I852d63523d3582f00d4d7953199181e3d2b6a885
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-24 04:22:33 +09:00
Zuul
a6668a1b39 Merge "Update Overload standard deviation doc" 2025-08-22 15:22:04 +00:00
Zuul
534c340df1 Merge "Add new tests to validate GET /infra-optim/v1/data_model" 2025-08-22 14:16:05 +00:00