Commit Graph

456 Commits

Author SHA1 Message Date
jgilaber
03073a1b0d Remove outdated zone migration documentation note
In a recent patch [1], a bug in the zone migration strategy was fixed,
which prevented audits using this strategy to create action plans
with both instance and volume migrations. We documented this limitation,
but forgot to remove the note when fixing this bug.

[1] https://review.opendev.org/c/openstack/watcher/+/952115

Change-Id: I2074f2b911dfcbf44716ff30d8ea35a5046b8520
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-10-01 17:02:40 +02:00
Ronelle Landy
ced0d58d23 Remove last column from strategies table
Removed the "Can Be Triggered from Horizon (UI)"
column and adjusted remaining column widths to
be equal.

Assisted-By: claude-sonnet-4 (Claude Code)
Signed-off-by: Ronelle Landy <rlandy@redhat.com>
Change-Id: I50eef1dee9071eeb532378bd5abcd1d994d299b5
2025-09-26 15:17:56 -04:00
Chandan Kumar (raukadah)
3c8bc6be62 Add user guide for continuous audits
Introduce a new user guide describing how to run continuous audits using
the dummy strategy. The guide covers:
- Overview and state machine
- Creating audits with interval and cron expressions
- Time window constraints (start/end time)
- Monitoring executions and action plan lifecycle
- Managing audits (stop/modify)
- Configuration reference and links to related specs

Closes-Bug: #2120437

Assisted-By: GPT-5 (Cursor)
Assisted-By: claude-sonnet-4 (Claude Code)
Change-Id: I842139271752cedb138e422027020488f22fe248
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-09-23 18:42:30 +05:30
jgilaber
cb6fb16097 Use src_type to filter volumes in zone migration
Despite having the src_type paremeter for the storage_pool dictionary as
a mandatory parameter, the value is not being used to filter the volumes
to migrate, using only 'src_pool'.

This change makes 'src_type' optional, since it was ignored until this
point, making it optional keeps the same behaviour by default. If
'src_type' is in the audit parameters, the strategy uses both 'src_pool' and
'src_type' to filter the volumes to migrate.

Closes-Bug: 2111507
Change-Id: Id83a96de85ada1ae6c0e25f8b7fcf54034604911
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-16 12:20:18 +02:00
jgilaber
fe56660c44 Handle missing dst_pool parameter in zone_migration
Unlike Nova, Cinder does not support calling the 'os-migrate_volume'[1]
action without a host or a cluster. For volume migrations of type
'migrate' in watcher the dst_pool is required, but for other migrations
that migrate the volumes to different types is not needed. This
change checks if the dst_pool is defined and prevents some migrations
when it's misssing information.

Adds testing for creating audits with the Zone Migration status,
validating the schema changes.

[1] https://docs.openstack.org/api-ref/block-storage/v3/index.html#migrate-a-volume

Closes-Bug: 2108988

Change-Id: I305c58e47093c4a884e86f1d91fdc15ef2a1cfba
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-10 15:58:24 +02:00
Zuul
e5b18afa01 Merge "Fix doc section to enable cinder notifications" 2025-09-01 14:15:29 +00:00
jgilaber
a4b785e4f1 Fix doc section to enable cinder notifications
The section in the Watcher docs that describes how to enable cinder
notifications incorrectly tells the user to change the cinder config to
send notification to the watcher.watcher_notifications exchange and
topic. Instead, it should instruct the user to change the Watcher
configuration of the notification_topics [1] to listen to the
'openstack.notifications', which is the one used by cinder by
default[2].

This patch also adds 'openstack.notifications' to the default value
for the 'notification_topics' parameter.

[1] https://docs.openstack.org/watcher/latest/configuration/watcher.html#watcher_decision_engine.notification_topics
[2] https://docs.openstack.org/cinder/latest/configuration/block-storage/samples/cinder.conf.html

Partial-Bug: 2121384
Change-Id: I4dc1a72af79a23c9ca07d2da5ff41bd7741e37d8
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-01 11:23:00 +02:00
Sean Mooney
ef0f35192d Make Monasca client optional and lazy-load
Monasca is deprecated for removal. This change makes the Monasca client
an optional dependency and ensures it is only imported and instantiated
when the Monasca datasource is explicitly selected. This reduces the
default footprint while preserving functionality for deployments that
still rely on Monasca.

What changed
============
- requirements.txt: remove python-monascaclient from hard deps
- setup.cfg: add [options.extras_require] monasca extra
- watcher/common/clients.py: lazy import with clear UnsupportedError
- watcher/decision_engine/datasources/monasca.py: lazy client property
  and deferred import of monascaclient.exc; reset on Unauthorized
- watcher/decision_engine/datasources/manager.py: unconditionally
  import Monasca helper and include in metric_map; helper is lazy
- tests: conditionally include Monasca based on availability; adjust
  expectations instead of skipping by default; avoid over-mocking
- tox.ini: enable optional extras via WATCHER_EXTRAS env var
- docs: datasources index notes Monasca is deprecated and optional
- releasenotes: upgrade note with install example and behavior

Why
===
- Allow deployments not using Monasca to run without the client
- Keep Monasca functional when explicitly installed via extras
- Provide clear operator guidance and smooth upgrades

Compatibility
=============
- No change for deployments that do not use Monasca
- Deployments using Monasca must install the optional extra:
  pip install watcher[monasca]

Testing
=======
- Default: tox -e py3
- With Monasca: WATCHER_EXTRAS=monasca tox -e py3

Assisted-By: GPT-5 (Cursor)
Closes-Bug: #2120192
Change-Id: I7c02b74e83d656083ce612727e6da58761200ae4
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-08-28 16:53:48 +01:00
Douglas Viroel
2452c1e541 Follow up changes for skip-action blueprint
These are some of the requested changes from reviews
in the series of patches for add-skip-action blueprint.
Some of them may required another specific patch since
would touch in more files that are not related to
this feature.

Change-Id: I9e30ca385e7b184ab19449a60db6f6d0f3c0e1b9
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-26 10:27:57 -03:00
Zuul
1668b9b9f8 Merge "API changes for skipped actions: patch actions and status_message" 2025-08-26 12:54:31 +00:00
Zuul
5e05b50048 Merge "Skip actions automatically based on pre_condition results" 2025-08-26 12:33:08 +00:00
Ronelle Landy
457819072f Update Overload standard deviation doc
Bug #2113862 details a number of suggested
corrections and additions to the Workload
Stabilization doc. This patch adds those
suggested changes.

Closes-Bug: #2113862
Assisted-By: Cursor (claude-3.5-sonnet)
Change-Id: I4131a304c064d2ea397b2447025c7edf69a56e2a
Signed-off-by: Ronelle Landy <rlandy@redhat.com>
2025-08-21 11:09:46 -04:00
Zuul
6d155c4be6 Merge "Add status_message to objects and notifications" 2025-08-21 14:59:53 +00:00
Zuul
616c8f4cc4 Merge "Add options to disable migration in host maintenance" 2025-08-21 14:11:22 +00:00
Quang Ngo
cc26b3b334 Add options to disable migration in host maintenance
This change enhances the Host Maintenance strategy by introducing
two new input parameters: `disable_live_migration` and
`disable_cold_migration`. These parameters allow cloud
administrators to control whether live or cold migration should be
considered during host maintenance operations.

If `disable_live_migration` is set, active instances will be cold
migrated if `disable_cold_migration` is not set, otherwise
active instances will be stopped. If `disable_cold_migration` is set,
inactive instances will not be cold migrated.
If both are set, only stop actions will be performed on instances.

The strategy logic and action plan generation have been updated to
reflect these behaviors. A new "stop" action is introduced and
registered, and the weight planner is updated to handle new action.

Documentation for the Host Maintenance strategy is updated to
describe the new parameters and their effects.

Test Plan:
- Unit tests for HostMaintenance strategy with new parameters
- Integration tests for action plan generation with stop action

This implements the specification:
Spec: https://review.opendev.org/c/openstack/watcher-specs/+/943873

Change-Id: I201b8e5c52e1bc1a74f3886a0e301e3c0fa5d351
Signed-off-by: Quang Ngo <quang.ngo@canonical.com>
2025-08-20 22:32:33 +10:00
Alfredo Moralejo
e06f1b0475 API changes for skipped actions: patch actions and status_message
This patch implements the changes in the API required for the
skipped action blueprint. It includes:

- New field `status_message` is visible in API get calls for Audits,
  ActionPlans and Audits.
- New Patch call is added to `/actions/{action_id}` which allows to
  manually move actions in PENDING state to SKIPPED for ActionPlans
  which have not been started.
- A new API microversion 1.5 is added for these changes.

It also adds requried tests and documentation.

Implements: blueprint add-skip-actions

Assisted-By: Cursor (claude-4-sonnet)

Change-Id: I71fb9af76085e5941a7fd3e9e4c89d6f3a3ada47
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-20 13:13:19 +02:00
Alfredo Moralejo
6d35be11ec Skip actions automatically based on pre_condition results
This patch is implementing skipping automatically actions based on the
result of action pre_condition method. This will allow to manage
properly situations as migration actions for vms which does not longer
exist. This patch includes:

- Adding a new state SKIPPED to the Action objects.
- Add a new Exception ActionSkipped. An action which raises it from the
  pre_condition execution is moved to SKIPPED state.
- pre_condition will not be executed for any action in SKIPPED state.
- execute will not be executed for any action in SKIPPED or FAILED state.
- post_condition will not be executed for any action in SKIPPED state.
- moving transition to ONGOING from pre_condition to execute. That means
  that actions raising ActionSkipped will move from PENDING to SKIPPED
  while actions raising any other Exception will move from PENDING to
  FAILED.
- Adding information on action failed or skipped state to the
  `status_message` field.
- Adding a new option to the testing action nop to simulate skipping on
  pre_condition, so that we can easily test it.

Implements: blueprint add-skip-actions

Assisted-By: Cursor (claude-4-sonnet)

Change-Id: I59cb4c7006c7c3bcc5ff2071886d3e2929800f9e
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-20 13:10:10 +02:00
Alfredo Moralejo
5048a6e3ba Add status_message to objects and notifications
This patch is part of the skipped action blueprint. It adds the
`status_message` field to the Audit, ActionPlan and Action objects and
all related notifications.

It bumps the versions of all the affected objects and notifications and
update the tests to include the new fields.

Change-Id: I3b9467e7e37188e647379cd9c4cbbda8ed75383f
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-19 13:01:00 +02:00
Jaromir Wysoglad
8309d9848a Add Aetos datasource
Implement the spec for multi-tenancy support for metrics. This adds
a new 'Aetos' datasource very similar to the current Prometheus
datasource. Because of that, the original PrometheusHelper class
was split into two classes and the base class is used for
PrometheusHelper and for AetosHelper. Except for the split, there
is one more change to the original PrometheusHelper class code, which
is the addition and use of the _get_fqdn_label() and
_get_instance_uuid_label() methods.

As part of the change, I refactored the current prometheus datasource
unit tests. Most of them are now used to test the PrometheusBase class
with minimal changes. Changes I've made to the original tests:

- the ones that can be be used to test the base class are moved into the
  TestPrometheusBase class
- the _setup_prometheus_client, _get_instance_uuid_label and
  _get_fqdn_label functions are mocked in the base class tests.
  Their concrete implementations are tested in each datasource tests
  separately.
- a self._create_helper() is used to instantiate the helper class with
  correct mocking.
- all config value modification is the original tests got moved out and
  instead of modifying the config values, the _get_* methods are mocked
  to return the wanted values
- to keep similar test coverage, config retrieval is tested for each
  concrete class by testing the _get_* methods.

New watcher-aetos-integration and watcher-aetos-integration-realdata
zuul jobs are added to test the new datasource. These use the same set
of tempest tests as the current watcher-prometheus-integration jobs.
The only difference is the environment setup and the Watcher config,
so that the job deploys Aetos and Watcher uses it instead of accessing
Prometheus directly.

At first this was generated by asking cursor to implement the linked spec
with some additional prompts for some smaller changes. Afterwards I manually
went through the code doing some cleanups, ensuring it complies with
PEP8 and hacking and so on. Later on I manually adjusted the code to use
the latest observabilityclient changes.
The zuul job was also mostly generated by cursor.

Implements: https://blueprints.launchpad.net/watcher/+spec/prometheus-multitenancy-support

Generated-By: Cursor with claude-4-sonnet model
Change-Id: I72c2171f72819bbde6c9cbbf565ee895e5d2bd53
Signed-off-by: Jaromir Wysoglad <jwysogla@redhat.com>
2025-08-14 02:27:24 -04:00
Zuul
27baff5184 Merge "Extend decision engine to support threading mode" 2025-08-06 15:38:31 +00:00
Douglas Viroel
f879b10b05 Extend decision engine to support threading mode
With the events of eventlet removal, Watcher will need
to be adapted to support both modes, eventlet and threading, for
a couple of releases before removing all eventlet code.
This patch adds methods and classes that allow decision engine
modules to create futurist thread pools instead of green thread pools,
based on a environment variable that can be enabled by service.
It moves continuous audit handler instance to decison engine service,
so it can be started together with the main decision engine service.
Adds an environment variable that allows the user to disable
eventlet monkey patching and to use oslo.service threading backend.

Change-Id: I8a8be0a7cebdc44005fd77ec960543828c7da318
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-05 16:45:48 -03:00
Sean Mooney
20cd4a0394 Add comprehensive release liaison guide for DPL model
Transform Nova's PTL guide into Watcher-specific release liaison
documentation following the DPL governance model. This guide provides
chronological guidance for release liaisons managing Watcher's
cycle-with-intermediary release process.

Key features:
* DPL liaison coordination with proper precedence hierarchies
* Watcher-specific project context and repository references
* Enhanced FFE process with release liaison decision authority
* Proper RST formatting with code blocks and cross-references
* Comprehensive glossary of OpenStack release terminology
* Usage guidance for both new and experienced release liaisons

Adapts Nova's proven chronological structure while reflecting
Watcher's distributed leadership model and technical requirements.

Assisted-By: claude-code
Change-Id: I133bb06e47c14deaca162a2bf024210f68d78ab2
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-07-21 16:34:47 +01:00
Zuul
bbe30f93f2 Merge "Update workload balance doc per review comments" 2025-07-03 19:57:05 +00:00
Zuul
93366df264 Merge "Add crosslinks to strategies table" 2025-06-30 13:02:28 +00:00
Ronelle Landy
6f72e33de5 Add crosslinks to strategies table
These replace the full external links
used previously.

Change-Id: I9c79f7b7ddebaa25d243fdbe1eb422cba25de8f1
2025-06-27 16:54:38 -04:00
Ronelle Landy
56d0a0d6ea Update workload balance doc per review comments
The original documentation update review [1]
had some additional comments for improvements.
The commit adds the suggested changes.

[1] https://review.opendev.org/c/openstack/watcher/+/951025

Change-Id: I4b4624e2dbc4c6a5f888ec77d6a03b8f66ff0a23
2025-06-27 16:46:17 -04:00
Ronelle Landy
de9eb2cd80 Add doc clarifications for Zone Migration
Adds documation clarifications on how the
strategy and associated parameters as used.

Closes-Bug: #2112480
Change-Id: Id42c280fc5744bebb01d50b52b834e5b3b76af73
2025-06-27 16:12:41 -04:00
Zuul
76de167171 Merge "Add Integrations doc page with support matrix" 2025-06-27 16:09:51 +00:00
Zuul
70032aa477 Merge "Add table - level of test/usage per strategy" 2025-06-27 16:01:31 +00:00
Zuul
16131e5cac Merge "Update Workload Balance strategy documentation" 2025-06-27 13:36:50 +00:00
Ronelle Landy
bfbd136f4b Update Host Maintenance strategy documentation
Add clarifications to the documentation to reflect
the actual strategy usage, including:
 - updating parameter descriptions
 - extending the 'How to Use' section

Closes-Bug: #2111810
Change-Id: Ifd2876056cd8819c50658fb9f213246dc1546d42
2025-06-23 06:36:42 -04:00
Ronelle Landy
0599618add Add table - level of test/usage per strategy
This patch adds a table to the strategies page to
show the level of qualification and where the
strategy can be triggered.

Change-Id: I6991566fd5fec3f8bbae06eefa63a8b83a87eed1
2025-06-11 14:19:42 -04:00
Ronelle Landy
f42cb8557b Update Workload Balance strategy documentation
Adds additional parameter and usage explanations
and combined example.

Closes-Bug: #2111848
Change-Id: Id0de4d56fa7083388ad82c61596e7484431d465b
2025-06-06 15:51:23 -04:00
Douglas Viroel
b788a67c52 Add Integrations doc page with support matrix
Adds a new documentation section that descript which service
integrations are currently supported and their integrations status.
This information is not clear today and will help to cover the lack
of testing and documention about them.

Change-Id: I26b2a2ef5672b78a575a2bdaef3a08d5bbc063bd
2025-06-05 13:31:02 -03:00
jgilaber
2c76da2868 Make prometheus the default devstack example
Change the devstack local.conf samples and devstack multinode
contributor doc to demonstrate deploying watcher with prometheus as
datasource instead of gnocchi. Keep the gnocchi as an alternative
deployment example.

Depends-On: https://review.opendev.org/c/openstack/watcher/+/946230
Depends-On: https://review.opendev.org/c/openstack/devstack-plugin-prometheus/+/946254

Change-Id: I721b550a03f9e5350a3f1ab10292faa1c50049a7
2025-04-24 16:06:50 +02:00
Alfredo Moralejo
a65e7e9b59 Query by fqdn_label instead of instance for host metrics
Currently we are using `instance` label to query about host metrics to
prometheus. This label is assigned to the url of each endpoint being
scrapped.

While this work fine in one-exporter-per-compute cases as the driver is
mapping the fqdn_label value to the `instance` label value, it fails
when there are more that one target with the same value for the fqdn
label. This is a valid case, to be able to query by fqdn and do not
care about what exporter in the host is providing the metric.

This patch is changing the queries we use for hosts to be based on the
fqdn_label instead of the instance one. To implement it, we are also
simplifying the way we check the metric exist for the host by converting
prometheus_fqdn_instance_map into a prometheus_fqdn_labels set
which stores the list of fqdn found in  prometheus.

Closes-Bug: #2103451
Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e
2025-03-19 15:25:24 +01:00
Zuul
4527f89d8d Merge "Add support for instance metrics to prometheus datasource" 2025-02-03 13:22:28 +00:00
Zuul
e535177bc0 Merge "Remove ceilometer datasource" 2025-01-29 13:22:46 +00:00
Alfredo Moralejo
136e5d927c Add support for instance metrics to prometheus datasource
In order to support vm_workload_consolidation, workload_balance and
workload_stabilization strategis some instance metrics are required.
This patch is adding support for them.

Implementation is based on a prometheus store populated using sg-core
from ceilometer metrics with Pollster source.

- instance_ram_usage: rely on ceilometer_memory_usage metrics created from
  ceilometer memory.usage meter.
- instance_ram_allocated: rely on the memory value provided by the
  inventory created from nova and placement APIs.
- instance_cpu_usage: rely on ceilometer_cpu metric created from
  ceilometer cpu meter. A max value of 100 is set in the query.
- instance_root_disk_size: rely on the `disk` value provided by the
  inventory created from nova and placement APIs.

A new parameterer `instance_uuid_label` has been added to the prometheus
datasource configuration to identify the label used to store the value of the
OpenStack instance uuid for eache instance metric in prometheus. Default
value is `resource`.

Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2
2025-01-23 13:23:04 +01:00
m
3f26dc47f2 Add prometheus data source for watcher decision engine
This adds a new data source for the Watcher decision engine that
implements the watcher.decision_engine.datasources.DataSourceBase.

related spec was merged at [1].

Implements: blueprint prometheus-datasource

[1] https://review.opendev.org/c/openstack/watcher-specs/+/933300

Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906
2025-01-10 15:20:37 +02:00
Takashi Kajinami
da23fdc621 Remove ceilometer datasource
This datasource requires Ceilometer API which was already removed some
years ago. The implementation should have been removed when dependency
on ceilometerclient was removed by [1].

Also remove some job definitions which are not actually used.

[1] 01d74d0a87

Change-Id: I29c3865dc1207f1bbbb266e4217cf8888afebfb6
2024-12-16 23:51:27 +09:00
Sean Mooney
1f8d06e075 [docs] apply sphinx-lint to docs
This change corrects the detected sphinx-linit issue in the existing
docs and updates the contributor devstack guide to call out
required and advanced.

mostly the changes were simple fixes like replacing the configurable
default rule with explict literal syntax `term` -> ``term``

some inline Note: comments have been promoted to .. note:: blocks
and literal blocks ::  have been promoted to .. code-block:: <language>
directives.

Change-Id: I6320c313d22bf542ad407169e6538dc6acf79901
2024-11-19 00:43:36 +00:00
Sean Mooney
5f79ab87c7 [pre-commit] fix typos and configure codespell
This chanage enabled codespell in precommit and
fixes the existing typos.

A followup commit will enable this in tox and ci.

Change-Id: I0a11bcd5a88247a48d3437525fc8a3cb3cdd4e58
2024-11-07 19:50:21 +00:00
Takashi Kajinami
b5e45b43b9 Drop unnecessary 'x' bit from doc config file
This file is not actually executable.

Trivial-Fix

Change-Id: I64352c3c5c6bfd5d08aa4cee873016e02d736a2e
2024-10-28 13:13:24 +00:00
Sean Mooney
9d8b990fd1 [pre-commit] Add initial pre-commit config
This change adds configuration for the pre-commit tool,
follow-up changes will address the remaining issues in a phased
approach to make the reviews simpler.

This is based on the pre-commit config used in nova
with some additional hooks.

Follow-up changes will address the FIXME comments
related to sphinx-lint and codespell, as well as update tox
to enforce these checks in ci.

Change-Id: I87681a19f7fa88366c2b0d310c8b3153aa6a137b
2024-10-22 20:12:53 +01:00
Takashi Kajinami
566a830f64 Bump hacking
hacking 3.0.x is quite old. Bump it to the current latest version.

Change-Id: I8d87fed6afe5988678c64090af261266d1ca20e6
2024-09-22 23:54:36 +09:00
Lucian Petrut
424e9a76af vm workload consolidation: use actual host metrics
The "vm workload consolidation" strategy is summing up instance
usage in order to estimate host usage.

The problem is that some infrastructure services (e.g. OVS or Ceph
clients) may also use a significant amount of resources, which
would be ignored. This can impact Watcher's ability to detect
overloaded nodes and correctly rebalance the workload.

This commit will use the host metrics, if available. The proposed
implementation uses the maximum value between the host metric
and the sum of the instance metrics.

Note that we're holding a dict of host metric deltas in order to
account for planned migrations.

Change-Id: I82f474ee613f6c9a7c0a9d24a05cba41d2f68edb
2023-10-27 21:54:42 +03:00
Lucian Petrut
00fea975e2 Handle deprecated "cpu_util" metric
The "cpu_util" metric has been deprecated a few years ago.
We'll obtain the same result by converting the cumulative cpu
time to a percentage, leveraging the rate of change aggregation.

Change-Id: I18fe0de6f74c785e674faceea0c48f44055818fe
2023-10-24 10:47:23 +00:00
chenker
c7be34fbaa update saving_energy docs
Change-Id: I3b0c86911a8d32912c2de2e2392af9539b8d9be0
2023-02-07 10:27:54 +00:00
wangjiaqi07
c55143bc21 remove unicode from code
Change-Id: I747445d482a2fb40c2f39139c5fd2a0cb26c27bc
2022-08-19 14:17:10 +08:00