Unlike Nova, Cinder does not support calling the 'os-migrate_volume'[1]
action without a host or a cluster. For volume migrations of type
'migrate' in watcher the dst_pool is required, but for other migrations
that migrate the volumes to different types is not needed. This
change checks if the dst_pool is defined and prevents some migrations
when it's misssing information.
Adds testing for creating audits with the Zone Migration status,
validating the schema changes.
[1] https://docs.openstack.org/api-ref/block-storage/v3/index.html#migrate-a-volume
Closes-Bug: 2108988
Change-Id: I305c58e47093c4a884e86f1d91fdc15ef2a1cfba
Signed-off-by: jgilaber <jgilaber@redhat.com>
Monasca is deprecated for removal. This change makes the Monasca client
an optional dependency and ensures it is only imported and instantiated
when the Monasca datasource is explicitly selected. This reduces the
default footprint while preserving functionality for deployments that
still rely on Monasca.
What changed
============
- requirements.txt: remove python-monascaclient from hard deps
- setup.cfg: add [options.extras_require] monasca extra
- watcher/common/clients.py: lazy import with clear UnsupportedError
- watcher/decision_engine/datasources/monasca.py: lazy client property
and deferred import of monascaclient.exc; reset on Unauthorized
- watcher/decision_engine/datasources/manager.py: unconditionally
import Monasca helper and include in metric_map; helper is lazy
- tests: conditionally include Monasca based on availability; adjust
expectations instead of skipping by default; avoid over-mocking
- tox.ini: enable optional extras via WATCHER_EXTRAS env var
- docs: datasources index notes Monasca is deprecated and optional
- releasenotes: upgrade note with install example and behavior
Why
===
- Allow deployments not using Monasca to run without the client
- Keep Monasca functional when explicitly installed via extras
- Provide clear operator guidance and smooth upgrades
Compatibility
=============
- No change for deployments that do not use Monasca
- Deployments using Monasca must install the optional extra:
pip install watcher[monasca]
Testing
=======
- Default: tox -e py3
- With Monasca: WATCHER_EXTRAS=monasca tox -e py3
Assisted-By: GPT-5 (Cursor)
Closes-Bug: #2120192
Change-Id: I7c02b74e83d656083ce612727e6da58761200ae4
Signed-off-by: Sean Mooney <work@seanmooney.info>
These are some of the requested changes from reviews
in the series of patches for add-skip-action blueprint.
Some of them may required another specific patch since
would touch in more files that are not related to
this feature.
Change-Id: I9e30ca385e7b184ab19449a60db6f6d0f3c0e1b9
Signed-off-by: Douglas Viroel <viroel@gmail.com>
Bug #2113862 details a number of suggested
corrections and additions to the Workload
Stabilization doc. This patch adds those
suggested changes.
Closes-Bug: #2113862
Assisted-By: Cursor (claude-3.5-sonnet)
Change-Id: I4131a304c064d2ea397b2447025c7edf69a56e2a
Signed-off-by: Ronelle Landy <rlandy@redhat.com>
This change enhances the Host Maintenance strategy by introducing
two new input parameters: `disable_live_migration` and
`disable_cold_migration`. These parameters allow cloud
administrators to control whether live or cold migration should be
considered during host maintenance operations.
If `disable_live_migration` is set, active instances will be cold
migrated if `disable_cold_migration` is not set, otherwise
active instances will be stopped. If `disable_cold_migration` is set,
inactive instances will not be cold migrated.
If both are set, only stop actions will be performed on instances.
The strategy logic and action plan generation have been updated to
reflect these behaviors. A new "stop" action is introduced and
registered, and the weight planner is updated to handle new action.
Documentation for the Host Maintenance strategy is updated to
describe the new parameters and their effects.
Test Plan:
- Unit tests for HostMaintenance strategy with new parameters
- Integration tests for action plan generation with stop action
This implements the specification:
Spec: https://review.opendev.org/c/openstack/watcher-specs/+/943873
Change-Id: I201b8e5c52e1bc1a74f3886a0e301e3c0fa5d351
Signed-off-by: Quang Ngo <quang.ngo@canonical.com>
This patch implements the changes in the API required for the
skipped action blueprint. It includes:
- New field `status_message` is visible in API get calls for Audits,
ActionPlans and Audits.
- New Patch call is added to `/actions/{action_id}` which allows to
manually move actions in PENDING state to SKIPPED for ActionPlans
which have not been started.
- A new API microversion 1.5 is added for these changes.
It also adds requried tests and documentation.
Implements: blueprint add-skip-actions
Assisted-By: Cursor (claude-4-sonnet)
Change-Id: I71fb9af76085e5941a7fd3e9e4c89d6f3a3ada47
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
This patch is implementing skipping automatically actions based on the
result of action pre_condition method. This will allow to manage
properly situations as migration actions for vms which does not longer
exist. This patch includes:
- Adding a new state SKIPPED to the Action objects.
- Add a new Exception ActionSkipped. An action which raises it from the
pre_condition execution is moved to SKIPPED state.
- pre_condition will not be executed for any action in SKIPPED state.
- execute will not be executed for any action in SKIPPED or FAILED state.
- post_condition will not be executed for any action in SKIPPED state.
- moving transition to ONGOING from pre_condition to execute. That means
that actions raising ActionSkipped will move from PENDING to SKIPPED
while actions raising any other Exception will move from PENDING to
FAILED.
- Adding information on action failed or skipped state to the
`status_message` field.
- Adding a new option to the testing action nop to simulate skipping on
pre_condition, so that we can easily test it.
Implements: blueprint add-skip-actions
Assisted-By: Cursor (claude-4-sonnet)
Change-Id: I59cb4c7006c7c3bcc5ff2071886d3e2929800f9e
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
This patch is part of the skipped action blueprint. It adds the
`status_message` field to the Audit, ActionPlan and Action objects and
all related notifications.
It bumps the versions of all the affected objects and notifications and
update the tests to include the new fields.
Change-Id: I3b9467e7e37188e647379cd9c4cbbda8ed75383f
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
Implement the spec for multi-tenancy support for metrics. This adds
a new 'Aetos' datasource very similar to the current Prometheus
datasource. Because of that, the original PrometheusHelper class
was split into two classes and the base class is used for
PrometheusHelper and for AetosHelper. Except for the split, there
is one more change to the original PrometheusHelper class code, which
is the addition and use of the _get_fqdn_label() and
_get_instance_uuid_label() methods.
As part of the change, I refactored the current prometheus datasource
unit tests. Most of them are now used to test the PrometheusBase class
with minimal changes. Changes I've made to the original tests:
- the ones that can be be used to test the base class are moved into the
TestPrometheusBase class
- the _setup_prometheus_client, _get_instance_uuid_label and
_get_fqdn_label functions are mocked in the base class tests.
Their concrete implementations are tested in each datasource tests
separately.
- a self._create_helper() is used to instantiate the helper class with
correct mocking.
- all config value modification is the original tests got moved out and
instead of modifying the config values, the _get_* methods are mocked
to return the wanted values
- to keep similar test coverage, config retrieval is tested for each
concrete class by testing the _get_* methods.
New watcher-aetos-integration and watcher-aetos-integration-realdata
zuul jobs are added to test the new datasource. These use the same set
of tempest tests as the current watcher-prometheus-integration jobs.
The only difference is the environment setup and the Watcher config,
so that the job deploys Aetos and Watcher uses it instead of accessing
Prometheus directly.
At first this was generated by asking cursor to implement the linked spec
with some additional prompts for some smaller changes. Afterwards I manually
went through the code doing some cleanups, ensuring it complies with
PEP8 and hacking and so on. Later on I manually adjusted the code to use
the latest observabilityclient changes.
The zuul job was also mostly generated by cursor.
Implements: https://blueprints.launchpad.net/watcher/+spec/prometheus-multitenancy-support
Generated-By: Cursor with claude-4-sonnet model
Change-Id: I72c2171f72819bbde6c9cbbf565ee895e5d2bd53
Signed-off-by: Jaromir Wysoglad <jwysogla@redhat.com>
With the events of eventlet removal, Watcher will need
to be adapted to support both modes, eventlet and threading, for
a couple of releases before removing all eventlet code.
This patch adds methods and classes that allow decision engine
modules to create futurist thread pools instead of green thread pools,
based on a environment variable that can be enabled by service.
It moves continuous audit handler instance to decison engine service,
so it can be started together with the main decision engine service.
Adds an environment variable that allows the user to disable
eventlet monkey patching and to use oslo.service threading backend.
Change-Id: I8a8be0a7cebdc44005fd77ec960543828c7da318
Signed-off-by: Douglas Viroel <viroel@gmail.com>
The original documentation update review [1]
had some additional comments for improvements.
The commit adds the suggested changes.
[1] https://review.opendev.org/c/openstack/watcher/+/951025
Change-Id: I4b4624e2dbc4c6a5f888ec77d6a03b8f66ff0a23
Adds documation clarifications on how the
strategy and associated parameters as used.
Closes-Bug: #2112480
Change-Id: Id42c280fc5744bebb01d50b52b834e5b3b76af73
Add clarifications to the documentation to reflect
the actual strategy usage, including:
- updating parameter descriptions
- extending the 'How to Use' section
Closes-Bug: #2111810
Change-Id: Ifd2876056cd8819c50658fb9f213246dc1546d42
This patch adds a table to the strategies page to
show the level of qualification and where the
strategy can be triggered.
Change-Id: I6991566fd5fec3f8bbae06eefa63a8b83a87eed1
Adds a new documentation section that descript which service
integrations are currently supported and their integrations status.
This information is not clear today and will help to cover the lack
of testing and documention about them.
Change-Id: I26b2a2ef5672b78a575a2bdaef3a08d5bbc063bd
Currently we are using `instance` label to query about host metrics to
prometheus. This label is assigned to the url of each endpoint being
scrapped.
While this work fine in one-exporter-per-compute cases as the driver is
mapping the fqdn_label value to the `instance` label value, it fails
when there are more that one target with the same value for the fqdn
label. This is a valid case, to be able to query by fqdn and do not
care about what exporter in the host is providing the metric.
This patch is changing the queries we use for hosts to be based on the
fqdn_label instead of the instance one. To implement it, we are also
simplifying the way we check the metric exist for the host by converting
prometheus_fqdn_instance_map into a prometheus_fqdn_labels set
which stores the list of fqdn found in prometheus.
Closes-Bug: #2103451
Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e
In order to support vm_workload_consolidation, workload_balance and
workload_stabilization strategis some instance metrics are required.
This patch is adding support for them.
Implementation is based on a prometheus store populated using sg-core
from ceilometer metrics with Pollster source.
- instance_ram_usage: rely on ceilometer_memory_usage metrics created from
ceilometer memory.usage meter.
- instance_ram_allocated: rely on the memory value provided by the
inventory created from nova and placement APIs.
- instance_cpu_usage: rely on ceilometer_cpu metric created from
ceilometer cpu meter. A max value of 100 is set in the query.
- instance_root_disk_size: rely on the `disk` value provided by the
inventory created from nova and placement APIs.
A new parameterer `instance_uuid_label` has been added to the prometheus
datasource configuration to identify the label used to store the value of the
OpenStack instance uuid for eache instance metric in prometheus. Default
value is `resource`.
Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2
This adds a new data source for the Watcher decision engine that
implements the watcher.decision_engine.datasources.DataSourceBase.
related spec was merged at [1].
Implements: blueprint prometheus-datasource
[1] https://review.opendev.org/c/openstack/watcher-specs/+/933300
Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906
This datasource requires Ceilometer API which was already removed some
years ago. The implementation should have been removed when dependency
on ceilometerclient was removed by [1].
Also remove some job definitions which are not actually used.
[1] 01d74d0a87
Change-Id: I29c3865dc1207f1bbbb266e4217cf8888afebfb6
This change corrects the detected sphinx-linit issue in the existing
docs and updates the contributor devstack guide to call out
required and advanced.
mostly the changes were simple fixes like replacing the configurable
default rule with explict literal syntax `term` -> ``term``
some inline Note: comments have been promoted to .. note:: blocks
and literal blocks :: have been promoted to .. code-block:: <language>
directives.
Change-Id: I6320c313d22bf542ad407169e6538dc6acf79901
This chanage enabled codespell in precommit and
fixes the existing typos.
A followup commit will enable this in tox and ci.
Change-Id: I0a11bcd5a88247a48d3437525fc8a3cb3cdd4e58
This change adds configuration for the pre-commit tool,
follow-up changes will address the remaining issues in a phased
approach to make the reviews simpler.
This is based on the pre-commit config used in nova
with some additional hooks.
Follow-up changes will address the FIXME comments
related to sphinx-lint and codespell, as well as update tox
to enforce these checks in ci.
Change-Id: I87681a19f7fa88366c2b0d310c8b3153aa6a137b
The "vm workload consolidation" strategy is summing up instance
usage in order to estimate host usage.
The problem is that some infrastructure services (e.g. OVS or Ceph
clients) may also use a significant amount of resources, which
would be ignored. This can impact Watcher's ability to detect
overloaded nodes and correctly rebalance the workload.
This commit will use the host metrics, if available. The proposed
implementation uses the maximum value between the host metric
and the sum of the instance metrics.
Note that we're holding a dict of host metric deltas in order to
account for planned migrations.
Change-Id: I82f474ee613f6c9a7c0a9d24a05cba41d2f68edb
The "cpu_util" metric has been deprecated a few years ago.
We'll obtain the same result by converting the cumulative cpu
time to a percentage, leveraging the rate of change aggregation.
Change-Id: I18fe0de6f74c785e674faceea0c48f44055818fe
Since installing watcher dashboard is fixed in devstack deployments
we can update documentation so it recommends to install dashboard
plugin.
Change-Id: I284a1ec31536ea258cc1979ffd46b22d3e1ac18b
As per the community goal of migrating the policy file
the format from JSON to YAML[1], we need to do two things:
1. Change the default value of '[oslo_policy] policy_file''
config option from 'policy.json' to 'policy.yaml' with
upgrade checks.
2. Deprecate the JSON formatted policy file on the project side
via warning in doc and releasenotes.
Also replace policy.json to policy.yaml ref from doc and tests.
[1]https://governance.openstack.org/tc/goals/selected/wallaby/migrate-policy-format-from-json-to-yaml.html
Change-Id: I207c02ba71fe60635fd3406c9c9364c11f259bae
Switch to openstackdocstheme 2.2.1 and reno 3.1.0 versions. Using
these versions will allow especially:
* Linking from HTML to PDF document
* Allow parallel building of documents
* Fix some rendering problems
Update Sphinx version as well.
Set openstackdocs_pdf_link to link to PDF file. Note that
the link to the published document only works on docs.openstack.org
where the PDF file is placed in the top-level html directory. The
site-preview places the PDF in a pdf directory.
Set openstackdocs_auto_name to False to use 'project' variable as name.
Change pygments_style to 'native' since old theme version always used
'native' and the theme now respects the setting and using 'sphinx' can
lead to some strange rendering.
Remove docs requirements from lower-constraints, they are not needed
during install or test but only for docs building.
openstackdocstheme renames some variables, so follow the renames
before the next release removes them. A couple of variables are also
not needed anymore, remove them.
See also
http://lists.openstack.org/pipermail/openstack-discuss/2020-May/014971.html
Change-Id: Ia9a3fb804fb59bb70edc150a3eb20c07a279170b