watcher

Author	SHA1	Message	Date
Zuul	b1aad46209	Merge "Check result of retype action based on type and status"	2025-09-02 12:44:05 +00:00
Alfredo Moralejo	90009aac84	Check result of retype action based on type and status Currently, when there is a volume_migrate action and migration_type is `retype`, watcher assumes that the retype always triggers a migration and checks the result of the retype based on the fields related to the migration action (actually, it uses the same function to check the result when `migration_type` is `retype` or `migrate`. This creates problem in different scenarios: - Actions keep in ONGOING status forever for volumes which have never being migrated as the migration fields of the volume are empty. - Actions which were migrated anytime before, still have the old values so it may report the status of te retype actions wrongly. This patch is implementing an entirely new function to check the result of a retype action based on the final type and the status field of the volume. This should be valid for any kind of retype action, with or without migration. The criteria for successfull retype is that the type for the volume is the destination one in the action and the status is available or in-use. Closes-Bug: #2112100 Change-Id: I76e91ed99e7a814a43a6dd906b6bcc150d471624 Signed-off-by: jgilaber <jgilaber@redhat.com>	2025-09-01 16:59:38 +02:00
Zuul	e5b18afa01	Merge "Fix doc section to enable cinder notifications"	2025-09-01 14:15:29 +00:00
Zuul	fedc74a5b0	Merge "Update aetos fake data job to disable real metrics"	2025-09-01 12:06:53 +00:00
jgilaber	a4b785e4f1	Fix doc section to enable cinder notifications The section in the Watcher docs that describes how to enable cinder notifications incorrectly tells the user to change the cinder config to send notification to the watcher.watcher_notifications exchange and topic. Instead, it should instruct the user to change the Watcher configuration of the notification_topics [1] to listen to the 'openstack.notifications', which is the one used by cinder by default[2]. This patch also adds 'openstack.notifications' to the default value for the 'notification_topics' parameter. [1] https://docs.openstack.org/watcher/latest/configuration/watcher.html#watcher_decision_engine.notification_topics [2] https://docs.openstack.org/cinder/latest/configuration/block-storage/samples/cinder.conf.html Partial-Bug: 2121384 Change-Id: I4dc1a72af79a23c9ca07d2da5ff41bd7741e37d8 Signed-off-by: jgilaber <jgilaber@redhat.com>	2025-09-01 11:23:00 +02:00
Zuul	cdde0fb41e	Merge "Allow status_message updates for actions in SKIPPED state"	2025-08-28 20:04:34 +00:00
Sean Mooney	ef0f35192d	Make Monasca client optional and lazy-load Monasca is deprecated for removal. This change makes the Monasca client an optional dependency and ensures it is only imported and instantiated when the Monasca datasource is explicitly selected. This reduces the default footprint while preserving functionality for deployments that still rely on Monasca. What changed ============ - requirements.txt: remove python-monascaclient from hard deps - setup.cfg: add [options.extras_require] monasca extra - watcher/common/clients.py: lazy import with clear UnsupportedError - watcher/decision_engine/datasources/monasca.py: lazy client property and deferred import of monascaclient.exc; reset on Unauthorized - watcher/decision_engine/datasources/manager.py: unconditionally import Monasca helper and include in metric_map; helper is lazy - tests: conditionally include Monasca based on availability; adjust expectations instead of skipping by default; avoid over-mocking - tox.ini: enable optional extras via WATCHER_EXTRAS env var - docs: datasources index notes Monasca is deprecated and optional - releasenotes: upgrade note with install example and behavior Why === - Allow deployments not using Monasca to run without the client - Keep Monasca functional when explicitly installed via extras - Provide clear operator guidance and smooth upgrades Compatibility ============= - No change for deployments that do not use Monasca - Deployments using Monasca must install the optional extra: pip install watcher[monasca] Testing ======= - Default: tox -e py3 - With Monasca: WATCHER_EXTRAS=monasca tox -e py3 Assisted-By: GPT-5 (Cursor) Closes-Bug: #2120192 Change-Id: I7c02b74e83d656083ce612727e6da58761200ae4 Signed-off-by: Sean Mooney <work@seanmooney.info>	2025-08-28 16:53:48 +01:00
Sean Mooney	c9bfb763c2	Allow status_message updates for actions in SKIPPED state Fixed action status_message update restrictions to allow updates when action is already in SKIPPED state. Previously, users could only update the status_message when initially transitioning to SKIPPED state. Changes include: - Modified validation logic to allow status_message updates for SKIPPED actions - Changed exception type from PatchError to Conflict for better semantics - Added comprehensive test coverage for the new behavior - Updated API documentation and samples - Added release note documenting the fix This enables administrators to fix typos, provide more detailed explanations, or expand on reasons in action status messages after the action has been skipped. Generated-By: claude-code Closes-Bug: #2121601 Change-Id: I64def708389a8ecd32080fba1638a4499ead349d Signed-off-by: Sean Mooney <work@seanmooney.info>	2025-08-28 16:16:01 +01:00
morenod	eb3fdb1e97	Update aetos fake data job to disable real metrics Job watcher-aetos-integration is failing because of having real metrics enabled coming from ceilometer. We need to disable ceilometer-acompute and node_exporter so only injected data will be considered when asking prometheus to take decisions Change-Id: If4f2c3f6f89527d768c48f1ca4967339837bb994 Signed-off-by: morenod <dsanzmor@redhat.com>	2025-08-28 10:51:08 +00:00
Zuul	848cde3606	Merge "Rename confusing query timeout options"	2025-08-28 09:26:40 +00:00
Zuul	63cf35349c	Merge "Extend compute model attributes"	2025-08-27 16:40:53 +00:00
Takashi Kajinami	7106a12251	Rename confusing query timeout options These do not actually define timeout but interval. Rename the options to reflect what they actually define. The existing deprecated options in the [gnocchi_client] are also removed, because these have been kept for 6 years. In addition, fix inconsistent name (query vs call). Change-Id: Ib29115746a25b45bdff1c3da8df9d7167c2db662 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-27 23:22:45 +09:00
Douglas Viroel	03c09825f7	Extend compute model attributes This patch extends compute model attributes by adding new fields to Instance element. Values are populated by nova the collector, using the same nova list call, but requires a more recent compute API microversion. A new config option was added to allow users to enable or disable the extended attributes and it is disable by default. Configure prometheus-based jobs to run on newer version of nova api (2.96) and enables the extended attributes collection. Implements: bp/extend-compute-model-attributes Assisted-By: Cursor (claude-4-sonnet) Change-Id: Ibf31105d780dce510a59fc74241fa04e28529ade Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-26 11:35:18 -03:00
Douglas Viroel	2452c1e541	Follow up changes for skip-action blueprint These are some of the requested changes from reviews in the series of patches for add-skip-action blueprint. Some of them may required another specific patch since would touch in more files that are not related to this feature. Change-Id: I9e30ca385e7b184ab19449a60db6f6d0f3c0e1b9 Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-26 10:27:57 -03:00
Zuul	d91b550fc9	Merge "Fix missing watcher_workflow_engines.taskflow section"	2025-08-26 13:16:19 +00:00
Zuul	1668b9b9f8	Merge "API changes for skipped actions: patch actions and status_message"	2025-08-26 12:54:31 +00:00
Zuul	5e05b50048	Merge "Skip actions automatically based on pre_condition results"	2025-08-26 12:33:08 +00:00
Zuul	4d8f86b432	Merge "Fix NovaHelper microversion comparison"	2025-08-25 19:18:57 +00:00
Zuul	05d8f0e3c8	Merge "Validate endpoint_type option at loading"	2025-08-25 12:06:44 +00:00
Takashi Kajinami	1a87abc666	Fix missing watcher_workflow_engines.taskflow section ... caused by AttributeError. Closes-Bug: #2121286 Change-Id: I52bab27afdc96d8ce2d9733316737c3aa505f5fe Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-24 22:58:28 +09:00
Zuul	fa4552b93f	Merge "Fix type mismatch between option and its default"	2025-08-24 13:21:43 +00:00
Takashi Kajinami	a07bfa141d	Fix type mismatch between option and its default ... to avoid the following warning. ``` UserWarning: converting '1' to a string warnings.warn('converting \'%s\' to a string' % str_val) ``` Change-Id: I852d63523d3582f00d4d7953199181e3d2b6a885 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-24 04:22:33 +09:00
Zuul	a6668a1b39	Merge "Update Overload standard deviation doc"	2025-08-22 15:22:04 +00:00
Zuul	534c340df1	Merge "Add new tests to validate GET /infra-optim/v1/data_model"	2025-08-22 14:16:05 +00:00
Zuul	a963e0ff85	Merge "Fix api-ref doc for GET /infra-optim/v1/data_model"	2025-08-22 14:03:15 +00:00
Ronelle Landy	457819072f	Update Overload standard deviation doc Bug #2113862 details a number of suggested corrections and additions to the Workload Stabilization doc. This patch adds those suggested changes. Closes-Bug: #2113862 Assisted-By: Cursor (claude-3.5-sonnet) Change-Id: I4131a304c064d2ea397b2447025c7edf69a56e2a Signed-off-by: Ronelle Landy <rlandy@redhat.com>	2025-08-21 11:09:46 -04:00
Zuul	6d155c4be6	Merge "Add `status_message` to objects and notifications"	2025-08-21 14:59:53 +00:00
Zuul	83fea206df	Merge "Add `status_message` column to Actions, Audits and ActionPlans tables"	2025-08-21 14:50:46 +00:00
Zuul	00a3edeac6	Merge "Add parameters to force failures in nop action"	2025-08-21 14:32:37 +00:00
Zuul	b69642181b	Merge "Add patch call validation based on allowed_attrs"	2025-08-21 14:24:09 +00:00
Zuul	616c8f4cc4	Merge "Add options to disable migration in host maintenance"	2025-08-21 14:11:22 +00:00
Quang Ngo	cc26b3b334	Add options to disable migration in host maintenance This change enhances the Host Maintenance strategy by introducing two new input parameters: `disable_live_migration` and `disable_cold_migration`. These parameters allow cloud administrators to control whether live or cold migration should be considered during host maintenance operations. If `disable_live_migration` is set, active instances will be cold migrated if `disable_cold_migration` is not set, otherwise active instances will be stopped. If `disable_cold_migration` is set, inactive instances will not be cold migrated. If both are set, only stop actions will be performed on instances. The strategy logic and action plan generation have been updated to reflect these behaviors. A new "stop" action is introduced and registered, and the weight planner is updated to handle new action. Documentation for the Host Maintenance strategy is updated to describe the new parameters and their effects. Test Plan: - Unit tests for HostMaintenance strategy with new parameters - Integration tests for action plan generation with stop action This implements the specification: Spec: https://review.opendev.org/c/openstack/watcher-specs/+/943873 Change-Id: I201b8e5c52e1bc1a74f3886a0e301e3c0fa5d351 Signed-off-by: Quang Ngo <quang.ngo@canonical.com>	2025-08-20 22:32:33 +10:00
Douglas Viroel	9003906bdc	Fix NovaHelper microversion comparison Fixes the microversion comparison in both enable and disable nova-compute service methods in NovaHelper. The previous implementation was incorrect and started to fail for microversion greather than 2.99. Closes-Bug: #2120586 Assisted-By: Cursor (claude-4-sonnet) Change-Id: I69da7f10cd5b42f7d4613d8947bca3e382815c3f Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-20 08:35:18 -03:00
Alfredo Moralejo	e06f1b0475	API changes for skipped actions: patch actions and status_message This patch implements the changes in the API required for the skipped action blueprint. It includes: - New field `status_message` is visible in API get calls for Audits, ActionPlans and Audits. - New Patch call is added to `/actions/{action_id}` which allows to manually move actions in PENDING state to SKIPPED for ActionPlans which have not been started. - A new API microversion 1.5 is added for these changes. It also adds requried tests and documentation. Implements: blueprint add-skip-actions Assisted-By: Cursor (claude-4-sonnet) Change-Id: I71fb9af76085e5941a7fd3e9e4c89d6f3a3ada47 Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>	2025-08-20 13:13:19 +02:00
Alfredo Moralejo	6d35be11ec	Skip actions automatically based on pre_condition results This patch is implementing skipping automatically actions based on the result of action pre_condition method. This will allow to manage properly situations as migration actions for vms which does not longer exist. This patch includes: - Adding a new state SKIPPED to the Action objects. - Add a new Exception ActionSkipped. An action which raises it from the pre_condition execution is moved to SKIPPED state. - pre_condition will not be executed for any action in SKIPPED state. - execute will not be executed for any action in SKIPPED or FAILED state. - post_condition will not be executed for any action in SKIPPED state. - moving transition to ONGOING from pre_condition to execute. That means that actions raising ActionSkipped will move from PENDING to SKIPPED while actions raising any other Exception will move from PENDING to FAILED. - Adding information on action failed or skipped state to the `status_message` field. - Adding a new option to the testing action nop to simulate skipping on pre_condition, so that we can easily test it. Implements: blueprint add-skip-actions Assisted-By: Cursor (claude-4-sonnet) Change-Id: I59cb4c7006c7c3bcc5ff2071886d3e2929800f9e Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>	2025-08-20 13:10:10 +02:00
Takashi Kajinami	1009c3781b	Validate endpoint_type option at loading ... instead of documenting the supported values, so that more explicit error is presented to users. Also drop redundant description about the default values. The default values are added to sample config files generated, so don't have to be explained in help texts. Change-Id: I12b201da3e742b55f6cfcf71bdd4413cbf3ee4e5 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-20 01:44:59 +09:00
Alfredo Moralejo	5048a6e3ba	Add `status_message` to objects and notifications This patch is part of the skipped action blueprint. It adds the `status_message` field to the Audit, ActionPlan and Action objects and all related notifications. It bumps the versions of all the affected objects and notifications and update the tests to include the new fields. Change-Id: I3b9467e7e37188e647379cd9c4cbbda8ed75383f Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>	2025-08-19 13:01:00 +02:00
Alfredo Moralejo	84742be8c2	Add `status_message` column to Actions, Audits and ActionPlans tables This patch implements the changes in the database required for the skipped action blueprint. It just adds a new nullable column to the required tables and add tests for it. Note that I am also introducing a fix in a previous tables tests which will be affected by the changes in the objects. Implements: blueprint add-skip-actions Change-Id: I027bc3861b589bd281a7216583a8c5c351a53c57 Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>	2025-08-19 11:05:39 +02:00
Alfredo Moralejo	1fb89aeac3	Add parameters to force failures in nop action In order to test the different code paths for action execution it is very useful to be able to make the actions fail in the different execution stages. This patch adds three new options `fail_pre_condition`, `fail_execute` and `fail_post_condition`. Setting any of them to True makes the action to fail in the specified step. Change-Id: Ied8c0bb767d9bb6bdfb9209365857a3b4d606b40 Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>	2025-08-19 11:05:11 +02:00
Alfredo Moralejo	1a9f17748e	Add patch call validation based on allowed_attrs Currently, patch call field validations are done based on exclussion, all the fields can be patched unless included in a list `internal_attrs`. This patch is adding a new validation rule based on fields inclussion in a list `allowed_attrs`. When that list is non-empty, only the fields included on it can be patched. in order to keep the existing behavior for the existing patch calls, I am defining the list as empty, so that the rest of validation rules are applied and it is not affecting the current behavior. Change-Id: I22010649332c8fb872446a9d0483a0303a4eba3b Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>	2025-08-19 11:01:20 +02:00
Zuul	90f0c2264c	Merge "use cinder migrate for swap volume"	2025-08-18 20:32:42 +00:00
Sean Mooney	3742e0a79c	use cinder migrate for swap volume This change removes watchers in tree functionality for swapping instance volumes and defines swap as an alias of cinder volume migrate. The watcher native implementation was missing error handling which could lead to irretrievable data loss. The removed code also forged project user credentials to perform admin request as if it was done by a member of a project. this was unsafe an posses a security risk due to how it was implemented. This code has been removed without replacement. While some effort has been made to allow existing audits that were defined to work, any reduction of functionality as a result of this security hardening is intentional. Closes-Bug: #2112187 Change-Id: Ic3b6bfd164e272d70fe86d7b182478dd962f8ac0 Signed-off-by: Sean Mooney <work@seanmooney.info>	2025-08-18 16:35:38 +00:00
Jaromir Wysoglad	8309d9848a	Add Aetos datasource Implement the spec for multi-tenancy support for metrics. This adds a new 'Aetos' datasource very similar to the current Prometheus datasource. Because of that, the original PrometheusHelper class was split into two classes and the base class is used for PrometheusHelper and for AetosHelper. Except for the split, there is one more change to the original PrometheusHelper class code, which is the addition and use of the _get_fqdn_label() and _get_instance_uuid_label() methods. As part of the change, I refactored the current prometheus datasource unit tests. Most of them are now used to test the PrometheusBase class with minimal changes. Changes I've made to the original tests: - the ones that can be be used to test the base class are moved into the TestPrometheusBase class - the _setup_prometheus_client, _get_instance_uuid_label and _get_fqdn_label functions are mocked in the base class tests. Their concrete implementations are tested in each datasource tests separately. - a self._create_helper() is used to instantiate the helper class with correct mocking. - all config value modification is the original tests got moved out and instead of modifying the config values, the _get_* methods are mocked to return the wanted values - to keep similar test coverage, config retrieval is tested for each concrete class by testing the _get_* methods. New watcher-aetos-integration and watcher-aetos-integration-realdata zuul jobs are added to test the new datasource. These use the same set of tempest tests as the current watcher-prometheus-integration jobs. The only difference is the environment setup and the Watcher config, so that the job deploys Aetos and Watcher uses it instead of accessing Prometheus directly. At first this was generated by asking cursor to implement the linked spec with some additional prompts for some smaller changes. Afterwards I manually went through the code doing some cleanups, ensuring it complies with PEP8 and hacking and so on. Later on I manually adjusted the code to use the latest observabilityclient changes. The zuul job was also mostly generated by cursor. Implements: https://blueprints.launchpad.net/watcher/+spec/prometheus-multitenancy-support Generated-By: Cursor with claude-4-sonnet model Change-Id: I72c2171f72819bbde6c9cbbf565ee895e5d2bd53 Signed-off-by: Jaromir Wysoglad <jwysogla@redhat.com>	2025-08-14 02:27:24 -04:00
Zuul	355671e979	Merge "Add a new tox environment to run unit tests in threading mode"	2025-08-13 21:37:19 +00:00
Douglas Viroel	9becb68495	Add new tests to validate GET /infra-optim/v1/data_model The data_model list API response comes from the model to_list() method, which generates both server_* and node_* attributes from Instance and Node classes fields[1]. Any change on these classes can break the data_model list API and require a new microversion. These tests validate the current expected fields. [1] `5ba086095c/watcher/decision_engine/model/model_root.py (L250-L270)` Change-Id: I77fac162101013aa923272aa99c7c6695cc5fdca Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-12 09:47:01 -03:00
Douglas Viroel	37faf614e2	Fix api-ref doc for GET /infra-optim/v1/data_model Some response parameters from GET /infra-optim/v1/data_model endpoint are missing from api-ref documentation. This patch updates the doc to include them. For more details see, LP #2117726 Closes-Bug: #2117726 Change-Id: Iaa775f56bb8167d9c6b458cd07f1ec3cefaf70fe Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-12 09:47:01 -03:00
Zuul	4080d5767d	Merge "Disable real metrics on devstack injected data jobs"	2025-08-11 17:10:32 +00:00
Zuul	9925fd2cc9	Merge "Replace dateutils usage with datetime and oslo.utils"	2025-08-07 20:46:25 +00:00
Zuul	27baff5184	Merge "Extend decision engine to support threading mode"	2025-08-06 15:38:31 +00:00
Douglas Viroel	8ca794cdbb	Add a new tox environment to run unit tests in threading mode It is done by disabling the eventlet patching and configuring oslo.service backend to threading. Once oslo.service backend is configured, it can't be reverted to eventlet. This needs to be done before including other modules, which may include oslo.service library. Adds a job that run a subset of tests with eventlet patching disabled. Change-Id: I9f8c2c5bbcf3192313cc3b309e8f2719a3bea18f Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-05 16:50:29 -03:00

1 2 3 4 5 ...

2687 Commits