watcher

Author	SHA1	Message	Date
Sean Mooney	9b1adaa7c7	Add 2025.2 release notes prelude The prelude provides a high-level overview of the security improvements, operational enhancements, and new monitoring capabilities for operators. Assisted-By: claude-code Change-Id: Ia2c1409d26aca0eddfb1685e9009305215c2405a Signed-off-by: Sean Mooney <work@seanmooney.info>	2025-09-09 17:43:53 +01:00
Zuul	b1aad46209	Merge "Check result of retype action based on type and status"	2025-09-02 12:44:05 +00:00
Alfredo Moralejo	90009aac84	Check result of retype action based on type and status Currently, when there is a volume_migrate action and migration_type is `retype`, watcher assumes that the retype always triggers a migration and checks the result of the retype based on the fields related to the migration action (actually, it uses the same function to check the result when `migration_type` is `retype` or `migrate`. This creates problem in different scenarios: - Actions keep in ONGOING status forever for volumes which have never being migrated as the migration fields of the volume are empty. - Actions which were migrated anytime before, still have the old values so it may report the status of te retype actions wrongly. This patch is implementing an entirely new function to check the result of a retype action based on the final type and the status field of the volume. This should be valid for any kind of retype action, with or without migration. The criteria for successfull retype is that the type for the volume is the destination one in the action and the status is available or in-use. Closes-Bug: #2112100 Change-Id: I76e91ed99e7a814a43a6dd906b6bcc150d471624 Signed-off-by: jgilaber <jgilaber@redhat.com>	2025-09-01 16:59:38 +02:00
Zuul	e5b18afa01	Merge "Fix doc section to enable cinder notifications"	2025-09-01 14:15:29 +00:00
jgilaber	a4b785e4f1	Fix doc section to enable cinder notifications The section in the Watcher docs that describes how to enable cinder notifications incorrectly tells the user to change the cinder config to send notification to the watcher.watcher_notifications exchange and topic. Instead, it should instruct the user to change the Watcher configuration of the notification_topics [1] to listen to the 'openstack.notifications', which is the one used by cinder by default[2]. This patch also adds 'openstack.notifications' to the default value for the 'notification_topics' parameter. [1] https://docs.openstack.org/watcher/latest/configuration/watcher.html#watcher_decision_engine.notification_topics [2] https://docs.openstack.org/cinder/latest/configuration/block-storage/samples/cinder.conf.html Partial-Bug: 2121384 Change-Id: I4dc1a72af79a23c9ca07d2da5ff41bd7741e37d8 Signed-off-by: jgilaber <jgilaber@redhat.com>	2025-09-01 11:23:00 +02:00
Zuul	cdde0fb41e	Merge "Allow status_message updates for actions in SKIPPED state"	2025-08-28 20:04:34 +00:00
Sean Mooney	ef0f35192d	Make Monasca client optional and lazy-load Monasca is deprecated for removal. This change makes the Monasca client an optional dependency and ensures it is only imported and instantiated when the Monasca datasource is explicitly selected. This reduces the default footprint while preserving functionality for deployments that still rely on Monasca. What changed ============ - requirements.txt: remove python-monascaclient from hard deps - setup.cfg: add [options.extras_require] monasca extra - watcher/common/clients.py: lazy import with clear UnsupportedError - watcher/decision_engine/datasources/monasca.py: lazy client property and deferred import of monascaclient.exc; reset on Unauthorized - watcher/decision_engine/datasources/manager.py: unconditionally import Monasca helper and include in metric_map; helper is lazy - tests: conditionally include Monasca based on availability; adjust expectations instead of skipping by default; avoid over-mocking - tox.ini: enable optional extras via WATCHER_EXTRAS env var - docs: datasources index notes Monasca is deprecated and optional - releasenotes: upgrade note with install example and behavior Why === - Allow deployments not using Monasca to run without the client - Keep Monasca functional when explicitly installed via extras - Provide clear operator guidance and smooth upgrades Compatibility ============= - No change for deployments that do not use Monasca - Deployments using Monasca must install the optional extra: pip install watcher[monasca] Testing ======= - Default: tox -e py3 - With Monasca: WATCHER_EXTRAS=monasca tox -e py3 Assisted-By: GPT-5 (Cursor) Closes-Bug: #2120192 Change-Id: I7c02b74e83d656083ce612727e6da58761200ae4 Signed-off-by: Sean Mooney <work@seanmooney.info>	2025-08-28 16:53:48 +01:00
Sean Mooney	c9bfb763c2	Allow status_message updates for actions in SKIPPED state Fixed action status_message update restrictions to allow updates when action is already in SKIPPED state. Previously, users could only update the status_message when initially transitioning to SKIPPED state. Changes include: - Modified validation logic to allow status_message updates for SKIPPED actions - Changed exception type from PatchError to Conflict for better semantics - Added comprehensive test coverage for the new behavior - Updated API documentation and samples - Added release note documenting the fix This enables administrators to fix typos, provide more detailed explanations, or expand on reasons in action status messages after the action has been skipped. Generated-By: claude-code Closes-Bug: #2121601 Change-Id: I64def708389a8ecd32080fba1638a4499ead349d Signed-off-by: Sean Mooney <work@seanmooney.info>	2025-08-28 16:16:01 +01:00
Zuul	848cde3606	Merge "Rename confusing query timeout options"	2025-08-28 09:26:40 +00:00
Takashi Kajinami	7106a12251	Rename confusing query timeout options These do not actually define timeout but interval. Rename the options to reflect what they actually define. The existing deprecated options in the [gnocchi_client] are also removed, because these have been kept for 6 years. In addition, fix inconsistent name (query vs call). Change-Id: Ib29115746a25b45bdff1c3da8df9d7167c2db662 Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>	2025-08-27 23:22:45 +09:00
Douglas Viroel	03c09825f7	Extend compute model attributes This patch extends compute model attributes by adding new fields to Instance element. Values are populated by nova the collector, using the same nova list call, but requires a more recent compute API microversion. A new config option was added to allow users to enable or disable the extended attributes and it is disable by default. Configure prometheus-based jobs to run on newer version of nova api (2.96) and enables the extended attributes collection. Implements: bp/extend-compute-model-attributes Assisted-By: Cursor (claude-4-sonnet) Change-Id: Ibf31105d780dce510a59fc74241fa04e28529ade Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-26 11:35:18 -03:00
Zuul	1668b9b9f8	Merge "API changes for skipped actions: patch actions and status_message"	2025-08-26 12:54:31 +00:00
Zuul	4d8f86b432	Merge "Fix NovaHelper microversion comparison"	2025-08-25 19:18:57 +00:00
Zuul	a6668a1b39	Merge "Update Overload standard deviation doc"	2025-08-22 15:22:04 +00:00
Zuul	a963e0ff85	Merge "Fix api-ref doc for GET /infra-optim/v1/data_model"	2025-08-22 14:03:15 +00:00
Ronelle Landy	457819072f	Update Overload standard deviation doc Bug #2113862 details a number of suggested corrections and additions to the Workload Stabilization doc. This patch adds those suggested changes. Closes-Bug: #2113862 Assisted-By: Cursor (claude-3.5-sonnet) Change-Id: I4131a304c064d2ea397b2447025c7edf69a56e2a Signed-off-by: Ronelle Landy <rlandy@redhat.com>	2025-08-21 11:09:46 -04:00
Zuul	00a3edeac6	Merge "Add parameters to force failures in nop action"	2025-08-21 14:32:37 +00:00
Zuul	616c8f4cc4	Merge "Add options to disable migration in host maintenance"	2025-08-21 14:11:22 +00:00
Quang Ngo	cc26b3b334	Add options to disable migration in host maintenance This change enhances the Host Maintenance strategy by introducing two new input parameters: `disable_live_migration` and `disable_cold_migration`. These parameters allow cloud administrators to control whether live or cold migration should be considered during host maintenance operations. If `disable_live_migration` is set, active instances will be cold migrated if `disable_cold_migration` is not set, otherwise active instances will be stopped. If `disable_cold_migration` is set, inactive instances will not be cold migrated. If both are set, only stop actions will be performed on instances. The strategy logic and action plan generation have been updated to reflect these behaviors. A new "stop" action is introduced and registered, and the weight planner is updated to handle new action. Documentation for the Host Maintenance strategy is updated to describe the new parameters and their effects. Test Plan: - Unit tests for HostMaintenance strategy with new parameters - Integration tests for action plan generation with stop action This implements the specification: Spec: https://review.opendev.org/c/openstack/watcher-specs/+/943873 Change-Id: I201b8e5c52e1bc1a74f3886a0e301e3c0fa5d351 Signed-off-by: Quang Ngo <quang.ngo@canonical.com>	2025-08-20 22:32:33 +10:00
Douglas Viroel	9003906bdc	Fix NovaHelper microversion comparison Fixes the microversion comparison in both enable and disable nova-compute service methods in NovaHelper. The previous implementation was incorrect and started to fail for microversion greather than 2.99. Closes-Bug: #2120586 Assisted-By: Cursor (claude-4-sonnet) Change-Id: I69da7f10cd5b42f7d4613d8947bca3e382815c3f Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-20 08:35:18 -03:00
Alfredo Moralejo	e06f1b0475	API changes for skipped actions: patch actions and status_message This patch implements the changes in the API required for the skipped action blueprint. It includes: - New field `status_message` is visible in API get calls for Audits, ActionPlans and Audits. - New Patch call is added to `/actions/{action_id}` which allows to manually move actions in PENDING state to SKIPPED for ActionPlans which have not been started. - A new API microversion 1.5 is added for these changes. It also adds requried tests and documentation. Implements: blueprint add-skip-actions Assisted-By: Cursor (claude-4-sonnet) Change-Id: I71fb9af76085e5941a7fd3e9e4c89d6f3a3ada47 Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>	2025-08-20 13:13:19 +02:00
Alfredo Moralejo	1fb89aeac3	Add parameters to force failures in nop action In order to test the different code paths for action execution it is very useful to be able to make the actions fail in the different execution stages. This patch adds three new options `fail_pre_condition`, `fail_execute` and `fail_post_condition`. Setting any of them to True makes the action to fail in the specified step. Change-Id: Ied8c0bb767d9bb6bdfb9209365857a3b4d606b40 Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>	2025-08-19 11:05:11 +02:00
Zuul	90f0c2264c	Merge "use cinder migrate for swap volume"	2025-08-18 20:32:42 +00:00
Sean Mooney	3742e0a79c	use cinder migrate for swap volume This change removes watchers in tree functionality for swapping instance volumes and defines swap as an alias of cinder volume migrate. The watcher native implementation was missing error handling which could lead to irretrievable data loss. The removed code also forged project user credentials to perform admin request as if it was done by a member of a project. this was unsafe an posses a security risk due to how it was implemented. This code has been removed without replacement. While some effort has been made to allow existing audits that were defined to work, any reduction of functionality as a result of this security hardening is intentional. Closes-Bug: #2112187 Change-Id: Ic3b6bfd164e272d70fe86d7b182478dd962f8ac0 Signed-off-by: Sean Mooney <work@seanmooney.info>	2025-08-18 16:35:38 +00:00
Jaromir Wysoglad	8309d9848a	Add Aetos datasource Implement the spec for multi-tenancy support for metrics. This adds a new 'Aetos' datasource very similar to the current Prometheus datasource. Because of that, the original PrometheusHelper class was split into two classes and the base class is used for PrometheusHelper and for AetosHelper. Except for the split, there is one more change to the original PrometheusHelper class code, which is the addition and use of the _get_fqdn_label() and _get_instance_uuid_label() methods. As part of the change, I refactored the current prometheus datasource unit tests. Most of them are now used to test the PrometheusBase class with minimal changes. Changes I've made to the original tests: - the ones that can be be used to test the base class are moved into the TestPrometheusBase class - the _setup_prometheus_client, _get_instance_uuid_label and _get_fqdn_label functions are mocked in the base class tests. Their concrete implementations are tested in each datasource tests separately. - a self._create_helper() is used to instantiate the helper class with correct mocking. - all config value modification is the original tests got moved out and instead of modifying the config values, the _get_* methods are mocked to return the wanted values - to keep similar test coverage, config retrieval is tested for each concrete class by testing the _get_* methods. New watcher-aetos-integration and watcher-aetos-integration-realdata zuul jobs are added to test the new datasource. These use the same set of tempest tests as the current watcher-prometheus-integration jobs. The only difference is the environment setup and the Watcher config, so that the job deploys Aetos and Watcher uses it instead of accessing Prometheus directly. At first this was generated by asking cursor to implement the linked spec with some additional prompts for some smaller changes. Afterwards I manually went through the code doing some cleanups, ensuring it complies with PEP8 and hacking and so on. Later on I manually adjusted the code to use the latest observabilityclient changes. The zuul job was also mostly generated by cursor. Implements: https://blueprints.launchpad.net/watcher/+spec/prometheus-multitenancy-support Generated-By: Cursor with claude-4-sonnet model Change-Id: I72c2171f72819bbde6c9cbbf565ee895e5d2bd53 Signed-off-by: Jaromir Wysoglad <jwysogla@redhat.com>	2025-08-14 02:27:24 -04:00
Douglas Viroel	37faf614e2	Fix api-ref doc for GET /infra-optim/v1/data_model Some response parameters from GET /infra-optim/v1/data_model endpoint are missing from api-ref documentation. This patch updates the doc to include them. For more details see, LP #2117726 Closes-Bug: #2117726 Change-Id: Iaa775f56bb8167d9c6b458cd07f1ec3cefaf70fe Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-12 09:47:01 -03:00
Zuul	9925fd2cc9	Merge "Replace dateutils usage with datetime and oslo.utils"	2025-08-07 20:46:25 +00:00
Zuul	27baff5184	Merge "Extend decision engine to support threading mode"	2025-08-06 15:38:31 +00:00
Douglas Viroel	f879b10b05	Extend decision engine to support threading mode With the events of eventlet removal, Watcher will need to be adapted to support both modes, eventlet and threading, for a couple of releases before removing all eventlet code. This patch adds methods and classes that allow decision engine modules to create futurist thread pools instead of green thread pools, based on a environment variable that can be enabled by service. It moves continuous audit handler instance to decison engine service, so it can be started together with the main decision engine service. Adds an environment variable that allows the user to disable eventlet monkey patching and to use oslo.service threading backend. Change-Id: I8a8be0a7cebdc44005fd77ec960543828c7da318 Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-05 16:45:48 -03:00
Chandan Kumar (raukadah)	95d975f339	Replace dateutils usage with datetime and oslo.utils This cr fixes: * Replaced ``dateutil.tz.tzlocal()`` and ``dateutil.tz.tzutc()`` with ``datetime.timezone`` built-in classes in audit controllers and continuous audit scheduling. * Replaced ``dateutil.parser.parse()`` with ``oslo_utils.timeutils.parse_isotime()`` in the zone migration strategy for parsing datetime strings. Closes-Bug: #2118404 Change-Id: I6d8a345fa4339a688769b147413dcdf3016bf4a0 Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>	2025-08-05 23:09:50 +05:30
Zuul	a1e7156c7e	Merge "finalize python 3.9 support removal"	2025-07-30 15:54:12 +00:00
Sean Mooney	3e8392b8f1	finalize python 3.9 support removal The last release of openstack to support python 3.9 was 2025.1 (epoxy), with this change watcher now requires 3.10, testing of 3.9 was removed in previous commits. Change-Id: Ida53740293e93b0c20dec2e175b390fa18bed852 Signed-off-by: Sean Mooney <work@seanmooney.info>	2025-07-21 18:25:04 +01:00
Chandan Kumar	2fe3b0cdbe	Fix release notes typo and extra information This cr fixes the release notes for https://review.opendev.org/c/openstack/watcher/+/954120/ and https://review.opendev.org/c/openstack/watcher/+/954120/ Related-Bug: #2110895 Related-Bug: #2115968 Change-Id: I1f3fc06549c2d5d7ba9debee424429a25a651070 Signed-off-by: Chandan Kumar <chkumar@redhat.com>	2025-07-09 15:44:20 +05:30
Chandan Kumar (raukadah)	e3b813e27e	Drop Code related to OperationNotPermitted exception The following exception was added in initial import of watcher code base[1]. In each of the controller REST APIs, it was called with a flag stating request was coming from top level resources apis. But this exception and code was not used anywhere in the rest api. It seems to be a dead code. So, it needs to be cleaned up. Note: In audit_template, under patchapi, this exception was used for not removal goal from audit template. Since this cr drops this exception, It replace the same with NotAuthorized exception keeping status code same. Links: [1]. `d14e057da1 (diff-6d510a275605e20ba8b435157062da2b749265a88a3cfd6d90abb7e8e5feac2aR235)` Closes-Bug: #2115968 Change-Id: I82a5e4a7a51726b3a89257c84a75157fbfcb82eb Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>	2025-07-04 19:07:13 +05:30
Chandan Kumar (raukadah)	c0a5abe29c	Drops forbidden patch/delete/post action apis These apis are not implemented with in the watcher code base and was marked as a forbidden to use. It does not make sense to keep these api as they are not implemented. This cr drops the code around that to make the action apis cleaner. Closes-Bug: #2110895 Change-Id: I0f465157e6cd481b27665ca6016db68c198cebeb Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>	2025-07-04 11:51:40 +05:30
Zuul	e64709ea08	Merge "Add warning message for experimental integrations"	2025-07-03 17:27:39 +00:00
Alfredo Moralejo	6ea362da0b	Use KiB as unit for host_ram_usage when using prometheus datasource The prometheus datasource was reporting host_ram_usage in MiB as described in the docstring for the base datasource interface definition [1]. However, the gnocchi datasource is reporting it in KiB following ceilometer metric `hardware.memory.used` [2] and the strategies using that metric expect it to be in KiB so the best approach is to change the unit in the prometheus datasource and update the docstring to avoid missunderstandings in future. So, this patch is fixing the prometheus datasource to return host_ram_usage in KiB instead of MiB. Additionally, it is adding more unit tests for the check_threshold method so that it covers the memory based strategy execution, validates the calculated standard deviation and adds the cases where it is below the threshold. [1] `15981117ee/watcher/decision_engine/datasources/base.py (L177-L183)` [2] https://docs.openstack.org/ceilometer/train/admin/telemetry-measurements.html#snmp-based-meters Closes-Bug: #2113776 Change-Id: Idc060d1e709c0265c64ada16062c3a206c6b04fa	2025-06-19 16:25:27 +02:00
Douglas Viroel	520ec0b79b	Add warning message for experimental integrations Some services integrations are now classified as experimental and a warning message will now appear once a client is created for them. These integrations are not fully tested in CI and miss a documentation on how they work or should be used. A release note was added to inform users about the status of these integrations and related features. Change-Id: Ib7d0ac0b3e187ae239dfa075fb53a6c0107dff29	2025-06-07 11:33:28 -03:00
Zuul	73f8728d22	Merge "Fix audit creation with no name and no goal or audit_template"	2025-06-05 13:39:38 +00:00
Alfredo Moralejo	bf6a28bd1e	Fix audit creation with no name and no goal or audit_template Currently, in that case it was failing because watcher tried to create a name based on a goal automatically and the goal is not defined. This patch is moving the check for goal specification in the audit creation call earlier, and if there is not goal defined, it returns an invalid call error. This patch is also modifying the existing error for this case to check the expected behavior. Closes-Bug: #2110947 Change-Id: I6f3d73b035e8081e86ce82c205498432f0e0fc33	2025-06-04 14:46:36 +02:00
Zuul	58b25101e6	Merge "Return HTTP code 400 when creating an audit with wrong parameters"	2025-05-27 19:23:25 +00:00
Zuul	20f231054a	Merge "Set actionplan state to FAILED if any action has failed"	2025-05-26 14:44:37 +00:00
Alfredo Moralejo	88d81c104e	Set actionplan state to FAILED if any action has failed Currently, an actionplan state is set to SUCCEEDED once the execution has finished, but that does not imply that all the actions finished successfully. This patch is checking the actual state of all the actions in the plan after the execution has finished. If any action has status FAILED, it will set the state of the action plan as FAILED and will apply the appropiate notification parameters. This is the expected behavior according to Watcher documentation. The patch is also fixing the unit test for this to set the expected action plan state to FAILED and notification parameters. Closes-Bug: #2106407 Change-Id: I7bfc6759b51cd97c26ec13b3918bd8d3b7ac9d4e	2025-05-26 14:58:03 +02:00
Zuul	26e36e1620	Merge "Handle missing dst_node parameter in zone_migration"	2025-05-20 17:14:29 +00:00
Zuul	3585e0cc3e	Merge "Drop code from Host maintenance strategy migrating instance to disabled hosts"	2025-05-16 18:18:26 +00:00
jgilaber	c6302edeca	Handle missing dst_node parameter in zone_migration For compute nodes, nova works fine if a destination node is not specified, so this change makes sure we're not passing None when the user does not set one to avoid an error. Partial-Bug: 2108988 Change-Id: Ida1f18b97697c041819e29f935aa5e232848226a	2025-05-16 13:51:47 +02:00
Alfredo Moralejo	4629402f38	Return HTTP code 400 when creating an audit with wrong parameters Currently, when trying to create an audit which misses a mandatory parameter watcher returns error 500 instead of 400 which is the documented error in the API [1] and the appropiate error code for malformed requests. This patch catch parameters validation errors according to the json schema for each strategy and returns error 400. It also fixes the unit test to validate the expected behavior. [1] https://docs.openstack.org/api-ref/resource-optimization/#audits Closes-Bug: #2110538 Change-Id: I23232b3b54421839bb01d54386d4e7b244f4e2a0	2025-05-16 09:35:50 +02:00
Zuul	86a260a2c7	Merge "Set keystone_client default interface to public"	2025-05-15 12:45:52 +00:00
Chandan Kumar (raukadah)	9dea55bd64	Drop code from Host maintenance strategy migrating instance to disabled hosts Currently host maintenance strategy also migrate instances from maintenance node to watcher_disabled compute nodes. watcher_disabled compute nodes might be disabled for some other purpose by different strategy. If host maintenace use those compute nodes for migration, It might affect customer workloads. Host maintenance strategy should never touch disabled hosts unless the user specify a disable host as backup node. This cr drops the logic for using disabled compute node for maintenance. Host maintaince is already using nova schedular for migrating the instance, will use the same. If there is no available node, strategy will fail. Closes-Bug: #2109945 Change-Id: If9795fd06f684eb67d553405cebd8a30887c3997 Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>	2025-05-14 09:24:25 +05:30
Douglas Viroel	17d1cf535a	Deprecated Noisy Neighbor strategy Noisy neighbor strategy is a proof of concept strategy that was built based on LLC metric, which is not available in Nova since Victoria release[1]. This patch marks this strategy as deprecated, to be removed in future releases. [1] https://docs.openstack.org/releasenotes/nova/victoria.html#relnotes-22-0-0-unmaintained-victoria-upgrade-notes Change-Id: I940b88555007312c76a86706bd44a38fbcf7701e	2025-05-12 15:44:39 -03:00

1 2 3 4 5

221 Commits