watcher

Author	SHA1	Message	Date
Zuul	616c8f4cc4	Merge "Add options to disable migration in host maintenance"	2025-08-21 14:11:22 +00:00
Quang Ngo	cc26b3b334	Add options to disable migration in host maintenance This change enhances the Host Maintenance strategy by introducing two new input parameters: `disable_live_migration` and `disable_cold_migration`. These parameters allow cloud administrators to control whether live or cold migration should be considered during host maintenance operations. If `disable_live_migration` is set, active instances will be cold migrated if `disable_cold_migration` is not set, otherwise active instances will be stopped. If `disable_cold_migration` is set, inactive instances will not be cold migrated. If both are set, only stop actions will be performed on instances. The strategy logic and action plan generation have been updated to reflect these behaviors. A new "stop" action is introduced and registered, and the weight planner is updated to handle new action. Documentation for the Host Maintenance strategy is updated to describe the new parameters and their effects. Test Plan: - Unit tests for HostMaintenance strategy with new parameters - Integration tests for action plan generation with stop action This implements the specification: Spec: https://review.opendev.org/c/openstack/watcher-specs/+/943873 Change-Id: I201b8e5c52e1bc1a74f3886a0e301e3c0fa5d351 Signed-off-by: Quang Ngo <quang.ngo@canonical.com>	2025-08-20 22:32:33 +10:00
Zuul	90f0c2264c	Merge "use cinder migrate for swap volume"	2025-08-18 20:32:42 +00:00
Sean Mooney	3742e0a79c	use cinder migrate for swap volume This change removes watchers in tree functionality for swapping instance volumes and defines swap as an alias of cinder volume migrate. The watcher native implementation was missing error handling which could lead to irretrievable data loss. The removed code also forged project user credentials to perform admin request as if it was done by a member of a project. this was unsafe an posses a security risk due to how it was implemented. This code has been removed without replacement. While some effort has been made to allow existing audits that were defined to work, any reduction of functionality as a result of this security hardening is intentional. Closes-Bug: #2112187 Change-Id: Ic3b6bfd164e272d70fe86d7b182478dd962f8ac0 Signed-off-by: Sean Mooney <work@seanmooney.info>	2025-08-18 16:35:38 +00:00
Jaromir Wysoglad	8309d9848a	Add Aetos datasource Implement the spec for multi-tenancy support for metrics. This adds a new 'Aetos' datasource very similar to the current Prometheus datasource. Because of that, the original PrometheusHelper class was split into two classes and the base class is used for PrometheusHelper and for AetosHelper. Except for the split, there is one more change to the original PrometheusHelper class code, which is the addition and use of the _get_fqdn_label() and _get_instance_uuid_label() methods. As part of the change, I refactored the current prometheus datasource unit tests. Most of them are now used to test the PrometheusBase class with minimal changes. Changes I've made to the original tests: - the ones that can be be used to test the base class are moved into the TestPrometheusBase class - the _setup_prometheus_client, _get_instance_uuid_label and _get_fqdn_label functions are mocked in the base class tests. Their concrete implementations are tested in each datasource tests separately. - a self._create_helper() is used to instantiate the helper class with correct mocking. - all config value modification is the original tests got moved out and instead of modifying the config values, the _get_* methods are mocked to return the wanted values - to keep similar test coverage, config retrieval is tested for each concrete class by testing the _get_* methods. New watcher-aetos-integration and watcher-aetos-integration-realdata zuul jobs are added to test the new datasource. These use the same set of tempest tests as the current watcher-prometheus-integration jobs. The only difference is the environment setup and the Watcher config, so that the job deploys Aetos and Watcher uses it instead of accessing Prometheus directly. At first this was generated by asking cursor to implement the linked spec with some additional prompts for some smaller changes. Afterwards I manually went through the code doing some cleanups, ensuring it complies with PEP8 and hacking and so on. Later on I manually adjusted the code to use the latest observabilityclient changes. The zuul job was also mostly generated by cursor. Implements: https://blueprints.launchpad.net/watcher/+spec/prometheus-multitenancy-support Generated-By: Cursor with claude-4-sonnet model Change-Id: I72c2171f72819bbde6c9cbbf565ee895e5d2bd53 Signed-off-by: Jaromir Wysoglad <jwysogla@redhat.com>	2025-08-14 02:27:24 -04:00
Zuul	355671e979	Merge "Add a new tox environment to run unit tests in threading mode"	2025-08-13 21:37:19 +00:00
Zuul	4080d5767d	Merge "Disable real metrics on devstack injected data jobs"	2025-08-11 17:10:32 +00:00
Zuul	9925fd2cc9	Merge "Replace dateutils usage with datetime and oslo.utils"	2025-08-07 20:46:25 +00:00
Zuul	27baff5184	Merge "Extend decision engine to support threading mode"	2025-08-06 15:38:31 +00:00
Douglas Viroel	8ca794cdbb	Add a new tox environment to run unit tests in threading mode It is done by disabling the eventlet patching and configuring oslo.service backend to threading. Once oslo.service backend is configured, it can't be reverted to eventlet. This needs to be done before including other modules, which may include oslo.service library. Adds a job that run a subset of tests with eventlet patching disabled. Change-Id: I9f8c2c5bbcf3192313cc3b309e8f2719a3bea18f Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-05 16:50:29 -03:00
Douglas Viroel	f879b10b05	Extend decision engine to support threading mode With the events of eventlet removal, Watcher will need to be adapted to support both modes, eventlet and threading, for a couple of releases before removing all eventlet code. This patch adds methods and classes that allow decision engine modules to create futurist thread pools instead of green thread pools, based on a environment variable that can be enabled by service. It moves continuous audit handler instance to decison engine service, so it can be started together with the main decision engine service. Adds an environment variable that allows the user to disable eventlet monkey patching and to use oslo.service threading backend. Change-Id: I8a8be0a7cebdc44005fd77ec960543828c7da318 Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-05 16:45:48 -03:00
Chandan Kumar (raukadah)	95d975f339	Replace dateutils usage with datetime and oslo.utils This cr fixes: * Replaced ``dateutil.tz.tzlocal()`` and ``dateutil.tz.tzutc()`` with ``datetime.timezone`` built-in classes in audit controllers and continuous audit scheduling. * Replaced ``dateutil.parser.parse()`` with ``oslo_utils.timeutils.parse_isotime()`` in the zone migration strategy for parsing datetime strings. Closes-Bug: #2118404 Change-Id: I6d8a345fa4339a688769b147413dcdf3016bf4a0 Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>	2025-08-05 23:09:50 +05:30
morenod	0435200fb1	Disable real metrics on devstack injected data jobs We need to disable real data metrics comming from host and instances on injected data jobs as they are creating wrong results when they are mixed with the injected data. We already did this on watcher-operator disabling ceilometer agent and node_exported on [1] so now we have to do it on devstack installations, disabling meminfo on node_exporter for host metrics (cpu is already disabled) and sg-core for instance metrics [1] https://github.com/openstack-k8s-operators/watcher-operator/pull/196 Change-Id: I4130ca6dd7cb52d96842e04e7720431ebc76efff Signed-off-by: morenod <dsanzmor@redhat.com>	2025-08-04 12:41:54 +02:00
Douglas Viroel	adfe3858aa	Configure watcher tempest's microversion in devstack Adds a tempest configuration for min and max microversions supported by watcher. This help us to define the correct range of microversion to be tested on each stable branch. New microversion proposals should also increase the default max_microversion, in order to work with watcher-tempest-plugin microversion testing. Change-Id: I0b695ba4530eb89ed17b3935b87e938cadec84cc Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-08-01 17:28:40 -03:00
Zuul	a1e7156c7e	Merge "finalize python 3.9 support removal"	2025-07-30 15:54:12 +00:00
Zuul	71470dac73	Merge "Add comprehensive release liaison guide for DPL model"	2025-07-30 15:24:49 +00:00
Zuul	5ba086095c	Merge "Fix release notes typo and extra information"	2025-07-21 18:41:57 +00:00
Sean Mooney	3e8392b8f1	finalize python 3.9 support removal The last release of openstack to support python 3.9 was 2025.1 (epoxy), with this change watcher now requires 3.10, testing of 3.9 was removed in previous commits. Change-Id: Ida53740293e93b0c20dec2e175b390fa18bed852 Signed-off-by: Sean Mooney <work@seanmooney.info>	2025-07-21 18:25:04 +01:00
Sean Mooney	20cd4a0394	Add comprehensive release liaison guide for DPL model Transform Nova's PTL guide into Watcher-specific release liaison documentation following the DPL governance model. This guide provides chronological guidance for release liaisons managing Watcher's cycle-with-intermediary release process. Key features: * DPL liaison coordination with proper precedence hierarchies * Watcher-specific project context and repository references * Enhanced FFE process with release liaison decision authority * Proper RST formatting with code blocks and cross-references * Comprehensive glossary of OpenStack release terminology * Usage guidance for both new and experienced release liaisons Adapts Nova's proven chronological structure while reflecting Watcher's distributed leadership model and technical requirements. Assisted-By: claude-code Change-Id: I133bb06e47c14deaca162a2bf024210f68d78ab2 Signed-off-by: Sean Mooney <work@seanmooney.info>	2025-07-21 16:34:47 +01:00
Zuul	374750847f	Merge "Merge decision engine services into a single one"	2025-07-17 13:09:11 +00:00
Chandan Kumar	2fe3b0cdbe	Fix release notes typo and extra information This cr fixes the release notes for https://review.opendev.org/c/openstack/watcher/+/954120/ and https://review.opendev.org/c/openstack/watcher/+/954120/ Related-Bug: #2110895 Related-Bug: #2115968 Change-Id: I1f3fc06549c2d5d7ba9debee424429a25a651070 Signed-off-by: Chandan Kumar <chkumar@redhat.com>	2025-07-09 15:44:20 +05:30
Zuul	9b9965265a	Merge "Drop Code related to OperationNotPermitted exception"	2025-07-08 19:31:11 +00:00
Zuul	98b56b66ac	Merge "Drops forbidden patch/delete/post action apis"	2025-07-08 18:38:40 +00:00
Douglas Viroel	081cd5fae9	Merge decision engine services into a single one The decision engine process was built based on 2 services: a service that handle rpc requests and a scheduler to trigger watcher periodic tasks. With the new version of oslo.service, a new threading backend was added, based on cotyledon service manager, which starts a new process for each service tha it manages. These two services can't run in different process since they need access to a shared in-memory representation of the cluster (cluster data models) This patch proposes creating a Decision Engine Service which includes everything in a single main service. Change-Id: I335a97ca14b6e023fef055978a56aefebf22d433 Signed-off-by: Douglas Viroel <viroel@gmail.com>	2025-07-08 09:55:32 -03:00
Zuul	1ab5babbb6	Merge "Move eventlet command scripts to a different dir"	2025-07-08 12:41:35 +00:00
Zuul	d771d00c5a	Merge "sqlalchemy: Use built-in declarative"	2025-07-08 12:41:32 +00:00
Chandan Kumar (raukadah)	e3b813e27e	Drop Code related to OperationNotPermitted exception The following exception was added in initial import of watcher code base[1]. In each of the controller REST APIs, it was called with a flag stating request was coming from top level resources apis. But this exception and code was not used anywhere in the rest api. It seems to be a dead code. So, it needs to be cleaned up. Note: In audit_template, under patchapi, this exception was used for not removal goal from audit template. Since this cr drops this exception, It replace the same with NotAuthorized exception keeping status code same. Links: [1]. `d14e057da1 (diff-6d510a275605e20ba8b435157062da2b749265a88a3cfd6d90abb7e8e5feac2aR235)` Closes-Bug: #2115968 Change-Id: I82a5e4a7a51726b3a89257c84a75157fbfcb82eb Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>	2025-07-04 19:07:13 +05:30
Chandan Kumar (raukadah)	c0a5abe29c	Drops forbidden patch/delete/post action apis These apis are not implemented with in the watcher code base and was marked as a forbidden to use. It does not make sense to keep these api as they are not implemented. This cr drops the code around that to make the action apis cleaner. Closes-Bug: #2110895 Change-Id: I0f465157e6cd481b27665ca6016db68c198cebeb Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>	2025-07-04 11:51:40 +05:30
Zuul	bbe30f93f2	Merge "Update workload balance doc per review comments"	2025-07-03 19:57:05 +00:00
Zuul	3bc5c72039	Merge "resolve fixme comments in RequestContext"	2025-07-03 17:28:54 +00:00
Zuul	203b926be0	Merge "Drop unused fake class"	2025-07-03 17:28:52 +00:00
Zuul	e64709ea08	Merge "Add warning message for experimental integrations"	2025-07-03 17:27:39 +00:00
Zuul	94d8676db8	Merge "add missing bindeps for docs"	2025-07-03 16:03:47 +00:00
Takashi Kajinami	828bcadf6a	sqlalchemy: Use built-in declarative sqlalchemy.ext.declarative was deprecated in sqlalchemy 1.4.0, due to the built-in implementations[1]. [1] https://github.com/sqlalchemy/sqlalchemy/commit/450f5c0d6519a439f40 Change-Id: Idb4a361d4d65ff53ecf33b8a2a6aa0d6f6ae1979	2025-06-30 22:14:33 +09:00
Zuul	93366df264	Merge "Add crosslinks to strategies table"	2025-06-30 13:02:28 +00:00
Takashi Kajinami	aa67096fe8	Drop unused fake class It was a left-over from removal of ceilometer datasource[1]. [1] `da23fdc621` Change-Id: I17ef33d6f70e2cc601add721661347d0bf210008	2025-06-28 20:35:09 +09:00
Ronelle Landy	6f72e33de5	Add crosslinks to strategies table These replace the full external links used previously. Change-Id: I9c79f7b7ddebaa25d243fdbe1eb422cba25de8f1	2025-06-27 16:54:38 -04:00
Ronelle Landy	56d0a0d6ea	Update workload balance doc per review comments The original documentation update review [1] had some additional comments for improvements. The commit adds the suggested changes. [1] https://review.opendev.org/c/openstack/watcher/+/951025 Change-Id: I4b4624e2dbc4c6a5f888ec77d6a03b8f66ff0a23	2025-06-27 16:46:17 -04:00
Ronelle Landy	de9eb2cd80	Add doc clarifications for Zone Migration Adds documation clarifications on how the strategy and associated parameters as used. Closes-Bug: #2112480 Change-Id: Id42c280fc5744bebb01d50b52b834e5b3b76af73	2025-06-27 16:12:41 -04:00
Zuul	76de167171	Merge "Add Integrations doc page with support matrix"	2025-06-27 16:09:51 +00:00
Zuul	70032aa477	Merge "Add table - level of test/usage per strategy"	2025-06-27 16:01:31 +00:00
Zuul	16131e5cac	Merge "Update Workload Balance strategy documentation"	2025-06-27 13:36:50 +00:00
Ronelle Landy	bfbd136f4b	Update Host Maintenance strategy documentation Add clarifications to the documentation to reflect the actual strategy usage, including: - updating parameter descriptions - extending the 'How to Use' section Closes-Bug: #2111810 Change-Id: Ifd2876056cd8819c50658fb9f213246dc1546d42	2025-06-23 06:36:42 -04:00
Zuul	fe8d8c8839	Merge "Use KiB as unit for host_ram_usage when using prometheus datasource"	2025-06-20 16:19:50 +00:00
Zuul	b8e0e6b01c	Merge "Aggregate by label when querying instance cpu usage in prometheus"	2025-06-19 14:46:07 +00:00
Alfredo Moralejo	6ea362da0b	Use KiB as unit for host_ram_usage when using prometheus datasource The prometheus datasource was reporting host_ram_usage in MiB as described in the docstring for the base datasource interface definition [1]. However, the gnocchi datasource is reporting it in KiB following ceilometer metric `hardware.memory.used` [2] and the strategies using that metric expect it to be in KiB so the best approach is to change the unit in the prometheus datasource and update the docstring to avoid missunderstandings in future. So, this patch is fixing the prometheus datasource to return host_ram_usage in KiB instead of MiB. Additionally, it is adding more unit tests for the check_threshold method so that it covers the memory based strategy execution, validates the calculated standard deviation and adds the cases where it is below the threshold. [1] `15981117ee/watcher/decision_engine/datasources/base.py (L177-L183)` [2] https://docs.openstack.org/ceilometer/train/admin/telemetry-measurements.html#snmp-based-meters Closes-Bug: #2113776 Change-Id: Idc060d1e709c0265c64ada16062c3a206c6b04fa	2025-06-19 16:25:27 +02:00
Zuul	0f78386462	Merge "Add debug message to report calculated metric for workload_balance"	2025-06-18 12:26:24 +00:00
Alfredo Moralejo	1529e3fadd	Add debug message to report calculated metric for workload_balance The workload_balance strategy calculates host metrics based on the instance metrics and those are the ones used to compare with the threshold. Currently, the strategy does not reports the calculated values what makes difficult to troubleshoot sometimes. This patch is adding a debug message to log those values. This patch is also adding a new unit test for filter_destination_hosts based on ram instead of cpu and adding assertions for the new debug messages. To implement properly the new test, I had to sligthly modify the ram usage fixtures used for the workload_balance tests. Change-Id: Ief5e167afcf346ff53471f26adc70795c4b69f68	2025-06-17 19:11:48 +02:00
Zuul	31879d26f4	Merge "Add unit test zone migration with_attached_volume"	2025-06-13 12:17:52 +00:00
Zuul	efbae9321e	Merge "devstack: Drop template for mod_wsgi"	2025-06-13 10:44:48 +00:00

1 2 3 4 5 ...

2647 Commits