watcher

Author	SHA1	Message	Date
Zuul	fe8d8c8839	Merge "Use KiB as unit for host_ram_usage when using prometheus datasource"	2025-06-20 16:19:50 +00:00
Zuul	b8e0e6b01c	Merge "Aggregate by label when querying instance cpu usage in prometheus"	2025-06-19 14:46:07 +00:00
Alfredo Moralejo	6ea362da0b	Use KiB as unit for host_ram_usage when using prometheus datasource The prometheus datasource was reporting host_ram_usage in MiB as described in the docstring for the base datasource interface definition [1]. However, the gnocchi datasource is reporting it in KiB following ceilometer metric `hardware.memory.used` [2] and the strategies using that metric expect it to be in KiB so the best approach is to change the unit in the prometheus datasource and update the docstring to avoid missunderstandings in future. So, this patch is fixing the prometheus datasource to return host_ram_usage in KiB instead of MiB. Additionally, it is adding more unit tests for the check_threshold method so that it covers the memory based strategy execution, validates the calculated standard deviation and adds the cases where it is below the threshold. [1] `15981117ee/watcher/decision_engine/datasources/base.py (L177-L183)` [2] https://docs.openstack.org/ceilometer/train/admin/telemetry-measurements.html#snmp-based-meters Closes-Bug: #2113776 Change-Id: Idc060d1e709c0265c64ada16062c3a206c6b04fa	2025-06-19 16:25:27 +02:00
Zuul	0f78386462	Merge "Add debug message to report calculated metric for workload_balance"	2025-06-18 12:26:24 +00:00
Alfredo Moralejo	1529e3fadd	Add debug message to report calculated metric for workload_balance The workload_balance strategy calculates host metrics based on the instance metrics and those are the ones used to compare with the threshold. Currently, the strategy does not reports the calculated values what makes difficult to troubleshoot sometimes. This patch is adding a debug message to log those values. This patch is also adding a new unit test for filter_destination_hosts based on ram instead of cpu and adding assertions for the new debug messages. To implement properly the new test, I had to sligthly modify the ram usage fixtures used for the workload_balance tests. Change-Id: Ief5e167afcf346ff53471f26adc70795c4b69f68	2025-06-17 19:11:48 +02:00
Zuul	31879d26f4	Merge "Add unit test zone migration with_attached_volume"	2025-06-13 12:17:52 +00:00
Zuul	efbae9321e	Merge "devstack: Drop template for mod_wsgi"	2025-06-13 10:44:48 +00:00
Ronelle Landy	0599618add	Add table - level of test/usage per strategy This patch adds a table to the strategies page to show the level of qualification and where the strategy can be triggered. Change-Id: I6991566fd5fec3f8bbae06eefa63a8b83a87eed1	2025-06-11 14:19:42 -04:00
Zuul	1d50c12e15	Merge "Adapt zuul.yaml strategies jobs to include tests with tag 'strategy'"	2025-06-11 13:47:34 +00:00
Alfredo Moralejo	3860de0b1e	Aggregate by label when querying instance cpu usage in prometheus Currently, when the prometheus datasource query ceilometer_cpu metric for instance cpu usage, it aggregates by instance and filter by the label containing the instance uuid. While this works fine in real scenarios, where a single metric is provided in a single instance, in some cases as the CI jobs where metrics are directly injected, leads to incorrect metric calculation. We applied a similar fix for the host metrics in [1] but we did not implement it for instance cpu. I am also converting the query formatting to the dict format to improve understability. [1] https://review.opendev.org/c/openstack/watcher/+/946049 Closes-Bug: #2113936 Change-Id: I3038dec20612162c411fc77446e86a47e0354423	2025-06-11 14:49:56 +02:00
Chandan Kumar (raukadah)	15981117ee	Drop unused method get_disabled_compute_nodes_with_reason get_disabled_compute_nodes_with_reason defined in host_maintenance strategy is not used anywhere. This cr drops the unused method. Change-Id: I07c0d0b63e00d476511aa8b03c0feab8ec4db95b Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>	2025-06-09 10:51:45 +05:30
Douglas Viroel	4f8c14646d	Move eventlet command scripts to a different dir This is a initial patch towards the eventlet removal in watcher. It moves cmd scripts that depends on eventlet to a eventlet dir, where it is always monkey patched. Change-Id: Ie23caab018fbf68f8c29a0f748c0708b97933b4b	2025-06-08 09:05:56 -03:00
Douglas Viroel	520ec0b79b	Add warning message for experimental integrations Some services integrations are now classified as experimental and a warning message will now appear once a client is created for them. These integrations are not fully tested in CI and miss a documentation on how they work or should be used. A release note was added to inform users about the status of these integrations and related features. Change-Id: Ib7d0ac0b3e187ae239dfa075fb53a6c0107dff29	2025-06-07 11:33:28 -03:00
Ronelle Landy	f42cb8557b	Update Workload Balance strategy documentation Adds additional parameter and usage explanations and combined example. Closes-Bug: #2111848 Change-Id: Id0de4d56fa7083388ad82c61596e7484431d465b	2025-06-06 15:51:23 -04:00
Douglas Viroel	b788a67c52	Add Integrations doc page with support matrix Adds a new documentation section that descript which service integrations are currently supported and their integrations status. This information is not clear today and will help to cover the lack of testing and documention about them. Change-Id: I26b2a2ef5672b78a575a2bdaef3a08d5bbc063bd	2025-06-05 13:31:02 -03:00
Zuul	73f8728d22	Merge "Fix audit creation with no name and no goal or audit_template"	2025-06-05 13:39:38 +00:00
Alfredo Moralejo	bf6a28bd1e	Fix audit creation with no name and no goal or audit_template Currently, in that case it was failing because watcher tried to create a name based on a goal automatically and the goal is not defined. This patch is moving the check for goal specification in the audit creation call earlier, and if there is not goal defined, it returns an invalid call error. This patch is also modifying the existing error for this case to check the expected behavior. Closes-Bug: #2110947 Change-Id: I6f3d73b035e8081e86ce82c205498432f0e0fc33	2025-06-04 14:46:36 +02:00
morenod	1256b24133	Adapt zuul.yaml strategies jobs to include tests with tag 'strategy' The idea is to adapt zuul.yaml to future test structure where every strategy will be on its own file so now we keep executing everything inside test_execute_strategies but also any other test on any file with tag 'strategy' Change-Id: I304c858078d35beb1f7b4f1fad4ea8bedde674af	2025-06-04 09:50:35 +00:00
Takashi Kajinami	a559c0505e	devstack: Drop template for mod_wsgi ... because mod_wsgi support was already removed by [1]. [1] `57b248f9fe` Change-Id: I100169b3fb7ed68d9b01abb4fc91bdd16eb68aa9	2025-06-04 00:14:07 +09:00
Zuul	59757249bb	Merge "Added unit test to validate audit creation with no goal and no name"	2025-05-27 19:32:07 +00:00
Zuul	58b25101e6	Merge "Return HTTP code 400 when creating an audit with wrong parameters"	2025-05-27 19:23:25 +00:00
Zuul	690a389369	Merge "Add a unit test to check the error when creating an audit with wrong parameters"	2025-05-27 19:23:23 +00:00
Zuul	1cdd392f96	Merge "Remove deprecated executor in message handling servers"	2025-05-26 14:44:39 +00:00
Zuul	20f231054a	Merge "Set actionplan state to FAILED if any action has failed"	2025-05-26 14:44:37 +00:00
Zuul	077c36be8a	Merge "Add unit test to check action plan state when a nested action fails"	2025-05-26 14:27:08 +00:00
Alfredo Moralejo	88d81c104e	Set actionplan state to FAILED if any action has failed Currently, an actionplan state is set to SUCCEEDED once the execution has finished, but that does not imply that all the actions finished successfully. This patch is checking the actual state of all the actions in the plan after the execution has finished. If any action has status FAILED, it will set the state of the action plan as FAILED and will apply the appropiate notification parameters. This is the expected behavior according to Watcher documentation. The patch is also fixing the unit test for this to set the expected action plan state to FAILED and notification parameters. Closes-Bug: #2106407 Change-Id: I7bfc6759b51cd97c26ec13b3918bd8d3b7ac9d4e	2025-05-26 14:58:03 +02:00
Zuul	8ac8a29fda	Merge "Fix incorrect logging format"	2025-05-26 11:47:26 +00:00
Zuul	cd2910b0e9	Merge "Check logs in some cinder and nova helper tests"	2025-05-26 11:45:12 +00:00
jgilaber	167fb61b4e	Add unit test zone migration with_attached_volume Add a test for the zone migration strategy using the with_attached_volume parameter, setting storage_pools but not compute_nodes. With volumes attached to instances, with these inputs, the strategy should propose an action plan to migrate volumes and the instances they are attached to, since Nova, even without the user passing a destination node for the instances is able to find one. However, the execution results in an error, since the strategy assumes that the compute_nodes dict will always be there. Change-Id: Ifac28b1aab8a0caf77d97e4c19d051e764256674	2025-05-22 17:09:13 +02:00
Chandan Kumar (raukadah)	188e583dcb	Drop sg_core related prometheus var https://review.opendev.org/c/openstack/devstack-plugin-prometheus/+/950476 adds the support for passing custom scrape target and https://github.com/openstack-k8s-operators/sg-core/pull/25 drops sg_core prometheus related vars. So we also need to sg_core related prometheus vars from our job. This cr achieves the same. Depends-On: https://github.com/openstack-k8s-operators/sg-core/pull/25 Depends-On: https://review.opendev.org/c/openstack/devstack-plugin-prometheus/+/950476 Change-Id: I6c8f54f8749e81b532c88e9224022294c4a1d331 Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>	2025-05-21 16:52:36 +05:30
Zuul	26e36e1620	Merge "Handle missing dst_node parameter in zone_migration"	2025-05-20 17:14:29 +00:00
Sean Mooney	a016b3f4ea	add missing bindeps for docs This add two more bindep targets to encode the doc and pdf-docs deps. Change-Id: Ide54be172c485025e567ede39c238b39b01c89e0	2025-05-19 23:55:20 +00:00
Sean Mooney	9f6c8725ed	resolve fixme comments in RequestContext This change removes all the duplicate fields from the watcher RequestContext. It also removes several filed like quota_class and remote_address that were cargo culted from nova but never used in watcher when notification support was added. Change-Id: Ibf8739d6cd2d4557df6f8de6c780b6f4280b774f	2025-05-19 20:19:52 +01:00
Sean Mooney	040a7f5c41	update tests for new oslo.context release context.user has been deprecated for years and renamed to user_id the deprecated field has now been removed so this change updates our test cases to reflect that. Change-Id: I120441fb9392c370c57dc63d8c115d8993d25f62	2025-05-19 19:11:23 +01:00
Zuul	3585e0cc3e	Merge "Drop code from Host maintenance strategy migrating instance to disabled hosts"	2025-05-16 18:18:26 +00:00
Zuul	ba8370e1ad	Merge "Migrate value column of efficacy indicator on load"	2025-05-16 18:16:23 +00:00
Zuul	97c4e70847	Merge "Add test for missing destination in zone migration"	2025-05-16 17:10:18 +00:00
jgilaber	c6302edeca	Handle missing dst_node parameter in zone_migration For compute nodes, nova works fine if a destination node is not specified, so this change makes sure we're not passing None when the user does not set one to avoid an error. Partial-Bug: 2108988 Change-Id: Ida1f18b97697c041819e29f935aa5e232848226a	2025-05-16 13:51:47 +02:00
Alfredo Moralejo	0651fff910	Added unit test to validate audit creation with no goal and no name This patch is adding a new unit test to validate the behavior of the API when trying to create an audit without a goal (whether using a goal or audit template parameters) and no name is provided. Related-Bug: https://bugs.launchpad.net/watcher/+bug/2110947 Change-Id: I04df10a8a0eea4509856f2f4b9d11bae24cd563a	2025-05-16 11:13:52 +02:00
Alfredo Moralejo	b36ba8399e	Add unit test to check action plan state when a nested action fails This patch is adding a new unit test to check the behavior of the action plan when one of the actions in it fails during execution. Note this is to show a bug, and the expected state will be changed in the fixing patch. Related-Bug: #2106407 Change-Id: I2f3fe8f4da772a96db098066d253e5dee330101a	2025-05-16 09:52:28 +02:00
Alfredo Moralejo	4629402f38	Return HTTP code 400 when creating an audit with wrong parameters Currently, when trying to create an audit which misses a mandatory parameter watcher returns error 500 instead of 400 which is the documented error in the API [1] and the appropiate error code for malformed requests. This patch catch parameters validation errors according to the json schema for each strategy and returns error 400. It also fixes the unit test to validate the expected behavior. [1] https://docs.openstack.org/api-ref/resource-optimization/#audits Closes-Bug: #2110538 Change-Id: I23232b3b54421839bb01d54386d4e7b244f4e2a0	2025-05-16 09:35:50 +02:00
Zuul	86a260a2c7	Merge "Set keystone_client default interface to public"	2025-05-15 12:45:52 +00:00
jgilaber	63626d6fc3	Add test for missing destination in zone migration Add some tests to show that the zone migration strategy generates problematic input parameters for actions in some cases when destination parameters are not passed for instances or volumes. Change-Id: Idc3af0e6d9d2d5388ff3d152d81e63364758607b	2025-05-15 13:00:39 +02:00
afanasev.s	0f5b6a07d0	Fix incorrect logging format Fix incorrect logging format for multiple variables because of what this functionality didn't work correctly and some log messages were skipped. The logging calls require two arguments, but they are passed in a tuple so it's interpreted as one argument only and it fails as is missing the second argument. Closes-Bug: 2110149 Change-Id: I74ed44134b50782c105a0e82f3af34a5fa45d119	2025-05-15 12:55:18 +02:00
jgilaber	7d90a079b0	Check logs in some cinder and nova helper tests Check the debug logs for some methods in the cinder and nova helpers to reproduce the erros described in bug [1]. The logger is disabled by default, so the error was being ignored, in order to show the error, the logger needs to be enabled for the tests in question. The logging was disabled by allembic configuring logging in [2], so this patch also removes that logging config to expose the errors. [1] https://bugs.launchpad.net/watcher/+bug/2110149. [2] https://github.com/openstack/watcher/blob/master/watcher/db/sqlalchemy/alembic/env.py#L26 Change-Id: I3598ca1d08d260602c392f8a8098821faa53f570	2025-05-15 12:55:18 +02:00
Alfredo Moralejo	891119470c	Add a unit test to check the error when creating an audit with wrong parameters Currently, it is returning http error code 500 instead of 400, which would be the appropiate code. A follow-up patch will be sent with the vix and switching the error code and message. Related-Bug: #2110538 Change-Id: I35ccbb9cf29fc08e78c4d5f626a6518062efbed3	2025-05-14 17:01:59 +02:00
Chandan Kumar (raukadah)	9dea55bd64	Drop code from Host maintenance strategy migrating instance to disabled hosts Currently host maintenance strategy also migrate instances from maintenance node to watcher_disabled compute nodes. watcher_disabled compute nodes might be disabled for some other purpose by different strategy. If host maintenace use those compute nodes for migration, It might affect customer workloads. Host maintenance strategy should never touch disabled hosts unless the user specify a disable host as backup node. This cr drops the logic for using disabled compute node for maintenance. Host maintaince is already using nova schedular for migrating the instance, will use the same. If there is no available node, strategy will fail. Closes-Bug: #2109945 Change-Id: If9795fd06f684eb67d553405cebd8a30887c3997 Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>	2025-05-14 09:24:25 +05:30
Douglas Viroel	b4ef969eec	Remove deprecated executor in message handling servers Removes the deprecated message executor when creating both RPC and notification server instances. This parameter is deprecated[1], as well eventlet option. When not defined, the server will get the one that fits better the current context (monkey patched or not)[2] [1] `27d833e374` [2] `412ab4de92/oslo_messaging/_utils.py (L87)` Change-Id: I784407aa7db10bddcec5dc663e1cec65174631e0	2025-05-13 14:10:18 -03:00
jgilaber	322c89d982	Migrate value column of efficacy indicator on load In a recent change [1] we modified the database schema for efficacy indicators to use a 'data' column. However, that patch only contained the schema migration and a fallback to be able to read from older databases, and not any kind of data migration. This change introduces a migration on load, so whenever an efficacy indicator without a 'data' column is loaded, the column is populated in the database. The change also modifies the migration test to verify the procedure works well. [1] https://review.opendev.org/c/openstack/watcher/+/945199 Change-Id: Ib0621b0e03451faca803018d6a2f3ad657a25fb5	2025-05-13 16:36:59 +02:00
Zuul	59607f616a	Merge "Drop nova command reference from the code"	2025-05-13 12:39:25 +00:00

1 2 3 4 5 ...

2704 Commits