Commit Graph

110 Commits

Author SHA1 Message Date
Douglas Viroel
f879b10b05 Extend decision engine to support threading mode
With the events of eventlet removal, Watcher will need
to be adapted to support both modes, eventlet and threading, for
a couple of releases before removing all eventlet code.
This patch adds methods and classes that allow decision engine
modules to create futurist thread pools instead of green thread pools,
based on a environment variable that can be enabled by service.
It moves continuous audit handler instance to decison engine service,
so it can be started together with the main decision engine service.
Adds an environment variable that allows the user to disable
eventlet monkey patching and to use oslo.service threading backend.

Change-Id: I8a8be0a7cebdc44005fd77ec960543828c7da318
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-05 16:45:48 -03:00
Zuul
e64709ea08 Merge "Add warning message for experimental integrations" 2025-07-03 17:27:39 +00:00
Alfredo Moralejo
6ea362da0b Use KiB as unit for host_ram_usage when using prometheus datasource
The prometheus datasource was reporting host_ram_usage in MiB as
described in the docstring for the base datasource interface
definition [1].

However, the gnocchi datasource is reporting it in KiB following
ceilometer metric `hardware.memory.used` [2] and the strategies
using that metric expect it to be in KiB so the best approach is
to change the unit in the prometheus datasource and update the
docstring to avoid missunderstandings in future. So, this patch
is fixing the prometheus datasource to return host_ram_usage
in KiB instead of MiB.

Additionally, it is adding more unit tests for the check_threshold
method so that it covers the memory based strategy execution, validates
the calculated standard deviation and adds the cases where it is below
the threshold.

[1] 15981117ee/watcher/decision_engine/datasources/base.py (L177-L183)
[2] https://docs.openstack.org/ceilometer/train/admin/telemetry-measurements.html#snmp-based-meters

Closes-Bug: #2113776
Change-Id: Idc060d1e709c0265c64ada16062c3a206c6b04fa
2025-06-19 16:25:27 +02:00
Douglas Viroel
520ec0b79b Add warning message for experimental integrations
Some services integrations are now classified as experimental
and a warning message will now appear once a client is created
for them. These integrations are not fully tested in CI and
miss a documentation on how they work or should be used.
A release note was added to inform users about the status of
these integrations and related features.

Change-Id: Ib7d0ac0b3e187ae239dfa075fb53a6c0107dff29
2025-06-07 11:33:28 -03:00
Zuul
73f8728d22 Merge "Fix audit creation with no name and no goal or audit_template" 2025-06-05 13:39:38 +00:00
Alfredo Moralejo
bf6a28bd1e Fix audit creation with no name and no goal or audit_template
Currently, in that case it was failing because watcher tried to create a
name based on a goal automatically and the goal is not defined.

This patch is moving the check for goal specification in the audit
creation call earlier, and if there is not goal defined, it returns an
invalid call error.

This patch is also modifying the existing error for this case to check
the expected behavior.

Closes-Bug: #2110947

Change-Id: I6f3d73b035e8081e86ce82c205498432f0e0fc33
2025-06-04 14:46:36 +02:00
Zuul
58b25101e6 Merge "Return HTTP code 400 when creating an audit with wrong parameters" 2025-05-27 19:23:25 +00:00
Zuul
20f231054a Merge "Set actionplan state to FAILED if any action has failed" 2025-05-26 14:44:37 +00:00
Alfredo Moralejo
88d81c104e Set actionplan state to FAILED if any action has failed
Currently, an actionplan state is set to SUCCEEDED once the execution
has finished, but that does not imply that all the actions finished
successfully.

This patch is checking the actual state of all the actions in the plan
after the execution has finished. If any action has status FAILED, it
will set the state of the action plan as FAILED and will apply the
appropiate notification parameters. This is the expected behavior according
to Watcher documentation.

The patch is also fixing the unit test for this to set the expected
action plan state to FAILED and notification parameters.

Closes-Bug: #2106407
Change-Id: I7bfc6759b51cd97c26ec13b3918bd8d3b7ac9d4e
2025-05-26 14:58:03 +02:00
Zuul
26e36e1620 Merge "Handle missing dst_node parameter in zone_migration" 2025-05-20 17:14:29 +00:00
Zuul
3585e0cc3e Merge "Drop code from Host maintenance strategy migrating instance to disabled hosts" 2025-05-16 18:18:26 +00:00
jgilaber
c6302edeca Handle missing dst_node parameter in zone_migration
For compute nodes, nova works fine if a destination node is not
specified, so this change makes sure we're not passing None when the
user does not set one to avoid an error.

Partial-Bug: 2108988

Change-Id: Ida1f18b97697c041819e29f935aa5e232848226a
2025-05-16 13:51:47 +02:00
Alfredo Moralejo
4629402f38 Return HTTP code 400 when creating an audit with wrong parameters
Currently, when trying to create an audit which misses a mandatory
parameter watcher returns error 500 instead of 400 which is the
documented error in the API [1] and the appropiate error code for
malformed requests.

This patch catch parameters validation errors according to the json
schema for each strategy and returns error 400. It also fixes the
unit test to validate the expected behavior.

[1] https://docs.openstack.org/api-ref/resource-optimization/#audits

Closes-Bug: #2110538
Change-Id: I23232b3b54421839bb01d54386d4e7b244f4e2a0
2025-05-16 09:35:50 +02:00
Zuul
86a260a2c7 Merge "Set keystone_client default interface to public" 2025-05-15 12:45:52 +00:00
Chandan Kumar (raukadah)
9dea55bd64 Drop code from Host maintenance strategy migrating instance to disabled hosts
Currently host maintenance strategy also migrate instances from maintenance
node to watcher_disabled compute nodes.

watcher_disabled compute nodes might be disabled for some other purpose
by different strategy. If host maintenace use those compute nodes for
migration, It might affect customer workloads.

Host maintenance strategy should never touch disabled hosts unless the user
specify a disable host as backup node.

This cr drops the logic for using disabled compute node for maintenance.
Host maintaince is already using nova schedular for migrating the
instance, will use the same. If there is no available node, strategy
will fail.

Closes-Bug: #2109945

Change-Id: If9795fd06f684eb67d553405cebd8a30887c3997
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-05-14 09:24:25 +05:30
Douglas Viroel
17d1cf535a Deprecated Noisy Neighbor strategy
Noisy neighbor strategy is a proof of concept strategy that was
built based on LLC metric, which is not available in Nova since
Victoria release[1].
This patch marks this strategy as deprecated, to be removed in
future releases.

[1] https://docs.openstack.org/releasenotes/nova/victoria.html#relnotes-22-0-0-unmaintained-victoria-upgrade-notes

Change-Id: I940b88555007312c76a86706bd44a38fbcf7701e
2025-05-12 15:44:39 -03:00
jgilaber
ae48f65f20 Set keystone_client default interface to public
Set the default interface for keystone_client to public in the watcher
conf instead of admin.

Closes-Bug: 2109494

Change-Id: I9e0289249981ca965190df6dbdc37e09fd0951d7
2025-05-09 08:16:51 +02:00
Sean Mooney
57b248f9fe Add support for pyproject.toml and wsgi module paths
pip 23.1 removed the "setup.py install" fallback for projects that do
not have pyproject.toml and now uses a pyproject.toml which is vendored
in pip [1][2]. pip 24.2 has now deprecated a similar fallback to
"setup.py develop" and plans to fully remove this in pip 25.0 [3][4][5].
pbr supports editable installs since 6.0.0

pip 25.1 has now been released and the removal is complete.
by adding our own minimal pyproject.toml to ensure we are using the
correct build system.

This change also requires that we adapt how we generate our wsgi
entry point. when pyproject.toml is used the wsgi console script is
not generated in an editbale install such as is used in devstck

To adress this we need to refactor our usage of our wsgi applciation
to use a module path instead. This change does not remove
the declaration of our wsgi_scrtip entry point but it shoudl
be considered deprecated and it will be removed in the future.

To unblock the gate the devstack plugin is modifed to to deploy
using the wsgi module instead of the console script.

Finally supprot for the mod_wsgi wsgi mode is removed.
that was deprecated in devstack a few cycle ago and
support was removed in I8823e98809ed6b66c27dbcf21a00eea68ef403e8

[1] https://pip.pypa.io/en/stable/news/#v23-1
[2] https://github.com/pypa/pip/issues/8368
[3] https://pip.pypa.io/en/stable/news/#v24-2
[4] https://github.com/pypa/pip/issues/11457
[5] https://ichard26.github.io/blog/2024/08/whats-new-in-pip-24.2/
Closes-Bug: #2109608

Depends-on: https://review.opendev.org/c/openstack/watcher/+/948502
Change-Id: Iad77939ab0403c5720c549f96edfc77d2b7d90ee
2025-05-01 00:19:59 +00:00
Alfredo Moralejo
a65e7e9b59 Query by fqdn_label instead of instance for host metrics
Currently we are using `instance` label to query about host metrics to
prometheus. This label is assigned to the url of each endpoint being
scrapped.

While this work fine in one-exporter-per-compute cases as the driver is
mapping the fqdn_label value to the `instance` label value, it fails
when there are more that one target with the same value for the fqdn
label. This is a valid case, to be able to query by fqdn and do not
care about what exporter in the host is providing the metric.

This patch is changing the queries we use for hosts to be based on the
fqdn_label instead of the instance one. To implement it, we are also
simplifying the way we check the metric exist for the host by converting
prometheus_fqdn_instance_map into a prometheus_fqdn_labels set
which stores the list of fqdn found in  prometheus.

Closes-Bug: #2103451
Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e
2025-03-19 15:25:24 +01:00
Sean Mooney
bbf5c41cab Add epoxy prelude
This change added the prelude for the 2025.1 Expoxy release cycle.

Change-Id: I8223842a57491a91c565e47bd1819db4d142e628
2025-03-05 17:57:55 +00:00
Takashi Kajinami
977f014cba Deprecate Monasca data source
The Monasca project was marked inactive during 2023.1. Although we have
seen multiple people showing interest to keep the project, we haven't
seen any real progress.

Because the project is likely retired soon, let's deprecate the feature
dependent on Monasca so that we can remove it in a future release.

Change-Id: Ifd64f5ba59bbac238ff62302ec36a3e36954d6d0
2025-02-16 18:45:31 +09:00
Zuul
4527f89d8d Merge "Add support for instance metrics to prometheus datasource" 2025-02-03 13:22:28 +00:00
Zuul
e535177bc0 Merge "Remove ceilometer datasource" 2025-01-29 13:22:46 +00:00
Alfredo Moralejo
136e5d927c Add support for instance metrics to prometheus datasource
In order to support vm_workload_consolidation, workload_balance and
workload_stabilization strategis some instance metrics are required.
This patch is adding support for them.

Implementation is based on a prometheus store populated using sg-core
from ceilometer metrics with Pollster source.

- instance_ram_usage: rely on ceilometer_memory_usage metrics created from
  ceilometer memory.usage meter.
- instance_ram_allocated: rely on the memory value provided by the
  inventory created from nova and placement APIs.
- instance_cpu_usage: rely on ceilometer_cpu metric created from
  ceilometer cpu meter. A max value of 100 is set in the query.
- instance_root_disk_size: rely on the `disk` value provided by the
  inventory created from nova and placement APIs.

A new parameterer `instance_uuid_label` has been added to the prometheus
datasource configuration to identify the label used to store the value of the
OpenStack instance uuid for eache instance metric in prometheus. Default
value is `resource`.

Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2
2025-01-23 13:23:04 +01:00
m
3f26dc47f2 Add prometheus data source for watcher decision engine
This adds a new data source for the Watcher decision engine that
implements the watcher.decision_engine.datasources.DataSourceBase.

related spec was merged at [1].

Implements: blueprint prometheus-datasource

[1] https://review.opendev.org/c/openstack/watcher-specs/+/933300

Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906
2025-01-10 15:20:37 +02:00
Zuul
70ba13ca6d Merge "Update python versions, drop py3.8" 2024-12-21 01:58:27 +00:00
Takashi Kajinami
da23fdc621 Remove ceilometer datasource
This datasource requires Ceilometer API which was already removed some
years ago. The implementation should have been removed when dependency
on ceilometerclient was removed by [1].

Also remove some job definitions which are not actually used.

[1] 01d74d0a87

Change-Id: I29c3865dc1207f1bbbb266e4217cf8888afebfb6
2024-12-16 23:51:27 +09:00
Sean Mooney
5f79ab87c7 [pre-commit] fix typos and configure codespell
This chanage enabled codespell in precommit and
fixes the existing typos.

A followup commit will enable this in tox and ci.

Change-Id: I0a11bcd5a88247a48d3437525fc8a3cb3cdd4e58
2024-11-07 19:50:21 +00:00
Martin Kopec
6adaedf696 Update python versions, drop py3.8
The current testing runtime [1] states testing from py3.9
to 3.12. The patch updates setup.cfg to reflect the correct
python versions.

The patch also drops python 3.8 support following [2].

[1] https://governance.openstack.org/tc/reference/runtimes/2025.1.html
[2] https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/FOWV4UQZTH4DPDA67QDEROAESYU5Z3LE/

Change-Id: I2d13409c9bfffc866e31af52611a26f6037021cc
2024-11-06 16:00:11 +01:00
Sean Mooney
9d8b990fd1 [pre-commit] Add initial pre-commit config
This change adds configuration for the pre-commit tool,
follow-up changes will address the remaining issues in a phased
approach to make the reviews simpler.

This is based on the pre-commit config used in nova
with some additional hooks.

Follow-up changes will address the FIXME comments
related to sphinx-lint and codespell, as well as update tox
to enforce these checks in ci.

Change-Id: I87681a19f7fa88366c2b0d310c8b3153aa6a137b
2024-10-22 20:12:53 +01:00
Ghanshyam Mann
863815153e [goal] Deprecate the JSON formatted policy file
As per the community goal of migrating the policy file
the format from JSON to YAML[1], we need to do two things:

1. Change the default value of '[oslo_policy] policy_file''
config option from 'policy.json' to 'policy.yaml' with
upgrade checks.

2. Deprecate the JSON formatted policy file on the project side
via warning in doc and releasenotes.

Also replace policy.json to policy.yaml ref from doc and tests.

[1]https://governance.openstack.org/tc/goals/selected/wallaby/migrate-policy-format-from-json-to-yaml.html

Change-Id: I207c02ba71fe60635fd3406c9c9364c11f259bae
2021-02-12 19:59:27 +00:00
Zuul
2591b03625 Merge "Add releasenote for event-driven-optimization-based" 2020-02-13 07:05:04 +00:00
licanwei
58083bb67b releasenotes: Fix reference url
Change-Id: I0da6021f6d39cb7d6e79e8f637046d8dd0285647
2020-02-05 16:48:49 +08:00
licanwei
f79321ceeb Add releasenote for event-driven-optimization-based
Change-Id: If8fa82dab2e7f0ae359805eb68cc8562cfc641e3
Implements: blueprint event-driven-optimization-based
2020-02-04 03:46:32 +00:00
Dantali0n
ba43f766b8 Releasenote for decision engine threadpool
Add the releasenote for the general purpose decision engine threadpool.
Including config parameters and how contributors can find relevant
documentation.

Implements: blueprint general-purpose-decision-engine-threadpool

Change-Id: I3560069b4e34f13305950559a0f05f7921f7867e
2019-11-30 03:13:15 +00:00
Ghanshyam Mann
17f5a65a62 [ussuri][goal] Drop python 2.7 support and testing
OpenStack is dropping the py2.7 support in ussuri cycle.

Watcher is ready with python 3 and ok to drop the
python 2.7 support.

Complete discussion & schedule can be found in
- http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010142.html
- https://etherpad.openstack.org/p/drop-python2-support

Ussuri Communtiy-wide goal:
https://governance.openstack.org/tc/goals/selected/ussuri/drop-py27.html

Depends-On: https://review.opendev.org/#/c/693631/

Change-Id: I603c6d2c22779e8ef2e70eb6369fc521a77c9c3a
2019-11-16 14:55:01 +00:00
licanwei
a88e076646 Watcher planner slector releasenote
Change-Id: I632a59d9e3cb6f5d0dad8987b1b01934d9ce0b42
Implements: bp watcher-planner-selector
2019-09-18 01:59:01 -07:00
Zuul
67e9e16d62 Merge "node resource consolidation" 2019-09-16 14:44:50 +00:00
chenke
03a6216da0 Add releasenote about bp show-datamodel-api
Partially Implements:blueprint show-datamodel-api

Change-Id: I2f8a41cd8f9f805bd3796cbd639bec233546b521
2019-09-10 09:39:10 +08:00
licanwei
f1fe4b6c62 node resource consolidation
This strategy is used to centralize VMs to as few nodes as possible
by VM migration. User can set a input parameter to decide how to
select the destination node.

Implements: blueprint node-resource-consolidation
Closes-Bug: #1843016
Change-Id: I104c864d532c2092f5dc6f0c8f756ebeae12f09e
2019-09-06 18:03:43 -07:00
licanwei
4b2238f9a5 add releasenote for bp improve-compute-data-model
Change-Id: I19780be28912cb0ea1cad49c7c0f43ab3ba8f6e7
Implements: blueprint improve-compute-data-model
2019-08-09 03:06:43 +00:00
Dantali0n
cadc000f32 Add call_retry for ModelBuilder for error recovery
Add call_retry method for ModelBuilder classes along with configuration
options. This allows ModelBuilder classes to reattempt any failed calls
to external services such as Nova or Ironic.

Change-Id: Ided697adebed957e5ff13b4c6b5b06c816f81c4a
2019-07-19 16:09:18 +02:00
Dantali0n
a45f5abe48 Releasenote for grafana datasource
This is the releasenote for the new grafana datasource it refers to
the documentation on configuring grafana.

Depends-on: Ib12b6a7882703e84a27c301e821c1a034b192508
Change-Id: Icb3939d772f06ad2d66eeba9a59fa8b60822ece0
2019-07-02 10:20:49 +02:00
licanwei
c1a5e443fe Add uWSGI support
This patch implements uWSGI support for Watcher API service.
Because mod_wsgi is deprecated, using uwsgi to replace of mod_wsgi.
Most of Openstack projects have finished it.

Closes-Bug: #1834392
Change-Id: I3fad8d30a15aba493fb91da9337c2515ddea5167
2019-06-27 14:56:52 +08:00
Zuul
667d2d661a Merge "Move datasource query_retry into baseclass." 2019-06-14 09:04:15 +00:00
Zuul
b2111baf91 Merge "Backwards compatibility for node parameter" 2019-06-14 07:42:09 +00:00
Zuul
5f126cffe0 Merge "Add Placement helper" 2019-06-14 02:21:44 +00:00
Dantali0n
584eeefdc8 Move datasource query_retry into baseclass.
Moves the query_retry method into the baseclass and makes the query
retry and timeout options part of the watcher_datasources config group.
This makes the query_retry behavior uniform across all datasources.

A new baseclass method named query_retry_reset is added so datasources
can define operations to perform when recovering from a query error.
Test cases are added to verify the behavior of query_retry.

The query_max_retries and query_timeout config parameters are
deprecated in the gnocchi_client group and will be removed in a future
release.

Change-Id: I33e9dc2d1f5ba8f83fcf1488ff583ca5be5529cc
2019-06-13 15:52:53 +02:00
Dantali0n
dd119ca1f8 Backwards compatibility for node parameter
Adds backwards compatibility for node parameter used by strategies. If
the node value is set by the user configuration it will override the
value for compute_node which is the value used by the strategies now.

This change was introduced in: https://review.opendev.org/#/c/656622/
Resolution discussed in the meeting on the 5th of June 2019
https://eavesdrop.openstack.org/meetings/watcher/2019/watcher.2019-06-05-08.00.log.html

Change-Id: Idaea062789a6b169e64f556fecc34cfbaaee5076
2019-06-12 16:23:58 +00:00
licanwei
b57feba5e8 Add Placement helper
This patch added Placement to Watcher
We plan to improve the data model and strategies in
the future specs.

Change-Id: I7141459eef66557cd5d525b5887bd2a381cdac3f
Implements: blueprint support-placement-api
2019-06-12 11:11:13 +08:00