With the events of eventlet removal, Watcher will need
to be adapted to support both modes, eventlet and threading, for
a couple of releases before removing all eventlet code.
This patch adds methods and classes that allow decision engine
modules to create futurist thread pools instead of green thread pools,
based on a environment variable that can be enabled by service.
It moves continuous audit handler instance to decison engine service,
so it can be started together with the main decision engine service.
Adds an environment variable that allows the user to disable
eventlet monkey patching and to use oslo.service threading backend.
Change-Id: I8a8be0a7cebdc44005fd77ec960543828c7da318
Signed-off-by: Douglas Viroel <viroel@gmail.com>
The prometheus datasource was reporting host_ram_usage in MiB as
described in the docstring for the base datasource interface
definition [1].
However, the gnocchi datasource is reporting it in KiB following
ceilometer metric `hardware.memory.used` [2] and the strategies
using that metric expect it to be in KiB so the best approach is
to change the unit in the prometheus datasource and update the
docstring to avoid missunderstandings in future. So, this patch
is fixing the prometheus datasource to return host_ram_usage
in KiB instead of MiB.
Additionally, it is adding more unit tests for the check_threshold
method so that it covers the memory based strategy execution, validates
the calculated standard deviation and adds the cases where it is below
the threshold.
[1] 15981117ee/watcher/decision_engine/datasources/base.py (L177-L183)
[2] https://docs.openstack.org/ceilometer/train/admin/telemetry-measurements.html#snmp-based-meters
Closes-Bug: #2113776
Change-Id: Idc060d1e709c0265c64ada16062c3a206c6b04fa
Some services integrations are now classified as experimental
and a warning message will now appear once a client is created
for them. These integrations are not fully tested in CI and
miss a documentation on how they work or should be used.
A release note was added to inform users about the status of
these integrations and related features.
Change-Id: Ib7d0ac0b3e187ae239dfa075fb53a6c0107dff29
Currently, in that case it was failing because watcher tried to create a
name based on a goal automatically and the goal is not defined.
This patch is moving the check for goal specification in the audit
creation call earlier, and if there is not goal defined, it returns an
invalid call error.
This patch is also modifying the existing error for this case to check
the expected behavior.
Closes-Bug: #2110947
Change-Id: I6f3d73b035e8081e86ce82c205498432f0e0fc33
Currently, an actionplan state is set to SUCCEEDED once the execution
has finished, but that does not imply that all the actions finished
successfully.
This patch is checking the actual state of all the actions in the plan
after the execution has finished. If any action has status FAILED, it
will set the state of the action plan as FAILED and will apply the
appropiate notification parameters. This is the expected behavior according
to Watcher documentation.
The patch is also fixing the unit test for this to set the expected
action plan state to FAILED and notification parameters.
Closes-Bug: #2106407
Change-Id: I7bfc6759b51cd97c26ec13b3918bd8d3b7ac9d4e
For compute nodes, nova works fine if a destination node is not
specified, so this change makes sure we're not passing None when the
user does not set one to avoid an error.
Partial-Bug: 2108988
Change-Id: Ida1f18b97697c041819e29f935aa5e232848226a
Currently, when trying to create an audit which misses a mandatory
parameter watcher returns error 500 instead of 400 which is the
documented error in the API [1] and the appropiate error code for
malformed requests.
This patch catch parameters validation errors according to the json
schema for each strategy and returns error 400. It also fixes the
unit test to validate the expected behavior.
[1] https://docs.openstack.org/api-ref/resource-optimization/#audits
Closes-Bug: #2110538
Change-Id: I23232b3b54421839bb01d54386d4e7b244f4e2a0
Currently host maintenance strategy also migrate instances from maintenance
node to watcher_disabled compute nodes.
watcher_disabled compute nodes might be disabled for some other purpose
by different strategy. If host maintenace use those compute nodes for
migration, It might affect customer workloads.
Host maintenance strategy should never touch disabled hosts unless the user
specify a disable host as backup node.
This cr drops the logic for using disabled compute node for maintenance.
Host maintaince is already using nova schedular for migrating the
instance, will use the same. If there is no available node, strategy
will fail.
Closes-Bug: #2109945
Change-Id: If9795fd06f684eb67d553405cebd8a30887c3997
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
Set the default interface for keystone_client to public in the watcher
conf instead of admin.
Closes-Bug: 2109494
Change-Id: I9e0289249981ca965190df6dbdc37e09fd0951d7
pip 23.1 removed the "setup.py install" fallback for projects that do
not have pyproject.toml and now uses a pyproject.toml which is vendored
in pip [1][2]. pip 24.2 has now deprecated a similar fallback to
"setup.py develop" and plans to fully remove this in pip 25.0 [3][4][5].
pbr supports editable installs since 6.0.0
pip 25.1 has now been released and the removal is complete.
by adding our own minimal pyproject.toml to ensure we are using the
correct build system.
This change also requires that we adapt how we generate our wsgi
entry point. when pyproject.toml is used the wsgi console script is
not generated in an editbale install such as is used in devstck
To adress this we need to refactor our usage of our wsgi applciation
to use a module path instead. This change does not remove
the declaration of our wsgi_scrtip entry point but it shoudl
be considered deprecated and it will be removed in the future.
To unblock the gate the devstack plugin is modifed to to deploy
using the wsgi module instead of the console script.
Finally supprot for the mod_wsgi wsgi mode is removed.
that was deprecated in devstack a few cycle ago and
support was removed in I8823e98809ed6b66c27dbcf21a00eea68ef403e8
[1] https://pip.pypa.io/en/stable/news/#v23-1
[2] https://github.com/pypa/pip/issues/8368
[3] https://pip.pypa.io/en/stable/news/#v24-2
[4] https://github.com/pypa/pip/issues/11457
[5] https://ichard26.github.io/blog/2024/08/whats-new-in-pip-24.2/
Closes-Bug: #2109608
Depends-on: https://review.opendev.org/c/openstack/watcher/+/948502
Change-Id: Iad77939ab0403c5720c549f96edfc77d2b7d90ee
Currently we are using `instance` label to query about host metrics to
prometheus. This label is assigned to the url of each endpoint being
scrapped.
While this work fine in one-exporter-per-compute cases as the driver is
mapping the fqdn_label value to the `instance` label value, it fails
when there are more that one target with the same value for the fqdn
label. This is a valid case, to be able to query by fqdn and do not
care about what exporter in the host is providing the metric.
This patch is changing the queries we use for hosts to be based on the
fqdn_label instead of the instance one. To implement it, we are also
simplifying the way we check the metric exist for the host by converting
prometheus_fqdn_instance_map into a prometheus_fqdn_labels set
which stores the list of fqdn found in prometheus.
Closes-Bug: #2103451
Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e
The Monasca project was marked inactive during 2023.1. Although we have
seen multiple people showing interest to keep the project, we haven't
seen any real progress.
Because the project is likely retired soon, let's deprecate the feature
dependent on Monasca so that we can remove it in a future release.
Change-Id: Ifd64f5ba59bbac238ff62302ec36a3e36954d6d0
In order to support vm_workload_consolidation, workload_balance and
workload_stabilization strategis some instance metrics are required.
This patch is adding support for them.
Implementation is based on a prometheus store populated using sg-core
from ceilometer metrics with Pollster source.
- instance_ram_usage: rely on ceilometer_memory_usage metrics created from
ceilometer memory.usage meter.
- instance_ram_allocated: rely on the memory value provided by the
inventory created from nova and placement APIs.
- instance_cpu_usage: rely on ceilometer_cpu metric created from
ceilometer cpu meter. A max value of 100 is set in the query.
- instance_root_disk_size: rely on the `disk` value provided by the
inventory created from nova and placement APIs.
A new parameterer `instance_uuid_label` has been added to the prometheus
datasource configuration to identify the label used to store the value of the
OpenStack instance uuid for eache instance metric in prometheus. Default
value is `resource`.
Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2
This adds a new data source for the Watcher decision engine that
implements the watcher.decision_engine.datasources.DataSourceBase.
related spec was merged at [1].
Implements: blueprint prometheus-datasource
[1] https://review.opendev.org/c/openstack/watcher-specs/+/933300
Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906
This datasource requires Ceilometer API which was already removed some
years ago. The implementation should have been removed when dependency
on ceilometerclient was removed by [1].
Also remove some job definitions which are not actually used.
[1] 01d74d0a87
Change-Id: I29c3865dc1207f1bbbb266e4217cf8888afebfb6
This chanage enabled codespell in precommit and
fixes the existing typos.
A followup commit will enable this in tox and ci.
Change-Id: I0a11bcd5a88247a48d3437525fc8a3cb3cdd4e58
This change adds configuration for the pre-commit tool,
follow-up changes will address the remaining issues in a phased
approach to make the reviews simpler.
This is based on the pre-commit config used in nova
with some additional hooks.
Follow-up changes will address the FIXME comments
related to sphinx-lint and codespell, as well as update tox
to enforce these checks in ci.
Change-Id: I87681a19f7fa88366c2b0d310c8b3153aa6a137b
As per the community goal of migrating the policy file
the format from JSON to YAML[1], we need to do two things:
1. Change the default value of '[oslo_policy] policy_file''
config option from 'policy.json' to 'policy.yaml' with
upgrade checks.
2. Deprecate the JSON formatted policy file on the project side
via warning in doc and releasenotes.
Also replace policy.json to policy.yaml ref from doc and tests.
[1]https://governance.openstack.org/tc/goals/selected/wallaby/migrate-policy-format-from-json-to-yaml.html
Change-Id: I207c02ba71fe60635fd3406c9c9364c11f259bae
Add the releasenote for the general purpose decision engine threadpool.
Including config parameters and how contributors can find relevant
documentation.
Implements: blueprint general-purpose-decision-engine-threadpool
Change-Id: I3560069b4e34f13305950559a0f05f7921f7867e
This strategy is used to centralize VMs to as few nodes as possible
by VM migration. User can set a input parameter to decide how to
select the destination node.
Implements: blueprint node-resource-consolidation
Closes-Bug: #1843016
Change-Id: I104c864d532c2092f5dc6f0c8f756ebeae12f09e
Add call_retry method for ModelBuilder classes along with configuration
options. This allows ModelBuilder classes to reattempt any failed calls
to external services such as Nova or Ironic.
Change-Id: Ided697adebed957e5ff13b4c6b5b06c816f81c4a
This is the releasenote for the new grafana datasource it refers to
the documentation on configuring grafana.
Depends-on: Ib12b6a7882703e84a27c301e821c1a034b192508
Change-Id: Icb3939d772f06ad2d66eeba9a59fa8b60822ece0
This patch implements uWSGI support for Watcher API service.
Because mod_wsgi is deprecated, using uwsgi to replace of mod_wsgi.
Most of Openstack projects have finished it.
Closes-Bug: #1834392
Change-Id: I3fad8d30a15aba493fb91da9337c2515ddea5167
Moves the query_retry method into the baseclass and makes the query
retry and timeout options part of the watcher_datasources config group.
This makes the query_retry behavior uniform across all datasources.
A new baseclass method named query_retry_reset is added so datasources
can define operations to perform when recovering from a query error.
Test cases are added to verify the behavior of query_retry.
The query_max_retries and query_timeout config parameters are
deprecated in the gnocchi_client group and will be removed in a future
release.
Change-Id: I33e9dc2d1f5ba8f83fcf1488ff583ca5be5529cc
This patch added Placement to Watcher
We plan to improve the data model and strategies in
the future specs.
Change-Id: I7141459eef66557cd5d525b5887bd2a381cdac3f
Implements: blueprint support-placement-api