While in a regular case a specific metric for a specific host will be
provider by a single instance (exporter) so aggregating by label and by
intances should be the same, it is more correct to aggregate by the same
label that the one we use to filter the metrics.
This is follow up of https://review.opendev.org/c/openstack/watcher/+/944795
Related-Bug: #2103451
Change-Id: Ia61f051547ddc51e0d1ccd5a56485ab49ce84c2e
Currently we are using `instance` label to query about host metrics to
prometheus. This label is assigned to the url of each endpoint being
scrapped.
While this work fine in one-exporter-per-compute cases as the driver is
mapping the fqdn_label value to the `instance` label value, it fails
when there are more that one target with the same value for the fqdn
label. This is a valid case, to be able to query by fqdn and do not
care about what exporter in the host is providing the metric.
This patch is changing the queries we use for hosts to be based on the
fqdn_label instead of the instance one. To implement it, we are also
simplifying the way we check the metric exist for the host by converting
prometheus_fqdn_instance_map into a prometheus_fqdn_labels set
which stores the list of fqdn found in prometheus.
Closes-Bug: #2103451
Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e
The Monasca project was marked inactive during 2023.1. Although we have
seen multiple people showing interest to keep the project, we haven't
seen any real progress.
Because the project is likely retired soon, let's deprecate the feature
dependent on Monasca so that we can remove it in a future release.
Change-Id: Ifd64f5ba59bbac238ff62302ec36a3e36954d6d0
More refactoring of the SQLAlchemy database layer to improve
compatility with eventlet on newer Pythons.
Inspired by 0ce2c41404
Related-Bug: 2067815
Change-Id: Ib5e9aa288232cc1b766bbf2a8ce2113d5a8e2f7d
Run bandit check from per-commit so that the check is executed in pep8
job.
Also remove requirements installed automatically by pre-commit from
test-requirements.
Change-Id: I45af8c47afb262882ebbee74ae52446fed741e26
In order to support vm_workload_consolidation, workload_balance and
workload_stabilization strategis some instance metrics are required.
This patch is adding support for them.
Implementation is based on a prometheus store populated using sg-core
from ceilometer metrics with Pollster source.
- instance_ram_usage: rely on ceilometer_memory_usage metrics created from
ceilometer memory.usage meter.
- instance_ram_allocated: rely on the memory value provided by the
inventory created from nova and placement APIs.
- instance_cpu_usage: rely on ceilometer_cpu metric created from
ceilometer cpu meter. A max value of 100 is set in the query.
- instance_root_disk_size: rely on the `disk` value provided by the
inventory created from nova and placement APIs.
A new parameterer `instance_uuid_label` has been added to the prometheus
datasource configuration to identify the label used to store the value of the
OpenStack instance uuid for eache instance metric in prometheus. Default
value is `resource`.
Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2
This adds a new data source for the Watcher decision engine that
implements the watcher.decision_engine.datasources.DataSourceBase.
related spec was merged at [1].
Implements: blueprint prometheus-datasource
[1] https://review.opendev.org/c/openstack/watcher-specs/+/933300
Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906
This datasource requires Ceilometer API which was already removed some
years ago. The implementation should have been removed when dependency
on ceilometerclient was removed by [1].
Also remove some job definitions which are not actually used.
[1] 01d74d0a87
Change-Id: I29c3865dc1207f1bbbb266e4217cf8888afebfb6
This commit removes the execute bit from several files
and remove the shebang lines from the devstack plugin.
While the devstack plugin is written in bash, it is not an executable
script. The devstack plugin is sourced by devstack as needed,
as such it is not executed in a subshell and the #!/bin/bash
lines are not used even when present.
Change-Id: I82ca22b7a47bf267fe6cf11f3e3519510108c146
This chanage enabled codespell in precommit and
fixes the existing typos.
A followup commit will enable this in tox and ci.
Change-Id: I0a11bcd5a88247a48d3437525fc8a3cb3cdd4e58
This change adds configuration for the pre-commit tool,
follow-up changes will address the remaining issues in a phased
approach to make the reviews simpler.
This is based on the pre-commit config used in nova
with some additional hooks.
Follow-up changes will address the FIXME comments
related to sphinx-lint and codespell, as well as update tox
to enforce these checks in ci.
Change-Id: I87681a19f7fa88366c2b0d310c8b3153aa6a137b
The datetime.utcnow() is deprecated in Python 3.12.
Replace datetime.utcnow() with oslo_utils.timeutils.utcnow().
This bumps oslo.utils to 7.0.0.
Change-Id: Icccbb0549add686a744a72b354932471cbf91c92
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
At the moment, Watcher can use a single bare metal provisioning
service: Openstack Ironic.
We're now adding support for Canonical's MAAS service [1], which
is commonly used along with Juju [2] to deploy Openstack.
In order to do so, we're building a metal client abstraction, with
concrete implementations for Ironic and MAAS. We'll pick the MAAS
client if the MAAS url is provided, otherwise defaulting to Ironic.
For now, we aren't updating the baremetal model collector since it
doesn't seem to be used by any of the existing Watcher strategy
implementations.
[1] https://maas.io/docs
[2] https://juju.is/docs
Implements: blueprint maas-support
Change-Id: I6861995598f6c542fa9c006131f10203f358e0a6
The "vm workload consolidation" strategy is summing up instance
usage in order to estimate host usage.
The problem is that some infrastructure services (e.g. OVS or Ceph
clients) may also use a significant amount of resources, which
would be ignored. This can impact Watcher's ability to detect
overloaded nodes and correctly rebalance the workload.
This commit will use the host metrics, if available. The proposed
implementation uses the maximum value between the host metric
and the sum of the instance metrics.
Note that we're holding a dict of host metric deltas in order to
account for planned migrations.
Change-Id: I82f474ee613f6c9a7c0a9d24a05cba41d2f68edb
The "cpu_util" metric has been deprecated a few years ago.
We'll obtain the same result by converting the cumulative cpu
time to a percentage, leveraging the rate of change aggregation.
Change-Id: I18fe0de6f74c785e674faceea0c48f44055818fe
There may be no available metrics for instances that are stopped
or were recently spawned. This makes retries unnecessary and time
consuming.
For this reason, we'll ignore gnocchi MetricNotFound errors.
Change-Id: I79cd03bf04db634b931d6dfd32d5150f58e82044
We're adding a few info log messages in order to trace the
"vm consolidation" strategy more easily.
Change-Id: I8ce1a9dd173733f1b801839d3ad0c1269c4306bb
Although Watcher supports cold migrations, the vm workload
consolidation workflow only allows live migrations to be
performed.
We'll remove this unnecessary limitation so that stopped instances
could be cold migrated.
Change-Id: I4b41550f2255560febf8586722a0e02045c3a486
The Nova collector json schema validation started [1][2] failing after
the jsonschema upper constraint was bumped from 4.17.3 to 4.19.1 [3].
The reason is that jsonschema v4.18.0a1 switched to a reference
resolving library [4], which treats the aggregate "id" as a jsonschema
id and expects it to be a string [5]. For this reason, we're now getting
AttributeError exceptions.
As a workaround, we'll rename the "id" ref element as "host_aggr_id".
Also, the watcher-tempest-multinode job is configured to use Focal,
which is no longer supported by Devstack [6]. That being considered,
we'll switch to Ubuntu Jammy (22.04).
While at it, we're disabling Cinder Backup, which isn't used while
testing Watched. It currently causes Devstack failures since it
uses the Swift backend by default, which is disabled.
[1] https://paste.opendev.org/raw/bjQ1uIdbDMnmA1UEhxLL/
[2] https://paste.opendev.org/raw/bNgxqulBwBLYB7tNhrU4/
[3] ab0dcbdda2
[4] https://github.com/python-jsonschema/jsonschema/releases/tag/v4.18.0a1
[5] c23a5dc1c9/referencing/jsonschema.py (L54-L55C18)
[6] https://paste.openstack.org/raw/bSoSyXgbtmq6d9768HQn/
Change-Id: I300620c2ec4857b1e0d402a9b57a637f576eeb24
>>> random.sample([5,10], 1.3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.6/random.py", line 321, in sample
result = [None] * k
TypeError: can't multiply sequence by non-int of type 'float'
Change-Id: Ifa5dca06f07220512579e4fe3c5c741aeffc71cc
Block Storage API v2 was deprecated during Pike cycle and is being
removed during Xena cycle, and current v3 API should be used instead.
Change-Id: Ia5247742b31f5f07186ef908588f0972d3ac609f
Python introduced http.HTTPStatus since version 3.5,
and Wallaby has targeted a minimum version of python 3.6.
Change-Id: I45f732f0f59b8fae831bb6c07f4fdd98cdd7409a
Implements base method as well as some basic implementations to
retrieve time series metrics. Ceilometer can not be supported
as API documentation has been unavailable. Grafana will be
supported in follow-up patch.
Partially Implements: blueprint time-series-framework
Change-Id: I55414093324c8cff379b28f5b855f41a9265c2d3
The repo is Python 3 now, so update hacking to version 3.0 which
supports Python 3.
Fix problems found.
Update local hacking checks for new flake8.
Remove hacking and friends from lower-constraints, they are not needed
to be installed at run-time.
Change-Id: Ia6af344ec8441dc98a0820176373dcff3a8c80d5
This patchset added a new audit type: event,
and the handler to execute event audit.
Partially Implements: blueprint event-driven-optimization-based
Change-Id: I287471ee4d1dcc42af7a6bcc15f8509d4ce73072
We have provided functions to get used and free resources in
class ModelRoot. So strategies can invoke the functions to
get used and free resources.
Change-Id: I3c74d56539ac6c6eb16b0d254a76260bc791567c
Use the general purpose threadpool when building the nova compute
data model. Additionally, adds thorough explanation about theory of
operation.
Updates related test cases to better ensure the correct operation
of add_physical_layer.
Partially Implements: blueprint general-purpose-decision-engine-threadpool
Change-Id: I53ed32a4b2a089b05d1ffede629c9f4c5cb720c8