Currently we are using `instance` label to query about host metrics to
prometheus. This label is assigned to the url of each endpoint being
scrapped.
While this work fine in one-exporter-per-compute cases as the driver is
mapping the fqdn_label value to the `instance` label value, it fails
when there are more that one target with the same value for the fqdn
label. This is a valid case, to be able to query by fqdn and do not
care about what exporter in the host is providing the metric.
This patch is changing the queries we use for hosts to be based on the
fqdn_label instead of the instance one. To implement it, we are also
simplifying the way we check the metric exist for the host by converting
prometheus_fqdn_instance_map into a prometheus_fqdn_labels set
which stores the list of fqdn found in prometheus.
Closes-Bug: #2103451
Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e
In order to support vm_workload_consolidation, workload_balance and
workload_stabilization strategis some instance metrics are required.
This patch is adding support for them.
Implementation is based on a prometheus store populated using sg-core
from ceilometer metrics with Pollster source.
- instance_ram_usage: rely on ceilometer_memory_usage metrics created from
ceilometer memory.usage meter.
- instance_ram_allocated: rely on the memory value provided by the
inventory created from nova and placement APIs.
- instance_cpu_usage: rely on ceilometer_cpu metric created from
ceilometer cpu meter. A max value of 100 is set in the query.
- instance_root_disk_size: rely on the `disk` value provided by the
inventory created from nova and placement APIs.
A new parameterer `instance_uuid_label` has been added to the prometheus
datasource configuration to identify the label used to store the value of the
OpenStack instance uuid for eache instance metric in prometheus. Default
value is `resource`.
Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2
This adds a new data source for the Watcher decision engine that
implements the watcher.decision_engine.datasources.DataSourceBase.
related spec was merged at [1].
Implements: blueprint prometheus-datasource
[1] https://review.opendev.org/c/openstack/watcher-specs/+/933300
Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906
This datasource requires Ceilometer API which was already removed some
years ago. The implementation should have been removed when dependency
on ceilometerclient was removed by [1].
Also remove some job definitions which are not actually used.
[1] 01d74d0a87
Change-Id: I29c3865dc1207f1bbbb266e4217cf8888afebfb6
This change corrects the detected sphinx-linit issue in the existing
docs and updates the contributor devstack guide to call out
required and advanced.
mostly the changes were simple fixes like replacing the configurable
default rule with explict literal syntax `term` -> ``term``
some inline Note: comments have been promoted to .. note:: blocks
and literal blocks :: have been promoted to .. code-block:: <language>
directives.
Change-Id: I6320c313d22bf542ad407169e6538dc6acf79901
This chanage enabled codespell in precommit and
fixes the existing typos.
A followup commit will enable this in tox and ci.
Change-Id: I0a11bcd5a88247a48d3437525fc8a3cb3cdd4e58
This change adds configuration for the pre-commit tool,
follow-up changes will address the remaining issues in a phased
approach to make the reviews simpler.
This is based on the pre-commit config used in nova
with some additional hooks.
Follow-up changes will address the FIXME comments
related to sphinx-lint and codespell, as well as update tox
to enforce these checks in ci.
Change-Id: I87681a19f7fa88366c2b0d310c8b3153aa6a137b
The "vm workload consolidation" strategy is summing up instance
usage in order to estimate host usage.
The problem is that some infrastructure services (e.g. OVS or Ceph
clients) may also use a significant amount of resources, which
would be ignored. This can impact Watcher's ability to detect
overloaded nodes and correctly rebalance the workload.
This commit will use the host metrics, if available. The proposed
implementation uses the maximum value between the host metric
and the sum of the instance metrics.
Note that we're holding a dict of host metric deltas in order to
account for planned migrations.
Change-Id: I82f474ee613f6c9a7c0a9d24a05cba41d2f68edb
The "cpu_util" metric has been deprecated a few years ago.
We'll obtain the same result by converting the cumulative cpu
time to a percentage, leveraging the rate of change aggregation.
Change-Id: I18fe0de6f74c785e674faceea0c48f44055818fe
Since installing watcher dashboard is fixed in devstack deployments
we can update documentation so it recommends to install dashboard
plugin.
Change-Id: I284a1ec31536ea258cc1979ffd46b22d3e1ac18b
As per the community goal of migrating the policy file
the format from JSON to YAML[1], we need to do two things:
1. Change the default value of '[oslo_policy] policy_file''
config option from 'policy.json' to 'policy.yaml' with
upgrade checks.
2. Deprecate the JSON formatted policy file on the project side
via warning in doc and releasenotes.
Also replace policy.json to policy.yaml ref from doc and tests.
[1]https://governance.openstack.org/tc/goals/selected/wallaby/migrate-policy-format-from-json-to-yaml.html
Change-Id: I207c02ba71fe60635fd3406c9c9364c11f259bae
Switch to openstackdocstheme 2.2.1 and reno 3.1.0 versions. Using
these versions will allow especially:
* Linking from HTML to PDF document
* Allow parallel building of documents
* Fix some rendering problems
Update Sphinx version as well.
Set openstackdocs_pdf_link to link to PDF file. Note that
the link to the published document only works on docs.openstack.org
where the PDF file is placed in the top-level html directory. The
site-preview places the PDF in a pdf directory.
Set openstackdocs_auto_name to False to use 'project' variable as name.
Change pygments_style to 'native' since old theme version always used
'native' and the theme now respects the setting and using 'sphinx' can
lead to some strange rendering.
Remove docs requirements from lower-constraints, they are not needed
during install or test but only for docs building.
openstackdocstheme renames some variables, so follow the renames
before the next release removes them. A couple of variables are also
not needed anymore, remove them.
See also
http://lists.openstack.org/pipermail/openstack-discuss/2020-May/014971.html
Change-Id: Ia9a3fb804fb59bb70edc150a3eb20c07a279170b
Error when importing wsmeext.sphinxext
Could not import extension wsmeext.sphinxext
(exception: cannot import name 'l_')
Change-Id: Id23c9c1fd35153d67d4ffb50dc1cd40f30b7ab41
Sphinx 3.0.0 breaks the building here, block it for now.
Depends-On: https://review.opendev.org/#/c/717949/
Change-Id: Ibf0c93ea79fec647fbf749257835f1fa99d5f59d
Documentation with details on general concurrency as well as OpenStack
specific libraries.
Describes how different libraries are effectively used across different
Watcher services. This includes describing how futurist is used in the
Decision Engine and how taskflow is used in the Applier.
Finally, this documentation describes how contributors can use the
new DecisionEngineThreadpool effectively and includes examples.
https://docs.openstack.org/futurist/latest/https://docs.openstack.org/taskflow/latest/
Change-Id: Ic1cd1f3733a0e9a239c9b8d49951e1e4ece49f3a
Partially Implements: blueprint general-purpose-decision-engine-threadpool
Sphinx 1.8 introduced [1] the '--keep-going' argument which, as its name
suggests, keeps the build running when it encounters non-fatal errors.
This is exceptionally useful in avoiding a continuous edit-build loop
when undertaking large doc reworks where multiple errors may be
introduced.
[1] https://github.com/sphinx-doc/sphinx/commit/e3483e9b045
Change-Id: If2bbfd8ae6d1fc75cbc494578310c1dc03c367e6
Add a new pdf-docs environment to enable PDF build.
sphinxcontrib-svg2pdfconverter is used to handle SVG properly.
Change-Id: I1563579486da8912ba8a220bb08a5331e7df910b
The api documentation is now published on docs.openstack.org instead
of developer.openstack.org. Update all links that are changed to the
new location.
Note that redirects will be set up as well but let's point now to the
new location.
For details, see:
http://lists.openstack.org/pipermail/openstack-discuss/2019-July/007828.html
Change-Id: I4101eced9c4bd26741f760e5651204f5d2dfea0f
This patch does two things:
1. replace instance's human_id with name.
2. remove ComputeNode human_id.
Now name field in Watcher Compute Data Model is availible.
Use name is better than human_id. For the reason, please see[1].
[1]. https://bugs.launchpad.net/watcher/+bug/1833665
Change-Id: I04f40e7d2a2bda48e9a362f9d0b23f449c40324e
Ceilometer removed cpu_util metric in [1].
Another metric compute.node.cpu.percent need to set
compute_monitors option to cpu.virt_driver in the
nova.conf, we should remind user about these.
[1]: https://review.opendev.org/#/c/580709/
Change-Id: I89306ef7c26fa2927945bd4f3ee88b670511d147
Now there are some errors when running apidoc,
actually we don't need apidoc, so remove it.
Closes-Bug: #1831515
Change-Id: I3b91a2c05ed62ae7bbd30a29e9db51d0e021410f
The [nova_client]/api_version defaults to 2.56 since
change Idd6ebc94f81ad5d65256c80885f2addc1aaeaae1. There
is compatibility code for that change but if 2.56 is
not available watcher_non_live_migrate_instance will
still fail if a destination host is used.
Since 2.56 has been available since the Queens version of
nova it should be reasonable to require at least that
version of nova is running for using Watcher.
This adds code which enforces the minimum version along
with a release note and "watcher-status upgrade check"
check method.
Note that it's kind of weird for watcher to have a config
option like nova_client.api_version since compute API
microversions are per API request even though novaclient
is constructed with the single configured version. It should
really be something the client (watcher in this case) determines
using version discovery and gracefully enables features if
the required nova API version is available, but that's a bigger
change.
Change-Id: Id34938c7bb8a5ca934d997e52cac3b365414c006