Commit Graph

669 Commits

Author SHA1 Message Date
m
3f26dc47f2 Add prometheus data source for watcher decision engine
This adds a new data source for the Watcher decision engine that
implements the watcher.decision_engine.datasources.DataSourceBase.

related spec was merged at [1].

Implements: blueprint prometheus-datasource

[1] https://review.opendev.org/c/openstack/watcher-specs/+/933300

Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906
2025-01-10 15:20:37 +02:00
Sean Mooney
5fadd0de57 [pre-commit] Fix execute and shebang lines
This commit removes the execute bit from several files
and remove the shebang lines from the devstack plugin.

While the devstack plugin is written in bash, it is not an executable
script. The devstack plugin is sourced by devstack as needed,
as such it is not executed in a subshell and the #!/bin/bash
lines are not used even when present.

Change-Id: I82ca22b7a47bf267fe6cf11f3e3519510108c146
2024-11-07 20:12:59 +00:00
Sean Mooney
5f79ab87c7 [pre-commit] fix typos and configure codespell
This chanage enabled codespell in precommit and
fixes the existing typos.

A followup commit will enable this in tox and ci.

Change-Id: I0a11bcd5a88247a48d3437525fc8a3cb3cdd4e58
2024-11-07 19:50:21 +00:00
Sean Mooney
9d8b990fd1 [pre-commit] Add initial pre-commit config
This change adds configuration for the pre-commit tool,
follow-up changes will address the remaining issues in a phased
approach to make the reviews simpler.

This is based on the pre-commit config used in nova
with some additional hooks.

Follow-up changes will address the FIXME comments
related to sphinx-lint and codespell, as well as update tox
to enforce these checks in ci.

Change-Id: I87681a19f7fa88366c2b0d310c8b3153aa6a137b
2024-10-22 20:12:53 +01:00
Takashi Natsume
61a7dd85ca Replace deprecated datetime.utcnow()
The datetime.utcnow() is deprecated in Python 3.12.
Replace datetime.utcnow() with oslo_utils.timeutils.utcnow().
This bumps oslo.utils to 7.0.0.

Change-Id: Icccbb0549add686a744a72b354932471cbf91c92
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2024-10-02 22:24:47 +09:00
Takashi Kajinami
566a830f64 Bump hacking
hacking 3.0.x is quite old. Bump it to the current latest version.

Change-Id: I8d87fed6afe5988678c64090af261266d1ca20e6
2024-09-22 23:54:36 +09:00
Lucian Petrut
c95ce4ec17 Add MAAS support
At the moment, Watcher can use a single bare metal provisioning
service: Openstack Ironic.

We're now adding support for Canonical's MAAS service [1], which
is commonly used along with Juju [2] to deploy Openstack.

In order to do so, we're building a metal client abstraction, with
concrete implementations for Ironic and MAAS. We'll pick the MAAS
client if the MAAS url is provided, otherwise defaulting to Ironic.

For now, we aren't updating the baremetal model collector since it
doesn't seem to be used by any of the existing Watcher strategy
implementations.

[1] https://maas.io/docs
[2] https://juju.is/docs

Implements: blueprint maas-support

Change-Id: I6861995598f6c542fa9c006131f10203f358e0a6
2023-12-11 10:21:33 +00:00
Lucian Petrut
424e9a76af vm workload consolidation: use actual host metrics
The "vm workload consolidation" strategy is summing up instance
usage in order to estimate host usage.

The problem is that some infrastructure services (e.g. OVS or Ceph
clients) may also use a significant amount of resources, which
would be ignored. This can impact Watcher's ability to detect
overloaded nodes and correctly rebalance the workload.

This commit will use the host metrics, if available. The proposed
implementation uses the maximum value between the host metric
and the sum of the instance metrics.

Note that we're holding a dict of host metric deltas in order to
account for planned migrations.

Change-Id: I82f474ee613f6c9a7c0a9d24a05cba41d2f68edb
2023-10-27 21:54:42 +03:00
Zuul
40e93407c7 Merge "Handle deprecated "cpu_util" metric" 2023-10-27 09:47:38 +00:00
Zuul
721aec1cb6 Merge "vm workload consolidation: allow cold migrations" 2023-10-27 09:47:36 +00:00
Zuul
8a3ee8f931 Merge "Improve vm_consolidation logging" 2023-10-27 09:20:13 +00:00
Lucian Petrut
00fea975e2 Handle deprecated "cpu_util" metric
The "cpu_util" metric has been deprecated a few years ago.
We'll obtain the same result by converting the cumulative cpu
time to a percentage, leveraging the rate of change aggregation.

Change-Id: I18fe0de6f74c785e674faceea0c48f44055818fe
2023-10-24 10:47:23 +00:00
Lucian Petrut
fd6562382e Avoid performing retries in case of missing resources
There may be no available metrics for instances that are stopped
or were recently spawned. This makes retries unnecessary and time
consuming.

For this reason, we'll ignore gnocchi MetricNotFound errors.

Change-Id: I79cd03bf04db634b931d6dfd32d5150f58e82044
2023-10-23 14:14:21 +00:00
Lucian Petrut
ec90891636 Improve vm_consolidation logging
We're adding a few info log messages in order to trace the
"vm consolidation" strategy more easily.

Change-Id: I8ce1a9dd173733f1b801839d3ad0c1269c4306bb
2023-10-23 14:10:02 +00:00
Lucian Petrut
7336a48057 vm workload consolidation: allow cold migrations
Although Watcher supports cold migrations, the vm workload
consolidation workflow only allows live migrations to be
performed.

We'll remove this unnecessary limitation so that stopped instances
could be cold migrated.

Change-Id: I4b41550f2255560febf8586722a0e02045c3a486
2023-10-23 13:03:18 +00:00
Lucian Petrut
922478fbda Unblock the CI gate
The Nova collector json schema validation started [1][2] failing after
the jsonschema upper constraint was bumped from 4.17.3 to 4.19.1 [3].

The reason is that jsonschema v4.18.0a1 switched to a reference
resolving library [4], which treats the aggregate "id" as a jsonschema
id and expects it to be a string [5]. For this reason, we're now getting
AttributeError exceptions.

As a workaround, we'll rename the "id" ref element as "host_aggr_id".

Also, the watcher-tempest-multinode job is configured to use Focal,
which is no longer supported by Devstack [6]. That being considered,
we'll switch to Ubuntu Jammy (22.04).

While at it, we're disabling Cinder Backup, which isn't used while
testing Watched. It currently causes Devstack failures since it
uses the Swift backend by default, which is disabled.

[1] https://paste.opendev.org/raw/bjQ1uIdbDMnmA1UEhxLL/
[2] https://paste.opendev.org/raw/bNgxqulBwBLYB7tNhrU4/
[3] ab0dcbdda2
[4] https://github.com/python-jsonschema/jsonschema/releases/tag/v4.18.0a1
[5] c23a5dc1c9/referencing/jsonschema.py (L54-L55C18)
[6] https://paste.openstack.org/raw/bSoSyXgbtmq6d9768HQn/

Change-Id: I300620c2ec4857b1e0d402a9b57a637f576eeb24
2023-10-23 09:21:55 +03:00
BubaVV
0610070e59 Add timeout option for Grafana request
Implemented config option to setup Grafana API request timeout

Change-Id: I8cbf8ce22f199fe22c0b162ba1f419169881f193
2023-08-23 17:46:19 +03:00
chenker
4ea3eada3e Fix watcher comment
Change-Id: I4512cf1032e08934886d5e3ca858b3e05c3da76c
2023-08-13 00:00:12 +00:00
chenker
52da088011 Modify saving_energy log info
Change-Id: I84879a453aa3ff78917d1136c62978b9d0e606de
2023-02-07 10:20:04 +00:00
chenker
6dd2f2a9c1 BugFix: Prevent float type variables from being passed to random
>>> random.sample([5,10], 1.3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/random.py", line 321, in sample
    result = [None] * k
TypeError: can't multiply sequence by non-int of type 'float'

Change-Id: Ifa5dca06f07220512579e4fe3c5c741aeffc71cc
2021-08-23 01:58:52 +00:00
Takashi Kajinami
a993849928 Use Block Storage API v3 instead of API v2
Block Storage API v2 was deprecated during Pike cycle and is being
removed during Xena cycle, and current v3 API should be used instead.

Change-Id: Ia5247742b31f5f07186ef908588f0972d3ac609f
2021-07-27 11:04:16 +09:00
sue
c28756c48b use HTTPStatus instead of direct code
Python introduced http.HTTPStatus since version 3.5,
and Wallaby has targeted a minimum version of python 3.6.

Change-Id: I45f732f0f59b8fae831bb6c07f4fdd98cdd7409a
2021-07-09 11:02:36 +02:00
zhufl
204b276693 Fix missing self argument in instances_no_attached
instances_no_attached should have self as the first argument, this is
to add it.

Change-Id: I010d9d1e9ddb8790c398bcf06d0772a0d17f57ec
2020-11-27 17:01:52 +08:00
zhufl
af02bebca9 Fix parameter passed to IronicNodeNotFound exception
IronicNodeNotFound expects uuid parameter for the error message,
not name.

Change-Id: I9fefa98fa9fe6f6491e5f621190cac7d376db6c9
2020-11-02 15:48:27 +08:00
xuanyandong
16a0486655 Remove six
Replace the following items with Python 3 style code.

- six.string_types
- six.integer_types
- six.moves
- six.PY2

Implements: blueprint six-removal

Change-Id: I2a0624bd4b455c7e5a0617f1253efa05485dc673
2020-09-30 16:25:13 +08:00
Dantali0n
cca0d9f7d7 Implements base method for time series metrics
Implements base method as well as some basic implementations to
retrieve time series metrics. Ceilometer can not be supported
as API documentation has been unavailable. Grafana will be
supported in follow-up patch.

Partially Implements: blueprint time-series-framework

Change-Id: I55414093324c8cff379b28f5b855f41a9265c2d3
2020-08-26 16:01:15 +02:00
licanwei
9f0138e1cf Check if scope is None
if scope is None, don't create data model

Change-Id: Icf611966c9b0a3882615d778ee6c72a8da73841d
Closed-Bug: #1881920
2020-06-18 00:58:16 +00:00
zhangbailin
f0f15f89c6 Remove future imports
These particular imports are no longer needed in a Python 3-only world.

Change-Id: I5e9e15556c04871c451f6363380f2a7ac026c968
2020-05-02 00:33:39 +00:00
chenke
0ef0f165cb Remove six[7]
Since our code will only support py3. So remove six is necessary.

Change-Id: I3738118b1898421ee41e9e2902c255ead73f3915
2020-04-22 15:59:15 +08:00
Andreas Jaeger
1bb2aefec3 Update hacking for Python3
The repo is Python 3 now, so update hacking to version 3.0 which
supports Python 3.

Fix problems found.

Update local hacking checks for new flake8.

Remove hacking and friends from lower-constraints, they are not needed
to be installed at run-time.

Change-Id: Ia6af344ec8441dc98a0820176373dcff3a8c80d5
2020-04-02 07:50:02 +02:00
Zuul
8835576374 Merge "Add audit type: event" 2020-01-10 03:30:03 +00:00
zhufl
db709691be Fix duplicated words issue like "an active instance instance"
This is to fix the duplicated words issue like
"Pick up an active instance instance to migrate".

Change-Id: I74de4eb06aa1e462f0b499e3fd62a7cdc7570b31
2020-01-06 15:29:25 +08:00
licanwei
6a173a9161 Add audit type: event
This patchset added a new audit type: event,
and the handler to execute event audit.

Partially Implements: blueprint event-driven-optimization-based

Change-Id: I287471ee4d1dcc42af7a6bcc15f8509d4ce73072
2019-12-13 15:14:41 +08:00
Zuul
b7baa88010 Merge "Use threadpool when building compute data model" 2019-11-30 02:23:51 +00:00
Zuul
65ec309050 Merge "General purpose threadpool for decision engine" 2019-11-30 02:22:13 +00:00
licanwei
4a269ba039 Change self.node to self.nodes in model_root
networkx removed G.node in version 2.4[1]
G.node was replaced by G.nodes since version 2.0[2],
and supports Python 2.7, 3.5, 3.6 and 3.7 from 2.2
so the lower constraint version is 2.2.
lib task_flow also invokes lib networkx,
task_flow version is also needed to be updated.
[1]: https://networkx.github.io/documentation/stable/release/release_2.4.html
[2]: https://networkx.github.io/documentation/stable/release/release_2.0.html
Change-Id: I268bcf57ec977bd8132a9f1573b28b681cb4ce1e
Closes-Bug: #1854132
2019-11-27 17:19:29 +08:00
licanwei
689ae25ef5 Refactoring the codes about getting used and free resources
We have provided functions to get used and free resources in
class ModelRoot. So strategies can invoke the functions to
get used and free resources.

Change-Id: I3c74d56539ac6c6eb16b0d254a76260bc791567c
2019-11-12 16:22:09 +08:00
Dantali0n
c644e23ca0 Use threadpool when building compute data model
Use the general purpose threadpool when building the nova compute
data model. Additionally, adds thorough explanation about theory of
operation.

Updates related test cases to better ensure the correct operation
of add_physical_layer.

Partially Implements: blueprint general-purpose-decision-engine-threadpool

Change-Id: I53ed32a4b2a089b05d1ffede629c9f4c5cb720c8
2019-11-01 13:44:15 +01:00
Dantali0n
2b6ee38327 General purpose threadpool for decision engine
Implements the singleton general purpose threadpool for the decision
engine and associated tests.

A threadpool is a collection of one or more threads typically called
'workers' to which tasks can be submitted. These submitted tasks will
be scheduled by the threadpool and subsequently executed. How many
tasks will be executed concurrently is managed by the underlying
threadpool and its configuration. In Python the submission of tasks
to a threadpool returns an object called a 'future'. Futures provide
a method to interface with the task being executed that allows to
retrieve information about its state. Such as if it currently is being
executed, if it is waiting on a condition and if it has completed
succesfully. Finally, futures allow to retrieve what has been returned
by the submitted task.

In the case of most OpenStack projects instead of interfacing with native
Python concurrency the futurist library is used. This library provides
very similar interfaces to native concurrency with some extras such as
the wait_for_any method.

For more information about futurist or Python concurrency the following
references can be consulted:
https://docs.python.org/3/library/concurrent.futures.html
https://docs.openstack.org/futurist/latest/reference/index.html#executors

Partially Implements: blueprint general-purpose-decision-engine-threadpool

Change-Id: I94bd9a17290967f011762f2b9c787ee7c46ff930
2019-11-01 11:33:59 +01:00
licanwei
f685bf62ab Don't throw exception when missing metrics
When querying data from datasource, it's possible to miss some data.
In this case if we throw an exception, Audit will failed because of
the exception. We should remove the exception and give the decision
to the strategy.

Change-Id: I1b0e6b78b3bba4df9ba16e093b3910aab1de922e
Closes-Bug: #1847434
2019-10-16 21:01:39 -07:00
Zuul
d13adb4e3c Merge "Fix damodel list return None error When has a compute node" 2019-09-23 08:46:56 +00:00
Zuul
d11f43638c Merge "Fix misspelling" 2019-09-23 06:52:51 +00:00
chenke
2b63a35e86 Fix damodel list return None error When has a compute node
Reason:
When there is a compute node but no virtual machine,
the command 'watcher datamodel list' should display
the information of the compute node instead of return None.

Change-Id: Id5ff7f08ac8a9883af9f0313785b756d813ed5a2
Closes-Bug: #1844948
2019-09-23 14:04:16 +08:00
licanwei
4869449211 Fix misspelling
Depends-on: Ifa21645e19b8311de35cb9b0b8f39371b8713a2f
Depends-on:Ic2ae4cb758eba32f1b1529a24d12a57ca93a2a82
Change-Id: Ia1ea9814b8293e2bed989d700c7a813a97052f04
2019-09-23 04:29:47 +00:00
licanwei
42c1babfa4 skip deleted instance when creating datamodel
Change-Id: Ic2ae4cb758eba32f1b1529a24d12a57ca93a2a82
Closes-Bug: #1844949
2019-09-22 21:26:31 -07:00
Zuul
511a24e011 Merge "Set strategy planner" 2019-09-19 02:25:22 +00:00
licanwei
0559cd7a04 Set strategy planner
The default planner can not create actions with right order,
The node_resouce_consolidation strategy needs to use its
own planner.
Partially Implements: blueprint node-resource-consolidation
Depends-on: I586e67f782e2965234826634ba3ff51681af4df8
Change-Id: I05b02905a3335a73b6926966de6331c632842293
2019-09-17 19:42:32 -07:00
licanwei
0c191a2da9 Get planner from solution
support Strategy select different planner
Implements: bp watcher-planner-selector

Change-Id: I586e67f782e2965234826634ba3ff51681af4df8
2019-09-17 19:36:07 -07:00
Zuul
62020cac30 Merge "Watcher Planner Selector" 2019-09-17 07:42:44 +00:00
Zuul
004a46a98a Merge "Add node resource consolidation planner" 2019-09-17 04:44:21 +00:00