Nova changed the default notification_format from "both" to
"unversioned" in Train [1]. Without configuring nova in the
grenade job we are not testing the nova versioned notification
handler code during upgrades.
Note that grenade only runs stack.sh on the base (old) side so
this change has to depend on a devstack stable/stein change to
add the NOVA_NOTIFICATION_FORMAT variable that we override.
Closes-Bug: #1831917
Depends-On: Ied9d50b07c368d5c2be658c744f340a8d1ee41e0
[1] https://review.opendev.org/603079/
Change-Id: I94c2d14477da185310e0fec596a1ad6436b802f1
Nova used to emit versioned and unversioned notiifcations
by default but that changed in https://review.opendev.org/603079/
so now nova emits only unversioned notifications by default.
Watcher listens for versioned notifications so we need to configure
nova to emit both versioned (for Watcher) and unversioned
(for Ceilometer) notifications explicitly.
This adds an override-defaults file so devstack will load up
the nova devstack variable to set the notification_format before
importing and stacking the nova lib script.
Note that this only fixes the non-grenade CI jobs since grenade
requires separate handling for overriding defaults which is proving
hard to do and will be addressed in a separate change.
Partial-Bug: #1831917
Change-Id: I7e441608b38338eecd80e663ed3abe66a89e504f
Ceilometer removed cpu_util metric in [1].
Another metric compute.node.cpu.percent need to set
compute_monitors option to cpu.virt_driver in the
nova.conf, we should remind user about these.
[1]: https://review.opendev.org/#/c/580709/
Change-Id: I89306ef7c26fa2927945bd4f3ee88b670511d147
This implements the configuration parameters to implement
Grafana as a datasource including the influxdb translator
Change-Id: If1f27dc01e853c5b24bdb21f1e810f64eaee2e5c
Partially-implements: blueprint grafana-proxy-datasource
Replaces the NoSuchMetric exception that was replaced. The exception
is replaced with MetricNotAvailable and test cases are added to prevent
regression.
The changes in the exceptions were introduced in:
https://review.opendev.org/#/c/658127/
Change-Id: Id0f872e916aaa5dec59ed1ae6c0f653553b3fe46
In get_node_by_instance_uuid, an exception ComputeNodeNotFound
will be thrown if can't find a node through instance uuid.
But the exception information replaces the node name with
instance uuid, which is misleading, so we define a new exception.
Closes-Bug: #1832156
Change-Id: Ic6c44ae44da7c3b9a1c20e9b24a036063af266ba
Moves the query_retry method into the baseclass and makes the query
retry and timeout options part of the watcher_datasources config group.
This makes the query_retry behavior uniform across all datasources.
A new baseclass method named query_retry_reset is added so datasources
can define operations to perform when recovering from a query error.
Test cases are added to verify the behavior of query_retry.
The query_max_retries and query_timeout config parameters are
deprecated in the gnocchi_client group and will be removed in a future
release.
Change-Id: I33e9dc2d1f5ba8f83fcf1488ff583ca5be5529cc
We should be starting from stable/stein on the "old" side
of grenade runs now. Rather than hard-code the branch, just
use the BASE_DEVSTACK_BRANCH variable.
Change-Id: I1b0406f870ed0ae5622cfa7421a6cca00d0f891c
When receiving Nova notification instance.create.end,
map instance to its node after adding instance to datamodel.
Related-Bug: #1832156
Change-Id: I6f39e8d935195c611f668f71590e1d9ff52ced0d
In Python, when we use @property, the method will be
decorated by property.
When we call method self.strategy.datasource_backend()[1],
Actually it did two things:
1. call self.strategy.datasource_backend()
2. according to the method's return value[2], call self._datasource_backend()
[1]. https://github.com/openstack/watcher/blob/bd8636f3f/watcher/tests/decision_engine/strategy/strategies/test_base.py#L87
[2]. https://github.com/openstack/watcher/blob/bd8636f3f/watcher/decision_engine/strategy/strategies/base.py#L368
But in this part, we just want it to perform the first step.
So we have to use self.strategy.datasource_backend instead of
self.strategy.datasource_backend()
The reason why the unittest does not report an error is
because the returned value is a mock object, and the second step
is executed without error, for example:
python -m unittest watcher.tests.decision_engine.strategy.strategies.test_base
(Pdb) x=self.strategy.datasource_backend
(Pdb) type(x)
<class 'mock.mock.MagicMock'>
(Pdb) x
<MagicMock name='DataSourceManager().get_backend()' id='139740418102608'>
(Pdb) x()
<MagicMock name='DataSourceManager().get_backend()()' id='139740410824976'>
(Pdb) self.strategy.datasource_backend()
<MagicMock name='DataSourceManager().get_backend()()' id='139740410824976'>
To make the tests more robust, the underlying backend function
is mocked to be not callable.
Co-Authored-By: Matt Riedemann <mriedem.os@gmail.com>
Change-Id: I3305d9afe8ed79e1dc3affe02ba067ac06cece42
This patch added Placement to Watcher
We plan to improve the data model and strategies in
the future specs.
Change-Id: I7141459eef66557cd5d525b5887bd2a381cdac3f
Implements: blueprint support-placement-api
This makes the ConfFixture extend the Config fixture from
oslo.config which handles cleanup for us. The module level
import_opt calls are also removed since they are no longer
needed.
Change-Id: I869e89c53284c8da45e0b1293f2d35011f5bfbf9
In the process of creating an instance, Nova will emit an
instance.update notification with 'building' state.
This will cause a KeyError exception because this instance
isn't in Watcher datamodel.
So we should ignore the notification instance.update with
'building' state.
Closes-Bug: #1832154
Change-Id: I950eec50d2cee38bd22c47a70ae6f88bbf049080
Now there are some errors when running apidoc,
actually we don't need apidoc, so remove it.
Closes-Bug: #1831515
Change-Id: I3b91a2c05ed62ae7bbd30a29e9db51d0e021410f
The get_compute_node_by_hostname method is given a
compute service hostname and then does two queries to
find the matching hypervisor (compute node) with details:
1. List hypervisors with details and find the one that
matches the given compute service hostname.
2. Using that node, search for hypervisors with the
matching hypervisor_hostname.
There are two issues here:
1. The first query is inefficient in that it has to list
all hypervisors in the deployment to try and match the
one with the compute service hostname client side.
2. The second query is a fuzzy match on the server side [1]
so even though we have matched on the node we want,
get_compute_node_by_name can still return more than
one hypervisor which will result in the helper method
raising ComputeNodeNotFound. Consider having compute
hosts with names compute1, compute10, compute11, compute100,
and so on. The fuzzy match on compute1 would return all of
those hypervisors.
For non-ironic nodes in nova, the compute service host and
hypervisor should be 1:1, meaning the hypervisor.service['host']
should be the same as hypervisor.hypervisor_hostname. Knowing
this, we can simplify the code to search just on the given
compute service hostname and if we get more than one result, it
is because of the fuzzy match and we can then do our client-side
filtering on the compute service hostname.
[1] https://github.com/openstack/nova/blob/d4f58f5eb/nova/db/sqlalchemy/api.py#L676
Change-Id: I84f387982f665d7cc11bffe8ec390cc7e7ed5278
The nova CDM builder code and notification handling
code had some inefficiencies when it came to looking
up a hypevisor to get details. The general pattern
used before was:
1. get the minimal hypervisor information by hypervisor_hostname
2. make another query to get the hypervisor details by id
In the notifications case, it was actually three calls because
the first is listing hyprvisors to filter client-side by service
host.
This change collapses 1 and 2 above into a single API call
to get the hypervisor by hypervisor_hostname with details
which will include the service (compute) host information
which is what get_compute_node_by_id() was being used for.
Now that nothing is using get_compute_node_by_id it is removed.
There is more work we could do in get_compute_node_by_hostname
if the compute API allowed filtering hypervisors by service
host so a TODO is left for that.
One final thing: the TODO in get_compute_node_by_hostname about
there being more than one hypervisor per compute service host
for vmware vcenter is not accurate - nova's vcenter driver
hasn't supported a host:node 1:M topology like that since the
Liberty release [1]. The only in-tree driver in nova that supports
1:M is the ironic baremetal driver, so the comment is updated.
[1] Ifc17c5049e3ed29c8dd130339207907b00433960
Depends-On: https://review.opendev.org/661785/
Change-Id: I5e0e88d7b2dd1a69117ab03e0e66851c687606da
Some of the methods for retrieving data about instances was placed
at the bottom of nova_helper instead of being close to the other
instance based methods.
Change-Id: I68475883529610e514aa82f1881105ab0cf24ec3