The prometheus datasource was reporting host_ram_usage in MiB as
described in the docstring for the base datasource interface
definition [1].
However, the gnocchi datasource is reporting it in KiB following
ceilometer metric `hardware.memory.used` [2] and the strategies
using that metric expect it to be in KiB so the best approach is
to change the unit in the prometheus datasource and update the
docstring to avoid missunderstandings in future. So, this patch
is fixing the prometheus datasource to return host_ram_usage
in KiB instead of MiB.
Additionally, it is adding more unit tests for the check_threshold
method so that it covers the memory based strategy execution, validates
the calculated standard deviation and adds the cases where it is below
the threshold.
[1] 15981117ee/watcher/decision_engine/datasources/base.py (L177-L183)
[2] https://docs.openstack.org/ceilometer/train/admin/telemetry-measurements.html#snmp-based-meters
Closes-Bug: #2113776
Change-Id: Idc060d1e709c0265c64ada16062c3a206c6b04fa
The workload_balance strategy calculates host metrics based on the
instance metrics and those are the ones used to compare with the
threshold.
Currently, the strategy does not reports the calculated values what
makes difficult to troubleshoot sometimes. This patch is adding a debug
message to log those values.
This patch is also adding a new unit test for filter_destination_hosts
based on ram instead of cpu and adding assertions for the new debug
messages. To implement properly the new test, I had to sligthly modify
the ram usage fixtures used for the workload_balance tests.
Change-Id: Ief5e167afcf346ff53471f26adc70795c4b69f68
This patch adds a table to the strategies page to
show the level of qualification and where the
strategy can be triggered.
Change-Id: I6991566fd5fec3f8bbae06eefa63a8b83a87eed1
Currently, when the prometheus datasource query ceilometer_cpu metric
for instance cpu usage, it aggregates by instance and filter by the
label containing the instance uuid. While this works fine in real
scenarios, where a single metric is provided in a single instance, in
some cases as the CI jobs where metrics are directly injected, leads to
incorrect metric calculation.
We applied a similar fix for the host metrics in [1] but we did not
implement it for instance cpu.
I am also converting the query formatting to the dict format to improve
understability.
[1] https://review.opendev.org/c/openstack/watcher/+/946049
Closes-Bug: #2113936
Change-Id: I3038dec20612162c411fc77446e86a47e0354423
get_disabled_compute_nodes_with_reason defined in host_maintenance
strategy is not used anywhere.
This cr drops the unused method.
Change-Id: I07c0d0b63e00d476511aa8b03c0feab8ec4db95b
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
This is a initial patch towards the eventlet removal in watcher.
It moves cmd scripts that depends on eventlet to a eventlet dir,
where it is always monkey patched.
Change-Id: Ie23caab018fbf68f8c29a0f748c0708b97933b4b
Some services integrations are now classified as experimental
and a warning message will now appear once a client is created
for them. These integrations are not fully tested in CI and
miss a documentation on how they work or should be used.
A release note was added to inform users about the status of
these integrations and related features.
Change-Id: Ib7d0ac0b3e187ae239dfa075fb53a6c0107dff29
Adds a new documentation section that descript which service
integrations are currently supported and their integrations status.
This information is not clear today and will help to cover the lack
of testing and documention about them.
Change-Id: I26b2a2ef5672b78a575a2bdaef3a08d5bbc063bd
Currently, in that case it was failing because watcher tried to create a
name based on a goal automatically and the goal is not defined.
This patch is moving the check for goal specification in the audit
creation call earlier, and if there is not goal defined, it returns an
invalid call error.
This patch is also modifying the existing error for this case to check
the expected behavior.
Closes-Bug: #2110947
Change-Id: I6f3d73b035e8081e86ce82c205498432f0e0fc33
The idea is to adapt zuul.yaml to future test structure where every strategy will be on its own file so now we keep executing everything inside test_execute_strategies but also any other test on any file with tag 'strategy'
Change-Id: I304c858078d35beb1f7b4f1fad4ea8bedde674af
Currently, an actionplan state is set to SUCCEEDED once the execution
has finished, but that does not imply that all the actions finished
successfully.
This patch is checking the actual state of all the actions in the plan
after the execution has finished. If any action has status FAILED, it
will set the state of the action plan as FAILED and will apply the
appropiate notification parameters. This is the expected behavior according
to Watcher documentation.
The patch is also fixing the unit test for this to set the expected
action plan state to FAILED and notification parameters.
Closes-Bug: #2106407
Change-Id: I7bfc6759b51cd97c26ec13b3918bd8d3b7ac9d4e
Add a test for the zone migration strategy using the
with_attached_volume parameter, setting storage_pools but not
compute_nodes. With volumes attached to instances, with these inputs,
the strategy should propose an action plan to migrate volumes and the
instances they are attached to, since Nova, even without the user
passing a destination node for the instances is able to find one.
However, the execution results in an error, since the strategy assumes
that the compute_nodes dict will always be there.
Change-Id: Ifac28b1aab8a0caf77d97e4c19d051e764256674
This change removes all the duplicate fields from the
watcher RequestContext.
It also removes several filed like quota_class and
remote_address that were cargo culted from nova
but never used in watcher when notification support was
added.
Change-Id: Ibf8739d6cd2d4557df6f8de6c780b6f4280b774f
context.user has been deprecated for years
and renamed to user_id
the deprecated field has now been removed so this
change updates our test cases to reflect that.
Change-Id: I120441fb9392c370c57dc63d8c115d8993d25f62
For compute nodes, nova works fine if a destination node is not
specified, so this change makes sure we're not passing None when the
user does not set one to avoid an error.
Partial-Bug: 2108988
Change-Id: Ida1f18b97697c041819e29f935aa5e232848226a
This patch is adding a new unit test to validate the behavior
of the API when trying to create an audit without a goal (whether using
a goal or audit template parameters) and no name is provided.
Related-Bug: https://bugs.launchpad.net/watcher/+bug/2110947
Change-Id: I04df10a8a0eea4509856f2f4b9d11bae24cd563a
This patch is adding a new unit test to check the behavior of the action
plan when one of the actions in it fails during execution.
Note this is to show a bug, and the expected state will be changed in
the fixing patch.
Related-Bug: #2106407
Change-Id: I2f3fe8f4da772a96db098066d253e5dee330101a
Currently, when trying to create an audit which misses a mandatory
parameter watcher returns error 500 instead of 400 which is the
documented error in the API [1] and the appropiate error code for
malformed requests.
This patch catch parameters validation errors according to the json
schema for each strategy and returns error 400. It also fixes the
unit test to validate the expected behavior.
[1] https://docs.openstack.org/api-ref/resource-optimization/#audits
Closes-Bug: #2110538
Change-Id: I23232b3b54421839bb01d54386d4e7b244f4e2a0
Add some tests to show that the zone migration strategy generates
problematic input parameters for actions in some cases when destination
parameters are not passed for instances or volumes.
Change-Id: Idc3af0e6d9d2d5388ff3d152d81e63364758607b
Fix incorrect logging format for multiple variables because of what this
functionality didn't work correctly and some log messages were skipped.
The logging calls require two arguments, but they are passed in a tuple
so it's interpreted as one argument only and it fails as is missing
the second argument.
Closes-Bug: 2110149
Change-Id: I74ed44134b50782c105a0e82f3af34a5fa45d119
Check the debug logs for some methods in the cinder and nova helpers to
reproduce the erros described in bug [1]. The logger is disabled by default,
so the error was being ignored, in order to show the error, the logger
needs to be enabled for the tests in question. The logging was disabled
by allembic configuring logging in [2], so this patch also removes that
logging config to expose the errors.
[1] https://bugs.launchpad.net/watcher/+bug/2110149.
[2] https://github.com/openstack/watcher/blob/master/watcher/db/sqlalchemy/alembic/env.py#L26
Change-Id: I3598ca1d08d260602c392f8a8098821faa53f570
Currently, it is returning http error code 500 instead of 400, which
would be the appropiate code.
A follow-up patch will be sent with the vix and switching the error code
and message.
Related-Bug: #2110538
Change-Id: I35ccbb9cf29fc08e78c4d5f626a6518062efbed3
Currently host maintenance strategy also migrate instances from maintenance
node to watcher_disabled compute nodes.
watcher_disabled compute nodes might be disabled for some other purpose
by different strategy. If host maintenace use those compute nodes for
migration, It might affect customer workloads.
Host maintenance strategy should never touch disabled hosts unless the user
specify a disable host as backup node.
This cr drops the logic for using disabled compute node for maintenance.
Host maintaince is already using nova schedular for migrating the
instance, will use the same. If there is no available node, strategy
will fail.
Closes-Bug: #2109945
Change-Id: If9795fd06f684eb67d553405cebd8a30887c3997
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
Removes the deprecated message executor when creating both RPC
and notification server instances. This parameter is deprecated[1],
as well eventlet option.
When not defined, the server will get the one that fits better the
current context (monkey patched or not)[2]
[1] 27d833e374
[2] 412ab4de92/oslo_messaging/_utils.py (L87)
Change-Id: I784407aa7db10bddcec5dc663e1cec65174631e0
In a recent change [1] we modified the database schema for efficacy
indicators to use a 'data' column. However, that patch only contained
the schema migration and a fallback to be able to read from older
databases, and not any kind of data migration. This change introduces
a migration on load, so whenever an efficacy indicator without a 'data'
column is loaded, the column is populated in the database. The change
also modifies the migration test to verify the procedure works well.
[1] https://review.opendev.org/c/openstack/watcher/+/945199
Change-Id: Ib0621b0e03451faca803018d6a2f3ad657a25fb5