In a recent patch [1], a bug in the zone migration strategy was fixed,
which prevented audits using this strategy to create action plans
with both instance and volume migrations. We documented this limitation,
but forgot to remove the note when fixing this bug.
[1] https://review.opendev.org/c/openstack/watcher/+/952115
Change-Id: I2074f2b911dfcbf44716ff30d8ea35a5046b8520
Signed-off-by: jgilaber <jgilaber@redhat.com>
Removed the "Can Be Triggered from Horizon (UI)"
column and adjusted remaining column widths to
be equal.
Assisted-By: claude-sonnet-4 (Claude Code)
Signed-off-by: Ronelle Landy <rlandy@redhat.com>
Change-Id: I50eef1dee9071eeb532378bd5abcd1d994d299b5
Introduce a new user guide describing how to run continuous audits using
the dummy strategy. The guide covers:
- Overview and state machine
- Creating audits with interval and cron expressions
- Time window constraints (start/end time)
- Monitoring executions and action plan lifecycle
- Managing audits (stop/modify)
- Configuration reference and links to related specs
Closes-Bug: #2120437
Assisted-By: GPT-5 (Cursor)
Assisted-By: claude-sonnet-4 (Claude Code)
Change-Id: I842139271752cedb138e422027020488f22fe248
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
Scenario continuous audit tests is being added
but will not run by default, since not all stable
branches have the zone_migration fixes needed to
make tests stable.
Depends-On: https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954264
Change-Id: I5c49b251a49ee439bad024a1cf2569fcbeb2eaf1
Signed-off-by: Douglas Viroel <viroel@gmail.com>
The version history was not updated in the patch that
bumped the API to 1.6[1]. This patch adds the missing doc
and also sets 1.6 to the maximun API for the latest release.
[1] https://review.opendev.org/c/openstack/watcher/+/955827
Closes-Bug: #2124938
Change-Id: I62473e84415896387fda8ca6d0982f78d2a1a9f1
Signed-off-by: Douglas Viroel <viroel@gmail.com>
When retrieving the list of instances and volumes to propose a
solution, the zone migration strategy can raise an exception for
instance or volume not found, which will make the audit goes to a
failure state. This fix maintains the logic of listing all elements
directly from the client (nova) but now checks if the instance
is already in the model. The storage model check was already fixed
in another patch[1].
[1] cb6fb16097
Closes-Bug: #2098984
Assisted-By: Cursor (claude-3.5-sonnet)
Change-Id: I4c8993f051b797104172047eaae1fe1523eaf7eb
Signed-off-by: Douglas Viroel <viroel@gmail.com>
The Zone Migration strategy was implemented to list all
instances and volumes from clients (nova and cinder) and
check if they exist in the models. But the code is not
properly treating model exceptions, taking audit to a failure
state when the model doesn't have the requested element.
This patch adds unit tests to validate this scenario, which
should be fixed in a follow up change.
The additional check for volumes in the model was recently
added in [1]
[1] cb6fb16097
Related-Bug: #2098984
Assisted-By: Cursor (claude-3.5-sonnet)
Change-Id: Icf1e5d4c83862c848d11dae994842ad0ee62ba12
Signed-off-by: Douglas Viroel <viroel@gmail.com>
The unit tests were mocking part of the Zone Migration strategy class,
which could hide possible bugs. This patch removes this mocking, leaving
mocked only other classes that are used by the zone migration one.
Additionally, it includes improved suggestions as follow-up from the
review of previous patches, like more explicit comments and additional
asserts of mocked functions.
Assisted-By: Cursor (Claude-4-sonnet)
Change-Id: Ie1894311b0e384ab52b1b3dfe0eb50618eef6c9f
Signed-off-by: jgilaber <jgilaber@redhat.com>
When only running volume migrations, a zone migration
strategy audit without setting compute_nodes should work.
Before this change, an audit with defined storage_pools,
no compute_nodes parameters, and with_attached_volume is set to True
would trigger the migration of the instances attached to the volumes
being migrated.
This patch decouples instance and volume migrations unless the user
explicitely asks for both. When migrating attached volumes, the zone
migration strategy will check for which instances should be migrated
according to the audit parameters, and if the instance the volume is
attached to can be migrated, it will be just after the volume.
On the other hand, when the attached instances should not be migrated
according to user input, only the volumes will be migrated.
In an audit that migrates instnaces but not volumes, the
with_attached_volume parameter will continue doing nothing.
Closes-Bug: 2111429
Change-Id: If641af77ba368946398f9860c537a639d1053f69
Signed-off-by: jgilaber <jgilaber@redhat.com>
Currently, when an audit with strategy zone_migration has added at least
one volume_migration action, it will not process the instances
migrations according to the definition of the `compute_nodes` parameter.
This behavior is unexpected according to the documentation of the
strategy.
This patch is fixing that behavior and making sure that not duplicated
actions are added to the solution, to handle the case where instances
migration actions are created when analyzing the volumes if the
`with_attached_volume` parameter is enabled. The patch is also removing
the method `instances_no_attached` which is not longer used.
Finally, it's adding some unit tests for the new method and fixing the
ones to cover the mixed instances and volumes migration situation.
Closes-Bug: #2109722
Change-Id: Ief7386ab448c2711d0d8a94a77fa9ba189c8b7d2
Signed-off-by: jgilaber <jgilaber@redhat.com>
Currently, unit tests for zone_migration strategy do not include any
test for instances and volumes mixed, which is currently not working as
expected.
This patch is adding two new tests which include both compute_nodes and
storage_pools in audit configuration. One of them is also setting
with_attached_volume option.
These tests will be fixed to validate the expected behavior of the
strategy in the fixing patch.
Related-Bug: #2109722
Change-Id: I496ce3e1f21b7a4165aa47d5862cf0497be79487
Signed-off-by: jgilaber <jgilaber@redhat.com>
Despite having the src_type paremeter for the storage_pool dictionary as
a mandatory parameter, the value is not being used to filter the volumes
to migrate, using only 'src_pool'.
This change makes 'src_type' optional, since it was ignored until this
point, making it optional keeps the same behaviour by default. If
'src_type' is in the audit parameters, the strategy uses both 'src_pool' and
'src_type' to filter the volumes to migrate.
Closes-Bug: 2111507
Change-Id: Id83a96de85ada1ae6c0e25f8b7fcf54034604911
Signed-off-by: jgilaber <jgilaber@redhat.com>
Add file to the reno documentation build to show release notes for
stable/2025.2.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.2.
Sem-Ver: feature
Change-Id: I21fd5f9a613e5e2ee81ae4fe34165f3f4a6ae479
Signed-off-by: OpenStack Release Bot <infra-root@openstack.org>
Generated-By: openstack/project-config:roles/copy-release-tools-scripts/files/release-tools/add_release_note_page.sh
CORS middleware needs to be added to api pipeline to support
Cross-Origin Resource Sharing(CORS). CORS is supported globally by
multiple OpenStack services but is not by watcher, due to lack of
CORS middleware and no mechanism to inject it into api pipeline.
Closes-Bug: #2122347
Change-Id: I6b47abe4f08dc257e9156b254fa60005b82898d7
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
In case standalone watcher-api runs behind forwarders (like load
balancers), it should parse specific request headers to determine
the endpoint url clients actually use.
Add http_proxy_to_wsgi middleware to api pipeline to handle this.
Closes-Bug: #2122353
Change-Id: I27ade17f7ce1649295f92f3ea1af620df63ba1bc
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Request ID is essential in operating OpenStack services, especially
when troubleshooting some API problems. It allows us to find out
the log lines actually related to a specific request.
However watcher api hasn't returned it properly, so operators had no
way to determine the exact ID they should search.
Add RequestID middleware to return the id in X-OpenStack-Request-Id
header, which is globally used.
Closes-Bug: #2122350
Change-Id: Ie4a8307e8e7e981cedbeaf5fe731dbd47a50bade
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Unlike Nova, Cinder does not support calling the 'os-migrate_volume'[1]
action without a host or a cluster. For volume migrations of type
'migrate' in watcher the dst_pool is required, but for other migrations
that migrate the volumes to different types is not needed. This
change checks if the dst_pool is defined and prevents some migrations
when it's misssing information.
Adds testing for creating audits with the Zone Migration status,
validating the schema changes.
[1] https://docs.openstack.org/api-ref/block-storage/v3/index.html#migrate-a-volume
Closes-Bug: 2108988
Change-Id: I305c58e47093c4a884e86f1d91fdc15ef2a1cfba
Signed-off-by: jgilaber <jgilaber@redhat.com>
By default Watcher enables only the compute model collector [1]. This
change enables the storage one as well, since otherwise when doing
volume migration the model quickly becomes obsolete if there are new
volumes created while an audit is running. The storage model is only
enabled if a cinder service is registered in keystone.
[1] https://docs.openstack.org/watcher/latest/configuration/watcher.html#collector.collector_plugins
Assisted-By: Cursor
Closes-Bug: 2111785
Change-Id: I864d3fc12d6364f1932cf5d2348a6b68169641e9
Signed-off-by: jgilaber <jgilaber@redhat.com>
The prelude provides a high-level overview of the
security improvements, operational enhancements,
and new monitoring capabilities for operators.
Assisted-By: claude-code
Change-Id: Ia2c1409d26aca0eddfb1685e9009305215c2405a
Signed-off-by: Sean Mooney <work@seanmooney.info>
Updates watcher-prometheus-integration-threading job
parent, so every new config option added to
watcher-prometheus-integration job is also added/tested
in the threading job.
Change-Id: I38c95f638f748fd5c051c312817e9123d6037ab5
Signed-off-by: Douglas Viroel <viroel@gmail.com>
Currently, when there is a volume_migrate action and migration_type is
`retype`, watcher assumes that the retype always triggers a migration
and checks the result of the retype based on the fields related to
the migration action (actually, it uses the same function to check the
result when `migration_type` is `retype` or `migrate`. This creates
problem in different scenarios:
- Actions keep in ONGOING status forever for volumes which have never
being migrated as the migration fields of the volume are empty.
- Actions which were migrated anytime before, still have the old values
so it may report the status of te retype actions wrongly.
This patch is implementing an entirely new function to check the result
of a retype action based on the final type and the status field of the
volume. This should be valid for any kind of retype action, with or
without migration. The criteria for successfull retype is that the type
for the volume is the destination one in the action and the status is
available or in-use.
Closes-Bug: #2112100
Change-Id: I76e91ed99e7a814a43a6dd906b6bcc150d471624
Signed-off-by: jgilaber <jgilaber@redhat.com>
Monasca is deprecated for removal. This change makes the Monasca client
an optional dependency and ensures it is only imported and instantiated
when the Monasca datasource is explicitly selected. This reduces the
default footprint while preserving functionality for deployments that
still rely on Monasca.
What changed
============
- requirements.txt: remove python-monascaclient from hard deps
- setup.cfg: add [options.extras_require] monasca extra
- watcher/common/clients.py: lazy import with clear UnsupportedError
- watcher/decision_engine/datasources/monasca.py: lazy client property
and deferred import of monascaclient.exc; reset on Unauthorized
- watcher/decision_engine/datasources/manager.py: unconditionally
import Monasca helper and include in metric_map; helper is lazy
- tests: conditionally include Monasca based on availability; adjust
expectations instead of skipping by default; avoid over-mocking
- tox.ini: enable optional extras via WATCHER_EXTRAS env var
- docs: datasources index notes Monasca is deprecated and optional
- releasenotes: upgrade note with install example and behavior
Why
===
- Allow deployments not using Monasca to run without the client
- Keep Monasca functional when explicitly installed via extras
- Provide clear operator guidance and smooth upgrades
Compatibility
=============
- No change for deployments that do not use Monasca
- Deployments using Monasca must install the optional extra:
pip install watcher[monasca]
Testing
=======
- Default: tox -e py3
- With Monasca: WATCHER_EXTRAS=monasca tox -e py3
Assisted-By: GPT-5 (Cursor)
Closes-Bug: #2120192
Change-Id: I7c02b74e83d656083ce612727e6da58761200ae4
Signed-off-by: Sean Mooney <work@seanmooney.info>
Fixed action status_message update restrictions to allow updates when
action is already in SKIPPED state. Previously, users could only update
the status_message when initially transitioning to SKIPPED state.
Changes include:
- Modified validation logic to allow status_message updates for SKIPPED actions
- Changed exception type from PatchError to Conflict for better semantics
- Added comprehensive test coverage for the new behavior
- Updated API documentation and samples
- Added release note documenting the fix
This enables administrators to fix typos, provide more detailed
explanations, or expand on reasons in action status messages after
the action has been skipped.
Generated-By: claude-code
Closes-Bug: #2121601
Change-Id: I64def708389a8ecd32080fba1638a4499ead349d
Signed-off-by: Sean Mooney <work@seanmooney.info>
Job watcher-aetos-integration is failing because of
having real metrics enabled coming from ceilometer.
We need to disable ceilometer-acompute and node_exporter so only
injected data will be considered when asking prometheus to take
decisions
Change-Id: If4f2c3f6f89527d768c48f1ca4967339837bb994
Signed-off-by: morenod <dsanzmor@redhat.com>
These do not actually define timeout but interval. Rename the options
to reflect what they actually define. The existing deprecated options
in the [gnocchi_client] are also removed, because these have been kept
for 6 years.
In addition, fix inconsistent name (query vs call).
Change-Id: Ib29115746a25b45bdff1c3da8df9d7167c2db662
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
This patch extends compute model attributes by
adding new fields to Instance element. Values are
populated by nova the collector, using the same
nova list call, but requires a more recent compute
API microversion.
A new config option was added to allow users to
enable or disable the extended attributes and it is
disable by default.
Configure prometheus-based jobs to run on newer version
of nova api (2.96) and enables the extended attributes
collection.
Implements: bp/extend-compute-model-attributes
Assisted-By: Cursor (claude-4-sonnet)
Change-Id: Ibf31105d780dce510a59fc74241fa04e28529ade
Signed-off-by: Douglas Viroel <viroel@gmail.com>
These are some of the requested changes from reviews
in the series of patches for add-skip-action blueprint.
Some of them may required another specific patch since
would touch in more files that are not related to
this feature.
Change-Id: I9e30ca385e7b184ab19449a60db6f6d0f3c0e1b9
Signed-off-by: Douglas Viroel <viroel@gmail.com>
... to avoid the following warning.
```
UserWarning: converting '1' to a string
warnings.warn('converting \'%s\' to a string' % str_val)
```
Change-Id: I852d63523d3582f00d4d7953199181e3d2b6a885
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Bug #2113862 details a number of suggested
corrections and additions to the Workload
Stabilization doc. This patch adds those
suggested changes.
Closes-Bug: #2113862
Assisted-By: Cursor (claude-3.5-sonnet)
Change-Id: I4131a304c064d2ea397b2447025c7edf69a56e2a
Signed-off-by: Ronelle Landy <rlandy@redhat.com>
This change enhances the Host Maintenance strategy by introducing
two new input parameters: `disable_live_migration` and
`disable_cold_migration`. These parameters allow cloud
administrators to control whether live or cold migration should be
considered during host maintenance operations.
If `disable_live_migration` is set, active instances will be cold
migrated if `disable_cold_migration` is not set, otherwise
active instances will be stopped. If `disable_cold_migration` is set,
inactive instances will not be cold migrated.
If both are set, only stop actions will be performed on instances.
The strategy logic and action plan generation have been updated to
reflect these behaviors. A new "stop" action is introduced and
registered, and the weight planner is updated to handle new action.
Documentation for the Host Maintenance strategy is updated to
describe the new parameters and their effects.
Test Plan:
- Unit tests for HostMaintenance strategy with new parameters
- Integration tests for action plan generation with stop action
This implements the specification:
Spec: https://review.opendev.org/c/openstack/watcher-specs/+/943873
Change-Id: I201b8e5c52e1bc1a74f3886a0e301e3c0fa5d351
Signed-off-by: Quang Ngo <quang.ngo@canonical.com>
Fixes the microversion comparison in both enable and
disable nova-compute service methods in NovaHelper.
The previous implementation was incorrect and started to
fail for microversion greather than 2.99.
Closes-Bug: #2120586
Assisted-By: Cursor (claude-4-sonnet)
Change-Id: I69da7f10cd5b42f7d4613d8947bca3e382815c3f
Signed-off-by: Douglas Viroel <viroel@gmail.com>
This patch implements the changes in the API required for the
skipped action blueprint. It includes:
- New field `status_message` is visible in API get calls for Audits,
ActionPlans and Audits.
- New Patch call is added to `/actions/{action_id}` which allows to
manually move actions in PENDING state to SKIPPED for ActionPlans
which have not been started.
- A new API microversion 1.5 is added for these changes.
It also adds requried tests and documentation.
Implements: blueprint add-skip-actions
Assisted-By: Cursor (claude-4-sonnet)
Change-Id: I71fb9af76085e5941a7fd3e9e4c89d6f3a3ada47
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
This patch is implementing skipping automatically actions based on the
result of action pre_condition method. This will allow to manage
properly situations as migration actions for vms which does not longer
exist. This patch includes:
- Adding a new state SKIPPED to the Action objects.
- Add a new Exception ActionSkipped. An action which raises it from the
pre_condition execution is moved to SKIPPED state.
- pre_condition will not be executed for any action in SKIPPED state.
- execute will not be executed for any action in SKIPPED or FAILED state.
- post_condition will not be executed for any action in SKIPPED state.
- moving transition to ONGOING from pre_condition to execute. That means
that actions raising ActionSkipped will move from PENDING to SKIPPED
while actions raising any other Exception will move from PENDING to
FAILED.
- Adding information on action failed or skipped state to the
`status_message` field.
- Adding a new option to the testing action nop to simulate skipping on
pre_condition, so that we can easily test it.
Implements: blueprint add-skip-actions
Assisted-By: Cursor (claude-4-sonnet)
Change-Id: I59cb4c7006c7c3bcc5ff2071886d3e2929800f9e
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
... instead of documenting the supported values, so that more explicit
error is presented to users.
Also drop redundant description about the default values. The default
values are added to sample config files generated, so don't have to
be explained in help texts.
Change-Id: I12b201da3e742b55f6cfcf71bdd4413cbf3ee4e5
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
This patch is part of the skipped action blueprint. It adds the
`status_message` field to the Audit, ActionPlan and Action objects and
all related notifications.
It bumps the versions of all the affected objects and notifications and
update the tests to include the new fields.
Change-Id: I3b9467e7e37188e647379cd9c4cbbda8ed75383f
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
This patch implements the changes in the database required for the
skipped action blueprint.
It just adds a new nullable column to the required tables and add tests
for it.
Note that I am also introducing a fix in a previous tables tests which
will be affected by the changes in the objects.
Implements: blueprint add-skip-actions
Change-Id: I027bc3861b589bd281a7216583a8c5c351a53c57
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
In order to test the different code paths for action execution
it is very useful to be able to make the actions fail in the different
execution stages.
This patch adds three new options `fail_pre_condition`, `fail_execute`
and `fail_post_condition`. Setting any of them to True makes the action
to fail in the specified step.
Change-Id: Ied8c0bb767d9bb6bdfb9209365857a3b4d606b40
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
Currently, patch call field validations are done based on exclussion,
all the fields can be patched unless included in a list
`internal_attrs`.
This patch is adding a new validation rule based on fields inclussion
in a list `allowed_attrs`. When that list is non-empty, only the fields
included on it can be patched. in order to keep the existing behavior
for the existing patch calls, I am defining the list as empty, so that
the rest of validation rules are applied and it is not affecting the
current behavior.
Change-Id: I22010649332c8fb872446a9d0483a0303a4eba3b
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
This change removes watchers in tree functionality
for swapping instance volumes and defines swap as an alias
of cinder volume migrate.
The watcher native implementation was missing error handling
which could lead to irretrievable data loss.
The removed code also forged project user credentials to
perform admin request as if it was done by a member of a project.
this was unsafe an posses a security risk due to how it was
implemented. This code has been removed without replacement.
While some effort has been made to allow existing
audits that were defined to work, any reduction of functionality
as a result of this security hardening is intentional.
Closes-Bug: #2112187
Change-Id: Ic3b6bfd164e272d70fe86d7b182478dd962f8ac0
Signed-off-by: Sean Mooney <work@seanmooney.info>
Resolve the following warning raised from pecan.
```
DeprecationWarning: The function signature for
watcher.api.controllers.root.RootController._route is changing in
the next version of pecan.
Please update to: `def _route(self, args, request)`.
```
Change-Id: I7081cf956a8baa05cd70ced0496ca8192fff979e
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
Implement the spec for multi-tenancy support for metrics. This adds
a new 'Aetos' datasource very similar to the current Prometheus
datasource. Because of that, the original PrometheusHelper class
was split into two classes and the base class is used for
PrometheusHelper and for AetosHelper. Except for the split, there
is one more change to the original PrometheusHelper class code, which
is the addition and use of the _get_fqdn_label() and
_get_instance_uuid_label() methods.
As part of the change, I refactored the current prometheus datasource
unit tests. Most of them are now used to test the PrometheusBase class
with minimal changes. Changes I've made to the original tests:
- the ones that can be be used to test the base class are moved into the
TestPrometheusBase class
- the _setup_prometheus_client, _get_instance_uuid_label and
_get_fqdn_label functions are mocked in the base class tests.
Their concrete implementations are tested in each datasource tests
separately.
- a self._create_helper() is used to instantiate the helper class with
correct mocking.
- all config value modification is the original tests got moved out and
instead of modifying the config values, the _get_* methods are mocked
to return the wanted values
- to keep similar test coverage, config retrieval is tested for each
concrete class by testing the _get_* methods.
New watcher-aetos-integration and watcher-aetos-integration-realdata
zuul jobs are added to test the new datasource. These use the same set
of tempest tests as the current watcher-prometheus-integration jobs.
The only difference is the environment setup and the Watcher config,
so that the job deploys Aetos and Watcher uses it instead of accessing
Prometheus directly.
At first this was generated by asking cursor to implement the linked spec
with some additional prompts for some smaller changes. Afterwards I manually
went through the code doing some cleanups, ensuring it complies with
PEP8 and hacking and so on. Later on I manually adjusted the code to use
the latest observabilityclient changes.
The zuul job was also mostly generated by cursor.
Implements: https://blueprints.launchpad.net/watcher/+spec/prometheus-multitenancy-support
Generated-By: Cursor with claude-4-sonnet model
Change-Id: I72c2171f72819bbde6c9cbbf565ee895e5d2bd53
Signed-off-by: Jaromir Wysoglad <jwysogla@redhat.com>
The data_model list API response comes from the model to_list()
method, which generates both server_* and node_* attributes from
Instance and Node classes fields[1]. Any change on these classes
can break the data_model list API and require a new microversion.
These tests validate the current expected fields.
[1] 5ba086095c/watcher/decision_engine/model/model_root.py (L250-L270)
Change-Id: I77fac162101013aa923272aa99c7c6695cc5fdca
Signed-off-by: Douglas Viroel <viroel@gmail.com>
Some response parameters from GET /infra-optim/v1/data_model
endpoint are missing from api-ref documentation. This patch
updates the doc to include them.
For more details see, LP #2117726
Closes-Bug: #2117726
Change-Id: Iaa775f56bb8167d9c6b458cd07f1ec3cefaf70fe
Signed-off-by: Douglas Viroel <viroel@gmail.com>
It is done by disabling the eventlet patching and configuring
oslo.service backend to threading. Once oslo.service backend is
configured, it can't be reverted to eventlet. This needs to be
done before including other modules, which may include oslo.service
library.
Adds a job that run a subset of tests with eventlet patching disabled.
Change-Id: I9f8c2c5bbcf3192313cc3b309e8f2719a3bea18f
Signed-off-by: Douglas Viroel <viroel@gmail.com>
With the events of eventlet removal, Watcher will need
to be adapted to support both modes, eventlet and threading, for
a couple of releases before removing all eventlet code.
This patch adds methods and classes that allow decision engine
modules to create futurist thread pools instead of green thread pools,
based on a environment variable that can be enabled by service.
It moves continuous audit handler instance to decison engine service,
so it can be started together with the main decision engine service.
Adds an environment variable that allows the user to disable
eventlet monkey patching and to use oslo.service threading backend.
Change-Id: I8a8be0a7cebdc44005fd77ec960543828c7da318
Signed-off-by: Douglas Viroel <viroel@gmail.com>
This cr fixes:
* Replaced ``dateutil.tz.tzlocal()`` and ``dateutil.tz.tzutc()`` with
``datetime.timezone`` built-in classes in audit controllers and
continuous audit scheduling.
* Replaced ``dateutil.parser.parse()`` with
``oslo_utils.timeutils.parse_isotime()`` in the zone migration
strategy for parsing datetime strings.
Closes-Bug: #2118404
Change-Id: I6d8a345fa4339a688769b147413dcdf3016bf4a0
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
We need to disable real data metrics comming from host and
instances on injected data jobs as they are creating wrong results
when they are mixed with the injected data.
We already did this on watcher-operator disabling ceilometer agent and
node_exported on [1] so now we have to do it on devstack installations,
disabling meminfo on node_exporter for host metrics (cpu is already
disabled) and sg-core for instance metrics
[1] https://github.com/openstack-k8s-operators/watcher-operator/pull/196
Change-Id: I4130ca6dd7cb52d96842e04e7720431ebc76efff
Signed-off-by: morenod <dsanzmor@redhat.com>
Adds a tempest configuration for min and max microversions supported
by watcher. This help us to define the correct range of microversion
to be tested on each stable branch.
New microversion proposals should also increase the default
max_microversion, in order to work with watcher-tempest-plugin
microversion testing.
Change-Id: I0b695ba4530eb89ed17b3935b87e938cadec84cc
Signed-off-by: Douglas Viroel <viroel@gmail.com>
The last release of openstack to support python 3.9
was 2025.1 (epoxy), with this change watcher now requires
3.10, testing of 3.9 was removed in previous commits.
Change-Id: Ida53740293e93b0c20dec2e175b390fa18bed852
Signed-off-by: Sean Mooney <work@seanmooney.info>
The decision engine process was built based on 2
services: a service that handle rpc requests and a
scheduler to trigger watcher periodic tasks.
With the new version of oslo.service, a new threading
backend was added, based on cotyledon service manager,
which starts a new process for each service tha it
manages. These two services can't run in different
process since they need access to a shared in-memory
representation of the cluster (cluster data models)
This patch proposes creating a Decision Engine Service
which includes everything in a single main service.
Change-Id: I335a97ca14b6e023fef055978a56aefebf22d433
Signed-off-by: Douglas Viroel <viroel@gmail.com>
The following exception was added in initial import of watcher
code base[1].
In each of the controller REST APIs, it was called with a flag
stating request was coming from top level resources apis.
But this exception and code was not used anywhere in the
rest api. It seems to be a dead code. So, it needs to be
cleaned up.
Note: In audit_template, under patchapi, this exception
was used for not removal goal from audit template.
Since this cr drops this exception, It replace the same
with NotAuthorized exception keeping status code same.
Links:
[1]. d14e057da1 (diff-6d510a275605e20ba8b435157062da2b749265a88a3cfd6d90abb7e8e5feac2aR235)
Closes-Bug: #2115968
Change-Id: I82a5e4a7a51726b3a89257c84a75157fbfcb82eb
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
These apis are not implemented with in the watcher code base and
was marked as a forbidden to use.
It does not make sense to keep these api as they are not implemented.
This cr drops the code around that to make the action apis cleaner.
Closes-Bug: #2110895
Change-Id: I0f465157e6cd481b27665ca6016db68c198cebeb
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
The original documentation update review [1]
had some additional comments for improvements.
The commit adds the suggested changes.
[1] https://review.opendev.org/c/openstack/watcher/+/951025
Change-Id: I4b4624e2dbc4c6a5f888ec77d6a03b8f66ff0a23
Adds documation clarifications on how the
strategy and associated parameters as used.
Closes-Bug: #2112480
Change-Id: Id42c280fc5744bebb01d50b52b834e5b3b76af73
Add clarifications to the documentation to reflect
the actual strategy usage, including:
- updating parameter descriptions
- extending the 'How to Use' section
Closes-Bug: #2111810
Change-Id: Ifd2876056cd8819c50658fb9f213246dc1546d42
The prometheus datasource was reporting host_ram_usage in MiB as
described in the docstring for the base datasource interface
definition [1].
However, the gnocchi datasource is reporting it in KiB following
ceilometer metric `hardware.memory.used` [2] and the strategies
using that metric expect it to be in KiB so the best approach is
to change the unit in the prometheus datasource and update the
docstring to avoid missunderstandings in future. So, this patch
is fixing the prometheus datasource to return host_ram_usage
in KiB instead of MiB.
Additionally, it is adding more unit tests for the check_threshold
method so that it covers the memory based strategy execution, validates
the calculated standard deviation and adds the cases where it is below
the threshold.
[1] 15981117ee/watcher/decision_engine/datasources/base.py (L177-L183)
[2] https://docs.openstack.org/ceilometer/train/admin/telemetry-measurements.html#snmp-based-meters
Closes-Bug: #2113776
Change-Id: Idc060d1e709c0265c64ada16062c3a206c6b04fa
The workload_balance strategy calculates host metrics based on the
instance metrics and those are the ones used to compare with the
threshold.
Currently, the strategy does not reports the calculated values what
makes difficult to troubleshoot sometimes. This patch is adding a debug
message to log those values.
This patch is also adding a new unit test for filter_destination_hosts
based on ram instead of cpu and adding assertions for the new debug
messages. To implement properly the new test, I had to sligthly modify
the ram usage fixtures used for the workload_balance tests.
Change-Id: Ief5e167afcf346ff53471f26adc70795c4b69f68
This patch adds a table to the strategies page to
show the level of qualification and where the
strategy can be triggered.
Change-Id: I6991566fd5fec3f8bbae06eefa63a8b83a87eed1
Currently, when the prometheus datasource query ceilometer_cpu metric
for instance cpu usage, it aggregates by instance and filter by the
label containing the instance uuid. While this works fine in real
scenarios, where a single metric is provided in a single instance, in
some cases as the CI jobs where metrics are directly injected, leads to
incorrect metric calculation.
We applied a similar fix for the host metrics in [1] but we did not
implement it for instance cpu.
I am also converting the query formatting to the dict format to improve
understability.
[1] https://review.opendev.org/c/openstack/watcher/+/946049
Closes-Bug: #2113936
Change-Id: I3038dec20612162c411fc77446e86a47e0354423
get_disabled_compute_nodes_with_reason defined in host_maintenance
strategy is not used anywhere.
This cr drops the unused method.
Change-Id: I07c0d0b63e00d476511aa8b03c0feab8ec4db95b
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
This is a initial patch towards the eventlet removal in watcher.
It moves cmd scripts that depends on eventlet to a eventlet dir,
where it is always monkey patched.
Change-Id: Ie23caab018fbf68f8c29a0f748c0708b97933b4b
Some services integrations are now classified as experimental
and a warning message will now appear once a client is created
for them. These integrations are not fully tested in CI and
miss a documentation on how they work or should be used.
A release note was added to inform users about the status of
these integrations and related features.
Change-Id: Ib7d0ac0b3e187ae239dfa075fb53a6c0107dff29
Adds a new documentation section that descript which service
integrations are currently supported and their integrations status.
This information is not clear today and will help to cover the lack
of testing and documention about them.
Change-Id: I26b2a2ef5672b78a575a2bdaef3a08d5bbc063bd
Currently, in that case it was failing because watcher tried to create a
name based on a goal automatically and the goal is not defined.
This patch is moving the check for goal specification in the audit
creation call earlier, and if there is not goal defined, it returns an
invalid call error.
This patch is also modifying the existing error for this case to check
the expected behavior.
Closes-Bug: #2110947
Change-Id: I6f3d73b035e8081e86ce82c205498432f0e0fc33
The idea is to adapt zuul.yaml to future test structure where every strategy will be on its own file so now we keep executing everything inside test_execute_strategies but also any other test on any file with tag 'strategy'
Change-Id: I304c858078d35beb1f7b4f1fad4ea8bedde674af
Currently, an actionplan state is set to SUCCEEDED once the execution
has finished, but that does not imply that all the actions finished
successfully.
This patch is checking the actual state of all the actions in the plan
after the execution has finished. If any action has status FAILED, it
will set the state of the action plan as FAILED and will apply the
appropiate notification parameters. This is the expected behavior according
to Watcher documentation.
The patch is also fixing the unit test for this to set the expected
action plan state to FAILED and notification parameters.
Closes-Bug: #2106407
Change-Id: I7bfc6759b51cd97c26ec13b3918bd8d3b7ac9d4e
Add a test for the zone migration strategy using the
with_attached_volume parameter, setting storage_pools but not
compute_nodes. With volumes attached to instances, with these inputs,
the strategy should propose an action plan to migrate volumes and the
instances they are attached to, since Nova, even without the user
passing a destination node for the instances is able to find one.
However, the execution results in an error, since the strategy assumes
that the compute_nodes dict will always be there.
Change-Id: Ifac28b1aab8a0caf77d97e4c19d051e764256674
This change removes all the duplicate fields from the
watcher RequestContext.
It also removes several filed like quota_class and
remote_address that were cargo culted from nova
but never used in watcher when notification support was
added.
Change-Id: Ibf8739d6cd2d4557df6f8de6c780b6f4280b774f
context.user has been deprecated for years
and renamed to user_id
the deprecated field has now been removed so this
change updates our test cases to reflect that.
Change-Id: I120441fb9392c370c57dc63d8c115d8993d25f62
For compute nodes, nova works fine if a destination node is not
specified, so this change makes sure we're not passing None when the
user does not set one to avoid an error.
Partial-Bug: 2108988
Change-Id: Ida1f18b97697c041819e29f935aa5e232848226a
This patch is adding a new unit test to validate the behavior
of the API when trying to create an audit without a goal (whether using
a goal or audit template parameters) and no name is provided.
Related-Bug: https://bugs.launchpad.net/watcher/+bug/2110947
Change-Id: I04df10a8a0eea4509856f2f4b9d11bae24cd563a
This patch is adding a new unit test to check the behavior of the action
plan when one of the actions in it fails during execution.
Note this is to show a bug, and the expected state will be changed in
the fixing patch.
Related-Bug: #2106407
Change-Id: I2f3fe8f4da772a96db098066d253e5dee330101a
Currently, when trying to create an audit which misses a mandatory
parameter watcher returns error 500 instead of 400 which is the
documented error in the API [1] and the appropiate error code for
malformed requests.
This patch catch parameters validation errors according to the json
schema for each strategy and returns error 400. It also fixes the
unit test to validate the expected behavior.
[1] https://docs.openstack.org/api-ref/resource-optimization/#audits
Closes-Bug: #2110538
Change-Id: I23232b3b54421839bb01d54386d4e7b244f4e2a0
Add some tests to show that the zone migration strategy generates
problematic input parameters for actions in some cases when destination
parameters are not passed for instances or volumes.
Change-Id: Idc3af0e6d9d2d5388ff3d152d81e63364758607b
Fix incorrect logging format for multiple variables because of what this
functionality didn't work correctly and some log messages were skipped.
The logging calls require two arguments, but they are passed in a tuple
so it's interpreted as one argument only and it fails as is missing
the second argument.
Closes-Bug: 2110149
Change-Id: I74ed44134b50782c105a0e82f3af34a5fa45d119
Check the debug logs for some methods in the cinder and nova helpers to
reproduce the erros described in bug [1]. The logger is disabled by default,
so the error was being ignored, in order to show the error, the logger
needs to be enabled for the tests in question. The logging was disabled
by allembic configuring logging in [2], so this patch also removes that
logging config to expose the errors.
[1] https://bugs.launchpad.net/watcher/+bug/2110149.
[2] https://github.com/openstack/watcher/blob/master/watcher/db/sqlalchemy/alembic/env.py#L26
Change-Id: I3598ca1d08d260602c392f8a8098821faa53f570
Currently, it is returning http error code 500 instead of 400, which
would be the appropiate code.
A follow-up patch will be sent with the vix and switching the error code
and message.
Related-Bug: #2110538
Change-Id: I35ccbb9cf29fc08e78c4d5f626a6518062efbed3
Currently host maintenance strategy also migrate instances from maintenance
node to watcher_disabled compute nodes.
watcher_disabled compute nodes might be disabled for some other purpose
by different strategy. If host maintenace use those compute nodes for
migration, It might affect customer workloads.
Host maintenance strategy should never touch disabled hosts unless the user
specify a disable host as backup node.
This cr drops the logic for using disabled compute node for maintenance.
Host maintaince is already using nova schedular for migrating the
instance, will use the same. If there is no available node, strategy
will fail.
Closes-Bug: #2109945
Change-Id: If9795fd06f684eb67d553405cebd8a30887c3997
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
Removes the deprecated message executor when creating both RPC
and notification server instances. This parameter is deprecated[1],
as well eventlet option.
When not defined, the server will get the one that fits better the
current context (monkey patched or not)[2]
[1] 27d833e374
[2] 412ab4de92/oslo_messaging/_utils.py (L87)
Change-Id: I784407aa7db10bddcec5dc663e1cec65174631e0
In a recent change [1] we modified the database schema for efficacy
indicators to use a 'data' column. However, that patch only contained
the schema migration and a fallback to be able to read from older
databases, and not any kind of data migration. This change introduces
a migration on load, so whenever an efficacy indicator without a 'data'
column is loaded, the column is populated in the database. The change
also modifies the migration test to verify the procedure works well.
[1] https://review.opendev.org/c/openstack/watcher/+/945199
Change-Id: Ib0621b0e03451faca803018d6a2f3ad657a25fb5
In DevStack environment, nova service-list command does not
exist. Distro suggests to install python-novaclient from package.
In Strategies documentation, we generate the docs from following
code.[1]
```
* - ``migration``
- .. watcher-term:: watcher.applier.actions.migration.Migrate
* - ``change_nova_service_state``
- .. watcher-term:: watcher.applier.actions.change_nova_service_state.ChangeNovaServiceState
```
and with in code, we use nova python binding to get list services[2]
and we are not calling openstack cli reference with in the code.
Documenting the equivalent openstack command does not seems to be useful
in the help text as we are using python binding.
Links:
[1]. c4acce91d6/doc/source/strategies/host_maintenance.rst (L45)
[2]. c4acce91d6/watcher/common/nova_helper.py (L150-L152)
Change-Id: I0c663c9741fae94bdb9c30f46d3d396325a33948
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
Set the default interface for keystone_client to public in the watcher
conf instead of admin.
Closes-Bug: 2109494
Change-Id: I9e0289249981ca965190df6dbdc37e09fd0951d7
Configure the numeric type of the EfficacyIndicator value to use Float.
Add a new column named data and deprecate the existing value columen.
With the current model, value will use the default scale of the
Decimal type of mysql, which in some enviornments is 0.
This change also adds a test with mysql as backend to reproduce the
issue, since the existing tests using sqlite do not reproduce the
problem, as well as some simple migration tests.
Closes-Bug: #2103458
Change-Id: Ib281fa32e902d2181449091f493d6506b5199094
Add a test with mysql as backend to show that the current
EfficacyIndicator model does not store any decimal digit for the value.
Change-Id: I0cdbd7d87cd6869a10b48eda3d59558831c8dd36
ubuntu jammy is nolonger part of the required
testing runtime so this change simply removes
the jammy jobs.
Change-Id: I1e3bbb14cea5b856e8146f3a32d60c3a4ffdcfcc
suse has not been a testing runtime for a few releases
and we have no jobs currently validating it still work.
this change just removes the suse specific logic
Change-Id: I357fa71704af7aa6239054ede29d0fdcdc3fb8b5
pip 23.1 removed the "setup.py install" fallback for projects that do
not have pyproject.toml and now uses a pyproject.toml which is vendored
in pip [1][2]. pip 24.2 has now deprecated a similar fallback to
"setup.py develop" and plans to fully remove this in pip 25.0 [3][4][5].
pbr supports editable installs since 6.0.0
pip 25.1 has now been released and the removal is complete.
by adding our own minimal pyproject.toml to ensure we are using the
correct build system.
This change also requires that we adapt how we generate our wsgi
entry point. when pyproject.toml is used the wsgi console script is
not generated in an editbale install such as is used in devstck
To adress this we need to refactor our usage of our wsgi applciation
to use a module path instead. This change does not remove
the declaration of our wsgi_scrtip entry point but it shoudl
be considered deprecated and it will be removed in the future.
To unblock the gate the devstack plugin is modifed to to deploy
using the wsgi module instead of the console script.
Finally supprot for the mod_wsgi wsgi mode is removed.
that was deprecated in devstack a few cycle ago and
support was removed in I8823e98809ed6b66c27dbcf21a00eea68ef403e8
[1] https://pip.pypa.io/en/stable/news/#v23-1
[2] https://github.com/pypa/pip/issues/8368
[3] https://pip.pypa.io/en/stable/news/#v24-2
[4] https://github.com/pypa/pip/issues/11457
[5] https://ichard26.github.io/blog/2024/08/whats-new-in-pip-24.2/
Closes-Bug: #2109608
Depends-on: https://review.opendev.org/c/openstack/watcher/+/948502
Change-Id: Iad77939ab0403c5720c549f96edfc77d2b7d90ee
Currently we are passing src_node and des_node uuid when we try to run
migrate action.
In the watcher-applier log, migration fails with following exception
```
Nova client exception occurred while live migrating instance <uuid>Exception: Compute host <uuid> could not be found
```
Based on 57f55190ff/watcher/applier/actions/migration.py (L122)
and
57f55190ff/watcher/common/nova_helper.py (L322),
live_migrate_instance expects destination hostname not uuid.
This cr replaces dest_node uuid to hostname.
Closes-Bug: #2109309
Change-Id: I3911ff24ea612f69dddae5eab15fabb4891f938d
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
This job is adding a new job using prometheus datastore and real
workload data into the experimental pipeline so that we can run it
on-demand.
Also, it is adding it to the weekly periodic pipeline as agreed on
Watcher meeting.
Also I am excluding strategies execution with annotation `real_load` in
non-real-load jobs.
Finally, I'm moving the project configuration to the end of the file
as requested in the comments, as it's the usual location by convention.
Change-Id: Id41efda2f0dd8b1521df3f6179c3504f298e0e59
While in a regular case a specific metric for a specific host will be
provider by a single instance (exporter) so aggregating by label and by
intances should be the same, it is more correct to aggregate by the same
label that the one we use to filter the metrics.
This is follow up of https://review.opendev.org/c/openstack/watcher/+/944795
Related-Bug: #2103451
Change-Id: Ia61f051547ddc51e0d1ccd5a56485ab49ce84c2e
Currently we are using `instance` label to query about host metrics to
prometheus. This label is assigned to the url of each endpoint being
scrapped.
While this work fine in one-exporter-per-compute cases as the driver is
mapping the fqdn_label value to the `instance` label value, it fails
when there are more that one target with the same value for the fqdn
label. This is a valid case, to be able to query by fqdn and do not
care about what exporter in the host is providing the metric.
This patch is changing the queries we use for hosts to be based on the
fqdn_label instead of the instance one. To implement it, we are also
simplifying the way we check the metric exist for the host by converting
prometheus_fqdn_instance_map into a prometheus_fqdn_labels set
which stores the list of fqdn found in prometheus.
Closes-Bug: #2103451
Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e
Add file to the reno documentation build to show release notes for
stable/2025.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.1.
Sem-Ver: feature
Change-Id: Ie7a1845d7b02b852e776ed8ec73598caab2fb5c6
The library has been missing from the test requirements although it is
directly used. Replace it by the built-in datetime module to get rid
of the unmaintained direct dependency.
Change-Id: I1d08b38862b54fee4c7c26161f59264fb3f2ce51
The Monasca project was marked inactive during 2023.1. Although we have
seen multiple people showing interest to keep the project, we haven't
seen any real progress.
Because the project is likely retired soon, let's deprecate the feature
dependent on Monasca so that we can remove it in a future release.
Change-Id: Ifd64f5ba59bbac238ff62302ec36a3e36954d6d0
More refactoring of the SQLAlchemy database layer to improve
compatility with eventlet on newer Pythons.
Inspired by 0ce2c41404
Related-Bug: 2067815
Change-Id: Ib5e9aa288232cc1b766bbf2a8ce2113d5a8e2f7d
Run bandit check from per-commit so that the check is executed in pep8
job.
Also remove requirements installed automatically by pre-commit from
test-requirements.
Change-Id: I45af8c47afb262882ebbee74ae52446fed741e26
In order to support vm_workload_consolidation, workload_balance and
workload_stabilization strategis some instance metrics are required.
This patch is adding support for them.
Implementation is based on a prometheus store populated using sg-core
from ceilometer metrics with Pollster source.
- instance_ram_usage: rely on ceilometer_memory_usage metrics created from
ceilometer memory.usage meter.
- instance_ram_allocated: rely on the memory value provided by the
inventory created from nova and placement APIs.
- instance_cpu_usage: rely on ceilometer_cpu metric created from
ceilometer cpu meter. A max value of 100 is set in the query.
- instance_root_disk_size: rely on the `disk` value provided by the
inventory created from nova and placement APIs.
A new parameterer `instance_uuid_label` has been added to the prometheus
datasource configuration to identify the label used to store the value of the
OpenStack instance uuid for eache instance metric in prometheus. Default
value is `resource`.
Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2
This review adds a base job to test Watcher,
via devstack/tempest installation) and the
intreraction with the newly added
Prometheus data source.
Related change:
https://review.opendev.org/c/openstack/watcher/+/934423
Change-Id: Id9d7d2ded1aae160a97a5f0aa0f7048a9c38e87d
This adds a new data source for the Watcher decision engine that
implements the watcher.decision_engine.datasources.DataSourceBase.
related spec was merged at [1].
Implements: blueprint prometheus-datasource
[1] https://review.opendev.org/c/openstack/watcher-specs/+/933300
Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906
This datasource requires Ceilometer API which was already removed some
years ago. The implementation should have been removed when dependency
on ceilometerclient was removed by [1].
Also remove some job definitions which are not actually used.
[1] 01d74d0a87
Change-Id: I29c3865dc1207f1bbbb266e4217cf8888afebfb6
"test_create_continuous_audit_with_wrong_interval" is failing
to validate the expected error message when creating a continuous
audit with a wrong interval. The error message is now slightly
different, since "croniter" was bumped to latest version in openstack
requirements[1].
Closes-Bug: #2089866
[1] 868e0ae644
Change-Id: I33029d224577bd1d5124947f1e6150fe2dbc9456
The apscheduler background scheduler spawns a native thread
which is not monkey patched which interacts with shared module
level objects like the module level LOG instances and sqlachmey
engine facades.
This is unsafe and leads to mixing patched and unpatched
code in the same thread.
This manifests in 2 ways:
1.) https://paste.opendev.org/show/bGPgfURx1cZYOsgmtDyw/
sqlalchmey calls can fail due to a time.sleep(0) in oslo.db being invoked
using the unpatched time modules in an eventlet greenthrad.
2.) https://paste.opendev.org/show/b5C2Zz4A4BFIGbKLKrQU/
over time that caused the sqlalchmy connection queuepool to fill up preventing
backgound tasks form running like reconsiling audits.
This change adresses this by overloading the background scheduler _main_loop
to monkey patch the main loop if the calling thread was monkey patched.
Closes-Bug: #2086710
Change-Id: I672c183274b0a17cb40d7b5ab8c313197760b5a0
This change moves all style checks to be run via pre-commit.
To enable this in existing ci and preserve the standard developer flow
the tox pep8 target is updated to run all checks via pre-commit.
developers can optionally install pre-commit and/or the pre-commit
commit hook to automatically or manually run the precommit hooks.
Change-Id: I6ee6ed853dbf60339e7bf3da66b2e5914c218f76
This change corrects the detected sphinx-linit issue in the existing
docs and updates the contributor devstack guide to call out
required and advanced.
mostly the changes were simple fixes like replacing the configurable
default rule with explict literal syntax `term` -> ``term``
some inline Note: comments have been promoted to .. note:: blocks
and literal blocks :: have been promoted to .. code-block:: <language>
directives.
Change-Id: I6320c313d22bf542ad407169e6538dc6acf79901
olso.policy 4.5.0[1] changed the config options policy_file
default value to 'policy.yaml', which means it is changed
for all the OpenStack services and they do not need to
override the default anymore.
NOTE: There is no change in behaviour here, oslo.policy provides
the same configuration that services have overridden till now.
[1] https://review.opendev.org/c/openstack/releases/+/934012
[2] https://review.opendev.org/c/openstack/requirements/+/934295
Change-Id: I46cc9e05fbc8f6c95c0b2d50093ecfb070a4170f
This commit removes the execute bit from several files
and remove the shebang lines from the devstack plugin.
While the devstack plugin is written in bash, it is not an executable
script. The devstack plugin is sourced by devstack as needed,
as such it is not executed in a subshell and the #!/bin/bash
lines are not used even when present.
Change-Id: I82ca22b7a47bf267fe6cf11f3e3519510108c146
This change refactors how watcher manages monkey_patching
modules to achieve 2 goals.
First, we want to ensure the watcher code is tested as it is used
in production. While many tests can run without eventlet,
the existing unit tests depend on eventlet monkey patching
indirectly by importing watcher code that uses eventlet.spawn and
greenthread executors. While that mostly functions today it has
incorrect and inconsistent behaviour on Python 3.9 vs Python 3.12.
Second, the unit tests that test the cmd module were indirectly
monkey patching the test executor during the execution of the tests
as a side effect of importing watcher.cmd. As such the order the tests
execute in and how they are distributed across test workers changed
if the test was monkey-patched or not.
This change makes all tests run with monkey_patching by adding
monkey patching in the watcher/tests/__init__.py
This change also splits the monkey patching from the import
in preparation for an eventual removal of eventlet in a future
release.
Change-Id: I967f3469bd66e69c00863d553bc859343afbb3ff
This change adds supprot for OS_DEBUG and also configures
default testing timeouts and log capture.
Change-Id: I685fee4081cdee82c508b6d25c534483f2caf09b
This chanage enabled codespell in precommit and
fixes the existing typos.
A followup commit will enable this in tox and ci.
Change-Id: I0a11bcd5a88247a48d3437525fc8a3cb3cdd4e58
Add file to the reno documentation build to show release notes for
stable/2024.2.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2024.2.
Sem-Ver: feature
Change-Id: I84f9b0b1aa9749fee8ac174ae6d15c62a934d641
This change adds configuration for the pre-commit tool,
follow-up changes will address the remaining issues in a phased
approach to make the reviews simpler.
This is based on the pre-commit config used in nova
with some additional hooks.
Follow-up changes will address the FIXME comments
related to sphinx-lint and codespell, as well as update tox
to enforce these checks in ci.
Change-Id: I87681a19f7fa88366c2b0d310c8b3153aa6a137b
tox now always recreates an env although the env is shared using envdir
options.
~~~
$ tox -e genpolicy
genpolicy: recreate env because env type changed from
{'name': 'genconfig', 'type': 'VirtualEnvRunner'} to
{'name': 'genpolicy', 'type': 'VirtualEnvRunner'}
~~~
According to the maintainer of tox, this functionality is not intended
to be supported.
https://github.com/tox-dev/tox/issues/425#issuecomment-1011944293
Change-Id: I9c1f574c6d45a7be808a023f01dee13c3ac2c72e
The datetime.utcnow() is deprecated in Python 3.12.
Replace datetime.utcnow() with oslo_utils.timeutils.utcnow().
This bumps oslo.utils to 7.0.0.
Change-Id: Icccbb0549add686a744a72b354932471cbf91c92
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
This code worked around a bug in eventlet[1] that has been fixed in
115103d5608cbe8f15df10e27eba1644f5364e95. The fix has been available in
every eventlet release since v0.27.0.
[1] https://github.com/eventlet/eventlet/issues/592
Co-Authored-By: Cyril Roelandt <cyril@redhat.com>
Change-Id: Ifc0b9c1d7f022db54c34c48c903a1719f9404d04
This was originally five patches, but they are all needed to pass
any of the test jobs now, so they have been squashed into one:
Co-Authored-By: Dan Smith (dms@danplanet.com)
First:
The autoload argument was removed[1] in SQLAlchemy and only
the autoload_with argument should be passed.
The autoload argument is set according to the autoload_with argument
automatically even in SQLAlchemy 1.x[2] so is not at all needed.
[1] c932123bac
[2] ad8f921e96
Second:
Remove _warn_on_bytestring for newer SA, AFAICT, this flag has been
removed from SQLAlchemy and that is why watcher-db-manage fails to
initialize the DB for me on jammy. This migration was passing the
default value (=False) anyway, so I assume this is the right "fix".
Third:
Fix joinedload passing string attribute names
Fourth:
Fix engine.select pattern to use begin() per the migration guide.
Fifth:
Override the apscheduler get_next_run_time() which appears to be
trivially not compatible with SQLAlchemy 2.0 because of a return type
from scalar().
Change-Id: I000e5e78f97f82ed4ea64d42f1c38354c3252e08
Minimal refactor of SQLAlchemy api module to be compatible with
oslo.db >= 15.0.0 where autocommit behaviour was dropped.
Closes-Bug: #2056181
Change-Id: I33be53f647faae2aad30a43c10980df950d5d7c2
Add file to the reno documentation build to show release notes for
stable/2024.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2024.1.
Sem-Ver: feature
Change-Id: I9eb6462199bedb3bbc24ba853ebf52ac7d93353f
As per the current release tested runtime, we test
python version from 3.8 to 3.11 so updating the
same in python classifier in setup.cfg
Change-Id: Ie010eea38eb0861699b60f16dfd3e2e95ae33709
At the moment, Watcher can use a single bare metal provisioning
service: Openstack Ironic.
We're now adding support for Canonical's MAAS service [1], which
is commonly used along with Juju [2] to deploy Openstack.
In order to do so, we're building a metal client abstraction, with
concrete implementations for Ironic and MAAS. We'll pick the MAAS
client if the MAAS url is provided, otherwise defaulting to Ironic.
For now, we aren't updating the baremetal model collector since it
doesn't seem to be used by any of the existing Watcher strategy
implementations.
[1] https://maas.io/docs
[2] https://juju.is/docs
Implements: blueprint maas-support
Change-Id: I6861995598f6c542fa9c006131f10203f358e0a6
Power-off actions created by the energy saving strategy include
a resource name property, which currently isn't part of the
action json schema. For this reason, json schema validation fails.
Additional properties are not allowed ('resource_name' was unexpected)
We'll update the json schema, including the resource name property.
Change-Id: I924d36732a917c0be98b08c2f4128e9136356215
A couple of object tests are failing, probably after a dependency
bump.
watcher.objects.base.objects is mocked, so the registered object
version isn't properly retrieved, leading to a type error:
File "/mnt/data/workspace/watcher/watcher/tests/objects/test_objects.py",
line 535, in test_hook_chooses_newer_properly
reg.registration_hook(MyObj, 0)
File "/mnt/data/workspace/watcher/watcher/objects/base.py",
line 46, in registration_hook
cur_version = versionutils.convert_version_to_tuple(
File "/home/ubuntu/openstack_venv/lib/python3.10/site-packages/oslo_utils/versionutils.py",
line 91, in convert_version_to_tuple
version_str = re.sub(r'(\d+)(a|alpha|b|beta|rc)\d+$', '\\1', version_str)
File "/usr/lib/python3.10/re.py", line 209, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
We'll solve the issue by setting the VERSION attribute against
the mock object.
Change-Id: Ifeb38b98f1d702908531de5fc5c846bd1c53de4b
The "vm workload consolidation" strategy is summing up instance
usage in order to estimate host usage.
The problem is that some infrastructure services (e.g. OVS or Ceph
clients) may also use a significant amount of resources, which
would be ignored. This can impact Watcher's ability to detect
overloaded nodes and correctly rebalance the workload.
This commit will use the host metrics, if available. The proposed
implementation uses the maximum value between the host metric
and the sum of the instance metrics.
Note that we're holding a dict of host metric deltas in order to
account for planned migrations.
Change-Id: I82f474ee613f6c9a7c0a9d24a05cba41d2f68edb
The "cpu_util" metric has been deprecated a few years ago.
We'll obtain the same result by converting the cumulative cpu
time to a percentage, leveraging the rate of change aggregation.
Change-Id: I18fe0de6f74c785e674faceea0c48f44055818fe
There may be no available metrics for instances that are stopped
or were recently spawned. This makes retries unnecessary and time
consuming.
For this reason, we'll ignore gnocchi MetricNotFound errors.
Change-Id: I79cd03bf04db634b931d6dfd32d5150f58e82044
We're adding a few info log messages in order to trace the
"vm consolidation" strategy more easily.
Change-Id: I8ce1a9dd173733f1b801839d3ad0c1269c4306bb
Although Watcher supports cold migrations, the vm workload
consolidation workflow only allows live migrations to be
performed.
We'll remove this unnecessary limitation so that stopped instances
could be cold migrated.
Change-Id: I4b41550f2255560febf8586722a0e02045c3a486
The Nova collector json schema validation started [1][2] failing after
the jsonschema upper constraint was bumped from 4.17.3 to 4.19.1 [3].
The reason is that jsonschema v4.18.0a1 switched to a reference
resolving library [4], which treats the aggregate "id" as a jsonschema
id and expects it to be a string [5]. For this reason, we're now getting
AttributeError exceptions.
As a workaround, we'll rename the "id" ref element as "host_aggr_id".
Also, the watcher-tempest-multinode job is configured to use Focal,
which is no longer supported by Devstack [6]. That being considered,
we'll switch to Ubuntu Jammy (22.04).
While at it, we're disabling Cinder Backup, which isn't used while
testing Watched. It currently causes Devstack failures since it
uses the Swift backend by default, which is disabled.
[1] https://paste.opendev.org/raw/bjQ1uIdbDMnmA1UEhxLL/
[2] https://paste.opendev.org/raw/bNgxqulBwBLYB7tNhrU4/
[3] ab0dcbdda2
[4] https://github.com/python-jsonschema/jsonschema/releases/tag/v4.18.0a1
[5] c23a5dc1c9/referencing/jsonschema.py (L54-L55C18)
[6] https://paste.openstack.org/raw/bSoSyXgbtmq6d9768HQn/
Change-Id: I300620c2ec4857b1e0d402a9b57a637f576eeb24
Add file to the reno documentation build to show release notes for
stable/2023.2.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.2.
Sem-Ver: feature
Change-Id: I8a0c75ce5a4e5ae5cccd8eb1cb0325747a619122
Add file to the reno documentation build to show release notes for
stable/2023.1.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.1.
Sem-Ver: feature
Change-Id: Ia585893e7fef42e9991a2b81f604d1ff28c0a5ad
This moves the watcher queue declaration from the pipeline level
(where it is no longer valid) to the project level.
https: //lists.openstack.org/pipermail/openstack-discuss/2022-May/028603.html
Change-Id: I06923abb00f7eecd59587f44cd1f6a069e88a9fc
oslo.db 12.1.0 has changed the default value for the 'autocommit'
parameter of 'LegacyEngineFacade' from 'True' to 'False'. This is a
necessary step to ensure compatibility with SQLAlchemy 2.0. However, we
are currently relying on the autocommit behavior and need changes to
explicitly manage sessions. Until that happens, we need to override the
default.
Co-Authored-By: Stephen Finucane <stephenfin@redhat.com>
Change-Id: I7db39d958d087322bfa0aad70dfbd04de9228dd7
This is an automatically generated patch to ensure unit testing
is in place for all the of the tested runtimes for antelope. Also,
updating the template name to generic one.
See also the PTI in governance [1].
[1]: https://governance.openstack.org/tc/reference/project-testing-interface.html
Change-Id: Ide6c6c398f8e6cdd590c6620a752ad802a1f5cf8
Add file to the reno documentation build to show release notes for
stable/zed.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/zed.
Sem-Ver: feature
Change-Id: I1726e33a14038712dbb9fd5e5c0cddf8ad872e69
Add WebTest to test-requirements which used to be imported as a
transitive requirement via pecan, but the latest release of
pecan dropped this dependency. So make this requirement explicit.
Related-Bug: #1982110
Change-Id: I4852be23b489257aaa56d3fa22d27f72bcabf919
Add file to the reno documentation build to show release notes for
stable/yoga.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/yoga.
Sem-Ver: feature
Change-Id: Ic7c275b38fef9afc29577f81fe92546bb94b2930
Add file to the reno documentation build to show release notes for
stable/xena.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/xena.
Sem-Ver: feature
Change-Id: If1c02305a153575c6a550844b0c6f45b74ea5ef3
>>> random.sample([5,10], 1.3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.6/random.py", line 321, in sample
result = [None] * k
TypeError: can't multiply sequence by non-int of type 'float'
Change-Id: Ifa5dca06f07220512579e4fe3c5c741aeffc71cc
Block Storage API v2 was deprecated during Pike cycle and is being
removed during Xena cycle, and current v3 API should be used instead.
Change-Id: Ia5247742b31f5f07186ef908588f0972d3ac609f
Python introduced http.HTTPStatus since version 3.5,
and Wallaby has targeted a minimum version of python 3.6.
Change-Id: I45f732f0f59b8fae831bb6c07f4fdd98cdd7409a
Since installing watcher dashboard is fixed in devstack deployments
we can update documentation so it recommends to install dashboard
plugin.
Change-Id: I284a1ec31536ea258cc1979ffd46b22d3e1ac18b
Setuptools v54.1.0 introduces a warning that the use of dash-separated
options in 'setup.cfg' will not be supported in a future version [1].
Get ahead of the issue by replacing the dashes with underscores. Without
this, we see 'UserWarning' messages like the following on new enough
versions of setuptools:
UserWarning: Usage of dash-separated 'description-file' will not be
supported in future versions. Please use the underscore name
'description_file' instead
[1] https://github.com/pypa/setuptools/commit/a2e9ae4cb
Change-Id: Ide4d650a78829a6bc16d86b620e6b3fbed0bba06
The Query.with_lockmode() method is deprecated since version 0.9.0
and will be removed in a future release. [1]
This patch replaces it with Query.with_for_update().
The 'faultstring' was been modified to 'Exactly 5 or 6 columns has to be
specified for iterator expression', so adds one space between "iterator"
and "expression" for 'expected_error_msg'.
Also use upper-constraints in doc build to avoid issues in pdf build.
[1]
https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.with_lockmode
Closes-Bug: #1933226
Change-Id: I0ad514da647bb08790259fd27e56a41f6dbbbaa0
Add file to the reno documentation build to show release notes for
stable/wallaby.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/wallaby.
Sem-Ver: feature
Change-Id: Ic38b5071799ca733545381e79b956d7f82db2a87
As per the community goal of migrating the policy file
the format from JSON to YAML[1], we need to do two things:
1. Change the default value of '[oslo_policy] policy_file''
config option from 'policy.json' to 'policy.yaml' with
upgrade checks.
2. Deprecate the JSON formatted policy file on the project side
via warning in doc and releasenotes.
Also replace policy.json to policy.yaml ref from doc and tests.
[1]https://governance.openstack.org/tc/goals/selected/wallaby/migrate-policy-format-from-json-to-yaml.html
Change-Id: I207c02ba71fe60635fd3406c9c9364c11f259bae
Current requirements-check job is failing with
below error:
ERROR: Requirement for package PrettyTable excludes a version not excluded in the global list.
Local settings : {'<0.8'}
Global settings: set()
Unexpected : set()
Validating test-requirements.txt
Keeping PrettyTable same as what we have in openstack/requirements repo
Change-Id: I63633d2932757ca23bcea69fd655a2499a5b6d31
There is a commonly shared and proven rpc pattern used
across most OpenStack services that is already implemented
in watcher, but the functions are not used.
This patch basically makes use of the existing
rpc classes and removes some unnecessary code.
Change-Id: I57424561e0675a836d10b712ef1579a334f72018
The check for this call to input() has been removed.
The input method in Python 2 will read from standard input, evaluate and
run the resulting string as python source code. This is similar, though
in many ways worse, than using eval. On Python 2, use raw_input instead,
input is safe in Python 3.
Change-Id: I8654f0c197bfe88796b56e9d85f563cdded6e8a8
Python modules related to coding style checks (listed in blacklist.txt in
openstack/requirements repo) are dropped from lower-constraints.txt
they are not needed during installation.
Change-Id: Iadf4581646131f87803c2cebbc66bd55fdb56685
Add file to the reno documentation build to show release notes for
stable/victoria.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/victoria.
Change-Id: I311548732398a680ba50a72273fb98bb16009be4
Sem-Ver: feature
Implements base method as well as some basic implementations to
retrieve time series metrics. Ceilometer can not be supported
as API documentation has been unavailable. Grafana will be
supported in follow-up patch.
Partially Implements: blueprint time-series-framework
Change-Id: I55414093324c8cff379b28f5b855f41a9265c2d3
Create a native Zuul v3 grenade job. It matches the existing job,
even though it doesn't call any local hook as the current legacy
job does (because no local hook exists and it should be rewritten
as zuul configuration if it did).
The new job reuses the variable definition of the devstack watcher
job, so clean up that job as well:
- do not depend on devstack-gate, which is not needed and will be
deprecated soon anyway;
- use the new way (tempest_plugins) to define which tempest plugin
should be installed;
- remove the definition of USE_PYTHON3: true and simply inherit
the value set by devstack;
- remove the definition of PYTHONUNBUFFERED, not really set
anywhere else and only useful back in the days in Jenkins.
Change-Id: Ib0ed3c0f395e1b85b8f25f6e438c414165baab32
It has costs when rollback action_plan.
So give users an option whether to rollback it
when the action_plan fails.
Change-Id: I20c0afded795eda7fb1b57ffdd2ae1ca36c45301
when directly using the `curl` command to create audit template,
strategy name can be accepted.
Closes-Bug: #1884174
Change-Id: I7c0ca760a7fa414faca03c5293df34a84aad6fac
Whether to revert migrate action when the action_plan fails is determained by 'rollback_actionplan' option.
This reverts commit c522e881b1.
Change-Id: I5379018b7838dff4caf0ee0ce06cfa32e7b37b12
The mock third party library was needed for mock support in py2
runtimes. Since we now only support py36 and later, we can use the
standard lib unittest.mock module instead.
Change-Id: I4ee01710d04d650a3ad5ae069015255d3f674c74
Now that we no longer support py27, we can use the standard library
unittest.mock module instead of the third party mock lib.
Change-Id: I6cdd4c35a52a014ba3c4dfe4cc2bd4d670c96bc3
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
Switch to openstackdocstheme 2.2.1 and reno 3.1.0 versions. Using
these versions will allow especially:
* Linking from HTML to PDF document
* Allow parallel building of documents
* Fix some rendering problems
Update Sphinx version as well.
Set openstackdocs_pdf_link to link to PDF file. Note that
the link to the published document only works on docs.openstack.org
where the PDF file is placed in the top-level html directory. The
site-preview places the PDF in a pdf directory.
Set openstackdocs_auto_name to False to use 'project' variable as name.
Change pygments_style to 'native' since old theme version always used
'native' and the theme now respects the setting and using 'sphinx' can
lead to some strange rendering.
Remove docs requirements from lower-constraints, they are not needed
during install or test but only for docs building.
openstackdocstheme renames some variables, so follow the renames
before the next release removes them. A couple of variables are also
not needed anymore, remove them.
See also
http://lists.openstack.org/pipermail/openstack-discuss/2020-May/014971.html
Change-Id: Ia9a3fb804fb59bb70edc150a3eb20c07a279170b
These translation sections are not needed anymore, Babel can
generate translation files without them.
Change-Id: I95bde8575638511449edaa1e546e3399bf0e6451
Since we dropped support for python 2 [1], we no longer need to use the
mock library, which existed to backport py3 functionality into py2.
Which must be done by saying::
from unittest import mock
...because if you say::
import mock
...you definitely will not be getting the standard library mock.
That will always import the third party mock library.
This commit adds hacking check N366 to enforce the former.
This check can be removed in the future (and we can start saying
``import mock`` again) if we manage to purge these transitive
dependencies. I'm not holding my breath.
[1]https://review.opendev.org/#/c/717540
Change-Id: I8c8c99024e8de61d9151480d70543f809a100998
Now that we no longer support py27, we can use the standard library
unittest.mock module instead of the third party mock lib.
The remainder was auto-generated with the following (hacky) script, with
one or two manual tweaks after the fact:
import glob
for path in glob.glob('watcher/tests/**/*.py', recursive=True):
with open(path) as fh:
lines = fh.readlines()
if 'import mock\n' not in lines:
continue
import_group_found = False
create_first_party_group = False
for num, line in enumerate(lines):
line = line.strip()
if line.startswith('import ') or line.startswith('from '):
tokens = line.split()
for lib in (
'ddt', 'six', 'webob', 'fixtures', 'testtools'
'neutron', 'cinder', 'ironic', 'keystone', 'oslo',
):
if lib in tokens[1]:
create_first_party_group = True
break
if create_first_party_group:
break
import_group_found = True
if not import_group_found:
continue
if line.startswith('import ') or line.startswith('from '):
tokens = line.split()
if tokens[1] > 'unittest':
break
elif tokens[1] == 'unittest' and (
len(tokens) == 2 or tokens[4] > 'mock'
):
break
elif not line:
break
if create_first_party_group:
lines.insert(num, 'from unittest import mock\n\n')
else:
lines.insert(num, 'from unittest import mock\n')
del lines[lines.index('import mock\n')]
with open(path, 'w+') as fh:
fh.writelines(lines)
Co-Authored-By: Sean McGinnis <sean.mcginnis@gmail.com>
Change-Id: Icf35d3a6c10c529e07d1a4edaa36f504e5bf553a
flake8 new release 3.8.0 added new checks and gate pep8
job start failing. hacking 3.0.1 fix the pinning of flake8 to
avoid bringing in a new version with new checks.
Though it is fixed in latest hacking but 2.0 and 3.0 has cap for
flake8 as <4.0.0 which mean flake8 new version 3.9.0 can also
break the pep8 job if new check are added.
To avoid similar gate break in future, we need to bump the hacking min
version.
- http://lists.openstack.org/pipermail/openstack-discuss/2020-May/014828.html
Change-Id: I1fe394ebd1f161eb73f53bfa17d2ccc860b9f51b
Monkey patch the original current_thread to use the up-to-date _active
global variable. This solution is based on that documented at:
https://github.com/eventlet/eventlet/issues/592
Change-Id: I194eedd505d45137963eb40d1b1d5da2309caeac
Closes-Bug: #1863021
Now that we are running the Victoria tests that include a
voting py38, we can now add the Python 3.8 metadata to the
package information to reflect that support.
Change-Id: Icf85483ff64055d16d35f189755e5fb01fabf574
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
Add file to the reno documentation build to show release notes for
stable/ussuri.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/ussuri.
Change-Id: I63fc3e49802f89ac2d967ee089a9dd9dffbe9c78
Sem-Ver: feature
EfficacyIndicator.value is Decimal type, it's
not JSON serializable. So we convert value type
before serialization.
Closed-Bug: #1873377
Change-Id: Id38969775c446bece71f7a85c5c5d3efee9befa0
Error when importing wsmeext.sphinxext
Could not import extension wsmeext.sphinxext
(exception: cannot import name 'l_')
Change-Id: Id23c9c1fd35153d67d4ffb50dc1cd40f30b7ab41
This commit cleanup below:
- Remove python 2.7 stanza from setup.py
- Add requires on python >= 3.6 to setup.cfg so that pypi and pip
know about the requirement
- Add "ignore_basepython_conflict=True" to tox.ini
Change-Id: Ic4fcc1fb15f214ca4204f56ee1ea15dc6a782fc2
Sphinx 3.0.0 breaks the building here, block it for now.
Depends-On: https://review.opendev.org/#/c/717949/
Change-Id: Ibf0c93ea79fec647fbf749257835f1fa99d5f59d
The repo is Python 3 now, so update hacking to version 3.0 which
supports Python 3.
Fix problems found.
Update local hacking checks for new flake8.
Remove hacking and friends from lower-constraints, they are not needed
to be installed at run-time.
Change-Id: Ia6af344ec8441dc98a0820176373dcff3a8c80d5
Code in grenade and elsewhere rely on the process/service name
when one runs "ps auxw" and they grep for example "grep -e watcher-api"
to check if the service is running. with uwsgi, let us make sure
we use process name prefix so it is easier to spot the services
and be compatible with code elsewhere that relies on this.
Reference:
https://review.opendev.org/#/c/494531/
Change-Id: I69dbe8840e87a8cb0b2720caa95fb17fb7a30848
There are many warning info in the Watcher api log file,
the reason is that keystonemiddleware only need config
section keystone_authtoken.
refer to https://docs.openstack.org/keystonemiddleware/latest/
Closes-Bug: #1864129
Change-Id: Ie790277d55b3a2d93c26781f7e5e8f66b87227d8
Add a new webhook api and its microversion is 1.4
Partially Implements: blueprint event-driven-optimization-based
Change-Id: I50f7c824e52f3c5fc775d5064898ed422e375a99
This patchset added a new audit type: event,
and the handler to execute event audit.
Partially Implements: blueprint event-driven-optimization-based
Change-Id: I287471ee4d1dcc42af7a6bcc15f8509d4ce73072
Add the releasenote for the general purpose decision engine threadpool.
Including config parameters and how contributors can find relevant
documentation.
Implements: blueprint general-purpose-decision-engine-threadpool
Change-Id: I3560069b4e34f13305950559a0f05f7921f7867e
Now that we are using gitea the contents of our README.rst are
more prominently displayed. Starting it with a "Team and repository
tags" title is a bit confusing. This change makes it start with the
name of the project instead.
Change-Id: Icfce3764aa9e1aabf5e78443cf7ce102de63a052
Documentation with details on general concurrency as well as OpenStack
specific libraries.
Describes how different libraries are effectively used across different
Watcher services. This includes describing how futurist is used in the
Decision Engine and how taskflow is used in the Applier.
Finally, this documentation describes how contributors can use the
new DecisionEngineThreadpool effectively and includes examples.
https://docs.openstack.org/futurist/latest/https://docs.openstack.org/taskflow/latest/
Change-Id: Ic1cd1f3733a0e9a239c9b8d49951e1e4ece49f3a
Partially Implements: blueprint general-purpose-decision-engine-threadpool
We have provided functions to get used and free resources in
class ModelRoot. So strategies can invoke the functions to
get used and free resources.
Change-Id: I3c74d56539ac6c6eb16b0d254a76260bc791567c
Use the general purpose threadpool when building the nova compute
data model. Additionally, adds thorough explanation about theory of
operation.
Updates related test cases to better ensure the correct operation
of add_physical_layer.
Partially Implements: blueprint general-purpose-decision-engine-threadpool
Change-Id: I53ed32a4b2a089b05d1ffede629c9f4c5cb720c8
Implements the singleton general purpose threadpool for the decision
engine and associated tests.
A threadpool is a collection of one or more threads typically called
'workers' to which tasks can be submitted. These submitted tasks will
be scheduled by the threadpool and subsequently executed. How many
tasks will be executed concurrently is managed by the underlying
threadpool and its configuration. In Python the submission of tasks
to a threadpool returns an object called a 'future'. Futures provide
a method to interface with the task being executed that allows to
retrieve information about its state. Such as if it currently is being
executed, if it is waiting on a condition and if it has completed
succesfully. Finally, futures allow to retrieve what has been returned
by the submitted task.
In the case of most OpenStack projects instead of interfacing with native
Python concurrency the futurist library is used. This library provides
very similar interfaces to native concurrency with some extras such as
the wait_for_any method.
For more information about futurist or Python concurrency the following
references can be consulted:
https://docs.python.org/3/library/concurrent.futures.htmlhttps://docs.openstack.org/futurist/latest/reference/index.html#executors
Partially Implements: blueprint general-purpose-decision-engine-threadpool
Change-Id: I94bd9a17290967f011762f2b9c787ee7c46ff930
Sphinx 1.8 introduced [1] the '--keep-going' argument which, as its name
suggests, keeps the build running when it encounters non-fatal errors.
This is exceptionally useful in avoiding a continuous edit-build loop
when undertaking large doc reworks where multiple errors may be
introduced.
[1] https://github.com/sphinx-doc/sphinx/commit/e3483e9b045
Change-Id: If2bbfd8ae6d1fc75cbc494578310c1dc03c367e6
When querying data from datasource, it's possible to miss some data.
In this case if we throw an exception, Audit will failed because of
the exception. We should remove the exception and give the decision
to the strategy.
Change-Id: I1b0e6b78b3bba4df9ba16e093b3910aab1de922e
Closes-Bug: #1847434
Add file to the reno documentation build to show release notes for
stable/train.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/train.
Change-Id: I0d5ae49a33583514925ad966de067afaa8881ff3
Sem-Ver: feature
Reason:
When there is a compute node but no virtual machine,
the command 'watcher datamodel list' should display
the information of the compute node instead of return None.
Change-Id: Id5ff7f08ac8a9883af9f0313785b756d813ed5a2
Closes-Bug: #1844948
The default planner can not create actions with right order,
The node_resouce_consolidation strategy needs to use its
own planner.
Partially Implements: blueprint node-resource-consolidation
Depends-on: I586e67f782e2965234826634ba3ff51681af4df8
Change-Id: I05b02905a3335a73b6926966de6331c632842293
Add a new pdf-docs environment to enable PDF build.
sphinxcontrib-svg2pdfconverter is used to handle SVG properly.
Change-Id: I1563579486da8912ba8a220bb08a5331e7df910b
It should've been "watcher" instead of "python-watcher" as the
config files are expected to be in /etc/watcher/. Though this is
unlikely to cause problems as this patch corrected the default
config dir.
https://review.opendev.org/#/c/658348/
Nevertheless, we should be using the correct name.
Change-Id: If6b58133eecf2fcc37e11d8c45eaa58f238ea2a8
This component is responsible for selecting an appropriate Planner based
on predefined property value passed to concrete Strategy.
Change-Id: I86de95886df5d7e9558512569601e9ea3babb0e9
Implements: bp watcher-planner-selector
Co-Authored-By: Canwei Li <li.canwei2@zte.com.cn>
This strategy is used to centralize VMs to as few nodes as possible
by VM migration. User can set a input parameter to decide how to
select the destination node.
Implements: blueprint node-resource-consolidation
Closes-Bug: #1843016
Change-Id: I104c864d532c2092f5dc6f0c8f756ebeae12f09e
Many strategies need get node used or free resources, we define
two new method for the purpose in ModelRoot class.
Change-Id: I8cb41fd560dbac9a78d25bfdba51799533db83c2
1. Add datamodel api and policy_enfoce file.
2. Add related unittest for data_model api and policy.
Partially Implements:blueprint show-datamodel-api
Change-Id: I1654685d8cf04db5dd132d43a8640ddf91893cad
1. Add datamodel list endpoint and rpc process.
2. Add datamodel list parased and return.
3. Add related unittest.
Partially Implements:blueprint show-datamodel-api
Change-Id: I758b7ca2bc3d8d596d3457277744336c6629bc4e
The new bp need to get audit type from audit,
so we need to add an audit parameter to do_execute
Partially Implements: blueprint node-resource-consolidation
Change-Id: Ia979781b32202c1821aa1cb91d24253fe6d7bd2d
watcher-tempest-strategies includes all strategies tempest,
we add it and remove all other individual strategy tempest.
Depends-on: I3e45d4a66a6e1bf55499def8550da38ddf01b638
Change-Id: I182bf0ddc528099f5115098b825e9bddae3b187a
As part of Train community goal 'Support IPv6-Only Deployments and Testing'[1],
Tempest has defined the base job 'devstack-tempest-ipv6' which will
deploy services on IPv6.
This commit adds the new job 'watcher-tempest-functional-ipv6-only'
run on gate which is derived from 'devstack-tempest-ipv6'.
Verification structure will be:
- 'devstack-IPv6' deploy the service on IPv6
- 'devstack-tempest-ipv6' run will verify the IPv6-only setting and listen address
- 'watcher-tempest-functional-ipv6-only' will run the tests.
Story: #2005477
Task: #35939
[1] https://governance.openstack.org/tc/goals/train/ipv6-support-and-testing.html
Change-Id: I42b7e5ff5fd64a21bdb8a32f319759a18c173601
The fields disk and disk_capacity have the same value,
we just need one, so remove disk_capacity field.
Partially Implements: blueprint improve-compute-data-model
Change-Id: If3d385c5e61713bbdc85e22f10cd75e161ff79f0
For Compute node, we can use the new property to calculate
resource(VCPU, memory and disk).
Partially Implements: blueprint improve-compute-data-model
Depends-on: I3f9a3279a26f3df444117d9265e74cca57b38d6e
Change-Id: I9fe58603692a9850e86a2c36ad7a31c473070100
For Compute node, when calculating resource(VCPU, memory and disk)
capacity, we need to consider reserved resource and allocation ratio.
Partially Implements: blueprint improve-compute-data-model
Depends-on: I3f9a3279a26f3df444117d9265e74cca57b38d6e
Change-Id: I70257dd5fb342a67a3ffda1055eddc54b8360ca3
For Compute node, we can use the new property to calculate
resource(VCPU, memory and disk).
Partially Implements: blueprint improve-compute-data-model
Depends-on: I3f9a3279a26f3df444117d9265e74cca57b38d6e
Change-Id: I2bb230b5f5a573fb3045261dfdee73f1a8434e0d
For Compute node, we can use the new property to calculate
resource(VCPU, memory and disk).
Partially Implements: blueprint improve-compute-data-model
Depends-on: I3f9a3279a26f3df444117d9265e74cca57b38d6e
Change-Id: I4f041ad25353d575c276fce87fe13c5e6705754f
For Compute node, we can use the new property to calculate
resource(VCPU, memory and disk).
Partially Implements: blueprint improve-compute-data-model
Depends-on: I3f9a3279a26f3df444117d9265e74cca57b38d6e
Change-Id: Id113b4c19792946329e9ff448bfe636cc8eca057
For Compute node, we can use the new property to calculate
resource(VCPU, memory and disk).
Partially Implements: blueprint improve-compute-data-model
Depends-on: I3f9a3279a26f3df444117d9265e74cca57b38d6e
Change-Id: I7872265b2378e5dc37aa2e086ff1f7fb9071db0b
The node resource(vcpu, memory and disk) used infomation need
to change when creating or deleting instances. Now Placement do
not send notifications, so there is not a good way to capture
the change. We remove these fields and leave the process to strategy.
Partially Implements: blueprint improve-compute-data-model
Change-Id: I3f9a3279a26f3df444117d9265e74cca57b38d6e
This error was discovered by tool coverity. If we don't
initialize this var src_extra_specs, line 225 may sometimes
raise an error.
Change-Id: I992b56b64d56f35c8355b22707c3db5112964b31
The code associated with virtual has been removed before,
and the relevant comments should be removed here.
Change-Id: I7104c1a6752ad0b8c9837a643e51b0a13194a81b
Resource(VCPU, memory and disk) capacity need to be calculated
through formula: capacity = (total-reserved)*ratio.
Partially Implements: blueprint improve-compute-data-model
Change-Id: I15ca66dd2c3a21c5acfebf6f04fa6601aff7918f
We have some new fields(vcpus_ratio, vcpus_used, ...)
in the Watcher ComputeNode. During the process of updating
data model by notifications, we need to get data from
placement.
Partially Implements: blueprint improve-compute-data-model
Change-Id: I10587e93bb3e7be6af78bb3a50509d82d8228f78
The node.free_disk_gb does not take allocation ratios used
for overcommit into account so this value may be negative.
We do not need this field and plan to set disk to total disk
capacity and then remove disk_capacity.
Partially Implements: blueprint improve-compute-data-model
Change-Id: I72c4490f5a8d0fbd1039f70ff20f07b743b6bb2d
check if the resource class(VCPU, memory, disk) in the return
dictionary. If they are, don't need to use dict.get() with a
default value because the parameters are required.
Partially Implements: blueprint improve-compute-data-model
Change-Id: Icb8c672d0e87e6e5f030a2222f928d1bbd069e3c
The api documentation is now published on docs.openstack.org instead
of developer.openstack.org. Update all links that are changed to the
new location.
Note that redirects will be set up as well but let's point now to the
new location.
For details, see:
http://lists.openstack.org/pipermail/openstack-discuss/2019-July/007828.html
Change-Id: I4101eced9c4bd26741f760e5651204f5d2dfea0f
The fields(vcpus, memory and disk_capacity) in the Watcher ComputeNode
do not take allocation ratios used for overcommit into account so there
may be disparity between this and the used count.
This patch added some new fields to solve this problem.
Partially Implements: blueprint improve-compute-data-model
Change-Id: Id33496f368fb23cb8e744c7e8451e1cd1397866b
Add call_retry method for ModelBuilder classes along with configuration
options. This allows ModelBuilder classes to reattempt any failed calls
to external services such as Nova or Ironic.
Change-Id: Ided697adebed957e5ff13b4c6b5b06c816f81c4a
Actually list_opts() return a list like[1], So we don't need to
convert list to dict and then convert to list[2].
The reason why we need to convert it before is to put together
the same group of configuration objects, but we don't need it
actually.
Now, the list_opts()'s result like this[3].
Reference:
[1]. [(Group1,[cfgObj1,cfgObj2....]),(Group2,[cfgObj3,cfgObj3....])..]
[2]. 375ae32fad/watcher/conf/opts.py (L51-L52)
[3]. [(Group1,[cfgObj1]),(Group1,[cfgObj2]),(Group2,[cfgObj3,cfgObj3....])..]
Change-Id: I50fcc5f812be42038852662639fb10c6dd2f6f72
This lets all the ModelBuilder classes use one baseclass and forces
ClusterDataModelCollector's to pass the scope.
The scopes are still unused in the case of Ironic and Cinder.
The idea is to do several follow ups to this and in the end have a
similar method to query_retry in the datasources baseclass.
Change-Id: Ibbdedd3087fef5298d7f4c9d3abdba05d1fbb2f0
The datasources are only used by the decision_engine, however, they
are placed in a directory one level higher. This patch moves the
datasources code into the decision_engine folder.
Change-Id: Ia54531fb899b79a59bb77adea079ff27c0d518fa
We want to set the value of uuid field of Watcher ComputeNode
to hypversion id(as uuid). So we need to get hypervisor
information by uuid.
Change-Id: I752fbfa560313e28e87d83e46431c283b4db4f23
Related-Bug: #1835192
This error is caused because the condition "is not '':" is not always
true. Sometimes self.aggregation_method['node'] is u'' instead of ''.
This patch ensures that in both cases the behavior is the same.
Change-Id: I7453678cc76892ebeacca23c3501a10a08725d1d
Closes-bug: #1836195
This patch does two things:
1. replace instance's human_id with name.
2. remove ComputeNode human_id.
Now name field in Watcher Compute Data Model is availible.
Use name is better than human_id. For the reason, please see[1].
[1]. https://bugs.launchpad.net/watcher/+bug/1833665
Change-Id: I04f40e7d2a2bda48e9a362f9d0b23f449c40324e
aggregate list and availability_zone list may return ironic type
compute nodes. When building compute data model we should check
the hypervisor_type and remove ironic compute nodes.
Change-Id: Idf404c104c30368baf95ef7d05ad8fc3e7adca38
Related-Bug: #1835183
New datasource to retrieve metrics that can be configured in a
flexible way depending on the deployment. Current implemenation only
works with InfluxDB. Slight changes to datasource manager were
necessary because grafana metric_map can only be built at runtime.
The yaml configuration file can still be used to define metrics
but will require that five different attributes are specified per
metric.
Specific databases accesible through grafana can be defined by
creating 'translators' for these specific databases. This patch
introduces a base class for these translators and their methods.
In addition the first translator specific for InfluxDB is
created.
Depends-on: I68475883529610e514aa82f1881105ab0cf24ec3
Depends-on: If1f27dc01e853c5b24bdb21f1e810f64eaee2e5c
Implements: blueprint grafana-proxy-datasource
Change-Id: Ib12b6a7882703e84a27c301e821c1a034b192508
We want to set the value of uuid field of Watcher ComputeNode
to hypversion id(as uuid). We need a method to get compute
node by name.
Change-Id: I0975500f359de92b6d6fdea2e01614cf0ba73f05
Related-Bug: #1835192
The problem is that watcher is passing limit=-1 to novaclient when
listing servers which will always make at least two API calls to be
sure it's done paging:
https://github.com/openstack/python-novaclient/blob/13.0.1/novaclient/v2/servers.py#L896
If we can determine before we list servers that there are only a
certain number where the number of servers is less than 1000. For
example: 4, we should just pass the limit=len(servers) to novaclient
and avoid the second call for paging which takes extra time and
yields no results.
Change-Id: I797ad934a0f8496dbcbf65798e28b0443f238137
Closes-Bug: #1834679
openstack hypervisor list contains ironic nodes. we should
filter out baremetal nodes when get compute node list.
Change-Id: I4ab3e1a63dc6f61cdc3e99fa2cae749a711459cc
Closes-Bug: #1835183
According to https://review.opendev.org/#/c/251791/,
watcher_messaging group and notifier_driver option
were deprecated.
Change-Id: I2cd114060d1960f77dfa8f4fe0a6d0fc05de5d4c
This is the releasenote for the new grafana datasource it refers to
the documentation on configuring grafana.
Depends-on: Ib12b6a7882703e84a27c301e821c1a034b192508
Change-Id: Icb3939d772f06ad2d66eeba9a59fa8b60822ece0
This is a follow-up to: https://review.opendev.org/#/c/666897/
and makes sure titles and help information get rendered in
the configuration documentation and configuration samples.
The options for the placement_client group are already changed
and left untouched as a result. The changes to grafana_client
are already done in another patch and also untouched.
Change-Id: Ia33cd4576e4b55e651f3f3779a01f2867126138d
"self.assertTrue(action.state, objects.action.State.SUCCEEDED)"
and "self.assertTrue(action.state, objects.action.State.FAILED)"
should use assertEqual.
Co-Authored-By: Canwei Li <li.canwei2@zte.com.cn>
Change-Id: I8e28d651938ca6ed8d12e8a6f5ecf775cf01a39c
This patch implements uWSGI support for Watcher API service.
Because mod_wsgi is deprecated, using uwsgi to replace of mod_wsgi.
Most of Openstack projects have finished it.
Closes-Bug: #1834392
Change-Id: I3fad8d30a15aba493fb91da9337c2515ddea5167
Nova changed the default notification_format from "both" to
"unversioned" in Train [1]. Without configuring nova in the
grenade job we are not testing the nova versioned notification
handler code during upgrades.
Note that grenade only runs stack.sh on the base (old) side so
this change has to depend on a devstack stable/stein change to
add the NOVA_NOTIFICATION_FORMAT variable that we override.
Closes-Bug: #1831917
Depends-On: Ied9d50b07c368d5c2be658c744f340a8d1ee41e0
[1] https://review.opendev.org/603079/
Change-Id: I94c2d14477da185310e0fec596a1ad6436b802f1
This improves the documentation on configuration parameters for the
Grafana datasource.
Follow-up: If1f27dc01e853c5b24bdb21f1e810f64eaee2e5c
Depends-on: I5d1d3129b5d225f0f2fc86d149c046f9aab94d47
Change-Id: Ifd8be7491669c429482d880fdf0219be5ef03163
Nova used to emit versioned and unversioned notiifcations
by default but that changed in https://review.opendev.org/603079/
so now nova emits only unversioned notifications by default.
Watcher listens for versioned notifications so we need to configure
nova to emit both versioned (for Watcher) and unversioned
(for Ceilometer) notifications explicitly.
This adds an override-defaults file so devstack will load up
the nova devstack variable to set the notification_format before
importing and stacking the nova lib script.
Note that this only fixes the non-grenade CI jobs since grenade
requires separate handling for overriding defaults which is proving
hard to do and will be addressed in a separate change.
Partial-Bug: #1831917
Change-Id: I7e441608b38338eecd80e663ed3abe66a89e504f
In the process of handling instance_created.end,
there is a KeyError exception output log. This is because
invoking get_instance_by_uuid before creating the instance
in the data model.
During the review of https://review.opendev.org/#/c/663489/,
reviewers think that it's better to remove the KeyError exception.
This patche seperates the process of instance_created.end from
other Nova notifications and removes the call of get_instance_by_uuid.
Change-Id: Ie9e2d4f5b32ee7a5b52bbcd50abfa81dcabab7bb
Ceilometer removed cpu_util metric in [1].
Another metric compute.node.cpu.percent need to set
compute_monitors option to cpu.virt_driver in the
nova.conf, we should remind user about these.
[1]: https://review.opendev.org/#/c/580709/
Change-Id: I89306ef7c26fa2927945bd4f3ee88b670511d147
This implements the configuration parameters to implement
Grafana as a datasource including the influxdb translator
Change-Id: If1f27dc01e853c5b24bdb21f1e810f64eaee2e5c
Partially-implements: blueprint grafana-proxy-datasource
Replaces the NoSuchMetric exception that was replaced. The exception
is replaced with MetricNotAvailable and test cases are added to prevent
regression.
The changes in the exceptions were introduced in:
https://review.opendev.org/#/c/658127/
Change-Id: Id0f872e916aaa5dec59ed1ae6c0f653553b3fe46
In get_node_by_instance_uuid, an exception ComputeNodeNotFound
will be thrown if can't find a node through instance uuid.
But the exception information replaces the node name with
instance uuid, which is misleading, so we define a new exception.
Closes-Bug: #1832156
Change-Id: Ic6c44ae44da7c3b9a1c20e9b24a036063af266ba
Moves the query_retry method into the baseclass and makes the query
retry and timeout options part of the watcher_datasources config group.
This makes the query_retry behavior uniform across all datasources.
A new baseclass method named query_retry_reset is added so datasources
can define operations to perform when recovering from a query error.
Test cases are added to verify the behavior of query_retry.
The query_max_retries and query_timeout config parameters are
deprecated in the gnocchi_client group and will be removed in a future
release.
Change-Id: I33e9dc2d1f5ba8f83fcf1488ff583ca5be5529cc
We should be starting from stable/stein on the "old" side
of grenade runs now. Rather than hard-code the branch, just
use the BASE_DEVSTACK_BRANCH variable.
Change-Id: I1b0406f870ed0ae5622cfa7421a6cca00d0f891c
When receiving Nova notification instance.create.end,
map instance to its node after adding instance to datamodel.
Related-Bug: #1832156
Change-Id: I6f39e8d935195c611f668f71590e1d9ff52ced0d
In Python, when we use @property, the method will be
decorated by property.
When we call method self.strategy.datasource_backend()[1],
Actually it did two things:
1. call self.strategy.datasource_backend()
2. according to the method's return value[2], call self._datasource_backend()
[1]. https://github.com/openstack/watcher/blob/bd8636f3f/watcher/tests/decision_engine/strategy/strategies/test_base.py#L87
[2]. https://github.com/openstack/watcher/blob/bd8636f3f/watcher/decision_engine/strategy/strategies/base.py#L368
But in this part, we just want it to perform the first step.
So we have to use self.strategy.datasource_backend instead of
self.strategy.datasource_backend()
The reason why the unittest does not report an error is
because the returned value is a mock object, and the second step
is executed without error, for example:
python -m unittest watcher.tests.decision_engine.strategy.strategies.test_base
(Pdb) x=self.strategy.datasource_backend
(Pdb) type(x)
<class 'mock.mock.MagicMock'>
(Pdb) x
<MagicMock name='DataSourceManager().get_backend()' id='139740418102608'>
(Pdb) x()
<MagicMock name='DataSourceManager().get_backend()()' id='139740410824976'>
(Pdb) self.strategy.datasource_backend()
<MagicMock name='DataSourceManager().get_backend()()' id='139740410824976'>
To make the tests more robust, the underlying backend function
is mocked to be not callable.
Co-Authored-By: Matt Riedemann <mriedem.os@gmail.com>
Change-Id: I3305d9afe8ed79e1dc3affe02ba067ac06cece42
This patch added Placement to Watcher
We plan to improve the data model and strategies in
the future specs.
Change-Id: I7141459eef66557cd5d525b5887bd2a381cdac3f
Implements: blueprint support-placement-api
This makes the ConfFixture extend the Config fixture from
oslo.config which handles cleanup for us. The module level
import_opt calls are also removed since they are no longer
needed.
Change-Id: I869e89c53284c8da45e0b1293f2d35011f5bfbf9
In the process of creating an instance, Nova will emit an
instance.update notification with 'building' state.
This will cause a KeyError exception because this instance
isn't in Watcher datamodel.
So we should ignore the notification instance.update with
'building' state.
Closes-Bug: #1832154
Change-Id: I950eec50d2cee38bd22c47a70ae6f88bbf049080
Now there are some errors when running apidoc,
actually we don't need apidoc, so remove it.
Closes-Bug: #1831515
Change-Id: I3b91a2c05ed62ae7bbd30a29e9db51d0e021410f
The get_compute_node_by_hostname method is given a
compute service hostname and then does two queries to
find the matching hypervisor (compute node) with details:
1. List hypervisors with details and find the one that
matches the given compute service hostname.
2. Using that node, search for hypervisors with the
matching hypervisor_hostname.
There are two issues here:
1. The first query is inefficient in that it has to list
all hypervisors in the deployment to try and match the
one with the compute service hostname client side.
2. The second query is a fuzzy match on the server side [1]
so even though we have matched on the node we want,
get_compute_node_by_name can still return more than
one hypervisor which will result in the helper method
raising ComputeNodeNotFound. Consider having compute
hosts with names compute1, compute10, compute11, compute100,
and so on. The fuzzy match on compute1 would return all of
those hypervisors.
For non-ironic nodes in nova, the compute service host and
hypervisor should be 1:1, meaning the hypervisor.service['host']
should be the same as hypervisor.hypervisor_hostname. Knowing
this, we can simplify the code to search just on the given
compute service hostname and if we get more than one result, it
is because of the fuzzy match and we can then do our client-side
filtering on the compute service hostname.
[1] https://github.com/openstack/nova/blob/d4f58f5eb/nova/db/sqlalchemy/api.py#L676
Change-Id: I84f387982f665d7cc11bffe8ec390cc7e7ed5278
The nova CDM builder code and notification handling
code had some inefficiencies when it came to looking
up a hypevisor to get details. The general pattern
used before was:
1. get the minimal hypervisor information by hypervisor_hostname
2. make another query to get the hypervisor details by id
In the notifications case, it was actually three calls because
the first is listing hyprvisors to filter client-side by service
host.
This change collapses 1 and 2 above into a single API call
to get the hypervisor by hypervisor_hostname with details
which will include the service (compute) host information
which is what get_compute_node_by_id() was being used for.
Now that nothing is using get_compute_node_by_id it is removed.
There is more work we could do in get_compute_node_by_hostname
if the compute API allowed filtering hypervisors by service
host so a TODO is left for that.
One final thing: the TODO in get_compute_node_by_hostname about
there being more than one hypervisor per compute service host
for vmware vcenter is not accurate - nova's vcenter driver
hasn't supported a host:node 1:M topology like that since the
Liberty release [1]. The only in-tree driver in nova that supports
1:M is the ironic baremetal driver, so the comment is updated.
[1] Ifc17c5049e3ed29c8dd130339207907b00433960
Depends-On: https://review.opendev.org/661785/
Change-Id: I5e0e88d7b2dd1a69117ab03e0e66851c687606da
Some of the methods for retrieving data about instances was placed
at the bottom of nova_helper instead of being close to the other
instance based methods.
Change-Id: I68475883529610e514aa82f1881105ab0cf24ec3
This does two things:
1. Rather than make an API call per server on the host,
get all of the servers in a single API call by
filtering on the host. The os-hypervisors API results
to use make this require a bit of refactoring since
get_compute_node_by_name does not have the service
entry in it and get_compute_node_by_id does not have the
servers entry in it. A TODO is added to clean that up
with a single call to os-hypervisors once we have the
support in python-novaclient.
2. Pulls get_node_by_uuid() out of the loop.
A test is added for the nova_helper get_instance_list method
since one did not exist before.
The fake compute node mocks in test_nova_cdmc_execute are
also cleaned up since, as noted above, get_compute_node_by_name
and get_compute_node_by_id don't both return all the details.
Change-Id: Ifd9f83c2f399d4c1765b0c520f4d5a62ad0f5fbd
Fix the list of required metrics from a datasource when testing the
existence of this metric in the metric map.
Change-Id: I19b7408a98893bc942c32edb09f1b3798ec8dc79
With change Id34938c7bb8a5ca934d997e52cac3b365414c006
we require nova API version 2.56 or greater so we can
remove the compatibliity check in the
watcher_non_live_migrate_instance method.
The _check_nova_api_version method is left in place
for future compability checks.
Change-Id: I69040fbc13b03d90b9687c0d11104d4a5bae51d3
The [nova_client]/api_version defaults to 2.56 since
change Idd6ebc94f81ad5d65256c80885f2addc1aaeaae1. There
is compatibility code for that change but if 2.56 is
not available watcher_non_live_migrate_instance will
still fail if a destination host is used.
Since 2.56 has been available since the Queens version of
nova it should be reasonable to require at least that
version of nova is running for using Watcher.
This adds code which enforces the minimum version along
with a release note and "watcher-status upgrade check"
check method.
Note that it's kind of weird for watcher to have a config
option like nova_client.api_version since compute API
microversions are per API request even though novaclient
is constructed with the single configured version. It should
really be something the client (watcher in this case) determines
using version discovery and gracefully enables features if
the required nova API version is available, but that's a bigger
change.
Change-Id: Id34938c7bb8a5ca934d997e52cac3b365414c006
MetricNotAvailable and NoDatasourceAvailable allow to differentiate
between having no datasources configured and a required metric being
unavailable from the datasource. Both exceptions have comments so
that the use case is clear.
The input validation of the get_backend method in the datasource
manager is improved.
Additional logging information allows to identify which metric caused
the available datasource to be discarded.
Tests are updated to validate the correct functionality of the new
exceptions.
Change-Id: I512976cce2401dbcd249d42686b78843e111a0e7
Changes to the baseclass for datasources so strategies can be made
compatible with every datasource. Baseclass methods clearly describe
expected values and types for both parameters and for method returns.
query_retry has been added as base method since every current
datasource implements it.
Ceilometer is updated to work with the new baseclass. Several methods
which are not part of the baseclass and are not used by any strategies
are removed. The signature of these methods would have to be changed
to fit with the new base class while it would limit strategies to
only work with Ceilometer.
Gnocchi is updated to work with the new baseclass.
Gnocchi and Ceilometer will perform a transformation for the
host_airflow metric as it retrieves 1/10 th of the actual CFM
Monasca is updated to work with the new baseclass.
FakeMetrics for Gnocchi, Monasca and Ceilometer are updated to work
with the new method signatures of the baseclass.
FakeClusterAndMetrics for Ceilometer and Gnocchi are updated to work
with the new method signatures of the baseclass.
The strategies workload_balance, vm_workload_consolidation,
workload_stabilization, basic_consolidation, noisy_neighbour,
outlet_temp_control and uniform_airflow are updated to work with the
new datasource baseclass.
This patch will break compatibility with plugin strategies and
datasources due to the changes in signatures.
Depends-on: I7aa52a9b82f4aa849f2378d4d1c03453e45c0c78
Change-Id: Ie30ca3dbf01062cbb20d3be5d514ec6b5155cd7c
Implements: blueprint formal-datasource-interface
As a follow up to the recent test improvements for Ceilometer this
patch ensures that the same test pattern is used for Gnocchi and
Monasca as well. This ensures that the mocked functions will be called
with matching signatures.
Change-Id: Ic14a4c087f3961a4b4f373e2e3d792aba71868f6
Override the metric map of each datasource as soon as it is created by
the manager. This override comes from a file whose path is provided by
a setting in config file.
Loading at creation time allows the correct datasource be used when
get_backend is called, this allows loading a datasource whose metric
names get updated outside the watcher's codebase.
The function 'load_metric_map' returns empty-dict in any error case.
Also in case the file is empty where safe_load is unable finds any
yaml documents, it will return None. [1]
Some minor refactoring in the test_manager file for readability and
added tests for file load and metric override.
1 - https://pyyaml.org/wiki/PyYAMLDocumentation
Change-Id: I1df16245f4c7dfd34066f3ab0553cd67154faa58
Implements: blueprint file-based-metric-map
Some users may want to create keystoneclient by specifying the
type of endpoint and region name, so we need to supply the option
for user to choose.
Implements: blueprint support-keystoneclient-option
Change-Id: I49b33a69ec99d2a91568ce27ef89dc80b75e7091
Change I25b4cb0e1b85379ff0c4da9d0c1474380d75ce3a in
Queens refactored the statistic_aggregation method
and renamed the "aggregate" kwarg to "aggregation",
presumably to match the signature of the GnocchiHelper
statistic_aggregation method (the commit message does
not give details) so a base method could be added to
the parent class for all datasource helpers.
As a result, the CeilometerHelper calls to its
statistic_aggregation started passing the new
"granularity" kwarg but failed to match the rename
to the "aggregation" kwarg, which breaks the
CeilometerHelper. This was missed by the unit tests
because the tests were just asserting the erroneous
call that the runtime code made.
This change fixes the kwarg typo and makes the
tests more robust by using the mock spec kwarg
to define a spec for the statistic_aggregation
mock so that it must be called with the correct
parameters defined in the method signature. The
test is refactored to reduce duplicate mocking.
The same test hardening can and should be done
in the gnocchi and monasca helper tests but that
should be done in a separate change.
Co-Authored-By: Matt Riedemann <mriedem.os@gmail.com>
Closes-Bug: #1829542
Change-Id: Idfd099f718873d9056fdc35a97954771c9ae5762
As of change Ic4659d1f18af181203439a8bf1b38805ff34c309 the
nova CDM will not be built until an audit is performed.
Instances and services (compute hosts) can be created and
deleted before an audit is performed which will attempt
to use the notification callback function which relies
on the CDM being built already, and if not results in
an AttributeError.
This change side-steps that issue by checking to see that the
nova CDM exists before trying to call the notification
callback function.
An alternative to this is forcefully create the nova CDM when
notifications are received before an audit which is what happend
before change Ic4659d1f18af181203439a8bf1b38805ff34c309.
Change-Id: I16990afb82019821c443c9df26d3e515e52efa69
Closes-Bug: #1828582
_post_live_migration[1] runs on the source host and calls
post_live_migration_at_destination on the dest host which
emits the instance.live_migration_post_dest.end notification:[2]
But it's not the last notification for the live migration operation.
so we should use instance.live_migration_post.end instead of
instance.live_migration_post_dest.end notification.
[1]daa2ac2287/nova/compute/manager.py (L6907)
[2]daa2ac2287/nova/compute/manager.py (L7035)
Change-Id: Id1e2d98f56d5a95d49e32f98d2910660b9f48ce6
The version of bandit in lower-constraints (1.4.0) does
not match the version in test-requirements (1.6.0) however
bandit is a test-only dependency and there is no test coverage
for bandit in the lower-constraints tox job target, so there
is really no good reason to have bandit in lower-constraints.
As such, this change simply removes it from lower-constraints.
Co-Authored-By: Matt Riedemann <mriedem.os@gmail.com>
Change-Id: I35f66994e9a3a334b342232587d84491542da755
Sphinx 2.0 no longer works on python 2.7, so we need to start capping
it there as well.
The errors are as follow:
Requirement(package='sphinx', location='', specifiers='!=1.6.6,!=1.6.7,>=1.6.5'
does not match "python_version>='3.4'"
Requirement(package='sphinx', location='', specifiers='!=1.6.6,!=1.6.7,>=1.6.5'
does not match "python_version=='2.7'"
Could not find a global requirements entry to match package sphinx. If the package
is already included in the global list, the name or platform markers there may not
match the local settings.
Change-Id: I6dad56ffbb9e85e36cacea1a89565c2fc8248fbf
The final Stein version of Watcher was 2.0.0
so this fixes the version mentioned in the
watcher-status man page docs.
Change-Id: I7fce35471cf31222f9cdafc35e5a7b287bc4598e
The _add_virtual_layer and _add_virtual_servers methods
have not been used since Ic4659d1f18af181203439a8bf1b38805ff34c309
in Stein so this change removes them.
Change-Id: I8c05f29c3c03aa5897cb182bb492948771c42881
This enhances the [collector]/collector_plugins
config option help text to mention the storage
and baremetal in-tree collectors and the ability
to load out-of-tree collectors via extension point.
While doing this, the help text is formatted for
prettier rst rendering in the docs.
Change-Id: Ifd32c95c664c4e9586c250e6bceaeaba2e2df417
CeilometerClient has been deprecated and is no longer available for
master. Without ceilometer client installed docs fail to build with
an exception [1].
This patch marks the import optional.
1 -
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/sphinx/config.py", line 368, in
eval_config_file
execfile_(filename, namespace)
File "/usr/lib/python2.7/site-packages/sphinx/util/pycompat.py", line
150, in execfile_
exec_(code, _globals)
File "/usr/lib/python2.7/site-packages/six.py", line 709, in exec_
exec(""exec _code_ in _globs_, _locs_"")
File "<string>", line 1, in <module>
File
"/home/abuild/rpmbuild/BUILD/python-watcher-2.1.0.dev45/doc/source/conf.py",
line 20, in <module>
objects.register_all()
File
"/home/abuild/rpmbuild/BUILD/python-watcher-2.1.0.dev45/watcher/objects/__init__.py",
line 31, in register_all
__import__('watcher.objects.action_plan')
File
"/home/abuild/rpmbuild/BUILD/python-watcher-2.1.0.dev45/watcher/objects/action_plan.py",
line 78, in <module>
from watcher import conf
File
"/home/abuild/rpmbuild/BUILD/python-watcher-2.1.0.dev45/watcher/conf/__init__.py",
line 28, in <module>
from watcher.conf import datasources
File
"/home/abuild/rpmbuild/BUILD/python-watcher-2.1.0.dev45/watcher/conf/datasources.py",
line 21, in <module>
from watcher.datasources import manager
File
"/home/abuild/rpmbuild/BUILD/python-watcher-2.1.0.dev45/watcher/datasources/manager.py",
line 19, in <module>
from watcher.datasources import ceilometer as ceil
File
"/home/abuild/rpmbuild/BUILD/python-watcher-2.1.0.dev45/watcher/datasources/ceilometer.py",
line 21, in <module>
from ceilometerclient import exc
ImportError: No module named ceilometerclient
)
Change-Id: Idcf582c2495aab39aacf691b687759405bb94dca
Currently default config files are being for initialization of CONF from
oslo_config. However default config dirs are not being passed as a
result watcher components (eg: decision-engine) are unable to load
files from default directories (eg: /etc/watcher/watcher.conf.d)
supported by oslo_config. This is a short-coming on watcher's side.
Also this forces user to have multiple config for each component.
Without this default set, oslo_config will search for conf with string
'python-watcher' in it, eg: /etc/python-watcher/.... Since there is a
because project=python-watcher a couple of lines below
This patch adds the option after evaluating using project as 'watcher'
which is similar to evaluation of default_config_files and also allows
it to be passed in as a function parameter.
Change-Id: I013f9d03978f8716847f8d1ee6888629faf5779b
This fixes the wrong installation guide link from the
user guide which was pointing to the watcherclient docs
for some reason, maybe it was just a copy/paste error.
Change-Id: I38f536e187245523ac37d70054a2df8cdfcbe4b2
Closes-Bug: #1828584
Hard-coding watcher.openstack.common to warning level logging
only makes it hard to debug watcher's interactions with other
services, like when it's triggering and monitoring a server live
migration.
Since debug logging is controlled via the "debug" configuration
option, we can just rely on that to filter out debug logs within
watcher itself.
Note this has been this way since change
I699e0ab082657880998d8618fe29eb7f56c6c661 back in 2015 and there
was no explanation why the watcher.openstack.common logging
was set to WARN level.
Change-Id: I939403a4ae36d1aa9ea83badb9404bc37d18a1a6
Related-Bug: #1828598
The -x option for bandit changed in 1.6.0 and now
supports glob patterns so use that to correctly
exclude test code from bandit scans.
Since this change requires bandit>=1.6.0, we have
to also fix the networkx requirement to pass the
requirements-check job so that the networkx requirement
matches what is in global-requirements from change
I0a9700926c9a0db93e782c853c33f1aaee3d4876.
Change-Id: I4fc1166daee5d8739296419216d11d684be27c0a
Closes-Bug: #1828419
Allows to define a global preference for metric datasources with the
ability for strategy specific overrides. In addition, strategies which
do not require datasources have the config options removed this is
done to prevent confusion.
Some documentation that details the inner workings of selecting
datasources is updated.
Imports for some files in watcher/common have been changed to resolve
circular dependencies and now match the overall method to import
configuration.
Addtional datasources will be retrieved by the manager if the
datasource throws an error.
Implements: blueprint global-datasource-preference
Change-Id: I6fc455b288e338c20d2c4cfec5a0c95350bebc36
To get log formatting like the other openstack projects
running in devstack the setup_logging function should be
used. This will also give us the "Display level" formatting
in the logs via the os-loganalyze packaged used by infra.
Needed by: https://review.opendev.org/657652
Change-Id: I5e9bd5a142e45804e8d915b370746a2142243088
Exceptions should be reraised with just "raise" and not "raise e" to
preserve the traceback. This also addresses a couple cases where the
catching and reraising of the exception was not actually doing anything.
Change-Id: I94ba193f728ee7ca6f689f70fc08317a1dd50c92
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
We have a bp to add resource_name in action input parameter field.
Before doing this, one of this method's parameter should be node
instead of resource_id.
Change-Id: I4ce5ae97efce98d80a460fd6003df3cc5cacab82
This resolves problems with the audit scope such as the scope being
ignored, the scope not merging due to a type in .append, change update
into .add method when adding single elements to a set and making the
access of dict keys and values as lists work in python 3.7.
All these methods from the model builder now have tests to prevent
regressions.
Co-Authored-By: Canwei Li <li.canwei2@zte.com.cn>
Change-Id: I287763d5e426ff860aefabc4a1f3fe3f51accd76
Since commit I8df8921337ea3f4e751c0c822d823e64e3ca7e1c
the check for hardware.cpu.util was removed.
But it can be still used in workload stabilization.
Change-Id: I301487837aac2e1e63bce16a79d0f8136452c313
Our cgit instance will be going away and opendev.org is the new
preferred URL for browsing our git repos. Redirects will exist for the
foreseeable future, but it's more efficient to just go directly to the
new locations.
Change-Id: I7dd9d454da63167832bab02c89be98a2ce03b72a
Now there are only one scheduler for launching audit task and
executing audit jobs. We have found an exception where the scheduler
stops for some reason when executing audit.
In order to keep launching audit task normal, we need to split into
two schedulers.
Change-Id: I45dccaf062290cfc7d7fcfc27fe11d6f87f38afa
When the lower-constraints tox target was added, it was assumed the
install_command was just running the install and that the dependencies
and constraints were being set using "deps = ".
This fixed the install_command and deps to follow the expected pattern
so the lower-constraints job actual does install the lower constraints.
This also raises the oslo.context minimum as
Ic96c1f1e1a80099d9dafa95a014fc47f05b88e42 added a dependency on a newer
versions kwarg.
Depends-On: https://review.openstack.org/#/c/647726/
Change-Id: I4cc2c3ac158a607b22295c50f83896969a4007ee
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
When use tox-elower-constraint, we will meet these errors:
1. line 417, in test_clients_ironic:
b"AssertionError: Expected call: Client('1', endpoint_override...
b"Actual call: Client('1', 'http://localhost:6385/'...
2. line 39, in test_wrong_major_version:
b"KeyError: 'HTTP_ACCEPT'"
3. RUN END RESULT_TIMED_OUT:
[untrusted : git.openstack.org/openstack-infra/
zuul-jobs/playbooks/tox/run.yaml@master]
For the first error, The reason is that the unittest for the
ironicclient is too strict and must be adapted to the latest code.
In fact, the watcher can use the previous ironicclient version.
Therefore, we modified the unittest so that the watcher does not
have to rely on the latest ironicclient version.
For the second error, The reason is that we need to update the minimum
version of pecan and webOb.
For the third error, the reason is that the version of the oslo_messaging
is too low.
Change-Id: Icb3eda3d27fa4452e13e2dcd3c016cc76fc2c7c7
Metrics for datasources now match the name of their corresponding
abstract methods. This ensures that developers know how the method
is named if they know the name of the metric and vice versa.
Change-Id: I0f9d400432d8182b3f10a0da97155e6cb786690e
This is a mechanically generated change to replace openstack.org
git:// URLs with https:// equivalents.
This is in aid of a planned future move of the git hosting
infrastructure to a self-hosted instance of gitea (https://gitea.io),
which does not support the git wire protocol at this stage.
This update should result in no functional change.
For more information see the thread at
http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003825.html
Change-Id: I3452a7802dde00d8be32c833d714b2974be58e16
Add file to the reno documentation build to show release notes for
stable/stein.
Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/stein.
Change-Id: I25931207ed6066f905fe66ca504fa230e40d12dc
Sem-Ver: feature
Although this method does not report an error, this type check 'int'
is redundant and may be misleading.
Refercnce code url: https://github.com/openstack/wsme/blob/master/wsme/api.py
Change-Id: I631b5f9901790666e7f20275e8c8b99f06f06f0a
Many strategies execute very similar statements especially in
pre_execute and some might raise errors that others might not. This
same pattern of many similar statements can also be observed in
strategies their tests.
This patch addresses these issues, firstly; the BaseStrategy class gets
1 additional method _pre_execute which allows for general logic that
most strategies perform at that stage. This method can be executed
before the similarly named method of the superclass. A notable change
is that _pre_execute now handles common exception handling for
ClusterStateStale & ClusterStateNotDefined exceptions.
A similar pattern is applied to the test classes of the strategies
each of these classes now inherits from the TestBaseStrategy class.
This class provides the common attributes almost every test class for
the strategies requires such as: The mocked compute_model, mocked
audit_scope and an instance of FakerModelCollector.
Finally, some minor changes were required in test_strategy_context
& test_audit_handlers and exceptions around 0 nodes in cluster or
storage are removed.
Change-Id: Ia7154376b2448aac65cf17999cc8c3e1c8309b5b
This patch adds a scope to the datamodel, which only gets the VMs
of the specified nodes, and no longer gets all VMs from nova.
Implements: blueprint scope-for-watcher-datamodel
Change-Id: Ic4659d1f18af181203439a8bf1b38805ff34c309
Audit will only failed if an exception occured.
The situation that no solution found will not cause audit failed.
Change-Id: Ib9c3c3505f31c14500926ec13aa865dc8f7aa310
on ImportError set HAS_CEILCLIENT to false
Without this none of the watcher componenets can be started for master
as well as rocky because the ceilometercleint was deprecated.
Using the variable the support for ceilometer can be gradually removed
from master
A backport to rocky will allow using watcher without ceilometerclient.
Change-Id: I3beb0fb8f0a8e8e0a22acaf6bdeca492836bbee2
Moved the metric mappings for Ceilormeter, Gnocchi & Monasca out of
base.py. The datasources manager now uses classes extending base.py
their NAME attribute as key in the dictionary of total available
mappings and datasources. base.py still contains a template
definition of all available mappings so that anyone extending the
base class can identify all the possible endpoints they can map to.
Change-Id: I6a826423031b5a6a60c4cd5fe24f74b8400f6b55
Closes-Bug: #1815769
Prevent workload_stabilization strategy from failing in a network with
0 hosts.
Change-Id: I9f1a9524923c14d958eb50a70dad379a6021b884
Closes-Bug: #1815059
Small cleanups:
* Use openstack-lower-constraints-jobs template, remove individual
jobs.
* Sort list of templates
Change-Id: I63bfcd9bc21011b446fd1c54cb64c5568c601687
Needed-By: https://review.openstack.org/623229
This patch set removes "observable" and "synchronization"
modules cause they aren't used by any Watcher modules so far.
Change-Id: If23cdf0d3d09087919d48f50ab38b0d355c36481
Ceilometer Datasource has been deprecated since its API has been
deprecated in Ocata cycle. Watcher has supported Ceilometer for some
releases after Ocata to let users migrate to Gnocchi/Monasca datasources.
Since U-release, Ceilometer support will be removed.
Change-Id: I944a5a562ab09a36961eb9b75e9a5144ba0b9ca4
file host_maintenance.py
This is to fix spelling error and unsuitable punctuation
in file host_maintenance.py
Change-Id: I9c535059c3a02277be4c7329693db34fb7674b4e
Bare metal cluster data model was introduced in Queens cycle.
Since the model is different from compute data model, we
need add CDM scoper for bare metal cluster data model
Change-Id: Idd041cefb692085d4545252d229ebe8602926b58
Implements: blueprint audit-scoper-for-baremetal-data-model
vm_workload_consolidation.py
Increase the test of the execute method which contains
the pre_execute(), do_execute(), post_execute() methods.
Increase coverage from 82% to 87%.
Change-Id: Ibde67d7b7d7945657ad0b674e06b1edc9eb24a9f
When tls-proxy is enabled, first start the tls-proxy and then then wait
for api to come up.
Without this the api comes up on the internal port as a result the
subsequent curl fails killing the deployment
- create a zuul job to test with tls
- fix apache ports accordingly
Depends-On: Ie665240b53df92b8e5ca509e998e95d859bd5282
Change-Id: I610a7a24daab68c7ab0e30977e3cabd62cdb56a5
Actually, the metric "cpu_util" and "memory.resident" are necessary
in vm_workload_consolidation.py, according to line 75. So modify
this document about this part.
Change-Id: I648f341184a0b42d88dcb4c934af989997fe3fea
This patch updates response header
from OpenStack-API-Version: [VERSION_STRING]
to OpenStack-API-Version: [SERVICE_TYPE] [VERSION_STRING]
Change-Id: I10577ff1123ef781bd4aa0b26577574a3f7e9c39
Add a new config option 'action_execution_rule' which is a dict type.
Its key field is strategy name and the value is 'ALWAYS' or 'ANY'.
'ALWAYS' means the callback function returns True as usual.
'ANY' means the return depends on the result of previous action
execution. The callback returns True if previous action gets failed,
and the engine continues to run the next action. If previous action
executes success, the callback returns False then the next action
will be ignored.
For strategies that aren't in 'action_execution_rule', the callback
always returns True.
If exception is throwing out during the action execution, reverting will
be triggered by taskflow. To continue executing the next action,
we return False instead of throwing an exception.
Change-Id: Ib5afa214d8d097d739aad35d18b3fe5c8e4de8fc
Implements: blueprint enhance-watcher-applier-engine
With these brackets, the statement "raise Exception("Live migration
execution.....") in watcher/common/nova_helper.py, line 379 will never
be executed. So remove it and let the statement make sense.
Change-Id: I42a2fa0c8ffa9c84a918d432c5093470dbd80f82
The method is quite simple and it doesn't need a dostring.
Also existing docstring was incorrect. The name of the expected
parameter is 'name', not 'node'. And it cannot be an object
of the type node.StorageNode
Change-Id: I94124d327c490d45eae4d2ded218beadfbc33ad7
The commands used by constraints need at least tox 2.0.
Update to reflect reality, which should help with local running of
constraints targets.
Change-Id: I0eb9af735f34ad259c7099729d7d465a1276fc5f
The correct type of parameter 'pool' in method build_storage_pool is
<class 'cinderclient.v2.pools.Pool'>
Change-Id: I986f707e4e740ebec94a46c6ee413f9a70197dad
This patch set adds API microversion support along
with the first API microversion: start/end time for
CONTINUOUS audits.
APIImpact
Implements: blueprint api-microversioning
Depends-On: I6bb838d777b2c7aa799a70485980e5dc87838456
Change-Id: I17309d80b637f02bc5e6d33294472e02add88f86
Now we have removed nova legacy notifications in Watcher
and just consume nova versioned notifications,
we don't need notification config in nova.conf
Change-Id: I1c9c141d98d858c36ad8bb7be0b95c38ff1d5752
This commit adds the functionality of watcher-status CLI for performing
upgrade checks as part of the Stein cycle upgrade-checkers goal.
It only includes a sample check which must be replaced by real checks in
future.
Change-Id: Ic3d066af439797d6f705e805334f729b52ce3aac
Story: 2003657
Task: 26164
Add new start_time and end_time fields in the audit table
Partially Implements: blueprint add-start-end-time-for-continuous-audit
Change-Id: I6bb838d777b2c7aa799a70485980e5dc87838456
As rpc_backend config option has been removed from
oslo_config [1], projects should not use it.
Current uses of it cause watcher crash when installing
via devstack.
[1] https://review.openstack.org/#/c/580910/
Change-Id: Iba7471e87e8935f1ea02b363f269e9debdc7cc71
Quotes around {posargs} cause the entire string to be combined into one
arg that gets passed to stestr. This prevents passing multiple args
(e.g. '--concurrency=16 some-regex')
Change-Id: I0371fc2c0878a177c0a9e9c9313ca5b8470bfd98
This patch set fixes process of audit creation and
allows to create audit without Audit Template using only
names of Goal and Strategy. It also provides some additional
unit tests to improve tests covering.
Change-Id: I89a9c7661616f49639151869055d8f5ebe723d5f
Closes-Bug: #1794233
This patch set adds efficacy indicators for workload_balancing
goal (that includes workload_stabilization and workload_balance
strategies so far).
Change-Id: I5b04d084ace7c661001c62f07b8308e5763e144d
oslo_context may add new fields in request context, there are no
need to warning these fields.
Closes-Bug #1790577
Change-Id: Ic96c1f1e1a80099d9dafa95a014fc47f05b88e42
This is a mechanically generated patch to complete step 1 of moving
the zuul job settings out of project-config and into each project
repository.
Because there will be a separate patch on each branch, the branch
specifiers for branch-specific jobs have been removed.
Because this patch is generated by a script, there may be some
cosmetic changes to the layout of the YAML file(s) as the contents are
normalized.
See the python3-first goal document for details:
https://governance.openstack.org/tc/goals/stein/python3-first.html
Change-Id: I5e75f2ea7dd02065bc18793d974f56fef2daa2c4
Story: #2002586
Task: #24344
As openstack installation guides suggest to run mysql with root shell
user, mysql will not ask for password, so in
controller-install-*.rst "-u root -p" is useless.
Change-Id: I511f39d734702ab3d1a209f7d868f52fb184f1fc
Related-Bug: #1785025
This patch set refactors logs of workload stabilization
strategy to make them more readable and sensible.
Change-Id: I408988712bb7560728157f3b4e4f2b37572128c4
Having both Prometheus and Aetos datasources configured at the same time
is not supported and will result in a configuration error. Allowing this
can be investigated in the future if a need or a proper use case is
identified.
The watcher.conf configuration file is also used to set the parameter values
required by the Watcher Aetos data source. The configuration can be
added under the ``[aetos_client]`` section and the available options are
duplicated below from the code as they are self documenting:
..code-block::
cfg.StrOpt('interface',
default='public',
choices=['internal', 'public', 'admin'],
help="Type of endpoint to use in keystoneclient."),
cfg.StrOpt('region_name',
help="Region in Identity service catalog to use for "
"communication with the OpenStack service."),
cfg.StrOpt('fqdn_label',
default='fqdn',
help="The label that Prometheus uses to store the fqdn of "
"exporters. Defaults to 'fqdn'."),
cfg.StrOpt('instance_uuid_label',
default='resource',
help="The label that Prometheus uses to store the uuid of "
"OpenStack instances. Defaults to 'resource'."),
Authentication and Service Discovery
------------------------------------
Unlike the Prometheus datasource which requires explicit host and port
configuration, the Aetos datasource uses Keystone service discovery to
automatically locate the Aetos endpoint. The datasource:
1. Uses the configured Keystone credentials to authenticate
2. Searches the service catalog for a service with type 'metric-storage'
3. Uses the discovered endpoint URL to connect to Aetos
4. Attaches a Keystone token to each request for authentication
If the Aetos service is not registered in Keystone, the datasource will
fail to initialize and prevent the decision engine from starting.
So a sample watcher.conf configured to use the Aetos datasource would look
like the following:
..code-block::
[watcher_datasources]
datasources = aetos
[aetos_client]
interface = public
region_name = RegionOne
fqdn_label = fqdn
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.