Compare commits

..

315 Commits

Author SHA1 Message Date
jgilaber
03073a1b0d Remove outdated zone migration documentation note
In a recent patch [1], a bug in the zone migration strategy was fixed,
which prevented audits using this strategy to create action plans
with both instance and volume migrations. We documented this limitation,
but forgot to remove the note when fixing this bug.

[1] https://review.opendev.org/c/openstack/watcher/+/952115

Change-Id: I2074f2b911dfcbf44716ff30d8ea35a5046b8520
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-10-01 17:02:40 +02:00
Ronelle Landy
ced0d58d23 Remove last column from strategies table
Removed the "Can Be Triggered from Horizon (UI)"
column and adjusted remaining column widths to
be equal.

Assisted-By: claude-sonnet-4 (Claude Code)
Signed-off-by: Ronelle Landy <rlandy@redhat.com>
Change-Id: I50eef1dee9071eeb532378bd5abcd1d994d299b5
2025-09-26 15:17:56 -04:00
Chandan Kumar (raukadah)
3c8bc6be62 Add user guide for continuous audits
Introduce a new user guide describing how to run continuous audits using
the dummy strategy. The guide covers:
- Overview and state machine
- Creating audits with interval and cron expressions
- Time window constraints (start/end time)
- Monitoring executions and action plan lifecycle
- Managing audits (stop/modify)
- Configuration reference and links to related specs

Closes-Bug: #2120437

Assisted-By: GPT-5 (Cursor)
Assisted-By: claude-sonnet-4 (Claude Code)
Change-Id: I842139271752cedb138e422027020488f22fe248
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-09-23 18:42:30 +05:30
Zuul
635be7a009 Merge "Enable Continuous Audit tests in CI" 2025-09-22 13:30:10 +00:00
Zuul
fe50d270c3 Merge "Resolve deprecation warning from pecan" 2025-09-19 10:50:57 +00:00
Zuul
27961d8574 Merge "Add missing 1.6 API doc in rest version history" 2025-09-17 20:37:00 +00:00
Douglas Viroel
408abaee49 Enable Continuous Audit tests in CI
Scenario continuous audit tests is being added
but will not run by default, since not all stable
branches have the zone_migration fixes needed to
make tests stable.

Depends-On: https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/954264

Change-Id: I5c49b251a49ee439bad024a1cf2569fcbeb2eaf1
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-09-17 15:22:16 -03:00
Douglas Viroel
ed0f7457fb Add missing 1.6 API doc in rest version history
The version history was not updated in the patch that
bumped the API to 1.6[1]. This patch adds the missing doc
and also sets 1.6 to the maximun API for the latest release.

[1] https://review.opendev.org/c/openstack/watcher/+/955827

Closes-Bug: #2124938

Change-Id: I62473e84415896387fda8ca6d0982f78d2a1a9f1
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-09-17 11:53:31 -03:00
Douglas Viroel
680518ad6d Fix zone migration instance not found issue
When retrieving the list of instances and volumes to propose a
solution, the zone migration strategy can raise an exception for
instance or volume not found, which will make the audit goes to a
failure state. This fix maintains the logic of listing all elements
directly from the client (nova) but now checks if the instance
is already in the model. The storage model check was already fixed
in another patch[1].

[1] cb6fb16097

Closes-Bug: #2098984
Assisted-By: Cursor (claude-3.5-sonnet)

Change-Id: I4c8993f051b797104172047eaae1fe1523eaf7eb
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-09-16 16:12:35 -03:00
Douglas Viroel
cada9acced Add unit tests for instance and volume not found in model
The Zone Migration strategy was implemented to list all
instances and volumes from clients (nova and cinder) and
check if they exist in the models. But the code is not
properly treating model exceptions, taking audit to a failure
state when the model doesn't have the requested element.
This patch adds unit tests to validate this scenario, which
should be fixed in a follow up change.
The additional check for volumes in the model was recently
added in [1]

[1] cb6fb16097

Related-Bug: #2098984

Assisted-By: Cursor (claude-3.5-sonnet)

Change-Id: Icf1e5d4c83862c848d11dae994842ad0ee62ba12
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-09-16 15:56:13 -03:00
jgilaber
8211475478 Improve unit tests for zone migration strategy
The unit tests were mocking part of the Zone Migration strategy class,
which could hide possible bugs. This patch removes this mocking, leaving
mocked only other classes that are used by the zone migration one.

Additionally, it includes improved suggestions as follow-up from the
review of previous patches, like more explicit comments and additional
asserts of mocked functions.

Assisted-By: Cursor (Claude-4-sonnet)
Change-Id: Ie1894311b0e384ab52b1b3dfe0eb50618eef6c9f
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-16 12:57:54 +02:00
jgilaber
c2ad4b28da Support zone migration audit without compute_nodes
When only running volume migrations, a zone migration
strategy audit without setting compute_nodes should work.

Before this change, an audit with defined storage_pools,
no compute_nodes parameters, and with_attached_volume is set to True
would trigger the migration of the instances attached to the volumes
being migrated.

This patch decouples instance and volume migrations unless the user
explicitely asks for both. When migrating attached volumes, the zone
migration strategy will check for which instances should be migrated
according to the audit parameters, and if the instance the volume is
attached to can be migrated, it will be just after the volume.

On the other hand, when the attached instances should not be migrated
according to user input, only the volumes will be migrated.

In an audit that migrates instnaces but not volumes, the
with_attached_volume parameter will continue doing nothing.

Closes-Bug: 2111429
Change-Id: If641af77ba368946398f9860c537a639d1053f69
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-16 12:20:18 +02:00
Alfredo Moralejo
296856101f Allow volume and vm migrations in zone_migration
Currently, when an audit with strategy zone_migration has added at least
one volume_migration action, it will not process the instances
migrations according to the definition of the `compute_nodes` parameter.
This behavior is unexpected according to the documentation of the
strategy.

This patch is fixing that behavior and making sure that not duplicated
actions are added to the solution, to handle the case where instances
migration actions are created when analyzing the volumes if the
`with_attached_volume` parameter is enabled. The patch is also removing
the method `instances_no_attached` which is not longer used.

Finally, it's adding some unit tests for the new method and fixing the
ones to cover the mixed instances and volumes migration situation.

Closes-Bug: #2109722
Change-Id: Ief7386ab448c2711d0d8a94a77fa9ba189c8b7d2
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-16 12:20:18 +02:00
Alfredo Moralejo
2f2134fc7a Add test for zone_migration with instances and volumes
Currently, unit tests for zone_migration strategy do not include any
test for instances and volumes mixed, which is currently not working as
expected.

This patch is adding two new tests which include both compute_nodes and
storage_pools in audit configuration. One of them is also setting
with_attached_volume option.

These tests will be fixed to validate the expected behavior of the
strategy in the fixing patch.

Related-Bug: #2109722
Change-Id: I496ce3e1f21b7a4165aa47d5862cf0497be79487
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-16 12:20:18 +02:00
jgilaber
cb6fb16097 Use src_type to filter volumes in zone migration
Despite having the src_type paremeter for the storage_pool dictionary as
a mandatory parameter, the value is not being used to filter the volumes
to migrate, using only 'src_pool'.

This change makes 'src_type' optional, since it was ignored until this
point, making it optional keeps the same behaviour by default. If
'src_type' is in the audit parameters, the strategy uses both 'src_pool' and
'src_type' to filter the volumes to migrate.

Closes-Bug: 2111507
Change-Id: Id83a96de85ada1ae6c0e25f8b7fcf54034604911
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-16 12:20:18 +02:00
Zuul
cee72d2bda Merge "Fix missing CORS middleware" 2025-09-15 17:00:56 +00:00
Zuul
cd1154d09c Merge "Add capability to parse forward headers" 2025-09-15 16:23:46 +00:00
Zuul
90f6552c74 Merge "Fix missing X-OpenStack-Request-ID header" 2025-09-15 15:56:03 +00:00
OpenStack Release Bot
0368cea4c1 Update master for stable/2025.2
Add file to the reno documentation build to show release notes for
stable/2025.2.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.2.

Sem-Ver: feature
Change-Id: I21fd5f9a613e5e2ee81ae4fe34165f3f4a6ae479
Signed-off-by: OpenStack Release Bot <infra-root@openstack.org>
Generated-By: openstack/project-config:roles/copy-release-tools-scripts/files/release-tools/add_release_note_page.sh
2025-09-15 10:13:29 +00:00
Takashi Kajinami
e1c8961a7c Fix missing CORS middleware
CORS middleware needs to be added to api pipeline to support
Cross-Origin Resource Sharing(CORS). CORS is supported globally by
multiple OpenStack services but is not by watcher, due to lack of
CORS middleware and no mechanism to inject it into api pipeline.

Closes-Bug: #2122347
Change-Id: I6b47abe4f08dc257e9156b254fa60005b82898d7
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-09-13 11:49:11 +09:00
Zuul
61cca16dcd Merge "Handle missing dst_pool parameter in zone_migration" 2025-09-11 20:57:34 +00:00
Zuul
f3d0ec5869 Merge "Enable storage model collector by default" 2025-09-11 19:36:22 +00:00
Takashi Kajinami
17a4c96c66 Add capability to parse forward headers
In case standalone watcher-api runs behind forwarders (like load
balancers), it should parse specific request headers to determine
the endpoint url clients actually use.

Add http_proxy_to_wsgi middleware to api pipeline to handle this.

Closes-Bug: #2122353
Change-Id: I27ade17f7ce1649295f92f3ea1af620df63ba1bc
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-09-11 15:50:04 +00:00
Takashi Kajinami
a562880b1c Fix missing X-OpenStack-Request-ID header
Request ID is essential in operating OpenStack services, especially
when troubleshooting some API problems. It allows us to find out
the log lines actually related to a specific request.

However watcher api hasn't returned it properly, so operators had no
way to determine the exact ID they should search.

Add RequestID middleware to return the id in X-OpenStack-Request-Id
header, which is globally used.

Closes-Bug: #2122350
Change-Id: Ie4a8307e8e7e981cedbeaf5fe731dbd47a50bade
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-09-11 15:46:21 +00:00
jgilaber
fe56660c44 Handle missing dst_pool parameter in zone_migration
Unlike Nova, Cinder does not support calling the 'os-migrate_volume'[1]
action without a host or a cluster. For volume migrations of type
'migrate' in watcher the dst_pool is required, but for other migrations
that migrate the volumes to different types is not needed. This
change checks if the dst_pool is defined and prevents some migrations
when it's misssing information.

Adds testing for creating audits with the Zone Migration status,
validating the schema changes.

[1] https://docs.openstack.org/api-ref/block-storage/v3/index.html#migrate-a-volume

Closes-Bug: 2108988

Change-Id: I305c58e47093c4a884e86f1d91fdc15ef2a1cfba
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-10 15:58:24 +02:00
jgilaber
6cb4e2fa83 Enable storage model collector by default
By default Watcher enables only the compute model collector [1]. This
change enables the storage one as well, since otherwise when doing
volume migration the model quickly becomes obsolete if there are new
volumes created while an audit is running. The storage model is only
enabled if a cinder service is registered in keystone.

[1] https://docs.openstack.org/watcher/latest/configuration/watcher.html#collector.collector_plugins

Assisted-By: Cursor
Closes-Bug: 2111785
Change-Id: I864d3fc12d6364f1932cf5d2348a6b68169641e9
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-10 15:58:24 +02:00
Sean Mooney
9b1adaa7c7 Add 2025.2 release notes prelude
The prelude provides a high-level overview of the
security improvements, operational enhancements,
and new monitoring capabilities for operators.

Assisted-By: claude-code
Change-Id: Ia2c1409d26aca0eddfb1685e9009305215c2405a
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-09-09 17:43:53 +01:00
Douglas Viroel
f21df7ce1e Update prometheus-threading parent jop
Updates watcher-prometheus-integration-threading job
parent, so every new config option added to
watcher-prometheus-integration job is also added/tested
in the threading job.

Change-Id: I38c95f638f748fd5c051c312817e9123d6037ab5
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-09-03 13:44:06 -03:00
Zuul
b1aad46209 Merge "Check result of retype action based on type and status" 2025-09-02 12:44:05 +00:00
Alfredo Moralejo
90009aac84 Check result of retype action based on type and status
Currently, when there is a volume_migrate action and migration_type is
`retype`, watcher assumes that the retype always triggers a migration
and checks the result of the retype based on the fields related to
the migration action (actually, it uses the same function to check the
result when `migration_type` is `retype` or `migrate`. This creates
problem in different scenarios:

- Actions keep in ONGOING status forever for volumes which have never
  being migrated as the migration fields of the volume are empty.
- Actions which were migrated anytime before, still have the old values
  so it may report the status of te retype actions wrongly.

This patch is implementing an entirely new function to check the result
of a retype action based on the final type and the status field of the
volume. This should be valid for any kind of retype action, with or
without migration. The criteria for successfull retype is that the type
for the volume is the destination one in the action and the status is
available or in-use.

Closes-Bug: #2112100

Change-Id: I76e91ed99e7a814a43a6dd906b6bcc150d471624
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-01 16:59:38 +02:00
Zuul
e5b18afa01 Merge "Fix doc section to enable cinder notifications" 2025-09-01 14:15:29 +00:00
Zuul
fedc74a5b0 Merge "Update aetos fake data job to disable real metrics" 2025-09-01 12:06:53 +00:00
jgilaber
a4b785e4f1 Fix doc section to enable cinder notifications
The section in the Watcher docs that describes how to enable cinder
notifications incorrectly tells the user to change the cinder config to
send notification to the watcher.watcher_notifications exchange and
topic. Instead, it should instruct the user to change the Watcher
configuration of the notification_topics [1] to listen to the
'openstack.notifications', which is the one used by cinder by
default[2].

This patch also adds 'openstack.notifications' to the default value
for the 'notification_topics' parameter.

[1] https://docs.openstack.org/watcher/latest/configuration/watcher.html#watcher_decision_engine.notification_topics
[2] https://docs.openstack.org/cinder/latest/configuration/block-storage/samples/cinder.conf.html

Partial-Bug: 2121384
Change-Id: I4dc1a72af79a23c9ca07d2da5ff41bd7741e37d8
Signed-off-by: jgilaber <jgilaber@redhat.com>
2025-09-01 11:23:00 +02:00
Zuul
cdde0fb41e Merge "Allow status_message updates for actions in SKIPPED state" 2025-08-28 20:04:34 +00:00
Sean Mooney
ef0f35192d Make Monasca client optional and lazy-load
Monasca is deprecated for removal. This change makes the Monasca client
an optional dependency and ensures it is only imported and instantiated
when the Monasca datasource is explicitly selected. This reduces the
default footprint while preserving functionality for deployments that
still rely on Monasca.

What changed
============
- requirements.txt: remove python-monascaclient from hard deps
- setup.cfg: add [options.extras_require] monasca extra
- watcher/common/clients.py: lazy import with clear UnsupportedError
- watcher/decision_engine/datasources/monasca.py: lazy client property
  and deferred import of monascaclient.exc; reset on Unauthorized
- watcher/decision_engine/datasources/manager.py: unconditionally
  import Monasca helper and include in metric_map; helper is lazy
- tests: conditionally include Monasca based on availability; adjust
  expectations instead of skipping by default; avoid over-mocking
- tox.ini: enable optional extras via WATCHER_EXTRAS env var
- docs: datasources index notes Monasca is deprecated and optional
- releasenotes: upgrade note with install example and behavior

Why
===
- Allow deployments not using Monasca to run without the client
- Keep Monasca functional when explicitly installed via extras
- Provide clear operator guidance and smooth upgrades

Compatibility
=============
- No change for deployments that do not use Monasca
- Deployments using Monasca must install the optional extra:
  pip install watcher[monasca]

Testing
=======
- Default: tox -e py3
- With Monasca: WATCHER_EXTRAS=monasca tox -e py3

Assisted-By: GPT-5 (Cursor)
Closes-Bug: #2120192
Change-Id: I7c02b74e83d656083ce612727e6da58761200ae4
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-08-28 16:53:48 +01:00
Sean Mooney
c9bfb763c2 Allow status_message updates for actions in SKIPPED state
Fixed action status_message update restrictions to allow updates when
action is already in SKIPPED state. Previously, users could only update
the status_message when initially transitioning to SKIPPED state.

Changes include:
- Modified validation logic to allow status_message updates for SKIPPED actions
- Changed exception type from PatchError to Conflict for better semantics
- Added comprehensive test coverage for the new behavior
- Updated API documentation and samples
- Added release note documenting the fix

This enables administrators to fix typos, provide more detailed
explanations, or expand on reasons in action status messages after
the action has been skipped.

Generated-By: claude-code
Closes-Bug: #2121601
Change-Id: I64def708389a8ecd32080fba1638a4499ead349d
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-08-28 16:16:01 +01:00
morenod
eb3fdb1e97 Update aetos fake data job to disable real metrics
Job watcher-aetos-integration is failing because of
having real metrics enabled coming from ceilometer.

We need to disable ceilometer-acompute and node_exporter so only
injected data will be considered when asking prometheus to take
decisions


Change-Id: If4f2c3f6f89527d768c48f1ca4967339837bb994
Signed-off-by: morenod <dsanzmor@redhat.com>
2025-08-28 10:51:08 +00:00
Zuul
848cde3606 Merge "Rename confusing query timeout options" 2025-08-28 09:26:40 +00:00
Zuul
63cf35349c Merge "Extend compute model attributes" 2025-08-27 16:40:53 +00:00
Takashi Kajinami
7106a12251 Rename confusing query timeout options
These do not actually define timeout but interval. Rename the options
to reflect what they actually define. The existing deprecated options
in the [gnocchi_client] are also removed, because these have been kept
for 6 years.

In addition, fix inconsistent name (query vs call).

Change-Id: Ib29115746a25b45bdff1c3da8df9d7167c2db662
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-27 23:22:45 +09:00
Douglas Viroel
03c09825f7 Extend compute model attributes
This patch extends compute model attributes by
adding new fields to Instance element. Values are
populated by nova the collector, using the same
nova list call, but requires a more recent compute
API microversion.
A new config option was added to allow users to
enable or disable the extended attributes and it is
disable by default.
Configure prometheus-based jobs to run on newer version
of nova api (2.96) and enables the extended attributes
collection.

Implements: bp/extend-compute-model-attributes

Assisted-By: Cursor (claude-4-sonnet)

Change-Id: Ibf31105d780dce510a59fc74241fa04e28529ade
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-26 11:35:18 -03:00
Douglas Viroel
2452c1e541 Follow up changes for skip-action blueprint
These are some of the requested changes from reviews
in the series of patches for add-skip-action blueprint.
Some of them may required another specific patch since
would touch in more files that are not related to
this feature.

Change-Id: I9e30ca385e7b184ab19449a60db6f6d0f3c0e1b9
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-26 10:27:57 -03:00
Zuul
d91b550fc9 Merge "Fix missing watcher_workflow_engines.taskflow section" 2025-08-26 13:16:19 +00:00
Zuul
1668b9b9f8 Merge "API changes for skipped actions: patch actions and status_message" 2025-08-26 12:54:31 +00:00
Zuul
5e05b50048 Merge "Skip actions automatically based on pre_condition results" 2025-08-26 12:33:08 +00:00
Zuul
4d8f86b432 Merge "Fix NovaHelper microversion comparison" 2025-08-25 19:18:57 +00:00
Zuul
05d8f0e3c8 Merge "Validate endpoint_type option at loading" 2025-08-25 12:06:44 +00:00
Takashi Kajinami
1a87abc666 Fix missing watcher_workflow_engines.taskflow section
... caused by AttributeError.

Closes-Bug: #2121286
Change-Id: I52bab27afdc96d8ce2d9733316737c3aa505f5fe
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-24 22:58:28 +09:00
Zuul
fa4552b93f Merge "Fix type mismatch between option and its default" 2025-08-24 13:21:43 +00:00
Takashi Kajinami
a07bfa141d Fix type mismatch between option and its default
... to avoid the following warning.

```
UserWarning: converting '1' to a string
  warnings.warn('converting \'%s\' to a string' % str_val)
```

Change-Id: I852d63523d3582f00d4d7953199181e3d2b6a885
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-24 04:22:33 +09:00
Zuul
a6668a1b39 Merge "Update Overload standard deviation doc" 2025-08-22 15:22:04 +00:00
Zuul
534c340df1 Merge "Add new tests to validate GET /infra-optim/v1/data_model" 2025-08-22 14:16:05 +00:00
Zuul
a963e0ff85 Merge "Fix api-ref doc for GET /infra-optim/v1/data_model" 2025-08-22 14:03:15 +00:00
Ronelle Landy
457819072f Update Overload standard deviation doc
Bug #2113862 details a number of suggested
corrections and additions to the Workload
Stabilization doc. This patch adds those
suggested changes.

Closes-Bug: #2113862
Assisted-By: Cursor (claude-3.5-sonnet)
Change-Id: I4131a304c064d2ea397b2447025c7edf69a56e2a
Signed-off-by: Ronelle Landy <rlandy@redhat.com>
2025-08-21 11:09:46 -04:00
Zuul
6d155c4be6 Merge "Add status_message to objects and notifications" 2025-08-21 14:59:53 +00:00
Zuul
83fea206df Merge "Add status_message column to Actions, Audits and ActionPlans tables" 2025-08-21 14:50:46 +00:00
Zuul
00a3edeac6 Merge "Add parameters to force failures in nop action" 2025-08-21 14:32:37 +00:00
Zuul
b69642181b Merge "Add patch call validation based on allowed_attrs" 2025-08-21 14:24:09 +00:00
Zuul
616c8f4cc4 Merge "Add options to disable migration in host maintenance" 2025-08-21 14:11:22 +00:00
Quang Ngo
cc26b3b334 Add options to disable migration in host maintenance
This change enhances the Host Maintenance strategy by introducing
two new input parameters: `disable_live_migration` and
`disable_cold_migration`. These parameters allow cloud
administrators to control whether live or cold migration should be
considered during host maintenance operations.

If `disable_live_migration` is set, active instances will be cold
migrated if `disable_cold_migration` is not set, otherwise
active instances will be stopped. If `disable_cold_migration` is set,
inactive instances will not be cold migrated.
If both are set, only stop actions will be performed on instances.

The strategy logic and action plan generation have been updated to
reflect these behaviors. A new "stop" action is introduced and
registered, and the weight planner is updated to handle new action.

Documentation for the Host Maintenance strategy is updated to
describe the new parameters and their effects.

Test Plan:
- Unit tests for HostMaintenance strategy with new parameters
- Integration tests for action plan generation with stop action

This implements the specification:
Spec: https://review.opendev.org/c/openstack/watcher-specs/+/943873

Change-Id: I201b8e5c52e1bc1a74f3886a0e301e3c0fa5d351
Signed-off-by: Quang Ngo <quang.ngo@canonical.com>
2025-08-20 22:32:33 +10:00
Douglas Viroel
9003906bdc Fix NovaHelper microversion comparison
Fixes the microversion comparison in both enable and
disable nova-compute service methods in NovaHelper.
The previous implementation was incorrect and started to
fail for microversion greather than 2.99.

Closes-Bug: #2120586

Assisted-By: Cursor (claude-4-sonnet)

Change-Id: I69da7f10cd5b42f7d4613d8947bca3e382815c3f
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-20 08:35:18 -03:00
Alfredo Moralejo
e06f1b0475 API changes for skipped actions: patch actions and status_message
This patch implements the changes in the API required for the
skipped action blueprint. It includes:

- New field `status_message` is visible in API get calls for Audits,
  ActionPlans and Audits.
- New Patch call is added to `/actions/{action_id}` which allows to
  manually move actions in PENDING state to SKIPPED for ActionPlans
  which have not been started.
- A new API microversion 1.5 is added for these changes.

It also adds requried tests and documentation.

Implements: blueprint add-skip-actions

Assisted-By: Cursor (claude-4-sonnet)

Change-Id: I71fb9af76085e5941a7fd3e9e4c89d6f3a3ada47
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-20 13:13:19 +02:00
Alfredo Moralejo
6d35be11ec Skip actions automatically based on pre_condition results
This patch is implementing skipping automatically actions based on the
result of action pre_condition method. This will allow to manage
properly situations as migration actions for vms which does not longer
exist. This patch includes:

- Adding a new state SKIPPED to the Action objects.
- Add a new Exception ActionSkipped. An action which raises it from the
  pre_condition execution is moved to SKIPPED state.
- pre_condition will not be executed for any action in SKIPPED state.
- execute will not be executed for any action in SKIPPED or FAILED state.
- post_condition will not be executed for any action in SKIPPED state.
- moving transition to ONGOING from pre_condition to execute. That means
  that actions raising ActionSkipped will move from PENDING to SKIPPED
  while actions raising any other Exception will move from PENDING to
  FAILED.
- Adding information on action failed or skipped state to the
  `status_message` field.
- Adding a new option to the testing action nop to simulate skipping on
  pre_condition, so that we can easily test it.

Implements: blueprint add-skip-actions

Assisted-By: Cursor (claude-4-sonnet)

Change-Id: I59cb4c7006c7c3bcc5ff2071886d3e2929800f9e
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-20 13:10:10 +02:00
Takashi Kajinami
1009c3781b Validate endpoint_type option at loading
... instead of documenting the supported values, so that more explicit
error is presented to users.

Also drop redundant description about the default values. The default
values are added to sample config files generated, so don't have to
be explained in help texts.

Change-Id: I12b201da3e742b55f6cfcf71bdd4413cbf3ee4e5
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-20 01:44:59 +09:00
Alfredo Moralejo
5048a6e3ba Add status_message to objects and notifications
This patch is part of the skipped action blueprint. It adds the
`status_message` field to the Audit, ActionPlan and Action objects and
all related notifications.

It bumps the versions of all the affected objects and notifications and
update the tests to include the new fields.

Change-Id: I3b9467e7e37188e647379cd9c4cbbda8ed75383f
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-19 13:01:00 +02:00
Alfredo Moralejo
84742be8c2 Add status_message column to Actions, Audits and ActionPlans tables
This patch implements the changes in the database required for the
skipped action blueprint.

It just adds a new nullable column to the required tables and add tests
for it.

Note that I am  also introducing a fix in a previous tables tests which
will be affected by the changes in the objects.

Implements: blueprint add-skip-actions

Change-Id: I027bc3861b589bd281a7216583a8c5c351a53c57
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-19 11:05:39 +02:00
Alfredo Moralejo
1fb89aeac3 Add parameters to force failures in nop action
In order to test the different code paths for action execution
it is very useful to be able to make the actions fail in the different
execution stages.

This patch adds three new options `fail_pre_condition`, `fail_execute`
and `fail_post_condition`. Setting any of them to True makes the action
to fail in the specified step.

Change-Id: Ied8c0bb767d9bb6bdfb9209365857a3b4d606b40
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-19 11:05:11 +02:00
Alfredo Moralejo
1a9f17748e Add patch call validation based on allowed_attrs
Currently, patch call field validations are done based on exclussion,
all the fields can be patched unless included in a list
`internal_attrs`.

This patch is adding a new validation rule based on fields inclussion
in a list `allowed_attrs`. When that list is non-empty, only the fields
included on it can be patched. in order to keep the existing behavior
for the existing patch calls, I am defining the list as empty, so that
the rest of validation rules are applied and it is not affecting the
current behavior.

Change-Id: I22010649332c8fb872446a9d0483a0303a4eba3b
Signed-off-by: Alfredo Moralejo <amoralej@redhat.com>
2025-08-19 11:01:20 +02:00
Zuul
90f0c2264c Merge "use cinder migrate for swap volume" 2025-08-18 20:32:42 +00:00
Sean Mooney
3742e0a79c use cinder migrate for swap volume
This change removes watchers in tree functionality
for swapping instance volumes and defines swap as an alias
of cinder volume migrate.

The watcher native implementation was missing error handling
which could lead to irretrievable data loss.

The removed code also forged project user credentials to
perform admin request as if it was done by a member of a project.
this was unsafe an posses a security risk due to how it was
implemented. This code has been removed without replacement.

While some effort has been made to allow existing
audits that were defined to work, any reduction of functionality
as a result of this security hardening is intentional.

Closes-Bug: #2112187
Change-Id: Ic3b6bfd164e272d70fe86d7b182478dd962f8ac0
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-08-18 16:35:38 +00:00
Takashi Kajinami
7b5243fa8d Resolve deprecation warning from pecan
Resolve the following warning raised from pecan.

```
DeprecationWarning: The function signature for
watcher.api.controllers.root.RootController._route is changing in
the next version of pecan.
Please update to: `def _route(self, args, request)`.
```

Change-Id: I7081cf956a8baa05cd70ced0496ca8192fff979e
Signed-off-by: Takashi Kajinami <kajinamit@oss.nttdata.com>
2025-08-18 22:06:20 +09:00
Jaromir Wysoglad
8309d9848a Add Aetos datasource
Implement the spec for multi-tenancy support for metrics. This adds
a new 'Aetos' datasource very similar to the current Prometheus
datasource. Because of that, the original PrometheusHelper class
was split into two classes and the base class is used for
PrometheusHelper and for AetosHelper. Except for the split, there
is one more change to the original PrometheusHelper class code, which
is the addition and use of the _get_fqdn_label() and
_get_instance_uuid_label() methods.

As part of the change, I refactored the current prometheus datasource
unit tests. Most of them are now used to test the PrometheusBase class
with minimal changes. Changes I've made to the original tests:

- the ones that can be be used to test the base class are moved into the
  TestPrometheusBase class
- the _setup_prometheus_client, _get_instance_uuid_label and
  _get_fqdn_label functions are mocked in the base class tests.
  Their concrete implementations are tested in each datasource tests
  separately.
- a self._create_helper() is used to instantiate the helper class with
  correct mocking.
- all config value modification is the original tests got moved out and
  instead of modifying the config values, the _get_* methods are mocked
  to return the wanted values
- to keep similar test coverage, config retrieval is tested for each
  concrete class by testing the _get_* methods.

New watcher-aetos-integration and watcher-aetos-integration-realdata
zuul jobs are added to test the new datasource. These use the same set
of tempest tests as the current watcher-prometheus-integration jobs.
The only difference is the environment setup and the Watcher config,
so that the job deploys Aetos and Watcher uses it instead of accessing
Prometheus directly.

At first this was generated by asking cursor to implement the linked spec
with some additional prompts for some smaller changes. Afterwards I manually
went through the code doing some cleanups, ensuring it complies with
PEP8 and hacking and so on. Later on I manually adjusted the code to use
the latest observabilityclient changes.
The zuul job was also mostly generated by cursor.

Implements: https://blueprints.launchpad.net/watcher/+spec/prometheus-multitenancy-support

Generated-By: Cursor with claude-4-sonnet model
Change-Id: I72c2171f72819bbde6c9cbbf565ee895e5d2bd53
Signed-off-by: Jaromir Wysoglad <jwysogla@redhat.com>
2025-08-14 02:27:24 -04:00
Zuul
355671e979 Merge "Add a new tox environment to run unit tests in threading mode" 2025-08-13 21:37:19 +00:00
Douglas Viroel
9becb68495 Add new tests to validate GET /infra-optim/v1/data_model
The data_model list API response comes from the model to_list()
method, which generates both server_* and node_* attributes from
Instance and Node classes fields[1]. Any change on these classes
can break the data_model list API and require a new microversion.
These tests validate the current expected fields.

[1] 5ba086095c/watcher/decision_engine/model/model_root.py (L250-L270)

Change-Id: I77fac162101013aa923272aa99c7c6695cc5fdca
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-12 09:47:01 -03:00
Douglas Viroel
37faf614e2 Fix api-ref doc for GET /infra-optim/v1/data_model
Some response parameters from GET /infra-optim/v1/data_model
endpoint are missing from api-ref documentation. This patch
updates the doc to include them.
For more details see, LP #2117726

Closes-Bug: #2117726

Change-Id: Iaa775f56bb8167d9c6b458cd07f1ec3cefaf70fe
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-12 09:47:01 -03:00
Zuul
4080d5767d Merge "Disable real metrics on devstack injected data jobs" 2025-08-11 17:10:32 +00:00
Zuul
9925fd2cc9 Merge "Replace dateutils usage with datetime and oslo.utils" 2025-08-07 20:46:25 +00:00
Zuul
27baff5184 Merge "Extend decision engine to support threading mode" 2025-08-06 15:38:31 +00:00
Douglas Viroel
8ca794cdbb Add a new tox environment to run unit tests in threading mode
It is done by disabling the eventlet patching and configuring
oslo.service backend to threading. Once oslo.service backend is
configured, it can't be reverted to eventlet. This needs to be
done before including other modules, which may include oslo.service
library.
Adds a job that run a subset of tests with eventlet patching disabled.

Change-Id: I9f8c2c5bbcf3192313cc3b309e8f2719a3bea18f
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-05 16:50:29 -03:00
Douglas Viroel
f879b10b05 Extend decision engine to support threading mode
With the events of eventlet removal, Watcher will need
to be adapted to support both modes, eventlet and threading, for
a couple of releases before removing all eventlet code.
This patch adds methods and classes that allow decision engine
modules to create futurist thread pools instead of green thread pools,
based on a environment variable that can be enabled by service.
It moves continuous audit handler instance to decison engine service,
so it can be started together with the main decision engine service.
Adds an environment variable that allows the user to disable
eventlet monkey patching and to use oslo.service threading backend.

Change-Id: I8a8be0a7cebdc44005fd77ec960543828c7da318
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-05 16:45:48 -03:00
Chandan Kumar (raukadah)
95d975f339 Replace dateutils usage with datetime and oslo.utils
This cr fixes:
* Replaced ``dateutil.tz.tzlocal()`` and ``dateutil.tz.tzutc()`` with
  ``datetime.timezone`` built-in classes in audit controllers and
  continuous audit scheduling.

* Replaced ``dateutil.parser.parse()`` with
  ``oslo_utils.timeutils.parse_isotime()`` in the zone migration
  strategy for parsing datetime strings.

Closes-Bug: #2118404

Change-Id: I6d8a345fa4339a688769b147413dcdf3016bf4a0
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-08-05 23:09:50 +05:30
morenod
0435200fb1 Disable real metrics on devstack injected data jobs
We need to disable real data metrics comming from host and
instances on injected data jobs as they are creating wrong results
when they are mixed with the injected data.

We already did this on watcher-operator disabling ceilometer agent and
node_exported on [1] so now we have to do it on devstack installations,
disabling meminfo on node_exporter for host metrics (cpu is already
disabled) and sg-core for instance metrics

[1] https://github.com/openstack-k8s-operators/watcher-operator/pull/196

Change-Id: I4130ca6dd7cb52d96842e04e7720431ebc76efff
Signed-off-by: morenod <dsanzmor@redhat.com>
2025-08-04 12:41:54 +02:00
Douglas Viroel
adfe3858aa Configure watcher tempest's microversion in devstack
Adds a tempest configuration for min and max microversions supported
by watcher. This help us to define the correct range of microversion
to be tested on each stable branch.
New microversion proposals should also increase the default
max_microversion, in order to work with watcher-tempest-plugin
microversion testing.

Change-Id: I0b695ba4530eb89ed17b3935b87e938cadec84cc
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-08-01 17:28:40 -03:00
Zuul
a1e7156c7e Merge "finalize python 3.9 support removal" 2025-07-30 15:54:12 +00:00
Zuul
71470dac73 Merge "Add comprehensive release liaison guide for DPL model" 2025-07-30 15:24:49 +00:00
Zuul
5ba086095c Merge "Fix release notes typo and extra information" 2025-07-21 18:41:57 +00:00
Sean Mooney
3e8392b8f1 finalize python 3.9 support removal
The last release of openstack to support python 3.9
was 2025.1 (epoxy), with this change watcher now requires
3.10, testing of 3.9 was removed in previous commits.

Change-Id: Ida53740293e93b0c20dec2e175b390fa18bed852
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-07-21 18:25:04 +01:00
Sean Mooney
20cd4a0394 Add comprehensive release liaison guide for DPL model
Transform Nova's PTL guide into Watcher-specific release liaison
documentation following the DPL governance model. This guide provides
chronological guidance for release liaisons managing Watcher's
cycle-with-intermediary release process.

Key features:
* DPL liaison coordination with proper precedence hierarchies
* Watcher-specific project context and repository references
* Enhanced FFE process with release liaison decision authority
* Proper RST formatting with code blocks and cross-references
* Comprehensive glossary of OpenStack release terminology
* Usage guidance for both new and experienced release liaisons

Adapts Nova's proven chronological structure while reflecting
Watcher's distributed leadership model and technical requirements.

Assisted-By: claude-code
Change-Id: I133bb06e47c14deaca162a2bf024210f68d78ab2
Signed-off-by: Sean Mooney <work@seanmooney.info>
2025-07-21 16:34:47 +01:00
Zuul
374750847f Merge "Merge decision engine services into a single one" 2025-07-17 13:09:11 +00:00
Chandan Kumar
2fe3b0cdbe Fix release notes typo and extra information
This cr fixes the release notes for
https://review.opendev.org/c/openstack/watcher/+/954120/ and
https://review.opendev.org/c/openstack/watcher/+/954120/

Related-Bug: #2110895
Related-Bug: #2115968

Change-Id: I1f3fc06549c2d5d7ba9debee424429a25a651070
Signed-off-by: Chandan Kumar <chkumar@redhat.com>
2025-07-09 15:44:20 +05:30
Zuul
9b9965265a Merge "Drop Code related to OperationNotPermitted exception" 2025-07-08 19:31:11 +00:00
Zuul
98b56b66ac Merge "Drops forbidden patch/delete/post action apis" 2025-07-08 18:38:40 +00:00
Douglas Viroel
081cd5fae9 Merge decision engine services into a single one
The decision engine process was built based on 2
services: a service that handle rpc requests and a
scheduler to trigger watcher periodic tasks.
With the new version of oslo.service, a new threading
backend was added, based on cotyledon service manager,
which starts a new process for each service tha it
manages. These two services can't run in different
process since they need access to a shared in-memory
representation of the cluster (cluster data models)
This patch proposes creating a Decision Engine Service
which includes everything in a single main service.

Change-Id: I335a97ca14b6e023fef055978a56aefebf22d433
Signed-off-by: Douglas Viroel <viroel@gmail.com>
2025-07-08 09:55:32 -03:00
Zuul
1ab5babbb6 Merge "Move eventlet command scripts to a different dir" 2025-07-08 12:41:35 +00:00
Zuul
d771d00c5a Merge "sqlalchemy: Use built-in declarative" 2025-07-08 12:41:32 +00:00
Chandan Kumar (raukadah)
e3b813e27e Drop Code related to OperationNotPermitted exception
The following exception was added in initial import of watcher
code base[1].

In each of the controller REST APIs, it was called with a flag
stating request was coming from top level resources apis.

But this exception and code was not used anywhere in the
rest api. It seems to be a dead code. So, it needs to be
cleaned up.

Note: In audit_template, under patchapi, this exception
was used for not removal goal from audit template.

Since this cr drops this exception, It replace the same
with NotAuthorized exception keeping status code same.

Links:
[1]. d14e057da1 (diff-6d510a275605e20ba8b435157062da2b749265a88a3cfd6d90abb7e8e5feac2aR235)

Closes-Bug: #2115968

Change-Id: I82a5e4a7a51726b3a89257c84a75157fbfcb82eb
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-07-04 19:07:13 +05:30
Chandan Kumar (raukadah)
c0a5abe29c Drops forbidden patch/delete/post action apis
These apis are not implemented with in the watcher code base and
was marked as a forbidden to use.

It does not make sense to keep these api as they are not implemented.
This cr drops the code around that to make the action apis cleaner.

Closes-Bug: #2110895

Change-Id: I0f465157e6cd481b27665ca6016db68c198cebeb
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-07-04 11:51:40 +05:30
Zuul
bbe30f93f2 Merge "Update workload balance doc per review comments" 2025-07-03 19:57:05 +00:00
Zuul
3bc5c72039 Merge "resolve fixme comments in RequestContext" 2025-07-03 17:28:54 +00:00
Zuul
203b926be0 Merge "Drop unused fake class" 2025-07-03 17:28:52 +00:00
Zuul
e64709ea08 Merge "Add warning message for experimental integrations" 2025-07-03 17:27:39 +00:00
Zuul
94d8676db8 Merge "add missing bindeps for docs" 2025-07-03 16:03:47 +00:00
Takashi Kajinami
828bcadf6a sqlalchemy: Use built-in declarative
sqlalchemy.ext.declarative was deprecated in sqlalchemy 1.4.0, due to
the built-in implementations[1].

[1] https://github.com/sqlalchemy/sqlalchemy/commit/450f5c0d6519a439f40

Change-Id: Idb4a361d4d65ff53ecf33b8a2a6aa0d6f6ae1979
2025-06-30 22:14:33 +09:00
Zuul
93366df264 Merge "Add crosslinks to strategies table" 2025-06-30 13:02:28 +00:00
Takashi Kajinami
aa67096fe8 Drop unused fake class
It was a left-over from removal of ceilometer datasource[1].

[1] da23fdc621

Change-Id: I17ef33d6f70e2cc601add721661347d0bf210008
2025-06-28 20:35:09 +09:00
Ronelle Landy
6f72e33de5 Add crosslinks to strategies table
These replace the full external links
used previously.

Change-Id: I9c79f7b7ddebaa25d243fdbe1eb422cba25de8f1
2025-06-27 16:54:38 -04:00
Ronelle Landy
56d0a0d6ea Update workload balance doc per review comments
The original documentation update review [1]
had some additional comments for improvements.
The commit adds the suggested changes.

[1] https://review.opendev.org/c/openstack/watcher/+/951025

Change-Id: I4b4624e2dbc4c6a5f888ec77d6a03b8f66ff0a23
2025-06-27 16:46:17 -04:00
Ronelle Landy
de9eb2cd80 Add doc clarifications for Zone Migration
Adds documation clarifications on how the
strategy and associated parameters as used.

Closes-Bug: #2112480
Change-Id: Id42c280fc5744bebb01d50b52b834e5b3b76af73
2025-06-27 16:12:41 -04:00
Zuul
76de167171 Merge "Add Integrations doc page with support matrix" 2025-06-27 16:09:51 +00:00
Zuul
70032aa477 Merge "Add table - level of test/usage per strategy" 2025-06-27 16:01:31 +00:00
Zuul
16131e5cac Merge "Update Workload Balance strategy documentation" 2025-06-27 13:36:50 +00:00
Ronelle Landy
bfbd136f4b Update Host Maintenance strategy documentation
Add clarifications to the documentation to reflect
the actual strategy usage, including:
 - updating parameter descriptions
 - extending the 'How to Use' section

Closes-Bug: #2111810
Change-Id: Ifd2876056cd8819c50658fb9f213246dc1546d42
2025-06-23 06:36:42 -04:00
Zuul
fe8d8c8839 Merge "Use KiB as unit for host_ram_usage when using prometheus datasource" 2025-06-20 16:19:50 +00:00
Zuul
b8e0e6b01c Merge "Aggregate by label when querying instance cpu usage in prometheus" 2025-06-19 14:46:07 +00:00
Alfredo Moralejo
6ea362da0b Use KiB as unit for host_ram_usage when using prometheus datasource
The prometheus datasource was reporting host_ram_usage in MiB as
described in the docstring for the base datasource interface
definition [1].

However, the gnocchi datasource is reporting it in KiB following
ceilometer metric `hardware.memory.used` [2] and the strategies
using that metric expect it to be in KiB so the best approach is
to change the unit in the prometheus datasource and update the
docstring to avoid missunderstandings in future. So, this patch
is fixing the prometheus datasource to return host_ram_usage
in KiB instead of MiB.

Additionally, it is adding more unit tests for the check_threshold
method so that it covers the memory based strategy execution, validates
the calculated standard deviation and adds the cases where it is below
the threshold.

[1] 15981117ee/watcher/decision_engine/datasources/base.py (L177-L183)
[2] https://docs.openstack.org/ceilometer/train/admin/telemetry-measurements.html#snmp-based-meters

Closes-Bug: #2113776
Change-Id: Idc060d1e709c0265c64ada16062c3a206c6b04fa
2025-06-19 16:25:27 +02:00
Zuul
0f78386462 Merge "Add debug message to report calculated metric for workload_balance" 2025-06-18 12:26:24 +00:00
Alfredo Moralejo
1529e3fadd Add debug message to report calculated metric for workload_balance
The workload_balance strategy calculates host metrics based on the
instance metrics and those are the ones used to compare with the
threshold.

Currently, the strategy does not reports the calculated values what
makes difficult to troubleshoot sometimes. This patch is adding a debug
message to log those values.

This patch is also adding a new unit test for filter_destination_hosts
based on ram instead of cpu and adding assertions for the new debug
messages. To implement properly the new test, I had to sligthly modify
the ram usage fixtures used for the workload_balance tests.

Change-Id: Ief5e167afcf346ff53471f26adc70795c4b69f68
2025-06-17 19:11:48 +02:00
Zuul
31879d26f4 Merge "Add unit test zone migration with_attached_volume" 2025-06-13 12:17:52 +00:00
Zuul
efbae9321e Merge "devstack: Drop template for mod_wsgi" 2025-06-13 10:44:48 +00:00
Ronelle Landy
0599618add Add table - level of test/usage per strategy
This patch adds a table to the strategies page to
show the level of qualification and where the
strategy can be triggered.

Change-Id: I6991566fd5fec3f8bbae06eefa63a8b83a87eed1
2025-06-11 14:19:42 -04:00
Zuul
1d50c12e15 Merge "Adapt zuul.yaml strategies jobs to include tests with tag 'strategy'" 2025-06-11 13:47:34 +00:00
Alfredo Moralejo
3860de0b1e Aggregate by label when querying instance cpu usage in prometheus
Currently, when the prometheus datasource query ceilometer_cpu metric
for instance cpu usage, it aggregates by instance and filter by the
label containing the instance uuid. While this works fine in real
scenarios, where a single metric is provided in a single instance, in
some cases as the CI jobs where metrics are directly injected, leads to
incorrect metric calculation.

We applied a similar fix for the host metrics in [1] but we did not
implement it for instance cpu.

I am also converting the query formatting to the dict format to improve
understability.

[1] https://review.opendev.org/c/openstack/watcher/+/946049

Closes-Bug: #2113936
Change-Id: I3038dec20612162c411fc77446e86a47e0354423
2025-06-11 14:49:56 +02:00
Chandan Kumar (raukadah)
15981117ee Drop unused method get_disabled_compute_nodes_with_reason
get_disabled_compute_nodes_with_reason defined in host_maintenance
strategy is not used anywhere.

This cr drops the unused method.

Change-Id: I07c0d0b63e00d476511aa8b03c0feab8ec4db95b
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-06-09 10:51:45 +05:30
Douglas Viroel
4f8c14646d Move eventlet command scripts to a different dir
This is a initial patch towards the eventlet removal in watcher.
It moves cmd scripts that depends on eventlet to a eventlet dir,
where it is always monkey patched.

Change-Id: Ie23caab018fbf68f8c29a0f748c0708b97933b4b
2025-06-08 09:05:56 -03:00
Douglas Viroel
520ec0b79b Add warning message for experimental integrations
Some services integrations are now classified as experimental
and a warning message will now appear once a client is created
for them. These integrations are not fully tested in CI and
miss a documentation on how they work or should be used.
A release note was added to inform users about the status of
these integrations and related features.

Change-Id: Ib7d0ac0b3e187ae239dfa075fb53a6c0107dff29
2025-06-07 11:33:28 -03:00
Ronelle Landy
f42cb8557b Update Workload Balance strategy documentation
Adds additional parameter and usage explanations
and combined example.

Closes-Bug: #2111848
Change-Id: Id0de4d56fa7083388ad82c61596e7484431d465b
2025-06-06 15:51:23 -04:00
Douglas Viroel
b788a67c52 Add Integrations doc page with support matrix
Adds a new documentation section that descript which service
integrations are currently supported and their integrations status.
This information is not clear today and will help to cover the lack
of testing and documention about them.

Change-Id: I26b2a2ef5672b78a575a2bdaef3a08d5bbc063bd
2025-06-05 13:31:02 -03:00
Zuul
73f8728d22 Merge "Fix audit creation with no name and no goal or audit_template" 2025-06-05 13:39:38 +00:00
Alfredo Moralejo
bf6a28bd1e Fix audit creation with no name and no goal or audit_template
Currently, in that case it was failing because watcher tried to create a
name based on a goal automatically and the goal is not defined.

This patch is moving the check for goal specification in the audit
creation call earlier, and if there is not goal defined, it returns an
invalid call error.

This patch is also modifying the existing error for this case to check
the expected behavior.

Closes-Bug: #2110947

Change-Id: I6f3d73b035e8081e86ce82c205498432f0e0fc33
2025-06-04 14:46:36 +02:00
morenod
1256b24133 Adapt zuul.yaml strategies jobs to include tests with tag 'strategy'
The idea is to adapt zuul.yaml to future test structure where every strategy will be on its own file so now we keep executing everything inside test_execute_strategies but also any other test on any file with tag 'strategy'

Change-Id: I304c858078d35beb1f7b4f1fad4ea8bedde674af
2025-06-04 09:50:35 +00:00
Takashi Kajinami
a559c0505e devstack: Drop template for mod_wsgi
... because mod_wsgi support was already removed by [1].

[1] 57b248f9fe

Change-Id: I100169b3fb7ed68d9b01abb4fc91bdd16eb68aa9
2025-06-04 00:14:07 +09:00
Zuul
59757249bb Merge "Added unit test to validate audit creation with no goal and no name" 2025-05-27 19:32:07 +00:00
Zuul
58b25101e6 Merge "Return HTTP code 400 when creating an audit with wrong parameters" 2025-05-27 19:23:25 +00:00
Zuul
690a389369 Merge "Add a unit test to check the error when creating an audit with wrong parameters" 2025-05-27 19:23:23 +00:00
Zuul
1cdd392f96 Merge "Remove deprecated executor in message handling servers" 2025-05-26 14:44:39 +00:00
Zuul
20f231054a Merge "Set actionplan state to FAILED if any action has failed" 2025-05-26 14:44:37 +00:00
Zuul
077c36be8a Merge "Add unit test to check action plan state when a nested action fails" 2025-05-26 14:27:08 +00:00
Alfredo Moralejo
88d81c104e Set actionplan state to FAILED if any action has failed
Currently, an actionplan state is set to SUCCEEDED once the execution
has finished, but that does not imply that all the actions finished
successfully.

This patch is checking the actual state of all the actions in the plan
after the execution has finished. If any action has status FAILED, it
will set the state of the action plan as FAILED and will apply the
appropiate notification parameters. This is the expected behavior according
to Watcher documentation.

The patch is also fixing the unit test for this to set the expected
action plan state to FAILED and notification parameters.

Closes-Bug: #2106407
Change-Id: I7bfc6759b51cd97c26ec13b3918bd8d3b7ac9d4e
2025-05-26 14:58:03 +02:00
Zuul
8ac8a29fda Merge "Fix incorrect logging format" 2025-05-26 11:47:26 +00:00
Zuul
cd2910b0e9 Merge "Check logs in some cinder and nova helper tests" 2025-05-26 11:45:12 +00:00
jgilaber
167fb61b4e Add unit test zone migration with_attached_volume
Add a test for the zone migration strategy using the
with_attached_volume parameter, setting storage_pools but not
compute_nodes. With volumes attached to instances, with these inputs,
the strategy should propose an action plan to migrate volumes and the
instances they are attached to, since Nova, even without the user
passing a destination node for the instances is able to find one.
However, the execution results in an error, since the strategy assumes
that the compute_nodes dict will always be there.

Change-Id: Ifac28b1aab8a0caf77d97e4c19d051e764256674
2025-05-22 17:09:13 +02:00
Chandan Kumar (raukadah)
188e583dcb Drop sg_core related prometheus var
https://review.opendev.org/c/openstack/devstack-plugin-prometheus/+/950476
adds the support for passing custom scrape target and
https://github.com/openstack-k8s-operators/sg-core/pull/25
drops sg_core prometheus related vars.

So we also need to sg_core related prometheus vars from our job.
This cr achieves the same.

Depends-On: https://github.com/openstack-k8s-operators/sg-core/pull/25
Depends-On: https://review.opendev.org/c/openstack/devstack-plugin-prometheus/+/950476

Change-Id: I6c8f54f8749e81b532c88e9224022294c4a1d331
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-05-21 16:52:36 +05:30
Zuul
26e36e1620 Merge "Handle missing dst_node parameter in zone_migration" 2025-05-20 17:14:29 +00:00
Sean Mooney
a016b3f4ea add missing bindeps for docs
This add two more bindep targets to encode the
doc and pdf-docs deps.

Change-Id: Ide54be172c485025e567ede39c238b39b01c89e0
2025-05-19 23:55:20 +00:00
Sean Mooney
9f6c8725ed resolve fixme comments in RequestContext
This change removes all the duplicate fields from the
watcher RequestContext.

It also removes several filed like quota_class and
remote_address that were cargo culted from nova
but never used in watcher when notification support was
added.

Change-Id: Ibf8739d6cd2d4557df6f8de6c780b6f4280b774f
2025-05-19 20:19:52 +01:00
Sean Mooney
040a7f5c41 update tests for new oslo.context release
context.user has been deprecated for years
and renamed to user_id

the deprecated field has now been removed so this
change updates our test cases to reflect that.

Change-Id: I120441fb9392c370c57dc63d8c115d8993d25f62
2025-05-19 19:11:23 +01:00
Zuul
3585e0cc3e Merge "Drop code from Host maintenance strategy migrating instance to disabled hosts" 2025-05-16 18:18:26 +00:00
Zuul
ba8370e1ad Merge "Migrate value column of efficacy indicator on load" 2025-05-16 18:16:23 +00:00
Zuul
97c4e70847 Merge "Add test for missing destination in zone migration" 2025-05-16 17:10:18 +00:00
jgilaber
c6302edeca Handle missing dst_node parameter in zone_migration
For compute nodes, nova works fine if a destination node is not
specified, so this change makes sure we're not passing None when the
user does not set one to avoid an error.

Partial-Bug: 2108988

Change-Id: Ida1f18b97697c041819e29f935aa5e232848226a
2025-05-16 13:51:47 +02:00
Alfredo Moralejo
0651fff910 Added unit test to validate audit creation with no goal and no name
This patch is adding a new unit test to validate the behavior
of the API when trying to create an audit without a goal (whether using
a goal or audit template parameters) and no name is provided.

Related-Bug: https://bugs.launchpad.net/watcher/+bug/2110947
Change-Id: I04df10a8a0eea4509856f2f4b9d11bae24cd563a
2025-05-16 11:13:52 +02:00
Alfredo Moralejo
b36ba8399e Add unit test to check action plan state when a nested action fails
This patch is adding a new unit test to check the behavior of the action
plan when one of the actions in it fails during execution.

Note this is to show a bug, and the expected state will be changed in
the fixing patch.

Related-Bug: #2106407
Change-Id: I2f3fe8f4da772a96db098066d253e5dee330101a
2025-05-16 09:52:28 +02:00
Alfredo Moralejo
4629402f38 Return HTTP code 400 when creating an audit with wrong parameters
Currently, when trying to create an audit which misses a mandatory
parameter watcher returns error 500 instead of 400 which is the
documented error in the API [1] and the appropiate error code for
malformed requests.

This patch catch parameters validation errors according to the json
schema for each strategy and returns error 400. It also fixes the
unit test to validate the expected behavior.

[1] https://docs.openstack.org/api-ref/resource-optimization/#audits

Closes-Bug: #2110538
Change-Id: I23232b3b54421839bb01d54386d4e7b244f4e2a0
2025-05-16 09:35:50 +02:00
Zuul
86a260a2c7 Merge "Set keystone_client default interface to public" 2025-05-15 12:45:52 +00:00
jgilaber
63626d6fc3 Add test for missing destination in zone migration
Add some tests to show that the zone migration strategy generates
problematic input parameters for actions in some cases when destination
parameters are not passed for instances or volumes.

Change-Id: Idc3af0e6d9d2d5388ff3d152d81e63364758607b
2025-05-15 13:00:39 +02:00
afanasev.s
0f5b6a07d0 Fix incorrect logging format
Fix incorrect logging format for multiple variables because of what this
functionality didn't work correctly and some log messages were skipped.
The logging calls require two arguments, but they are passed in a tuple
so it's interpreted as one argument only and it fails as is missing
the second argument.

Closes-Bug: 2110149

Change-Id: I74ed44134b50782c105a0e82f3af34a5fa45d119
2025-05-15 12:55:18 +02:00
jgilaber
7d90a079b0 Check logs in some cinder and nova helper tests
Check the debug logs for some methods in the cinder and nova helpers to
reproduce the erros described in bug [1]. The logger is disabled by default,
so the error was being ignored, in order to  show the error, the logger
needs to be enabled for the tests in question. The logging was disabled
by allembic configuring logging in [2], so this patch also removes that
logging config to expose the errors.

[1] https://bugs.launchpad.net/watcher/+bug/2110149.
[2] https://github.com/openstack/watcher/blob/master/watcher/db/sqlalchemy/alembic/env.py#L26

Change-Id: I3598ca1d08d260602c392f8a8098821faa53f570
2025-05-15 12:55:18 +02:00
Alfredo Moralejo
891119470c Add a unit test to check the error when creating an audit with wrong parameters
Currently, it is returning http error code 500 instead of 400, which
would be the appropiate code.

A follow-up patch will be sent with the vix and switching the error code
and message.

Related-Bug: #2110538
Change-Id: I35ccbb9cf29fc08e78c4d5f626a6518062efbed3
2025-05-14 17:01:59 +02:00
Chandan Kumar (raukadah)
9dea55bd64 Drop code from Host maintenance strategy migrating instance to disabled hosts
Currently host maintenance strategy also migrate instances from maintenance
node to watcher_disabled compute nodes.

watcher_disabled compute nodes might be disabled for some other purpose
by different strategy. If host maintenace use those compute nodes for
migration, It might affect customer workloads.

Host maintenance strategy should never touch disabled hosts unless the user
specify a disable host as backup node.

This cr drops the logic for using disabled compute node for maintenance.
Host maintaince is already using nova schedular for migrating the
instance, will use the same. If there is no available node, strategy
will fail.

Closes-Bug: #2109945

Change-Id: If9795fd06f684eb67d553405cebd8a30887c3997
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-05-14 09:24:25 +05:30
Douglas Viroel
b4ef969eec Remove deprecated executor in message handling servers
Removes the deprecated message executor when creating both RPC
and notification server instances. This parameter is deprecated[1],
as well eventlet option.
When not defined, the server will get the one that fits better the
current context (monkey patched or not)[2]

[1] 27d833e374
[2] 412ab4de92/oslo_messaging/_utils.py (L87)

Change-Id: I784407aa7db10bddcec5dc663e1cec65174631e0
2025-05-13 14:10:18 -03:00
jgilaber
322c89d982 Migrate value column of efficacy indicator on load
In a recent change [1] we modified the database schema for efficacy
indicators to use a 'data' column. However, that patch only contained
the schema migration and a fallback to be able to read from older
databases, and not any kind of data migration. This change introduces
a migration on load, so whenever an efficacy indicator without a 'data'
column is loaded, the column is populated in the database. The change
also modifies the migration test to verify the procedure works well.

[1] https://review.opendev.org/c/openstack/watcher/+/945199

Change-Id: Ib0621b0e03451faca803018d6a2f3ad657a25fb5
2025-05-13 16:36:59 +02:00
Zuul
59607f616a Merge "Drop nova command reference from the code" 2025-05-13 12:39:25 +00:00
Chandan Kumar (raukadah)
3f6c7e406a Drop nova command reference from the code
In DevStack environment, nova service-list command does not
exist. Distro suggests to install python-novaclient from package.

In Strategies documentation, we generate the docs from following
code.[1]
```
       * - ``migration``
         - .. watcher-term:: watcher.applier.actions.migration.Migrate
       * - ``change_nova_service_state``
         - .. watcher-term:: watcher.applier.actions.change_nova_service_state.ChangeNovaServiceState
```
and with in code, we use nova python binding to get list services[2]
and we are not calling openstack cli reference with in the code.

Documenting the equivalent openstack command does not seems to be useful
in the help text as we are using python binding.

Links:
[1]. c4acce91d6/doc/source/strategies/host_maintenance.rst (L45)
[2]. c4acce91d6/watcher/common/nova_helper.py (L150-L152)

Change-Id: I0c663c9741fae94bdb9c30f46d3d396325a33948
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-05-13 08:49:54 +05:30
Zuul
fd3d8b67ff Merge "Set number of decimal digits in efficacy indicator" 2025-05-13 00:06:07 +00:00
Zuul
c73f126b15 Merge "Deprecated Noisy Neighbor strategy" 2025-05-12 23:50:12 +00:00
Douglas Viroel
17d1cf535a Deprecated Noisy Neighbor strategy
Noisy neighbor strategy is a proof of concept strategy that was
built based on LLC metric, which is not available in Nova since
Victoria release[1].
This patch marks this strategy as deprecated, to be removed in
future releases.

[1] https://docs.openstack.org/releasenotes/nova/victoria.html#relnotes-22-0-0-unmaintained-victoria-upgrade-notes

Change-Id: I940b88555007312c76a86706bd44a38fbcf7701e
2025-05-12 15:44:39 -03:00
jgilaber
ae48f65f20 Set keystone_client default interface to public
Set the default interface for keystone_client to public in the watcher
conf instead of admin.

Closes-Bug: 2109494

Change-Id: I9e0289249981ca965190df6dbdc37e09fd0951d7
2025-05-09 08:16:51 +02:00
jgilaber
0ed3d4de83 Set number of decimal digits in efficacy indicator
Configure the numeric type of the EfficacyIndicator value to use Float.
Add a new column named data and deprecate the existing value columen.
With the current model, value will use the default scale of the
Decimal type of mysql, which in some enviornments is 0.

This change also adds a test with mysql as backend to reproduce the
issue, since the existing tests using sqlite do not reproduce the
problem, as well as some simple migration tests.

Closes-Bug: #2103458
Change-Id: Ib281fa32e902d2181449091f493d6506b5199094
2025-05-07 16:20:31 +02:00
jgilaber
6c5845721b Add test for EfficacyIndicator value in mysql
Add a test with mysql as backend to show that the current
EfficacyIndicator model does not store any decimal digit for the value.

Change-Id: I0cdbd7d87cd6869a10b48eda3d59558831c8dd36
2025-05-07 16:20:03 +02:00
Sean Mooney
77e7e4ef7b drop jammy jobs
ubuntu jammy is nolonger part of the required
testing runtime so this change simply removes
the jammy jobs.

Change-Id: I1e3bbb14cea5b856e8146f3a32d60c3a4ffdcfcc
2025-05-02 17:41:06 +00:00
Sean Mooney
f38ab70ba4 drop suse supprot in the devstack plugin
suse has not been a testing runtime for a few releases
and we have no jobs currently validating it still work.

this change just removes the suse specific logic

Change-Id: I357fa71704af7aa6239054ede29d0fdcdc3fb8b5
2025-05-02 17:41:00 +00:00
Sean Mooney
7aabd6dd5a update pre-commit hook versions
This updates all hooks to there latest verions
notable this adds python 3.13 support to autopep8

Change-Id: Ia67ed74c9942ff26bb1f8c1d72bf57aedfcd3846
2025-05-02 17:40:50 +00:00
Zuul
1b12e80882 Merge "Make prometheus the default devstack example" 2025-05-02 13:50:50 +00:00
Zuul
9f685a8cf1 Merge "[host_maintenance] Pass des hostname in add_action solution" 2025-05-02 13:45:57 +00:00
Sean Mooney
57b248f9fe Add support for pyproject.toml and wsgi module paths
pip 23.1 removed the "setup.py install" fallback for projects that do
not have pyproject.toml and now uses a pyproject.toml which is vendored
in pip [1][2]. pip 24.2 has now deprecated a similar fallback to
"setup.py develop" and plans to fully remove this in pip 25.0 [3][4][5].
pbr supports editable installs since 6.0.0

pip 25.1 has now been released and the removal is complete.
by adding our own minimal pyproject.toml to ensure we are using the
correct build system.

This change also requires that we adapt how we generate our wsgi
entry point. when pyproject.toml is used the wsgi console script is
not generated in an editbale install such as is used in devstck

To adress this we need to refactor our usage of our wsgi applciation
to use a module path instead. This change does not remove
the declaration of our wsgi_scrtip entry point but it shoudl
be considered deprecated and it will be removed in the future.

To unblock the gate the devstack plugin is modifed to to deploy
using the wsgi module instead of the console script.

Finally supprot for the mod_wsgi wsgi mode is removed.
that was deprecated in devstack a few cycle ago and
support was removed in I8823e98809ed6b66c27dbcf21a00eea68ef403e8

[1] https://pip.pypa.io/en/stable/news/#v23-1
[2] https://github.com/pypa/pip/issues/8368
[3] https://pip.pypa.io/en/stable/news/#v24-2
[4] https://github.com/pypa/pip/issues/11457
[5] https://ichard26.github.io/blog/2024/08/whats-new-in-pip-24.2/
Closes-Bug: #2109608

Depends-on: https://review.opendev.org/c/openstack/watcher/+/948502
Change-Id: Iad77939ab0403c5720c549f96edfc77d2b7d90ee
2025-05-01 00:19:59 +00:00
Chandan Kumar (raukadah)
278cb7e98c [host_maintenance] Pass des hostname in add_action solution
Currently we are passing src_node and des_node uuid when we try to run
migrate action.

In the watcher-applier log, migration fails with following exception
```
Nova client exception occurred while live migrating instance <uuid>Exception: Compute host <uuid> could not be found
```
Based on 57f55190ff/watcher/applier/actions/migration.py (L122)
and
57f55190ff/watcher/common/nova_helper.py (L322),
live_migrate_instance expects destination hostname not uuid.

This cr replaces dest_node uuid to hostname.

Closes-Bug: #2109309

Change-Id: I3911ff24ea612f69dddae5eab15fabb4891f938d
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-04-25 15:51:20 +05:30
jgilaber
2c76da2868 Make prometheus the default devstack example
Change the devstack local.conf samples and devstack multinode
contributor doc to demonstrate deploying watcher with prometheus as
datasource instead of gnocchi. Keep the gnocchi as an alternative
deployment example.

Depends-On: https://review.opendev.org/c/openstack/watcher/+/946230
Depends-On: https://review.opendev.org/c/openstack/devstack-plugin-prometheus/+/946254

Change-Id: I721b550a03f9e5350a3f1ab10292faa1c50049a7
2025-04-24 16:06:50 +02:00
Alfredo Moralejo
c4acce91d6 Add real-data based tests to experimental and weekly pipelines
This job is adding a new job using prometheus datastore and real
workload data into the experimental pipeline so that we can run it
on-demand.

Also, it is adding it to the weekly periodic pipeline as agreed on
Watcher meeting.

Also I am excluding strategies execution with annotation `real_load` in
non-real-load jobs.

Finally, I'm moving the project configuration to the end of the file
as requested in the comments, as it's the usual location by convention.

Change-Id: Id41efda2f0dd8b1521df3f6179c3504f298e0e59
2025-04-15 16:11:21 +02:00
Zuul
adbcac9319 Merge "Replace watcherclient functional job with python-watcherclient-functional" 2025-04-15 13:32:34 +00:00
Zuul
c9a1d06e7c Merge "Aggregate by fqdn label instead instance in host cpu metrics" 2025-04-08 17:37:10 +00:00
Zuul
25c1a8207f Merge "Drop sg_core prometheus related vars" 2025-04-08 11:55:39 +00:00
Chandan Kumar (raukadah)
0702cb3869 Drop sg_core prometheus related vars
The depends-on pr removes the installation of promotheus[1] and node
exporter[2] from sg_core. We no longer need to define those vars in
the devstack config.

Links:
[1]. https://github.com/openstack-k8s-operators/sg-core/pull/21
[2]. https://github.com/openstack-k8s-operators/sg-core/pull/23

Note: We do not need to enable sg_core service on compute node,
so removing it's plugin call.

Change-Id: Ie8645813a360605635de4dff9e8d1ba0d7a0cdc3
Signed-off-by: Chandan Kumar (raukadah) <raukadah@gmail.com>
2025-04-04 19:36:54 +05:30
Zuul
03c107a4ce Merge "Imported Translations from Zanata" 2025-04-03 18:49:08 +00:00
Alfredo Moralejo
c7158b08d1 Aggregate by fqdn label instead instance in host cpu metrics
While in a regular case a specific metric for a specific host will be
provider by a single instance (exporter) so aggregating by label and by
intances should be the same, it is more correct to aggregate by the same
label that the one we use to filter the metrics.

This is follow up of https://review.opendev.org/c/openstack/watcher/+/944795

Related-Bug: #2103451

Change-Id: Ia61f051547ddc51e0d1ccd5a56485ab49ce84c2e
2025-04-02 15:36:17 +02:00
Zuul
035e6584c7 Merge "Query by fqdn_label instead of instance for host metrics" 2025-03-20 12:50:28 +00:00
Chandan Kumar (raukadah)
253e97678c Replace watcherclient functional job with python-watcherclient-functional
https://review.opendev.org/c/openstack/python-watcherclient/+/943132
Move functional tests from watcher_tempest_plugin to watcherclient and
adds new zuul job based on devstack-tox-functional to run functional tests.

This pr replaces the existing zuul job using tempest regex with
devstack tox functional job. The new job will run only watcher/api
changes.

Closes-Bug: #2100741

Depends-On: https://review.opendev.org/c/openstack/python-watcherclient/+/943132

Change-Id: Ic2371745fe8aaf6f283151111fec4f92ea6bdf69
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-03-20 10:04:13 +00:00
OpenStack Proposal Bot
c7bb1fe52d Imported Translations from Zanata
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: Ifbc199a502734046ea57f5190c328447d66013ce
2025-03-20 03:49:09 +00:00
Alfredo Moralejo
a65e7e9b59 Query by fqdn_label instead of instance for host metrics
Currently we are using `instance` label to query about host metrics to
prometheus. This label is assigned to the url of each endpoint being
scrapped.

While this work fine in one-exporter-per-compute cases as the driver is
mapping the fqdn_label value to the `instance` label value, it fails
when there are more that one target with the same value for the fqdn
label. This is a valid case, to be able to query by fqdn and do not
care about what exporter in the host is providing the metric.

This patch is changing the queries we use for hosts to be based on the
fqdn_label instead of the instance one. To implement it, we are also
simplifying the way we check the metric exist for the host by converting
prometheus_fqdn_instance_map into a prometheus_fqdn_labels set
which stores the list of fqdn found in  prometheus.

Closes-Bug: #2103451
Change-Id: I3bcc317441b73da5c876e53edd4622370c6d575e
2025-03-19 15:25:24 +01:00
OpenStack Release Bot
b671550c91 Update master for stable/2025.1
Add file to the reno documentation build to show release notes for
stable/2025.1.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2025.1.

Sem-Ver: feature
Change-Id: Ie7a1845d7b02b852e776ed8ec73598caab2fb5c6
2025-03-13 13:51:55 +00:00
Zuul
52bba70fec Merge "Do not collect node_exporter cpu metrics in prometheus job" 2025-03-12 14:38:24 +00:00
Zuul
f2ee231f14 Merge "pre-commit: Integrate bandit" 2025-03-11 09:58:29 +00:00
Zuul
3861701f4a Merge "Replace deprecated abc.abstractproperty" 2025-03-11 09:47:31 +00:00
Zuul
d167134265 Merge "Drop implicit test dependency on iso8601" 2025-03-11 09:47:30 +00:00
Douglas Viroel
539be503f0 Do not collect node_exporter cpu metrics in prometheus job
Prometheus job already injects fake metrics for hosts and
instances. This patch disables node_exporter cpu metric
collector to avoid mixing both real and fake values in
test execution.

Depends-On: https://review.opendev.org/c/openstack/devstack-plugin-prometheus/+/942181
Depends-On: https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/943825

Change-Id: Ie2b7269ab15af6190ce72ba2b149e84375f4419d
2025-03-08 10:57:31 -03:00
Sean Mooney
bbf5c41cab Add epoxy prelude
This change added the prelude for the 2025.1 Expoxy release cycle.

Change-Id: I8223842a57491a91c565e47bd1819db4d142e628
2025-03-05 17:57:55 +00:00
Takashi Kajinami
df3d67a4ed Replace deprecated abc.abstractproperty
It was deprecated in Python 3.3 [1].

[1] https://docs.python.org/3.13/whatsnew/3.3.html#abc

Change-Id: Ibd98cb93f697a6da6a6bc5a5030640a262c7a66b
2025-03-02 15:36:48 +09:00
Takashi Kajinami
82f1c720dd Drop implicit test dependency on iso8601
The library has been missing from the test requirements although it is
directly used. Replace it by the built-in datetime module to get rid
of the unmaintained direct dependency.

Change-Id: I1d08b38862b54fee4c7c26161f59264fb3f2ce51
2025-03-01 15:23:15 +09:00
Zuul
77a30ef281 Merge "Enable prometheus datasource in watcher-prometheus-integration job" 2025-02-28 13:26:10 +00:00
Zuul
383751904c Merge "Further database refactoring" 2025-02-27 11:52:59 +00:00
Zuul
6a1f19d314 Merge "Deprecate Monasca data source" 2025-02-27 11:45:15 +00:00
Douglas Viroel
342fe8882a Enable prometheus datasource in watcher-prometheus-integration job
Enable prometheus as datasouce in tempest configuration,
to enable metric generation needed to run some scenario
tests. It is enabled on the watcher-prometheus-integration
job

Depends-On: https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942141
Depends-On: https://review.opendev.org/c/openstack/watcher-tempest-plugin/+/942308

Change-Id: I2b657782aedf61d89766fcd18bb453b62c0b0e3b
2025-02-22 10:46:01 -03:00
Chandan Kumar (raukadah)
7fcca0cc46 Enable prometheus and node_exporter from devstack-plugin-prometheus
https://opendev.org/openstack/devstack-plugin-prometheus is the new
devstack plugin providing functionality to install/configure
prometheus/node_exporter.

It will replace sg_core devstack plugin in future.

Depends-On: https://review.opendev.org/c/openstack/watcher/+/938893
Depends-On: https://review.opendev.org/c/openstack/devstack-plugin-prometheus/+/940426

Change-Id: Ia75e6597275b36c04cde653c16f7d45ed23bc261
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-02-19 08:49:53 -03:00
Takashi Kajinami
977f014cba Deprecate Monasca data source
The Monasca project was marked inactive during 2023.1. Although we have
seen multiple people showing interest to keep the project, we haven't
seen any real progress.

Because the project is likely retired soon, let's deprecate the feature
dependent on Monasca so that we can remove it in a future release.

Change-Id: Ifd64f5ba59bbac238ff62302ec36a3e36954d6d0
2025-02-16 18:45:31 +09:00
James Page
753c44b0c4 Further database refactoring
More refactoring of the SQLAlchemy database layer to improve
compatility with eventlet on newer Pythons.

Inspired by 0ce2c41404

Related-Bug: 2067815
Change-Id: Ib5e9aa288232cc1b766bbf2a8ce2113d5a8e2f7d
2025-02-14 11:42:47 +00:00
Takashi Kajinami
dd0082c343 pre-commit: Integrate bandit
Run bandit check from per-commit so that the check is executed in pep8
job.

Also remove requirements installed automatically by pre-commit from
test-requirements.

Change-Id: I45af8c47afb262882ebbee74ae52446fed741e26
2025-02-10 22:50:34 +09:00
Takashi Kajinami
5f6fbaea56 Remove unused os-api-ref from test requirements
It is used when building API reference but is not used in any testing.

Change-Id: I6af7c7b110b338acad10eccf42344a338afbc915
2025-02-09 08:14:17 +09:00
Takashi Kajinami
6b81b34b27 Drop import fallback for Python 2
cPickle no longer exists in Python 3 and pickle should be used always.

Change-Id: I5ddedb3e996d9a0679bab38ea94263886274ece4
2025-02-09 08:04:36 +09:00
Zuul
961bbb9460 Merge "Update master for stable/2024.2" 2025-02-06 08:07:22 +00:00
Zuul
d56e8ee65a Merge "X-Project-Name key in test code was duplicated" 2025-02-03 18:29:23 +00:00
Zuul
4527f89d8d Merge "Add support for instance metrics to prometheus datasource" 2025-02-03 13:22:28 +00:00
Zuul
e535177bc0 Merge "Remove ceilometer datasource" 2025-01-29 13:22:46 +00:00
Zuul
022d150d20 Merge "Add prometheus data source for watcher decision engine" 2025-01-24 13:46:32 +00:00
Alfredo Moralejo
136e5d927c Add support for instance metrics to prometheus datasource
In order to support vm_workload_consolidation, workload_balance and
workload_stabilization strategis some instance metrics are required.
This patch is adding support for them.

Implementation is based on a prometheus store populated using sg-core
from ceilometer metrics with Pollster source.

- instance_ram_usage: rely on ceilometer_memory_usage metrics created from
  ceilometer memory.usage meter.
- instance_ram_allocated: rely on the memory value provided by the
  inventory created from nova and placement APIs.
- instance_cpu_usage: rely on ceilometer_cpu metric created from
  ceilometer cpu meter. A max value of 100 is set in the query.
- instance_root_disk_size: rely on the `disk` value provided by the
  inventory created from nova and placement APIs.

A new parameterer `instance_uuid_label` has been added to the prometheus
datasource configuration to identify the label used to store the value of the
OpenStack instance uuid for eache instance metric in prometheus. Default
value is `resource`.

Change-Id: I2f2b56aa002014e511a5e48398ef1da43fc4f5e2
2025-01-23 13:23:04 +01:00
Chandan Kumar (raukadah)
1968334b29 Drop bandit B320 profile to fix tox -e bandit interface
e4da0b351f
drops B320 profile from blacklist. Bandit no longer identify this
profile leading to tox -e bandit failure.

This profile is not listed here
https://bandit.readthedocs.io/en/latest/plugins/index.html#complete-test-plugin-listing.
so dropping it fixes the issue.

Closes-Bug: #2094789

Change-Id: I8543a507757a22b69d9b8fda500910d2246028c4
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
2025-01-14 16:05:19 +05:30
Zuul
0b78f31e3a Merge "Add Tempest test for Prometheus integration" 2025-01-10 17:04:02 +00:00
Ronelle Landy
56b8c1211a Add Tempest test for Prometheus integration
This review adds a base job to test Watcher,
via devstack/tempest installation) and the
intreraction with the newly added
Prometheus data source.

Related change:
https://review.opendev.org/c/openstack/watcher/+/934423

Change-Id: Id9d7d2ded1aae160a97a5f0aa0f7048a9c38e87d
2025-01-10 08:50:04 -05:00
m
3f26dc47f2 Add prometheus data source for watcher decision engine
This adds a new data source for the Watcher decision engine that
implements the watcher.decision_engine.datasources.DataSourceBase.

related spec was merged at [1].

Implements: blueprint prometheus-datasource

[1] https://review.opendev.org/c/openstack/watcher-specs/+/933300

Change-Id: I6a70c4acc70a864c418cf347f5f6951cb92ec906
2025-01-10 15:20:37 +02:00
OpenStack Proposal Bot
1b6f723cc3 Imported Translations from Zanata
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: I2f92bde2f6eb0d479d0b56742c530a747fa64a70
2025-01-10 04:28:40 +00:00
Zuul
d6cb38289e Merge "reno: Update master for unmaintained/2023.1" 2025-01-10 00:25:12 +00:00
Zuul
406be36c45 Merge "reno: Update master for unmaintained/zed" 2025-01-10 00:25:11 +00:00
Zuul
6bb761a803 Merge "reno: Update master for unmaintained/yoga" 2025-01-10 00:25:09 +00:00
Zuul
a169d42b1f Merge "reno: Update master for unmaintained/xena" 2025-01-10 00:25:08 +00:00
Zuul
4827d6e766 Merge "reno: Update master for unmaintained/victoria" 2025-01-10 00:25:07 +00:00
Zuul
2a2db362e3 Merge "Replace deprecated LegacyEngineFacade" 2025-01-10 00:19:56 +00:00
Zuul
32756dc7b4 Merge "Replace deprecated configure_auth_token_middleware" 2025-01-10 00:06:14 +00:00
Zuul
ee447a2281 Merge "Remove default override for config options policy_file" 2025-01-09 23:35:04 +00:00
Zuul
4d8bb57c8d Merge "tox: Drop envdir" 2025-01-09 23:32:26 +00:00
Zuul
70ba13ca6d Merge "Update python versions, drop py3.8" 2024-12-21 01:58:27 +00:00
Takashi Kajinami
da23fdc621 Remove ceilometer datasource
This datasource requires Ceilometer API which was already removed some
years ago. The implementation should have been removed when dependency
on ceilometerclient was removed by [1].

Also remove some job definitions which are not actually used.

[1] 01d74d0a87

Change-Id: I29c3865dc1207f1bbbb266e4217cf8888afebfb6
2024-12-16 23:51:27 +09:00
Jiri Podivin
2ab27c0dfe X-Project-Name key in test code was duplicated
Change-Id: Ie4938edd4b606c7b84c09c191508b72b8bc8fa52
Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
2024-12-04 14:05:12 +01:00
Zuul
811a704f80 Merge "Update gate jobs as per the 2025.1 cycle testing runtime" 2024-12-02 19:52:04 +00:00
Zuul
99fea33fac Merge "Fix incompatiablity between apscheduler and eventlet" 2024-12-02 19:13:24 +00:00
Zuul
9d37d705e4 Merge "[pre-commit] enforce pre-commit checks in ci" 2024-12-02 18:37:40 +00:00
Douglas Viroel
fbb290b223 Fix create_continuous_audit_with_wrong_interval test assert
"test_create_continuous_audit_with_wrong_interval" is failing
to validate the expected error message when creating a continuous
audit with a wrong interval. The error message is now slightly
different, since "croniter" was bumped to latest version in openstack
requirements[1].

Closes-Bug: #2089866

[1] 868e0ae644

Change-Id: I33029d224577bd1d5124947f1e6150fe2dbc9456
2024-11-29 10:09:14 -03:00
Ghanshyam Mann
c80c940a4f Update gate jobs as per the 2025.1 cycle testing runtime
As per 2025.1 testing runtime[1], we need to test on Ubuntu
Noble (which will be taken care by depends-on tempest and devstack
patches to move base jobs to Noble) and at least single job to run on
Ubuntu Jammy (for smooth upgrade from previous releases).

This commit adds a new job to run on Jammy which can be removed
in future cycle when testing runtime test next version of Ubuntu
as default.

Depends-On: https://review.opendev.org/c/openstack/tempest/+/932156
Depends-On: https://review.opendev.org/c/openstack/watcher/+/933062

[1] https://governance.openstack.org/tc/reference/runtimes/2025.1.html

Change-Id: I1bc11633f4739bc87c7741496a2972ab99c9b08b
2024-11-25 18:27:25 +00:00
Sean Mooney
f07694ba6c Fix incompatiablity between apscheduler and eventlet
The apscheduler background scheduler spawns a native thread
which is not monkey patched which interacts with shared module
level objects like the module level LOG instances and sqlachmey
engine facades.

This is unsafe and leads to mixing patched and unpatched
code in the same thread.

This manifests in 2 ways:
1.) https://paste.opendev.org/show/bGPgfURx1cZYOsgmtDyw/
sqlalchmey calls can fail due to a time.sleep(0) in oslo.db being invoked
using the unpatched time modules in an eventlet greenthrad.
2.) https://paste.opendev.org/show/b5C2Zz4A4BFIGbKLKrQU/
over time that caused the sqlalchmy connection queuepool to fill up preventing
backgound tasks form running like reconsiling audits.

This change adresses this by overloading the background scheduler _main_loop
to monkey patch the main loop if the calling thread was monkey patched.

Closes-Bug: #2086710
Change-Id: I672c183274b0a17cb40d7b5ab8c313197760b5a0
2024-11-25 18:27:18 +00:00
Sean Mooney
9abec18c8b [pre-commit] enforce pre-commit checks in ci
This change moves all style checks to be run via pre-commit.

To enable this in existing ci and preserve the standard developer flow
the tox pep8 target is updated to run all checks via pre-commit.

developers can optionally install pre-commit and/or the pre-commit
commit hook to automatically or manually run the precommit hooks.

Change-Id: I6ee6ed853dbf60339e7bf3da66b2e5914c218f76
2024-11-19 00:43:39 +00:00
Sean Mooney
1f8d06e075 [docs] apply sphinx-lint to docs
This change corrects the detected sphinx-linit issue in the existing
docs and updates the contributor devstack guide to call out
required and advanced.

mostly the changes were simple fixes like replacing the configurable
default rule with explict literal syntax `term` -> ``term``

some inline Note: comments have been promoted to .. note:: blocks
and literal blocks ::  have been promoted to .. code-block:: <language>
directives.

Change-Id: I6320c313d22bf542ad407169e6538dc6acf79901
2024-11-19 00:43:36 +00:00
Takashi Kajinami
29c94c102b Replace deprecated configure_auth_token_middleware
It was deprecated some years ago by [1].

[1] https://review.opendev.org/628651

Change-Id: Id5bb081a745a0698ce0d297c098394bfd1ad6788
2024-11-15 17:36:18 +09:00
OpenStack Release Bot
3f3e660367 reno: Update master for unmaintained/2023.1
Update the 2023.1 release notes configuration to build from
unmaintained/2023.1.

Change-Id: I99964d16a09d6b24505eda8444f074a38ce4a2d7
2024-11-12 16:44:04 +00:00
Ghanshyam Mann
2eefaeed14 Remove default override for config options policy_file
olso.policy 4.5.0[1] changed the config options policy_file
default value to 'policy.yaml', which means it is changed
for all the OpenStack services and they do not need to
override the default anymore.

NOTE: There is no change in behaviour here, oslo.policy provides
the same configuration that services have overridden till now.

[1] https://review.opendev.org/c/openstack/releases/+/934012
[2] https://review.opendev.org/c/openstack/requirements/+/934295

Change-Id: I46cc9e05fbc8f6c95c0b2d50093ecfb070a4170f
2024-11-10 21:36:55 -08:00
Sean Mooney
5fadd0de57 [pre-commit] Fix execute and shebang lines
This commit removes the execute bit from several files
and remove the shebang lines from the devstack plugin.

While the devstack plugin is written in bash, it is not an executable
script. The devstack plugin is sourced by devstack as needed,
as such it is not executed in a subshell and the #!/bin/bash
lines are not used even when present.

Change-Id: I82ca22b7a47bf267fe6cf11f3e3519510108c146
2024-11-07 20:12:59 +00:00
Sean Mooney
c5edad2246 [eventlet] Ensure unit tests are monkey patched
This change refactors how watcher manages monkey_patching
modules to achieve 2 goals.

First, we want to ensure the watcher code is tested as it is used
in production. While many tests can run without eventlet,
the existing unit tests depend on eventlet monkey patching
indirectly by importing watcher code that uses eventlet.spawn and
greenthread executors. While that mostly functions today it has
incorrect and inconsistent behaviour on Python 3.9 vs Python 3.12.

Second, the unit tests that test the cmd module were indirectly
monkey patching the test executor during the execution of the tests
as a side effect of importing watcher.cmd. As such the order the tests
execute in and how they are distributed across test workers changed
if the test was monkey-patched or not.

This change makes all tests run with monkey_patching by adding
monkey patching in the watcher/tests/__init__.py
This change also splits the monkey patching from the import
in preparation for an eventual removal of eventlet in a future
release.

Change-Id: I967f3469bd66e69c00863d553bc859343afbb3ff
2024-11-07 19:50:59 +00:00
Sean Mooney
405bb93030 [tox] update tox.ini to enable debugging
This change adds supprot for OS_DEBUG and also configures
default testing timeouts and log capture.

Change-Id: I685fee4081cdee82c508b6d25c534483f2caf09b
2024-11-07 19:50:33 +00:00
Sean Mooney
5f79ab87c7 [pre-commit] fix typos and configure codespell
This chanage enabled codespell in precommit and
fixes the existing typos.

A followup commit will enable this in tox and ci.

Change-Id: I0a11bcd5a88247a48d3437525fc8a3cb3cdd4e58
2024-11-07 19:50:21 +00:00
Zuul
4d5022ab94 Merge "reno: Update master for unmaintained/wallaby" 2024-11-07 18:08:05 +00:00
Martin Kopec
6adaedf696 Update python versions, drop py3.8
The current testing runtime [1] states testing from py3.9
to 3.12. The patch updates setup.cfg to reflect the correct
python versions.

The patch also drops python 3.8 support following [2].

[1] https://governance.openstack.org/tc/reference/runtimes/2025.1.html
[2] https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/FOWV4UQZTH4DPDA67QDEROAESYU5Z3LE/

Change-Id: I2d13409c9bfffc866e31af52611a26f6037021cc
2024-11-06 16:00:11 +01:00
OpenStack Release Bot
f3ff65f233 Update master for stable/2024.2
Add file to the reno documentation build to show release notes for
stable/2024.2.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2024.2.

Sem-Ver: feature
Change-Id: I84f9b0b1aa9749fee8ac174ae6d15c62a934d641
2024-11-01 13:48:05 +01:00
Takashi Kajinami
b5e45b43b9 Drop unnecessary 'x' bit from doc config file
This file is not actually executable.

Trivial-Fix

Change-Id: I64352c3c5c6bfd5d08aa4cee873016e02d736a2e
2024-10-28 13:13:24 +00:00
Zuul
61afdd3df7 Merge "Update master for stable/2024.1" 2024-10-28 01:16:55 +00:00
Zuul
e8f9e31541 Merge "[pre-commit] Add initial pre-commit config" 2024-10-24 04:23:49 +00:00
Ghanshyam Mann
38288dd9c8 Run watcher-db-manage in grenade testing from venv
grenade install and run everything from virtual env

- https://review.opendev.org/c/openstack/grenade/+/930507

watcher-db-manage in watcher grenade job needs to be run accordingly
and not from system level. Otherwise it will fail with below error
- https://zuul.opendev.org/t/openstack/build/02c3bd4814ea4d0580f7dfd346416425/log/controller/logs/grenade.sh_log.txt

Depends-On: https://review.opendev.org/c/openstack/watcher/+/933062

Change-Id: I73e94222c89c6a12a6006d42637cd194a09005ac
2024-10-23 18:34:43 +00:00
Sean Mooney
9d8b990fd1 [pre-commit] Add initial pre-commit config
This change adds configuration for the pre-commit tool,
follow-up changes will address the remaining issues in a phased
approach to make the reviews simpler.

This is based on the pre-commit config used in nova
with some additional hooks.

Follow-up changes will address the FIXME comments
related to sphinx-lint and codespell, as well as update tox
to enforce these checks in ci.

Change-Id: I87681a19f7fa88366c2b0d310c8b3153aa6a137b
2024-10-22 20:12:53 +01:00
Zuul
0f96f99404 Merge "Convert CRLF to LF" 2024-10-17 03:11:48 +00:00
Zuul
57177aebb2 Merge "Replace deprecated datetime.utcnow()" 2024-10-17 02:40:39 +00:00
Takashi Kajinami
2c4fb7a990 tox: Drop envdir
tox now always recreates an env although the env is shared using envdir
options.
~~~
$ tox -e genpolicy
genpolicy: recreate env because env type changed from
{'name': 'genconfig', 'type': 'VirtualEnvRunner'} to
{'name': 'genpolicy', 'type': 'VirtualEnvRunner'}
~~~

According to the maintainer of tox, this functionality is not intended
to be supported.
https://github.com/tox-dev/tox/issues/425#issuecomment-1011944293

Change-Id: I9c1f574c6d45a7be808a023f01dee13c3ac2c72e
2024-10-13 01:31:25 +09:00
Takashi Natsume
61a7dd85ca Replace deprecated datetime.utcnow()
The datetime.utcnow() is deprecated in Python 3.12.
Replace datetime.utcnow() with oslo_utils.timeutils.utcnow().
This bumps oslo.utils to 7.0.0.

Change-Id: Icccbb0549add686a744a72b354932471cbf91c92
Signed-off-by: Takashi Natsume <takanattie@gmail.com>
2024-10-02 22:24:47 +09:00
Takashi Kajinami
a7dd51390c Remove workaround for eventlet < 0.27.0
This code worked around a bug in eventlet[1] that has been fixed in
115103d5608cbe8f15df10e27eba1644f5364e95. The fix has been available in
every eventlet release since v0.27.0.

[1] https://github.com/eventlet/eventlet/issues/592

Co-Authored-By: Cyril Roelandt <cyril@redhat.com>
Change-Id: Ifc0b9c1d7f022db54c34c48c903a1719f9404d04
2024-09-28 14:53:04 +09:00
Takashi Kajinami
a47cedecfa Convert CRLF to LF
LF is commonly used as newline code.

Change-Id: I9b40461bdb67ba3e650c694da3c3bc9ac0335dd7
2024-09-28 00:30:14 +09:00
Takashi Kajinami
566a830f64 Bump hacking
hacking 3.0.x is quite old. Bump it to the current latest version.

Change-Id: I8d87fed6afe5988678c64090af261266d1ca20e6
2024-09-22 23:54:36 +09:00
Tobias Urdin
5c627a3aa3 Replace deprecated LegacyEngineFacade
LegacyEngineFacade was deprecated in oslo.db 1.12.0 which was released
in 2015.

Change-Id: I5570698262617eae3f48cf29aacf2e23ad541e5f
2024-08-27 16:45:15 +02:00
OpenStack Proposal Bot
a9dc3794a6 Imported Translations from Zanata
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: I2b2afb0c0e590b737871bf4c43293df2ed88e534
2024-06-01 02:47:52 +00:00
Takashi Kajinami
d6f169197e SQLAlchemy 2.0: Omnibus fixes patch
This was originally five patches, but they are all needed to pass
any of the test jobs now, so they have been squashed into one:

Co-Authored-By: Dan Smith (dms@danplanet.com)

First:

The autoload argument was removed[1] in SQLAlchemy and only
the autoload_with argument should be passed.

The autoload argument is set according to the autoload_with argument
automatically even in SQLAlchemy 1.x[2] so is not at all needed.

[1] c932123bac
[2] ad8f921e96

Second:

Remove _warn_on_bytestring for newer SA, AFAICT, this flag has been
removed from SQLAlchemy and that is why watcher-db-manage fails to
initialize the DB for me on jammy. This migration was passing the
default value (=False) anyway, so I assume this is the right "fix".

Third:

Fix joinedload passing string attribute names

Fourth:

Fix engine.select pattern to use begin() per the migration guide.

Fifth:

Override the apscheduler get_next_run_time() which appears to be
trivially not compatible with SQLAlchemy 2.0 because of a return type
from scalar().

Change-Id: I000e5e78f97f82ed4ea64d42f1c38354c3252e08
2024-05-29 06:49:32 -07:00
OpenStack Release Bot
2bc49149b3 reno: Update master for unmaintained/zed
Update the zed release notes configuration to build from
unmaintained/zed.

Change-Id: Ie3eac64cc4cf48aa761a4f7c9d7ba06fbab28686
2024-05-08 11:00:31 +00:00
James Page
bc5922c684 Fix oslo.db >= 15.0.0 compatibility
Minimal refactor of SQLAlchemy api module to be compatible with
oslo.db >= 15.0.0 where autocommit behaviour was dropped.

Closes-Bug: #2056181
Change-Id: I33be53f647faae2aad30a43c10980df950d5d7c2
2024-03-27 09:41:23 +00:00
OpenStack Release Bot
f0935fb3e1 Update master for stable/2024.1
Add file to the reno documentation build to show release notes for
stable/2024.1.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2024.1.

Sem-Ver: feature
Change-Id: I9eb6462199bedb3bbc24ba853ebf52ac7d93353f
2024-03-15 14:16:10 +00:00
OpenStack Release Bot
762686e99e reno: Update master for unmaintained/xena
Update the xena release notes configuration to build from
unmaintained/xena.

Change-Id: I87f52f55d6997be166dd307df327ae38b9049791
2024-03-05 20:13:43 +00:00
OpenStack Release Bot
0f0527abc1 reno: Update master for unmaintained/wallaby
Update the wallaby release notes configuration to build from
unmaintained/wallaby.

Change-Id: I4ad95adf3f67786c97c0dbe562df3ae2a85fa1c8
2024-03-05 20:13:20 +00:00
OpenStack Release Bot
6e26e41519 reno: Update master for unmaintained/victoria
Update the victoria release notes configuration to build from
unmaintained/victoria.

Change-Id: Ib81d05e74ff5514a9abf0c68ac8c0494bcb93bc9
2024-03-05 20:12:57 +00:00
OpenStack Release Bot
954fc282ee reno: Update master for unmaintained/yoga
Update the yoga release notes configuration to build from
unmaintained/yoga.

Change-Id: Ic586db766f6af33c2e70c3f6fd2fd313b44a6ab8
2024-02-05 16:00:40 +00:00
Ghanshyam Mann
9d58a6d457 Update python classifier in setup.cfg
As per the current release tested runtime, we test
python version from 3.8 to 3.11 so updating the
same in python classifier in setup.cfg

Change-Id: Ie010eea38eb0861699b60f16dfd3e2e95ae33709
2024-01-09 19:22:04 -08:00
Lucian Petrut
c95ce4ec17 Add MAAS support
At the moment, Watcher can use a single bare metal provisioning
service: Openstack Ironic.

We're now adding support for Canonical's MAAS service [1], which
is commonly used along with Juju [2] to deploy Openstack.

In order to do so, we're building a metal client abstraction, with
concrete implementations for Ironic and MAAS. We'll pick the MAAS
client if the MAAS url is provided, otherwise defaulting to Ironic.

For now, we aren't updating the baremetal model collector since it
doesn't seem to be used by any of the existing Watcher strategy
implementations.

[1] https://maas.io/docs
[2] https://juju.is/docs

Implements: blueprint maas-support

Change-Id: I6861995598f6c542fa9c006131f10203f358e0a6
2023-12-11 10:21:33 +00:00
Zuul
9492c2190e Merge "vm workload consolidation: use actual host metrics" 2023-12-01 01:51:39 +00:00
Lucian Petrut
808f1bcee3 Update action json schema
Power-off actions created by the energy saving strategy include
a resource name property, which currently isn't part of the
action json schema. For this reason, json schema validation fails.

  Additional properties are not allowed ('resource_name' was unexpected)

We'll update the json schema, including the resource name property.

Change-Id: I924d36732a917c0be98b08c2f4128e9136356215
2023-11-15 01:11:56 +00:00
Lucian Petrut
3b224b5629 Fix object tests
A couple of object tests are failing, probably after a dependency
bump.

watcher.objects.base.objects is mocked, so the registered object
version isn't properly retrieved, leading to a type error:

    File "/mnt/data/workspace/watcher/watcher/tests/objects/test_objects.py",
    line 535, in test_hook_chooses_newer_properly
      reg.registration_hook(MyObj, 0)
    File "/mnt/data/workspace/watcher/watcher/objects/base.py",
    line 46, in registration_hook
      cur_version = versionutils.convert_version_to_tuple(
    File "/home/ubuntu/openstack_venv/lib/python3.10/site-packages/oslo_utils/versionutils.py",
    line 91, in convert_version_to_tuple
      version_str = re.sub(r'(\d+)(a|alpha|b|beta|rc)\d+$', '\\1', version_str)
    File "/usr/lib/python3.10/re.py", line 209, in sub
      return _compile(pattern, flags).sub(repl, string, count)
  TypeError: expected string or bytes-like object

We'll solve the issue by setting the VERSION attribute against
the mock object.

Change-Id: Ifeb38b98f1d702908531de5fc5c846bd1c53de4b
2023-11-14 10:38:40 +00:00
Lucian Petrut
424e9a76af vm workload consolidation: use actual host metrics
The "vm workload consolidation" strategy is summing up instance
usage in order to estimate host usage.

The problem is that some infrastructure services (e.g. OVS or Ceph
clients) may also use a significant amount of resources, which
would be ignored. This can impact Watcher's ability to detect
overloaded nodes and correctly rebalance the workload.

This commit will use the host metrics, if available. The proposed
implementation uses the maximum value between the host metric
and the sum of the instance metrics.

Note that we're holding a dict of host metric deltas in order to
account for planned migrations.

Change-Id: I82f474ee613f6c9a7c0a9d24a05cba41d2f68edb
2023-10-27 21:54:42 +03:00
Zuul
40e93407c7 Merge "Handle deprecated "cpu_util" metric" 2023-10-27 09:47:38 +00:00
Zuul
721aec1cb6 Merge "vm workload consolidation: allow cold migrations" 2023-10-27 09:47:36 +00:00
Zuul
8a3ee8f931 Merge "Improve vm_consolidation logging" 2023-10-27 09:20:13 +00:00
Lucian Petrut
00fea975e2 Handle deprecated "cpu_util" metric
The "cpu_util" metric has been deprecated a few years ago.
We'll obtain the same result by converting the cumulative cpu
time to a percentage, leveraging the rate of change aggregation.

Change-Id: I18fe0de6f74c785e674faceea0c48f44055818fe
2023-10-24 10:47:23 +00:00
Lucian Petrut
fd6562382e Avoid performing retries in case of missing resources
There may be no available metrics for instances that are stopped
or were recently spawned. This makes retries unnecessary and time
consuming.

For this reason, we'll ignore gnocchi MetricNotFound errors.

Change-Id: I79cd03bf04db634b931d6dfd32d5150f58e82044
2023-10-23 14:14:21 +00:00
Lucian Petrut
ec90891636 Improve vm_consolidation logging
We're adding a few info log messages in order to trace the
"vm consolidation" strategy more easily.

Change-Id: I8ce1a9dd173733f1b801839d3ad0c1269c4306bb
2023-10-23 14:10:02 +00:00
Lucian Petrut
7336a48057 vm workload consolidation: allow cold migrations
Although Watcher supports cold migrations, the vm workload
consolidation workflow only allows live migrations to be
performed.

We'll remove this unnecessary limitation so that stopped instances
could be cold migrated.

Change-Id: I4b41550f2255560febf8586722a0e02045c3a486
2023-10-23 13:03:18 +00:00
Lucian Petrut
922478fbda Unblock the CI gate
The Nova collector json schema validation started [1][2] failing after
the jsonschema upper constraint was bumped from 4.17.3 to 4.19.1 [3].

The reason is that jsonschema v4.18.0a1 switched to a reference
resolving library [4], which treats the aggregate "id" as a jsonschema
id and expects it to be a string [5]. For this reason, we're now getting
AttributeError exceptions.

As a workaround, we'll rename the "id" ref element as "host_aggr_id".

Also, the watcher-tempest-multinode job is configured to use Focal,
which is no longer supported by Devstack [6]. That being considered,
we'll switch to Ubuntu Jammy (22.04).

While at it, we're disabling Cinder Backup, which isn't used while
testing Watched. It currently causes Devstack failures since it
uses the Swift backend by default, which is disabled.

[1] https://paste.opendev.org/raw/bjQ1uIdbDMnmA1UEhxLL/
[2] https://paste.opendev.org/raw/bNgxqulBwBLYB7tNhrU4/
[3] ab0dcbdda2
[4] https://github.com/python-jsonschema/jsonschema/releases/tag/v4.18.0a1
[5] c23a5dc1c9/referencing/jsonschema.py (L54-L55C18)
[6] https://paste.openstack.org/raw/bSoSyXgbtmq6d9768HQn/

Change-Id: I300620c2ec4857b1e0d402a9b57a637f576eeb24
2023-10-23 09:21:55 +03:00
OpenStack Release Bot
9f0eca2343 Update master for stable/2023.2
Add file to the reno documentation build to show release notes for
stable/2023.2.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.2.

Sem-Ver: feature
Change-Id: I8a0c75ce5a4e5ae5cccd8eb1cb0325747a619122
2023-09-14 01:24:43 +00:00
Zuul
1e11c490a7 Merge "Add timeout option for Grafana request" 2023-08-29 11:21:46 +00:00
Zuul
8a7a8db661 Merge "Imported Translations from Zanata" 2023-08-28 06:21:40 +00:00
BubaVV
0610070e59 Add timeout option for Grafana request
Implemented config option to setup Grafana API request timeout

Change-Id: I8cbf8ce22f199fe22c0b162ba1f419169881f193
2023-08-23 17:46:19 +03:00
OpenStack Proposal Bot
a0997a0423 Imported Translations from Zanata
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: I37201577bd8d9c53db8ce6700f47d911359da6d2
2023-08-14 04:24:29 +00:00
chenker
4ea3eada3e Fix watcher comment
Change-Id: I4512cf1032e08934886d5e3ca858b3e05c3da76c
2023-08-13 00:00:12 +00:00
Zuul
cd1c0f3054 Merge "Imported Translations from Zanata" 2023-03-08 07:04:33 +00:00
OpenStack Proposal Bot
684350977d Imported Translations from Zanata
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: I4ee251e6d37a1b955c22dc6fdc04c1a08c9ae9b8
2023-03-02 03:28:31 +00:00
OpenStack Release Bot
d28630b759 Update master for stable/2023.1
Add file to the reno documentation build to show release notes for
stable/2023.1.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/2023.1.

Sem-Ver: feature
Change-Id: Ia585893e7fef42e9991a2b81f604d1ff28c0a5ad
2023-02-28 13:31:08 +00:00
Zuul
f7fbaf46a2 Merge "Use new get_rpc_client API from oslo.messaging" 2023-02-09 01:25:15 +00:00
Zuul
e7cda537e7 Merge "Modify saving_energy log info" 2023-02-07 12:18:58 +00:00
chenker
c7be34fbaa update saving_energy docs
Change-Id: I3b0c86911a8d32912c2de2e2392af9539b8d9be0
2023-02-07 10:27:54 +00:00
chenker
52da088011 Modify saving_energy log info
Change-Id: I84879a453aa3ff78917d1136c62978b9d0e606de
2023-02-07 10:20:04 +00:00
Tobias Urdin
6ac3a6febf Fix passenv in tox.ini
Change-Id: If1ddb1d48eeb96191bcbfadd1a5e14f4350a02e4
2023-02-07 08:02:20 +00:00
Tobias Urdin
e36b77ad6d Use new get_rpc_client API from oslo.messaging
Use the new API that is consistent with
the existing API instead of instantiating the client
class directly.

This was introduced in release 14.1.0 here [1] and
added into oslo.messaging here [2]

[1] https://review.opendev.org/c/openstack/requirements/+/869340
[2] https://review.opendev.org/c/openstack/oslo.messaging/+/862419

Change-Id: I43c399a0c68473e40b8b71e9617c8334a439e675
2023-01-19 20:50:26 +00:00
Thierry Carrez
6003322711 Move queue declaration to project level
This moves the watcher queue declaration from the pipeline level
(where it is no longer valid) to the project level.

https: //lists.openstack.org/pipermail/openstack-discuss/2022-May/028603.html
Change-Id: I06923abb00f7eecd59587f44cd1f6a069e88a9fc
2022-09-26 14:19:58 +02:00
Zuul
f4ffca01b8 Merge "Switch to 2023.1 Python3 unit tests and generic template name" 2022-09-16 06:36:21 +00:00
Alfredo Moralejo
5d70c207cd Fix compatibility with oslo.db 12.1.0
oslo.db 12.1.0 has changed the default value for the 'autocommit'
parameter of 'LegacyEngineFacade' from 'True' to 'False'. This is a
necessary step to ensure compatibility with SQLAlchemy 2.0. However, we
are currently relying on the autocommit behavior and need changes to
explicitly manage sessions. Until that happens, we need to override the
default.

Co-Authored-By: Stephen Finucane <stephenfin@redhat.com>
Change-Id: I7db39d958d087322bfa0aad70dfbd04de9228dd7
2022-09-15 16:52:41 +02:00
OpenStack Release Bot
0b2e641d00 Switch to 2023.1 Python3 unit tests and generic template name
This is an automatically generated patch to ensure unit testing
is in place for all the of the tested runtimes for antelope. Also,
updating the template name to generic one.

See also the PTI in governance [1].

[1]: https://governance.openstack.org/tc/reference/project-testing-interface.html

Change-Id: Ide6c6c398f8e6cdd590c6620a752ad802a1f5cf8
2022-09-13 12:30:33 +00:00
OpenStack Release Bot
ff84b052a5 Update master for stable/zed
Add file to the reno documentation build to show release notes for
stable/zed.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/zed.

Sem-Ver: feature
Change-Id: I1726e33a14038712dbb9fd5e5c0cddf8ad872e69
2022-09-13 12:30:32 +00:00
Zuul
a43b040ebc Merge "Imported Translations from Zanata" 2022-08-30 10:44:52 +00:00
Zuul
749fa2507a Merge "Tests: fix requirements for unit tests" 2022-08-30 08:15:05 +00:00
OpenStack Proposal Bot
76d61362ee Imported Translations from Zanata
For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: I95133dece6fdaf931dfed64015806430ba8d04f0
2022-08-29 04:12:15 +00:00
wangjiaqi07
c55143bc21 remove unicode from code
Change-Id: I747445d482a2fb40c2f39139c5fd2a0cb26c27bc
2022-08-19 14:17:10 +08:00
suzhengwei
7609df3370 Tests: fix requirements for unit tests
Add WebTest to test-requirements which used to be imported as a
transitive requirement via pecan, but the latest release of
pecan dropped this dependency. So make this requirement explicit.

Related-Bug: #1982110
Change-Id: I4852be23b489257aaa56d3fa22d27f72bcabf919
2022-07-28 16:14:13 +08:00
chenker
b57eac12cb Watcher DB upgrde compatibility consideration for add_apscheduler_jobs
Change-Id: I8896ff5731bb8c1bf88a5d7b926bd2a884100ea8
2022-04-28 02:21:06 +00:00
OpenStack Release Bot
ac6911d3c4 Add Python3 zed unit tests
This is an automatically generated patch to ensure unit testing
is in place for all the of the tested runtimes for zed.

See also the PTI in governance [1].

[1]: https://governance.openstack.org/tc/reference/project-testing-interface.html

Change-Id: I5cf874842550de18ff777b909fd28e2c32e6d530
2022-03-10 12:14:06 +00:00
OpenStack Release Bot
23c2010681 Update master for stable/yoga
Add file to the reno documentation build to show release notes for
stable/yoga.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/yoga.

Sem-Ver: feature
Change-Id: Ic7c275b38fef9afc29577f81fe92546bb94b2930
2022-03-10 12:14:04 +00:00
zhurong
01d74d0a87 Remove ceilometerclient dependecy
Change-Id: Ifa0f2493aa8414a29dc2722b6636a33bc5808be6
2022-01-07 05:48:22 +00:00
OpenStack Release Bot
e4fab0ce7f Add Python3 yoga unit tests
This is an automatically generated patch to ensure unit testing
is in place for all the of the tested runtimes for yoga.

See also the PTI in governance [1].

[1]: https://governance.openstack.org/tc/reference/project-testing-interface.html

Change-Id: I328b3ccb76153fa0dbb4d174dd976412be049200
2021-09-15 17:14:09 +00:00
OpenStack Release Bot
76ecaaeb3a Update master for stable/xena
Add file to the reno documentation build to show release notes for
stable/xena.

Use pbr instruction to increment the minor version number
automatically so that master versions are higher than the versions on
stable/xena.

Sem-Ver: feature
Change-Id: If1c02305a153575c6a550844b0c6f45b74ea5ef3
2021-09-15 17:14:07 +00:00
403 changed files with 13864 additions and 3540 deletions

62
.pre-commit-config.yaml Normal file
View File

@@ -0,0 +1,62 @@
---
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
# whitespace
- id: trailing-whitespace
- id: mixed-line-ending
args: ['--fix', 'lf']
exclude: '.*\.(svg)$'
- id: check-byte-order-marker
# file format and permissions
- id: check-ast
- id: debug-statements
- id: check-json
files: .*\.json$
- id: check-yaml
files: .*\.(yaml|yml)$
- id: check-executables-have-shebangs
- id: check-shebang-scripts-are-executable
# git
- id: check-added-large-files
- id: check-case-conflict
- id: detect-private-key
- id: check-merge-conflict
- repo: https://github.com/Lucas-C/pre-commit-hooks
rev: v1.5.5
hooks:
- id: remove-tabs
exclude: '.*\.(svg)$'
- repo: https://opendev.org/openstack/hacking
rev: 7.0.0
hooks:
- id: hacking
additional_dependencies: []
exclude: '^(doc|releasenotes|tools)/.*$'
- repo: https://github.com/PyCQA/bandit
rev: 1.8.3
hooks:
- id: bandit
args: ['-x', 'tests', '-s', 'B101,B311,B320']
- repo: https://github.com/hhatto/autopep8
rev: v2.3.2
hooks:
- id: autopep8
files: '^.*\.py$'
- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
hooks:
- id: codespell
args: ['--ignore-words=doc/dictionary.txt']
- repo: https://github.com/sphinx-contrib/sphinx-lint
rev: v1.0.0
hooks:
- id: sphinx-lint
args: [--enable=default-role]
files: ^doc/|^releasenotes/|^api-guide/
types: [rst]
- repo: https://github.com/PyCQA/doc8
rev: v1.1.2
hooks:
- id: doc8

View File

@@ -1,95 +1,24 @@
- project:
templates:
- check-requirements
- openstack-cover-jobs
- openstack-python3-xena-jobs
- publish-openstack-docs-pti
- release-notes-jobs-python3
check:
jobs:
- watcher-tempest-functional
- watcher-grenade
- watcher-tempest-strategies
- watcher-tempest-actuator
- watcherclient-tempest-functional
- watcher-tempest-functional-ipv6-only
gate:
queue: watcher
jobs:
- watcher-tempest-functional
- watcher-tempest-functional-ipv6-only
- job:
name: watcher-tempest-dummy_optim
parent: watcher-tempest-multinode
vars:
tempest_test_regex: watcher_tempest_plugin.tests.scenario.test_execute_dummy_optim
- job:
name: watcher-tempest-actuator
parent: watcher-tempest-multinode
vars:
tempest_test_regex: watcher_tempest_plugin.tests.scenario.test_execute_actuator
- job:
name: watcher-tempest-basic_optim
parent: watcher-tempest-multinode
vars:
tempest_test_regex: watcher_tempest_plugin.tests.scenario.test_execute_basic_optim
- job:
name: watcher-tempest-vm_workload_consolidation
parent: watcher-tempest-multinode
vars:
tempest_test_regex: watcher_tempest_plugin.tests.scenario.test_execute_vm_workload_consolidation
devstack_local_conf:
test-config:
$WATCHER_CONFIG:
watcher_strategies.vm_workload_consolidation:
datasource: ceilometer
- job:
name: watcher-tempest-workload_balancing
parent: watcher-tempest-multinode
vars:
tempest_test_regex: watcher_tempest_plugin.tests.scenario.test_execute_workload_balancing
- job:
name: watcher-tempest-zone_migration
parent: watcher-tempest-multinode
vars:
tempest_test_regex: watcher_tempest_plugin.tests.scenario.test_execute_zone_migration
- job:
name: watcher-tempest-host_maintenance
parent: watcher-tempest-multinode
vars:
tempest_test_regex: watcher_tempest_plugin.tests.scenario.test_execute_host_maintenance
- job:
name: watcher-tempest-storage_balance
parent: watcher-tempest-multinode
vars:
tempest_test_regex: watcher_tempest_plugin.tests.scenario.test_execute_storage_balance
devstack_local_conf:
test-config:
$TEMPEST_CONFIG:
volume:
backend_names: ['BACKEND_1', 'BACKEND_2']
volume-feature-enabled:
multi_backend: true
- job:
name: watcher-tempest-strategies
parent: watcher-tempest-multinode
vars:
tempest_concurrency: 1
tempest_test_regex: watcher_tempest_plugin.tests.scenario.test_execute_strategies
# All tests inside watcher_tempest_plugin.tests.scenario with tag "strategy"
# or test_execute_strategies file
# excluding tests with tag "real_load"
tempest_test_regex: (^watcher_tempest_plugin.tests.scenario)(.*\[.*\bstrategy\b.*\].*)|(^watcher_tempest_plugin.tests.scenario.test_execute_strategies)
tempest_exclude_regex: .*\[.*\breal_load\b.*\].*
- job:
name: watcher-tempest-multinode
parent: watcher-tempest-functional
nodeset: openstack-two-node-focal
nodeset: openstack-two-node-noble
roles:
- zuul: openstack/tempest
group-vars:
@@ -103,10 +32,16 @@
period: 120
watcher_cluster_data_model_collectors.storage:
period: 120
$CINDER_CONF:
# enable notifications in compute node, by default they are only
# configured in the controller
oslo_messaging_notifications:
driver: messagingv2
devstack_services:
watcher-api: false
watcher-decision-engine: true
watcher-applier: false
c-bak: false
ceilometer: false
ceilometer-acompute: false
ceilometer-acentral: false
@@ -117,6 +52,13 @@
rabbit: false
mysql: false
vars:
devstack_localrc:
GNOCCHI_ARCHIVE_POLICY_TEMPEST: "ceilometer-low-rate"
CEILOMETER_PIPELINE_INTERVAL: 15
devstack_services:
ceilometer-acompute: false
ceilometer-acentral: true
ceilometer-anotification: true
devstack_local_conf:
post-config:
$WATCHER_CONF:
@@ -126,6 +68,11 @@
period: 120
watcher_cluster_data_model_collectors.storage:
period: 120
$CINDER_CONF:
# enable notifications in compute node, by default they are only
# configured in the controller
oslo_messaging_notifications:
driver: messagingv2
test-config:
$TEMPEST_CONFIG:
compute:
@@ -136,6 +83,10 @@
block_migration_for_live_migration: true
placement:
min_microversion: 1.29
telemetry:
ceilometer_polling_interval: 15
optimize:
run_continuous_audit_tests: true
devstack_plugins:
ceilometer: https://opendev.org/openstack/ceilometer
@@ -185,7 +136,7 @@
- openstack/python-watcherclient
- openstack/watcher-tempest-plugin
vars: *base_vars
irrelevant-files:
irrelevant-files: &irrelevent_files
- ^(test-|)requirements.txt$
- ^.*\.rst$
- ^api-ref/.*$
@@ -198,10 +149,257 @@
- ^tox.ini$
- job:
# This job is used in python-watcherclient repo
name: watcherclient-tempest-functional
parent: watcher-tempest-functional
timeout: 4200
name: watcher-sg-core-tempest-base
parent: devstack-tempest
nodeset: openstack-two-node-noble
description: |
This job is for testing watcher and sg-core/prometheus installation
abstract: true
pre-run:
- playbooks/generate_prometheus_config.yml
irrelevant-files: *irrelevent_files
timeout: 7800
required-projects: &base_sg_required_projects
- openstack/aodh
- openstack/ceilometer
- openstack/tempest
- openstack-k8s-operators/sg-core
- openstack/watcher
- openstack/python-watcherclient
- openstack/watcher-tempest-plugin
- openstack/devstack-plugin-prometheus
vars:
configure_swap_size: 8192
devstack_plugins:
ceilometer: https://opendev.org/openstack/ceilometer
aodh: https://opendev.org/openstack/aodh
sg-core: https://github.com/openstack-k8s-operators/sg-core
watcher: https://opendev.org/openstack/watcher
devstack-plugin-prometheus: https://opendev.org/openstack/devstack-plugin-prometheus
devstack_services:
ceilometer-acompute: true
watcher-api: true
watcher-decision-engine: true
watcher-applier: true
tempest: true
# We do not need Swift in this job so disable it for speed
# Swift services
s-account: false
s-container: false
s-object: false
s-proxy: false
# Prometheus related service
prometheus: true
node_exporter: true
devstack_localrc:
CEILOMETER_BACKENDS: "sg-core"
CEILOMETER_PIPELINE_INTERVAL: 15
CEILOMETER_ALARM_THRESHOLD: 6000000000
PROMETHEUS_CONFIG_FILE: "/home/zuul/prometheus.yml"
devstack_local_conf:
post-config:
$WATCHER_CONF:
watcher_datasources:
datasources: prometheus
prometheus_client:
host: 127.0.0.1
port: 9090
watcher_cluster_data_model_collectors.compute:
period: 120
watcher_cluster_data_model_collectors.baremetal:
period: 120
watcher_cluster_data_model_collectors.storage:
period: 120
compute_model:
enable_extended_attributes: true
nova_client:
api_version: "2.96"
test-config:
$TEMPEST_CONFIG:
compute:
min_compute_nodes: 2
min_microversion: 2.56
compute-feature-enabled:
live_migration: true
block_migration_for_live_migration: true
placement:
min_microversion: 1.29
service_available:
sg_core: True
telemetry_services:
metric_backends: prometheus
telemetry:
disable_ssl_certificate_validation: True
ceilometer_polling_interval: 15
optimize:
datasource: prometheus
extended_attributes_nova_microversion: "2.96"
data_model_collectors_period: 120
run_continuous_audit_tests: true
tempest_plugins:
- watcher-tempest-plugin
# All tests inside watcher_tempest_plugin.tests.scenario with tag "strategy"
# and test_execute_strategies, test_data_model files
# excluding tests with tag "real_load"
tempest_test_regex: (watcher_tempest_plugin.tests.scenario)(.*\[.*\bstrategy\b.*\].*)|(watcher_tempest_plugin.tests.scenario.(test_execute_strategies|test_data_model))
tempest_exclude_regex: .*\[.*\breal_load\b.*\].*
tempest_concurrency: 1
tempest_test_regex: watcher_tempest_plugin.tests.client_functional
tox_envlist: all
zuul_copy_output:
/etc/prometheus/prometheus.yml: logs
group-vars:
subnode:
devstack_plugins:
ceilometer: https://opendev.org/openstack/ceilometer
devstack-plugin-prometheus: https://opendev.org/openstack/devstack-plugin-prometheus
devstack_services:
ceilometer-acompute: true
sg-core: false
prometheus: false
node_exporter: true
devstack_localrc:
CEILOMETER_BACKEND: "none"
CEILOMETER_BACKENDS: "none"
devstack_local_conf:
post-config:
$WATCHER_CONF:
watcher_cluster_data_model_collectors.compute:
period: 120
watcher_cluster_data_model_collectors.baremetal:
period: 120
watcher_cluster_data_model_collectors.storage:
period: 120
- job:
name: watcher-prometheus-integration
parent: watcher-sg-core-tempest-base
vars:
devstack_services:
ceilometer-acompute: false
node_exporter: false
group-vars:
subnode:
devstack_services:
ceilometer-acompute: false
node_exporter: false
- job:
name: watcher-aetos-integration
parent: watcher-sg-core-tempest-base
description: |
This job tests Watcher with Aetos reverse-proxy for Prometheus
using Keystone authentication instead of direct Prometheus access.
required-projects:
- openstack/python-observabilityclient
- openstack/aetos
vars: &aetos_vars
devstack_services:
ceilometer-acompute: false
node_exporter: false
devstack_plugins:
ceilometer: https://opendev.org/openstack/ceilometer
sg-core: https://github.com/openstack-k8s-operators/sg-core
watcher: https://opendev.org/openstack/watcher
devstack-plugin-prometheus: https://opendev.org/openstack/devstack-plugin-prometheus
aetos: https://opendev.org/openstack/aetos
devstack_local_conf:
post-config:
$WATCHER_CONF:
watcher_datasources:
datasources: aetos
aetos_client:
interface: public
region_name: RegionOne
fqdn_label: fqdn
instance_uuid_label: resource
test-config:
$TEMPEST_CONFIG:
optimize:
datasource: prometheus
group-vars:
subnode:
devstack_services:
ceilometer-acompute: false
node_exporter: false
- job:
name: watcher-prometheus-integration-realdata
parent: watcher-sg-core-tempest-base
vars: &realdata_vars
devstack_services:
ceilometer-acompute: true
node_exporter: true
devstack_localrc:
NODE_EXPORTER_COLLECTOR_EXCLUDE: ""
devstack_local_conf:
test-config:
$TEMPEST_CONFIG:
optimize:
datasource: ""
real_workload_period: 300
# All tests inside watcher_tempest_plugin.tests.scenario with tag "real_load"
tempest_test_regex: (^watcher_tempest_plugin.tests.scenario)(.*\[.*\breal_load\b.*\].*)
tempest_exclude_regex: ""
group-vars: &realdata_group_vars
subnode:
devstack_services:
ceilometer-acompute: true
node_exporter: true
devstack_localrc:
NODE_EXPORTER_COLLECTOR_EXCLUDE: ""
- job:
name: watcher-prometheus-integration-threading
parent: watcher-prometheus-integration
vars:
devstack_localrc:
'SYSTEMD_ENV_VARS["watcher-decision-engine"]': OS_WATCHER_DISABLE_EVENTLET_PATCHING=true
- job:
name: openstack-tox-py312-threading
parent: openstack-tox-py312
description: |
Run tox with the py3-threading environment.
vars:
tox_envlist: py3-threading
- job:
name: watcher-aetos-integration-realdata
parent: watcher-aetos-integration
vars: *realdata_vars
group-vars: *realdata_group_vars
- project:
queue: watcher
templates:
- check-requirements
- openstack-cover-jobs
- openstack-python3-jobs
- publish-openstack-docs-pti
- release-notes-jobs-python3
check:
jobs:
- openstack-tox-py312-threading
- watcher-tempest-functional
- watcher-grenade
- watcher-tempest-strategies
- watcher-tempest-actuator
- python-watcherclient-functional:
files:
- ^watcher/api/*
- watcher-tempest-functional-ipv6-only
- watcher-prometheus-integration
- watcher-prometheus-integration-threading
- watcher-aetos-integration
gate:
jobs:
- watcher-tempest-functional
- watcher-tempest-functional-ipv6-only
experimental:
jobs:
- watcher-prometheus-integration-realdata
- watcher-aetos-integration-realdata
periodic-weekly:
jobs:
- watcher-prometheus-integration-realdata
- watcher-aetos-integration-realdata

View File

@@ -189,6 +189,16 @@ action_state:
in: body
required: true
type: string
action_status_message:
description: |
Message with additional information about the Action state.
This field can be set when transitioning an action to SKIPPED state,
or updated for actions that are already in SKIPPED state to provide
more detailed explanations, fix typos, or expand on initial reasons.
in: body
required: false
type: string
min_version: 1.5
action_type:
description: |
Action type based on specific API action. Actions in Watcher are
@@ -230,6 +240,13 @@ actionplan_state:
in: body
required: false
type: string
actionplan_status_message:
description: |
Message with additional information about the Action Plan state.
in: body
required: false
type: string
min_version: 1.5
# Audit
audit_autotrigger:
@@ -320,6 +337,13 @@ audit_state:
in: body
required: true
type: string
audit_status_message:
description: |
Message with additional information about the Audit state.
in: body
required: false
type: string
min_version: 1.5
audit_strategy:
description: |
The UUID or name of the Strategy.
@@ -420,12 +444,24 @@ links:
type: array
# Data Model Node
node_disabled_reason:
description: |
The Disabled Reason of the node.
in: body
required: true
type: string
node_disk:
description: |
The Disk of the node(in GiB).
in: body
required: true
type: integer
node_disk_gb_reserved:
description: |
The Disk Reserved of the node (in GiB).
in: body
required: true
type: integer
node_disk_ratio:
description: |
The Disk Ratio of the node.
@@ -444,6 +480,12 @@ node_memory:
in: body
required: true
type: integer
node_memory_mb_reserved:
description: |
The Memory Reserved of the node(in MiB).
in: body
required: true
type: integer
node_memory_ratio:
description: |
The Memory Ratio of the node.
@@ -456,6 +498,12 @@ node_state:
in: body
required: true
type: string
node_status:
description: |
The Status of the node.
in: body
required: true
type: string
node_uuid:
description: |
The Unique UUID of the node.
@@ -468,13 +516,18 @@ node_vcpu_ratio:
in: body
required: true
type: float
node_vcpu_reserved:
description: |
The Vcpu Reserved of the node.
in: body
required: true
type: integer
node_vcpus:
description: |
The Vcpu of the node.
in: body
required: true
type: integer
# Scoring Engine
scoring_engine_description:
description: |
@@ -502,18 +555,50 @@ server_disk:
in: body
required: true
type: integer
server_flavor_extra_specs:
description: |
The flavor extra specs of the server.
in: body
required: true
type: JSON
min_version: 1.6
server_locked:
description: |
Whether the server is locked.
in: body
required: true
type: boolean
server_memory:
description: |
The Memory of server.
in: body
required: true
type: integer
server_metadata:
description: |
The metadata associated with the server.
in: body
required: true
type: JSON
server_name:
description: |
The Name of the server.
in: body
required: true
type: string
server_pinned_az:
description: |
The pinned availability zone of the server.
in: body
required: true
type: string
min_version: 1.6
server_project_id:
description: |
The project ID of the server.
in: body
required: true
type: string
server_state:
description: |
The State of the server.
@@ -532,6 +617,12 @@ server_vcpus:
in: body
required: true
type: integer
server_watcher_exclude:
description: |
Whether the server is excluded from the scope.
in: body
required: true
type: boolean
# Service
service_host:
description: |

View File

@@ -0,0 +1,12 @@
[
{
"op": "replace",
"value": "SKIPPED",
"path": "/state"
},
{
"op": "replace",
"value": "Skipping due to maintenance window",
"path": "/status_message"
}
]

View File

@@ -0,0 +1,7 @@
[
{
"op": "replace",
"value": "SKIPPED",
"path": "/state"
}
]

View File

@@ -0,0 +1,29 @@
{
"state": "SKIPPED",
"description": "Migrate instance to another compute node",
"parents": [
"b4529294-1de6-4302-b57a-9b5d5dc363c6"
],
"links": [
{
"rel": "self",
"href": "http://controller:9322/v1/actions/54acc7a0-91b0-46ea-a5f7-4ae2b9df0b0a"
},
{
"rel": "bookmark",
"href": "http://controller:9322/actions/54acc7a0-91b0-46ea-a5f7-4ae2b9df0b0a"
}
],
"action_plan_uuid": "4cbc4ede-0d25-481b-b86e-998dbbd4f8bf",
"uuid": "54acc7a0-91b0-46ea-a5f7-4ae2b9df0b0a",
"deleted_at": null,
"updated_at": "2018-04-10T12:15:44.026973+00:00",
"input_parameters": {
"migration_type": "live",
"destination_node": "compute-2",
"resource_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef"
},
"action_type": "migrate",
"created_at": "2018-04-10T11:59:12.725147+00:00",
"status_message": "Action skipped by user. Reason:Skipping due to maintenance window"
}

View File

@@ -0,0 +1,7 @@
[
{
"op": "replace",
"value": "Action skipped due to scheduled maintenance window",
"path": "/status_message"
}
]

View File

@@ -0,0 +1,29 @@
{
"state": "SKIPPED",
"description": "Migrate instance to another compute node",
"parents": [
"b4529294-1de6-4302-b57a-9b5d5dc363c6"
],
"links": [
{
"rel": "self",
"href": "http://controller:9322/v1/actions/54acc7a0-91b0-46ea-a5f7-4ae2b9df0b0a"
},
{
"rel": "bookmark",
"href": "http://controller:9322/actions/54acc7a0-91b0-46ea-a5f7-4ae2b9df0b0a"
}
],
"action_plan_uuid": "4cbc4ede-0d25-481b-b86e-998dbbd4f8bf",
"uuid": "54acc7a0-91b0-46ea-a5f7-4ae2b9df0b0a",
"deleted_at": null,
"updated_at": "2018-04-10T12:20:15.123456+00:00",
"input_parameters": {
"migration_type": "live",
"destination_node": "compute-2",
"resource_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef"
},
"action_type": "migrate",
"created_at": "2018-04-10T11:59:12.725147+00:00",
"status_message": "Action skipped by user. Reason: Action skipped due to scheduled maintenance window"
}

View File

@@ -21,7 +21,8 @@
"uuid": "4cbc4ede-0d25-481b-b86e-998dbbd4f8bf",
"audit_uuid": "7d100b05-0a86-491f-98a7-f93da19b272a",
"created_at": "2018-04-10T11:59:52.640067+00:00",
"hostname": "controller"
"hostname": "controller",
"status_message": null
}
]
}

View File

@@ -17,5 +17,6 @@
"strategy_name": "dummy_with_resize",
"uuid": "4cbc4ede-0d25-481b-b86e-998dbbd4f8bf",
"audit_uuid": "7d100b05-0a86-491f-98a7-f93da19b272a",
"hostname": "controller"
}
"hostname": "controller",
"status_message": null
}

View File

@@ -24,7 +24,8 @@
"duration": 3.2
},
"action_type": "sleep",
"created_at": "2018-03-26T11:56:08.235226+00:00"
"created_at": "2018-03-26T11:56:08.235226+00:00",
"status_message": null
}
]
}
}

View File

@@ -22,5 +22,6 @@
"message": "Welcome"
},
"action_type": "nop",
"created_at": "2018-04-10T11:59:12.725147+00:00"
}
"created_at": "2018-04-10T11:59:12.725147+00:00",
"status_message": null
}

View File

@@ -51,5 +51,6 @@
"updated_at": null,
"hostname": null,
"start_time": null,
"end_time": null
"end_time": null,
"status_message": null
}

View File

@@ -30,7 +30,7 @@
}
},
"auto_trigger": false,
"force": false,
"force": false,
"uuid": "65a5da84-5819-4aea-8278-a28d2b489028",
"goal_name": "workload_balancing",
"scope": [],
@@ -53,7 +53,8 @@
"updated_at": "2018-04-06T09:44:01.604146+00:00",
"hostname": "controller",
"start_time": null,
"end_time": null
"end_time": null,
"status_message": null
}
]
}

View File

@@ -51,5 +51,6 @@
"updated_at": "2018-04-06T11:54:01.266447+00:00",
"hostname": "controller",
"start_time": null,
"end_time": null
"end_time": null,
"status_message": null
}

View File

@@ -1,38 +1,62 @@
{
"context": [
{
"server_uuid": "1bf91464-9b41-428d-a11e-af691e5563bb",
"server_watcher_exclude": false,
"server_name": "chenke-test1",
"server_vcpus": "1",
"server_state": "active",
"server_memory": "512",
"server_disk": "1",
"server_state": "active",
"node_uuid": "253e5dd0-9384-41ab-af13-4f2c2ce26112",
"server_vcpus": "1",
"server_metadata": {},
"server_project_id": "baea342fc74b4a1785b4a40c69a8d958",
"server_locked":false,
"server_uuid": "1bf91464-9b41-428d-a11e-af691e5563bb",
"server_pinned_az": "nova",
"server_flavor_extra_specs": {
"hw_rng:allowed": true
},
"node_hostname": "localhost.localdomain",
"node_vcpus": "4",
"node_vcpu_ratio": "16.0",
"node_memory": "16383",
"node_memory_ratio": "1.5",
"node_disk": "37"
"node_disk_ratio": "1.0",
"node_status": "enabled",
"node_disabled_reason": null,
"node_state": "up",
"node_memory": "16383",
"node_memory_mb_reserved": "512",
"node_disk": "37",
"node_disk_gb_reserved": "0",
"node_vcpus": "4",
"node_vcpu_reserved": "0",
"node_memory_ratio": "1.5",
"node_vcpu_ratio": "16.0",
"node_disk_ratio": "1.0",
"node_uuid": "253e5dd0-9384-41ab-af13-4f2c2ce26112"
},
{
"server_uuid": "e2cb5f6f-fa1d-4ba2-be1e-0bf02fa86ba4",
"server_watcher_exclude": false,
"server_name": "chenke-test2",
"server_vcpus": "1",
"server_state": "active",
"server_memory": "512",
"server_disk": "1",
"server_state": "active",
"node_uuid": "253e5dd0-9384-41ab-af13-4f2c2ce26112",
"server_vcpus": "1",
"server_metadata": {},
"server_project_id": "baea342fc74b4a1785b4a40c69a8d958",
"server_locked": false,
"server_uuid": "e2cb5f6f-fa1d-4ba2-be1e-0bf02fa86ba4",
"server_pinned_az": "nova",
"server_flavor_extra_specs": {},
"node_hostname": "localhost.localdomain",
"node_vcpus": "4",
"node_vcpu_ratio": "16.0",
"node_memory": "16383",
"node_memory_ratio": "1.5",
"node_disk": "37"
"node_disk_ratio": "1.0",
"node_status": "enabled",
"node_disabled_reason": null,
"node_state": "up",
"node_memory": "16383",
"node_memory_mb_reserved": "512",
"node_disk": "37",
"node_disk_gb_reserved": "0",
"node_vcpus": "4",
"node_vcpu_reserved": "0",
"node_memory_ratio": "1.5",
"node_vcpu_ratio": "16.0",
"node_disk_ratio": "1.0",
"node_uuid": "253e5dd0-9384-41ab-af13-4f2c2ce26112"
}
]
}

View File

@@ -139,6 +139,7 @@ Response
- global_efficacy: actionplan_global_efficacy
- links: links
- hostname: actionplan_hostname
- status_message: actionplan_status_message
**Example JSON representation of an Action Plan:**
@@ -177,6 +178,7 @@ Response
- global_efficacy: actionplan_global_efficacy
- links: links
- hostname: actionplan_hostname
- status_message: actionplan_status_message
**Example JSON representation of an Audit:**
@@ -233,6 +235,7 @@ version 1:
- global_efficacy: actionplan_global_efficacy
- links: links
- hostname: actionplan_hostname
- status_message: actionplan_status_message
**Example JSON representation of an Action Plan:**

View File

@@ -23,6 +23,9 @@ following:
- **PENDING** : the ``Action`` has not been executed yet by the
``Watcher Applier``.
- **SKIPPED** : the ``Action`` will not be executed because a predefined
skipping condition is found by ``Watcher Applier`` or is explicitly
skipped by the ``Administrator``.
- **ONGOING** : the ``Action`` is currently being processed by the
``Watcher Applier``.
- **SUCCEEDED** : the ``Action`` has been executed successfully
@@ -111,6 +114,7 @@ Response
- description: action_description
- input_parameters: action_input_parameters
- links: links
- status_message: action_status_message
**Example JSON representation of an Action:**
@@ -148,8 +152,111 @@ Response
- description: action_description
- input_parameters: action_input_parameters
- links: links
- status_message: action_status_message
**Example JSON representation of an Action:**
.. literalinclude:: samples/actions-show-response.json
:language: javascript
Skip Action
===========
.. rest_method:: PATCH /v1/actions/{action_ident}
Skips an Action resource by changing its state to SKIPPED.
.. note::
Only Actions in PENDING state can be skipped. The Action must belong to
an Action Plan in RECOMMENDED or PENDING state. This operation requires
API microversion 1.5 or later.
Normal response codes: 200
Error codes: 400,404,403,409
Request
-------
.. rest_parameters:: parameters.yaml
- action_ident: action_ident
**Example Action skip request:**
.. literalinclude:: samples/action-skip-request.json
:language: javascript
**Example Action skip request with custom status message:**
.. literalinclude:: samples/action-skip-request-with-message.json
:language: javascript
Response
--------
.. rest_parameters:: parameters.yaml
- uuid: uuid
- action_type: action_type
- state: action_state
- action_plan_uuid: action_action_plan_uuid
- parents: action_parents
- description: action_description
- input_parameters: action_input_parameters
- links: links
- status_message: action_status_message
**Example JSON representation of a skipped Action:**
.. literalinclude:: samples/action-skip-response.json
:language: javascript
Update Action Status Message
============================
.. rest_method:: PATCH /v1/actions/{action_ident}
Updates the status_message of an Action that is already in SKIPPED state.
.. note::
The status_message field can only be updated for Actions that are currently
in SKIPPED state. This allows administrators to fix typos, provide more
detailed explanations, or expand on reasons that were initially omitted.
This operation requires API microversion 1.5 or later.
Normal response codes: 200
Error codes: 400,404,403,409
Request
-------
.. rest_parameters:: parameters.yaml
- action_ident: action_ident
**Example status_message update request for a SKIPPED action:**
.. literalinclude:: samples/action-update-status-message-request.json
:language: javascript
Response
--------
.. rest_parameters:: parameters.yaml
- uuid: uuid
- action_type: action_type
- state: action_state
- action_plan_uuid: action_action_plan_uuid
- parents: action_parents
- description: action_description
- input_parameters: action_input_parameters
- links: links
- status_message: action_status_message
**Example JSON representation of an Action with updated status_message:**
.. literalinclude:: samples/action-update-status-message-response.json
:language: javascript

View File

@@ -85,6 +85,7 @@ version 1:
- start_time: audit_starttime_resp
- end_time: audit_endtime_resp
- force: audit_force
- status_message: audit_status_message
**Example JSON representation of an Audit:**
@@ -184,6 +185,7 @@ Response
- start_time: audit_starttime_resp
- end_time: audit_endtime_resp
- force: audit_force
- status_message: audit_status_message
**Example JSON representation of an Audit:**
@@ -231,6 +233,7 @@ Response
- start_time: audit_starttime_resp
- end_time: audit_endtime_resp
- force: audit_force
- status_message: audit_status_message
**Example JSON representation of an Audit:**
@@ -286,6 +289,7 @@ version 1:
- start_time: audit_starttime_resp
- end_time: audit_endtime_resp
- force: audit_force
- status_message: audit_status_message
**Example JSON representation of an Audit:**
@@ -341,6 +345,7 @@ Response
- start_time: audit_starttime_resp
- end_time: audit_endtime_resp
- force: audit_force
- status_message: audit_status_message
**Example JSON representation of an Audit:**

View File

@@ -35,21 +35,32 @@ Response
.. rest_parameters:: parameters.yaml
- server_uuid: server_uuid
- server_watcher_exclude: server_watcher_exclude
- server_name: server_name
- server_vcpus: server_vcpus
- server_state: server_state
- server_memory: server_memory
- server_disk: server_disk
- server_state: server_state
- node_uuid: node_uuid
- server_vcpus: server_vcpus
- server_metadata: server_metadata
- server_project_id: server_project_id
- server_locked: server_locked
- server_uuid: server_uuid
- server_pinned_az: server_pinned_az
- server_flavor_extra_specs: server_flavor_extra_specs
- node_hostname: node_hostname
- node_vcpus: node_vcpus
- node_vcpu_ratio: node_vcpu_ratio
- node_memory: node_memory
- node_memory_ratio: node_memory_ratio
- node_disk: node_disk
- node_disk_ratio: node_disk_ratio
- node_status: node_status
- node_disabled_reason: node_disabled_reason
- node_state: node_state
- node_memory: node_memory
- node_memory_mb_reserved: node_memory_mb_reserved
- node_disk: node_disk
- node_disk_gb_reserved: node_disk_gb_reserved
- node_vcpus: node_vcpus
- node_vcpu_reserved: node_vcpu_reserved
- node_memory_ratio: node_memory_ratio
- node_vcpu_ratio: node_vcpu_ratio
- node_disk_ratio: node_disk_ratio
- node_uuid: node_uuid
**Example JSON representation of a Data Model:**

View File

@@ -12,7 +12,7 @@ Here are some examples of ``Goals``:
- minimize the energy consumption
- minimize the number of compute nodes (consolidation)
- balance the workload among compute nodes
- minimize the license cost (some softwares have a licensing model which is
- minimize the license cost (some software have a licensing model which is
based on the number of sockets or cores where the software is deployed)
- find the most appropriate moment for a planned maintenance on a
given group of host (which may be an entire availability zone):
@@ -123,4 +123,4 @@ Response
**Example JSON representation of a Goal:**
.. literalinclude:: samples/goal-show-response.json
:language: javascript
:language: javascript

23
bindep.txt Normal file
View File

@@ -0,0 +1,23 @@
# This is a cross-platform list tracking distribution packages needed for install and tests;
# see https://docs.openstack.org/infra/bindep/ for additional information.
mysql [platform:rpm !platform:redhat test]
mysql-client [platform:dpkg !platform:debian test]
mysql-devel [platform:rpm !platform:redhat test]
mysql-server [!platform:redhat !platform:debian test]
mariadb-devel [platform:rpm platform:redhat test]
mariadb-server [platform:rpm platform:redhat platform:debian test]
python3-all [platform:dpkg test]
python3-all-dev [platform:dpkg test]
python3 [platform:rpm test]
python3-devel [platform:rpm test]
sqlite-devel [platform:rpm test]
# gettext and graphviz are needed by doc builds only.
gettext [doc]
graphviz [doc]
# fonts-freefont-otf is needed for pdf docs builds with the 'xelatex' engine
fonts-freefont-otf [pdf-docs]
texlive [pdf-docs]
texlive-latex-recommended [pdf-docs]
texlive-xetex [pdf-docs]
latexmk [pdf-docs]

View File

@@ -1,42 +0,0 @@
# Licensed under the Apache License, Version 2.0 (the "License"); you may
# not use this file except in compliance with the License. You may obtain
# a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
# License for the specific language governing permissions and limitations
# under the License.
# This is an example Apache2 configuration file for using the
# Watcher API through mod_wsgi. This version assumes you are
# running devstack to configure the software.
Listen %WATCHER_SERVICE_PORT%
<VirtualHost *:%WATCHER_SERVICE_PORT%>
WSGIDaemonProcess watcher-api user=%USER% processes=%APIWORKERS% threads=1 display-name=%{GROUP}
WSGIScriptAlias / %WATCHER_WSGI_DIR%/app.wsgi
WSGIApplicationGroup %{GLOBAL}
WSGIProcessGroup watcher-api
WSGIPassAuthorization On
ErrorLogFormat "%M"
ErrorLog /var/log/%APACHE_NAME%/watcher-api.log
CustomLog /var/log/%APACHE_NAME%/watcher-api-access.log combined
<Directory %WATCHER_WSGI_DIR%>
WSGIProcessGroup watcher-api
WSGIApplicationGroup %{GLOBAL}
<IfVersion >= 2.4>
Require all granted
</IfVersion>
<IfVersion < 2.4>
Order allow,deny
Allow from all
</IfVersion>
</Directory>
</VirtualHost>

View File

@@ -1,5 +1,3 @@
#!/bin/bash
#
# lib/watcher
# Functions to control the configuration and operation of the watcher services
@@ -38,7 +36,6 @@ GITBRANCH["python-watcherclient"]=${WATCHERCLIENT_BRANCH:-master}
GITDIR["python-watcherclient"]=$DEST/python-watcherclient
WATCHER_STATE_PATH=${WATCHER_STATE_PATH:=$DATA_DIR/watcher}
WATCHER_AUTH_CACHE_DIR=${WATCHER_AUTH_CACHE_DIR:-/var/cache/watcher}
WATCHER_CONF_DIR=/etc/watcher
WATCHER_CONF=$WATCHER_CONF_DIR/watcher.conf
@@ -58,29 +55,16 @@ else
WATCHER_BIN_DIR=$(get_python_exec_prefix)
fi
# There are 2 modes, which is "uwsgi" which runs with an apache
# proxy uwsgi in front of it, or "mod_wsgi", which runs in
# apache. mod_wsgi is deprecated, don't use it.
WATCHER_USE_WSGI_MODE=${WATCHER_USE_WSGI_MODE:-$WSGI_MODE}
WATCHER_UWSGI=$WATCHER_BIN_DIR/watcher-api-wsgi
WATCHER_UWSGI=watcher.wsgi.api:application
WATCHER_UWSGI_CONF=$WATCHER_CONF_DIR/watcher-uwsgi.ini
if is_suse; then
WATCHER_WSGI_DIR=${WATCHER_WSGI_DIR:-/srv/www/htdocs/watcher}
else
WATCHER_WSGI_DIR=${WATCHER_WSGI_DIR:-/var/www/watcher}
fi
WATCHER_WSGI_DIR=${WATCHER_WSGI_DIR:-/var/www/watcher}
# Public facing bits
WATCHER_SERVICE_HOST=${WATCHER_SERVICE_HOST:-$SERVICE_HOST}
WATCHER_SERVICE_PORT=${WATCHER_SERVICE_PORT:-9322}
WATCHER_SERVICE_PORT_INT=${WATCHER_SERVICE_PORT_INT:-19322}
WATCHER_SERVICE_PROTOCOL=${WATCHER_SERVICE_PROTOCOL:-$SERVICE_PROTOCOL}
if [[ "$WATCHER_USE_WSGI_MODE" == "uwsgi" ]]; then
WATCHER_API_URL="$WATCHER_SERVICE_PROTOCOL://$WATCHER_SERVICE_HOST/infra-optim"
else
WATCHER_API_URL="$WATCHER_SERVICE_PROTOCOL://$WATCHER_SERVICE_HOST:$WATCHER_SERVICE_PORT"
fi
WATCHER_API_URL="$WATCHER_SERVICE_PROTOCOL://$WATCHER_SERVICE_HOST/infra-optim"
# Entry Points
# ------------
@@ -103,12 +87,8 @@ function _cleanup_watcher_apache_wsgi {
# cleanup_watcher() - Remove residual data files, anything left over from previous
# runs that a clean run would need to clean up
function cleanup_watcher {
sudo rm -rf $WATCHER_STATE_PATH $WATCHER_AUTH_CACHE_DIR
if [[ "$WATCHER_USE_WSGI_MODE" == "uwsgi" ]]; then
remove_uwsgi_config "$WATCHER_UWSGI_CONF" "$WATCHER_UWSGI"
else
_cleanup_watcher_apache_wsgi
fi
sudo rm -rf $WATCHER_STATE_PATH
remove_uwsgi_config "$WATCHER_UWSGI_CONF" "$WATCHER_UWSGI"
}
# configure_watcher() - Set config files, create data dirs, etc
@@ -157,31 +137,6 @@ function create_watcher_accounts {
"$WATCHER_API_URL"
}
# _config_watcher_apache_wsgi() - Set WSGI config files of watcher
function _config_watcher_apache_wsgi {
local watcher_apache_conf
if [[ "$WATCHER_USE_WSGI_MODE" == "mod_wsgi" ]]; then
local service_port=$WATCHER_SERVICE_PORT
if is_service_enabled tls-proxy; then
service_port=$WATCHER_SERVICE_PORT_INT
service_protocol="http"
fi
sudo mkdir -p $WATCHER_WSGI_DIR
sudo cp $WATCHER_DIR/watcher/api/app.wsgi $WATCHER_WSGI_DIR/app.wsgi
watcher_apache_conf=$(apache_site_config_for watcher-api)
sudo cp $WATCHER_DEVSTACK_FILES_DIR/apache-watcher-api.template $watcher_apache_conf
sudo sed -e "
s|%WATCHER_SERVICE_PORT%|$service_port|g;
s|%WATCHER_WSGI_DIR%|$WATCHER_WSGI_DIR|g;
s|%USER%|$STACK_USER|g;
s|%APIWORKERS%|$API_WORKERS|g;
s|%APACHE_NAME%|$APACHE_NAME|g;
" -i $watcher_apache_conf
enable_apache_site watcher-api
fi
}
# create_watcher_conf() - Create a new watcher.conf file
function create_watcher_conf {
# (Re)create ``watcher.conf``
@@ -199,21 +154,16 @@ function create_watcher_conf {
iniset $WATCHER_CONF api host "$(ipv6_unquote $WATCHER_SERVICE_HOST)"
iniset $WATCHER_CONF api port "$WATCHER_SERVICE_PORT_INT"
# iniset $WATCHER_CONF api enable_ssl_api "True"
else
if [[ "$WATCHER_USE_WSGI_MODE" == "mod_wsgi" ]]; then
iniset $WATCHER_CONF api host "$(ipv6_unquote $WATCHER_SERVICE_HOST)"
iniset $WATCHER_CONF api port "$WATCHER_SERVICE_PORT"
fi
fi
iniset $WATCHER_CONF oslo_policy policy_file $WATCHER_POLICY_YAML
iniset $WATCHER_CONF oslo_messaging_notifications driver "messagingv2"
configure_auth_token_middleware $WATCHER_CONF watcher $WATCHER_AUTH_CACHE_DIR
configure_auth_token_middleware $WATCHER_CONF watcher $WATCHER_AUTH_CACHE_DIR "watcher_clients_auth"
configure_keystone_authtoken_middleware $WATCHER_CONF watcher
configure_keystone_authtoken_middleware $WATCHER_CONF watcher "watcher_clients_auth"
if is_fedora || is_suse; then
if is_fedora; then
# watcher defaults to /usr/local/bin, but fedora and suse pip like to
# install things in /usr/bin
iniset $WATCHER_CONF DEFAULT bindir "/usr/bin"
@@ -231,12 +181,8 @@ function create_watcher_conf {
# Format logging
setup_logging $WATCHER_CONF
#config apache files
if [[ "$WATCHER_USE_WSGI_MODE" == "uwsgi" ]]; then
write_uwsgi_config "$WATCHER_UWSGI_CONF" "$WATCHER_UWSGI" "/infra-optim"
else
_config_watcher_apache_wsgi
fi
write_uwsgi_config "$WATCHER_UWSGI_CONF" "$WATCHER_UWSGI" "/infra-optim" "" "watcher-api"
# Register SSL certificates if provided
if is_ssl_enabled_service watcher; then
ensure_certificates WATCHER
@@ -248,13 +194,6 @@ function create_watcher_conf {
fi
}
# create_watcher_cache_dir() - Part of the init_watcher() process
function create_watcher_cache_dir {
# Create cache dir
sudo install -d -o $STACK_USER $WATCHER_AUTH_CACHE_DIR
rm -rf $WATCHER_AUTH_CACHE_DIR/*
}
# init_watcher() - Initialize databases, etc.
function init_watcher {
# clean up from previous (possibly aborted) runs
@@ -266,7 +205,6 @@ function init_watcher {
# Create watcher schema
$WATCHER_BIN_DIR/watcher-db-manage --config-file $WATCHER_CONF upgrade
fi
create_watcher_cache_dir
}
# install_watcherclient() - Collect source and prepare
@@ -275,15 +213,15 @@ function install_watcherclient {
git_clone_by_name "python-watcherclient"
setup_dev_lib "python-watcherclient"
fi
if [[ "$GLOBAL_VENV" == "True" ]]; then
sudo ln -sf /opt/stack/data/venv/bin/watcher /usr/local/bin
fi
}
# install_watcher() - Collect source and prepare
function install_watcher {
git_clone $WATCHER_REPO $WATCHER_DIR $WATCHER_BRANCH
setup_develop $WATCHER_DIR
if [[ "$WATCHER_USE_WSGI_MODE" == "mod_wsgi" ]]; then
install_apache_wsgi
fi
}
# start_watcher_api() - Start the API process ahead of other things
@@ -297,19 +235,10 @@ function start_watcher_api {
service_port=$WATCHER_SERVICE_PORT_INT
service_protocol="http"
fi
if [[ "$WATCHER_USE_WSGI_MODE" == "uwsgi" ]]; then
run_process "watcher-api" "$(which uwsgi) --procname-prefix watcher-api --ini $WATCHER_UWSGI_CONF"
watcher_url=$service_protocol://$SERVICE_HOST/infra-optim
else
watcher_url=$service_protocol://$SERVICE_HOST:$service_port
enable_apache_site watcher-api
restart_apache_server
# Start proxies if enabled
if is_service_enabled tls-proxy; then
start_tls_proxy watcher '*' $WATCHER_SERVICE_PORT $WATCHER_SERVICE_HOST $WATCHER_SERVICE_PORT_INT
fi
fi
run_process "watcher-api" "$(which uwsgi) --procname-prefix watcher-api --ini $WATCHER_UWSGI_CONF"
watcher_url=$service_protocol://$SERVICE_HOST/infra-optim
# TODO(sean-k-mooney): we should probably check that we can hit
# the microversion endpoint and get a valid response.
echo "Waiting for watcher-api to start..."
if ! wait_for_service $SERVICE_TIMEOUT $watcher_url; then
die $LINENO "watcher-api did not start"
@@ -327,17 +256,25 @@ function start_watcher {
# stop_watcher() - Stop running processes (non-screen)
function stop_watcher {
if [[ "$WATCHER_USE_WSGI_MODE" == "uwsgi" ]]; then
stop_process watcher-api
else
disable_apache_site watcher-api
restart_apache_server
fi
stop_process watcher-api
for serv in watcher-decision-engine watcher-applier; do
stop_process $serv
done
}
# configure_tempest_for_watcher() - Configure Tempest for watcher
function configure_tempest_for_watcher {
# Set default microversion for watcher-tempest-plugin
# Please make sure to update this when the microversion is updated, otherwise
# new tests may be skipped.
TEMPEST_WATCHER_MIN_MICROVERSION=${TEMPEST_WATCHER_MIN_MICROVERSION:-"1.0"}
TEMPEST_WATCHER_MAX_MICROVERSION=${TEMPEST_WATCHER_MAX_MICROVERSION:-"1.6"}
# Set microversion options in tempest.conf
iniset $TEMPEST_CONFIG optimize min_microversion $TEMPEST_WATCHER_MIN_MICROVERSION
iniset $TEMPEST_CONFIG optimize max_microversion $TEMPEST_WATCHER_MAX_MICROVERSION
}
# Restore xtrace
$_XTRACE_WATCHER

View File

@@ -26,7 +26,7 @@ GLANCE_HOSTPORT=${SERVICE_HOST}:9292
DATABASE_TYPE=mysql
# Enable services (including neutron)
ENABLED_SERVICES=n-cpu,n-api-meta,c-vol,q-agt,placement-client
ENABLED_SERVICES=n-cpu,n-api-meta,c-vol,q-agt,placement-client,node-exporter
NOVA_VNC_ENABLED=True
NOVNCPROXY_URL="http://$SERVICE_HOST:6080/vnc_auto.html"
@@ -42,6 +42,10 @@ disable_service ceilometer-acentral,ceilometer-collector,ceilometer-api
LOGFILE=$DEST/logs/stack.sh.log
LOGDAYS=2
CEILOMETER_BACKEND="none"
CEILOMETER_BACKENDS="none"
enable_plugin devstack-plugin-prometheus https://opendev.org/openstack/devstack-plugin-prometheus
[[post-config|$NOVA_CONF]]
[DEFAULT]
compute_monitors=cpu.virt_driver

View File

@@ -18,6 +18,10 @@ NETWORK_GATEWAY=10.254.1.1 # Change this for your network
MULTI_HOST=1
CEILOMETER_ALARM_THRESHOLD="6000000000"
CEILOMETER_BACKENDS="sg-core"
CEILOMETER_PIPELINE_INTERVAL="15"
#Set this to FALSE if do not want to run watcher-api behind mod-wsgi
#WATCHER_USE_MOD_WSGI=TRUE
@@ -40,8 +44,10 @@ disable_service ceilometer-acompute
# Enable the ceilometer api explicitly(bug:1667678)
enable_service ceilometer-api
# Enable the Gnocchi plugin
enable_plugin gnocchi https://github.com/gnocchixyz/gnocchi
enable_service prometheus
enable_plugin aodh https://opendev.org/openstack/aodh
enable_plugin devstack-plugin-prometheus https://opendev.org/openstack/devstack-plugin-prometheus
enable_plugin sg-core https://github.com/openstack-k8s-operators/sg-core main
LOGFILE=$DEST/logs/stack.sh.log
LOGDAYS=2
@@ -55,3 +61,42 @@ compute_monitors=cpu.virt_driver
# can change this to just versioned when ceilometer handles versioned
# notifications from nova: https://bugs.launchpad.net/ceilometer/+bug/1665449
notification_format=both
[[post-config|$WATCHER_CONF]]
[prometheus_client]
host = 127.0.0.1
port = 9090
[watcher_cluster_data_model_collectors.baremetal]
period = 120
[watcher_cluster_data_model_collectors.compute]
period = 120
[watcher_cluster_data_model_collectors.storage]
period = 120
[watcher_datasources]
datasources = prometheus
[[test-config|$TEMPEST_CONFIG]]
[optimize]
datasource = prometheus
[service_available]
sg_core = True
[telemetry]
ceilometer_polling_interval = 15
disable_ssl_certificate_validation = True
[telemetry_services]
metric_backends = prometheus
[compute]
min_compute_nodes = 2
min_microversion = 2.56
[compute-feature-enabled]
block_migration_for_live_migration = True
live_migration = True

View File

@@ -0,0 +1,53 @@
# Sample ``local.conf`` for compute node for Watcher development
# NOTE: Copy this file to the root DevStack directory for it to work properly.
[[local|localrc]]
ADMIN_PASSWORD=nomoresecrete
DATABASE_PASSWORD=stackdb
RABBIT_PASSWORD=stackqueue
SERVICE_PASSWORD=$ADMIN_PASSWORD
SERVICE_TOKEN=azertytoken
HOST_IP=192.168.42.2 # Change this to this compute node's IP address
#HOST_IPV6=2001:db8::7
FLAT_INTERFACE=eth0
FIXED_RANGE=10.254.1.0/24 # Change this to whatever your network is
NETWORK_GATEWAY=10.254.1.1 # Change this for your network
MULTI_HOST=1
SERVICE_HOST=192.168.42.1 # Change this to the IP of your controller node
MYSQL_HOST=$SERVICE_HOST
RABBIT_HOST=$SERVICE_HOST
GLANCE_HOSTPORT=${SERVICE_HOST}:9292
DATABASE_TYPE=mysql
# Enable services (including neutron)
ENABLED_SERVICES=n-cpu,n-api-meta,c-vol,q-agt,placement-client
NOVA_VNC_ENABLED=True
NOVNCPROXY_URL="http://$SERVICE_HOST:6080/vnc_auto.html"
VNCSERVER_LISTEN=0.0.0.0
VNCSERVER_PROXYCLIENT_ADDRESS=$HOST_IP # or HOST_IPV6
NOVA_INSTANCES_PATH=/opt/stack/data/instances
# Enable the Ceilometer plugin for the compute agent
enable_plugin ceilometer https://opendev.org/openstack/ceilometer
disable_service ceilometer-acentral,ceilometer-collector,ceilometer-api
LOGFILE=$DEST/logs/stack.sh.log
LOGDAYS=2
[[post-config|$NOVA_CONF]]
[DEFAULT]
compute_monitors=cpu.virt_driver
[notifications]
# Enable both versioned and unversioned notifications. Watcher only
# uses versioned notifications but ceilometer uses unversioned. We
# can change this to just versioned when ceilometer handles versioned
# notifications from nova: https://bugs.launchpad.net/ceilometer/+bug/1665449
notification_format=both

View File

@@ -0,0 +1,57 @@
# Sample ``local.conf`` for controller node for Watcher development
# NOTE: Copy this file to the root DevStack directory for it to work properly.
[[local|localrc]]
ADMIN_PASSWORD=nomoresecrete
DATABASE_PASSWORD=stackdb
RABBIT_PASSWORD=stackqueue
SERVICE_PASSWORD=$ADMIN_PASSWORD
SERVICE_TOKEN=azertytoken
HOST_IP=192.168.42.1 # Change this to your controller node IP address
#HOST_IPV6=2001:db8::7
FLAT_INTERFACE=eth0
FIXED_RANGE=10.254.1.0/24 # Change this to whatever your network is
NETWORK_GATEWAY=10.254.1.1 # Change this for your network
MULTI_HOST=1
#Set this to FALSE if do not want to run watcher-api behind mod-wsgi
#WATCHER_USE_MOD_WSGI=TRUE
# This is the controller node, so disable nova-compute
disable_service n-cpu
# Enable the Watcher Dashboard plugin
enable_plugin watcher-dashboard https://opendev.org/openstack/watcher-dashboard
# Enable the Watcher plugin
enable_plugin watcher https://opendev.org/openstack/watcher
# Enable the Ceilometer plugin
enable_plugin ceilometer https://opendev.org/openstack/ceilometer
# This is the controller node, so disable the ceilometer compute agent
disable_service ceilometer-acompute
# Enable the ceilometer api explicitly(bug:1667678)
enable_service ceilometer-api
# Enable the Gnocchi plugin
enable_plugin gnocchi https://github.com/gnocchixyz/gnocchi
LOGFILE=$DEST/logs/stack.sh.log
LOGDAYS=2
[[post-config|$NOVA_CONF]]
[DEFAULT]
compute_monitors=cpu.virt_driver
[notifications]
# Enable both versioned and unversioned notifications. Watcher only
# uses versioned notifications but ceilometer uses unversioned. We
# can change this to just versioned when ceilometer handles versioned
# notifications from nova: https://bugs.launchpad.net/ceilometer/+bug/1665449
notification_format=both

View File

@@ -1,5 +1,3 @@
#!/bin/bash
#
# plugin.sh - DevStack plugin script to install watcher
# Save trace setting
@@ -38,6 +36,9 @@ if is_service_enabled watcher-api watcher-decision-engine watcher-applier; then
# Start the watcher components
echo_summary "Starting watcher"
start_watcher
elif [[ "$1" == "stack" && "$2" == "test-config" ]]; then
echo_summary "Configuring tempest for watcher"
configure_tempest_for_watcher
fi
if [[ "$1" == "unstack" ]]; then

16
devstack/prometheus.yml Normal file
View File

@@ -0,0 +1,16 @@
global:
scrape_interval: 10s
scrape_configs:
- job_name: "node"
static_configs:
- targets: ["controller:3000"]
- targets: ["controller:9100"]
labels:
fqdn: "controller" # change the hostname here to your controller hostname
- targets: ["compute-1:9100"]
labels:
fqdn: "compute-1" # change the hostname here to your fist compute hostname
- targets: ["compute-2:9100"]
labels:
fqdn: "compute-2" # change the hostname her to your secondd compute hostname
# add as many blocks as compute nodes you have

View File

@@ -1,5 +1,3 @@
#!/usr/bin/env bash
# ``upgrade-watcher``
function configure_watcher_upgrade {

View File

@@ -70,7 +70,7 @@ then write_uwsgi_config "$WATCHER_UWSGI_CONF" "$WATCHER_UWSGI" "/infra-optim"
fi
# Migrate the database
watcher-db-manage upgrade || die $LINO "DB migration error"
$WATCHER_BIN_DIR/watcher-db-manage upgrade || die $LINO "DB migration error"
start_watcher

4
doc/dictionary.txt Normal file
View File

@@ -0,0 +1,4 @@
thirdparty
assertin
notin

View File

@@ -52,7 +52,7 @@ class BaseWatcherDirective(rst.Directive):
obj_raw_docstring = obj.__init__.__doc__
if not obj_raw_docstring:
# Raise a warning to make the tests fail wit doc8
# Raise a warning to make the tests fail with doc8
raise self.error("No docstring available for %s!" % obj)
obj_docstring = inspect.cleandoc(obj_raw_docstring)

View File

@@ -14,6 +14,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "CANCELLED",
"status_message": null,
"action_plan": {
"watcher_object.namespace": "watcher",
"watcher_object.version": "1.0",
@@ -24,6 +25,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "CANCELLING",
"status_message": null,
"audit_uuid": "10a47dd1-4874-4298-91cf-eff046dbdb8d",
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"deleted_at": null

View File

@@ -24,6 +24,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "FAILED",
"status_message": null,
"action_plan": {
"watcher_object.namespace": "watcher",
"watcher_object.version": "1.0",
@@ -34,6 +35,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "CANCELLING",
"status_message": null,
"audit_uuid": "10a47dd1-4874-4298-91cf-eff046dbdb8d",
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"deleted_at": null

View File

@@ -14,6 +14,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "CANCELLING",
"status_message": null,
"action_plan": {
"watcher_object.namespace": "watcher",
"watcher_object.version": "1.0",
@@ -24,6 +25,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "CANCELLING",
"status_message": null,
"audit_uuid": "10a47dd1-4874-4298-91cf-eff046dbdb8d",
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"deleted_at": null

View File

@@ -13,6 +13,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "PENDING",
"status_message": null,
"action_plan": {
"watcher_object.namespace": "watcher",
"watcher_object.version": "1.0",
@@ -23,6 +24,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "ONGOING",
"status_message": null,
"audit_uuid": "10a47dd1-4874-4298-91cf-eff046dbdb8d",
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"deleted_at": null

View File

@@ -13,6 +13,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "DELETED",
"status_message": null,
"action_plan": {
"watcher_object.namespace": "watcher",
"watcher_object.version": "1.0",
@@ -23,6 +24,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "ONGOING",
"status_message": null,
"audit_uuid": "10a47dd1-4874-4298-91cf-eff046dbdb8d",
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"deleted_at": null

View File

@@ -14,6 +14,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "SUCCEEDED",
"status_message": null,
"action_plan": {
"watcher_object.namespace": "watcher",
"watcher_object.version": "1.0",
@@ -24,6 +25,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "ONGOING",
"status_message": null,
"audit_uuid": "10a47dd1-4874-4298-91cf-eff046dbdb8d",
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"deleted_at": null

View File

@@ -24,6 +24,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "FAILED",
"status_message": "Action execution failed",
"action_plan": {
"watcher_object.namespace": "watcher",
"watcher_object.version": "1.0",
@@ -34,6 +35,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "ONGOING",
"status_message": null,
"audit_uuid": "10a47dd1-4874-4298-91cf-eff046dbdb8d",
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"deleted_at": null

View File

@@ -14,6 +14,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "ONGOING",
"status_message": null,
"action_plan": {
"watcher_object.namespace": "watcher",
"watcher_object.version": "1.0",
@@ -24,6 +25,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "ONGOING",
"status_message": null,
"audit_uuid": "10a47dd1-4874-4298-91cf-eff046dbdb8d",
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"deleted_at": null

View File

@@ -18,10 +18,12 @@
"watcher_object.name": "ActionStateUpdatePayload",
"watcher_object.data": {
"old_state": "PENDING",
"state": "ONGOING"
"state": "ONGOING",
"status_message": null
}
},
"state": "ONGOING",
"status_message": null,
"action_plan": {
"watcher_object.namespace": "watcher",
"watcher_object.version": "1.0",
@@ -32,6 +34,7 @@
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null,
"state": "ONGOING",
"status_message": null,
"audit_uuid": "10a47dd1-4874-4298-91cf-eff046dbdb8d",
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"deleted_at": null

View File

@@ -21,6 +21,7 @@
"scope": [],
"audit_type": "ONESHOT",
"state": "SUCCEEDED",
"status_message": null,
"parameters": {},
"interval": null,
"updated_at": null
@@ -29,6 +30,7 @@
"uuid": "76be87bd-3422-43f9-93a0-e85a577e3061",
"fault": null,
"state": "CANCELLED",
"status_message": null,
"global_efficacy": [],
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"strategy": {

View File

@@ -52,13 +52,15 @@
"scope": [],
"updated_at": null,
"audit_type": "ONESHOT",
"status_message": null,
"interval": null,
"deleted_at": null,
"state": "SUCCEEDED"
}
},
"global_efficacy": [],
"state": "CANCELLING"
"state": "CANCELLING",
"status_message": null
}
},
"timestamp": "2016-10-18 09:52:05.219414"

View File

@@ -21,6 +21,7 @@
"scope": [],
"audit_type": "ONESHOT",
"state": "SUCCEEDED",
"status_message": null,
"parameters": {},
"interval": null,
"updated_at": null
@@ -29,6 +30,7 @@
"uuid": "76be87bd-3422-43f9-93a0-e85a577e3061",
"fault": null,
"state": "CANCELLING",
"status_message": null,
"global_efficacy": [],
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"strategy": {

View File

@@ -33,6 +33,7 @@
"interval": null,
"deleted_at": null,
"state": "PENDING",
"status_message": null,
"created_at": "2016-10-18T09:52:05Z",
"updated_at": null
},
@@ -43,6 +44,7 @@
"global_efficacy": {},
"deleted_at": null,
"state": "RECOMMENDED",
"status_message": null,
"updated_at": null
},
"watcher_object.namespace": "watcher",

View File

@@ -18,6 +18,7 @@
"updated_at": null,
"deleted_at": null,
"state": "PENDING",
"status_message": null,
"created_at": "2016-10-18T09:52:05Z",
"parameters": {}
},
@@ -43,7 +44,8 @@
"watcher_object.name": "StrategyPayload",
"watcher_object.namespace": "watcher"
},
"state": "DELETED"
"state": "DELETED",
"status_message": null
},
"watcher_object.version": "1.0",
"watcher_object.name": "ActionPlanDeletePayload",

View File

@@ -22,6 +22,7 @@
"scope": [],
"audit_type": "ONESHOT",
"state": "SUCCEEDED",
"status_message": null,
"parameters": {},
"interval": null,
"updated_at": null
@@ -30,6 +31,7 @@
"uuid": "76be87bd-3422-43f9-93a0-e85a577e3061",
"fault": null,
"state": "ONGOING",
"status_message": null,
"global_efficacy": [],
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"strategy": {

View File

@@ -55,11 +55,13 @@
"audit_type": "ONESHOT",
"interval": null,
"deleted_at": null,
"state": "PENDING"
"state": "PENDING",
"status_message": null
}
},
"global_efficacy": [],
"state": "ONGOING"
"state": "ONGOING",
"status_message": null
}
},
"timestamp": "2016-10-18 09:52:05.219414"

View File

@@ -22,6 +22,7 @@
"scope": [],
"audit_type": "ONESHOT",
"state": "PENDING",
"status_message": null,
"parameters": {},
"interval": null,
"updated_at": null
@@ -30,6 +31,7 @@
"uuid": "76be87bd-3422-43f9-93a0-e85a577e3061",
"fault": null,
"state": "ONGOING",
"status_message": null,
"global_efficacy": [],
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"strategy": {

View File

@@ -16,6 +16,7 @@
"interval": null,
"updated_at": null,
"state": "PENDING",
"status_message": null,
"deleted_at": null,
"parameters": {}
},
@@ -35,6 +36,7 @@
"watcher_object.name": "ActionPlanStateUpdatePayload"
},
"state": "ONGOING",
"status_message": null,
"deleted_at": null,
"strategy_uuid": "cb3d0b58-4415-4d90-b75b-1e96878730e3",
"strategy": {

View File

@@ -9,6 +9,7 @@
"para1": 3.2
},
"state": "PENDING",
"status_message": null,
"updated_at": null,
"deleted_at": null,
"goal_uuid": "bc830f84-8ae3-4fc6-8bc6-e3dd15e8b49a",

View File

@@ -9,6 +9,7 @@
"para1": 3.2
},
"state": "DELETED",
"status_message": null,
"updated_at": null,
"deleted_at": null,
"goal_uuid": "bc830f84-8ae3-4fc6-8bc6-e3dd15e8b49a",

View File

@@ -9,6 +9,7 @@
"para1": 3.2
},
"state": "ONGOING",
"status_message": null,
"updated_at": null,
"deleted_at": null,
"fault": null,

View File

@@ -9,6 +9,7 @@
"para1": 3.2
},
"state": "ONGOING",
"status_message": null,
"updated_at": null,
"deleted_at": null,
"fault": {

View File

@@ -9,6 +9,7 @@
"para1": 3.2
},
"state": "ONGOING",
"status_message": null,
"updated_at": null,
"deleted_at": null,
"fault": null,

View File

@@ -9,6 +9,7 @@
"para1": 3.2
},
"state": "ONGOING",
"status_message": null,
"updated_at": null,
"deleted_at": null,
"fault": null,

View File

@@ -9,6 +9,7 @@
"para1": 3.2
},
"state": "ONGOING",
"status_message": null,
"updated_at": null,
"deleted_at": null,
"fault": {

View File

@@ -9,6 +9,7 @@
"para1": 3.2
},
"state": "ONGOING",
"status_message": null,
"updated_at": null,
"deleted_at": null,
"fault": null,

View File

@@ -70,6 +70,7 @@
"interval": null,
"updated_at": null,
"state": "ONGOING",
"status_message": null,
"audit_type": "ONESHOT"
},
"watcher_object.namespace": "watcher",

View File

@@ -1,10 +1,10 @@
# The order of packages is significant, because pip processes them in the order
# of appearance. Changing the order has an impact on the overall integration
# process, which may cause wedges in the gate later.
openstackdocstheme>=2.2.1 # Apache-2.0
sphinx>=2.0.0,!=2.1.0 # BSD
sphinxcontrib-pecanwsme>=0.8.0 # Apache-2.0
sphinx>=2.1.1 # BSD
sphinxcontrib-svg2pdfconverter>=0.1.0 # BSD
reno>=3.1.0 # Apache-2.0
sphinxcontrib-pecanwsme>=0.8.0 # Apache-2.0
sphinxcontrib-apidoc>=0.2.0 # BSD
# openstack
os-api-ref>=1.4.0 # Apache-2.0
openstackdocstheme>=2.2.1 # Apache-2.0
# releasenotes
reno>=3.1.0 # Apache-2.0

View File

@@ -34,7 +34,7 @@ own sections. However, the base *GMR* consists of several sections:
Package
Shows information about the package to which this process belongs, including
version informations.
version information.
Threads
Shows stack traces and thread ids for each of the threads within this

View File

@@ -285,7 +285,7 @@ Audit and interval (in case of CONTINUOUS type). There is three types of Audit:
ONESHOT, CONTINUOUS and EVENT. ONESHOT Audit is launched once and if it
succeeded executed new action plan list will be provided; CONTINUOUS Audit
creates action plans with specified interval (in seconds or cron format, cron
inteval can be used like: `*/5 * * * *`), if action plan
interval can be used like: ``*/5 * * * *``), if action plan
has been created, all previous action plans get CANCELLED state;
EVENT audit is launched when receiving webhooks API.
@@ -384,7 +384,9 @@ following methods of the :ref:`Action <action_definition>` handler:
- **preconditions()**: this method will make sure that all conditions are met
before executing the action (for example, it makes sure that an instance
still exists before trying to migrate it).
still exists before trying to migrate it). If action specific preconditions
are not met in this phase, the Action is set to **SKIPPED** state and will
not be executed.
- **execute()**: this method is what triggers real commands on other
OpenStack services (such as Nova, ...) in order to change target resource
state. If the action is successfully executed, a notification message is
@@ -479,6 +481,39 @@ change to a new value:
.. image:: ./images/action_plan_state_machine.png
:width: 100%
.. _action_state_machine:
Action State Machine
-------------------------
An :ref:`Action <action_definition>` has a life-cycle and its current state may
be one of the following:
- **PENDING** : the :ref:`Action <action_definition>` has not been executed
yet by the :ref:`Watcher Applier <watcher_applier_definition>`
- **SKIPPED** : the :ref:`Action <action_definition>` will not be executed
because a predefined skipping condition is found by
:ref:`Watcher Applier <watcher_applier_definition>` or is explicitly
skipped by the :ref:`Administrator <administrator_definition>`.
- **ONGOING** : the :ref:`Action <action_definition>` is currently being
processed by the :ref:`Watcher Applier <watcher_applier_definition>`
- **SUCCEEDED** : the :ref:`Action <action_definition>` has been executed
successfully
- **FAILED** : an error occurred while trying to execute the
:ref:`Action <action_definition>`
- **DELETED** : the :ref:`Action <action_definition>` is still stored in the
:ref:`Watcher database <watcher_database_definition>` but is not returned
any more through the Watcher APIs.
- **CANCELLED** : the :ref:`Action <action_definition>` was in **PENDING** or
**ONGOING** state and was cancelled by the
:ref:`Administrator <administrator_definition>`
The following diagram shows the different possible states of an
:ref:`Action <action_definition>` and what event makes the state change
change to a new value:
.. image:: ./images/action_state_machine.png
:width: 100%
.. _Watcher API: https://docs.openstack.org/api-ref/resource-optimization/

22
doc/source/conf.py Executable file → Normal file
View File

@@ -56,8 +56,8 @@ source_suffix = '.rst'
master_doc = 'index'
# General information about the project.
project = u'Watcher'
copyright = u'OpenStack Foundation'
project = 'Watcher'
copyright = 'OpenStack Foundation'
# A list of ignored prefixes for module index sorting.
modindex_common_prefix = ['watcher.']
@@ -91,14 +91,14 @@ pygments_style = 'native'
# List of tuples 'sourcefile', 'target', u'title', u'Authors name', 'manual'
man_pages = [
('man/watcher-api', 'watcher-api', u'Watcher API Server',
[u'OpenStack'], 1),
('man/watcher-applier', 'watcher-applier', u'Watcher Applier',
[u'OpenStack'], 1),
('man/watcher-api', 'watcher-api', 'Watcher API Server',
['OpenStack'], 1),
('man/watcher-applier', 'watcher-applier', 'Watcher Applier',
['OpenStack'], 1),
('man/watcher-db-manage', 'watcher-db-manage',
u'Watcher Db Management Utility', [u'OpenStack'], 1),
'Watcher Db Management Utility', ['OpenStack'], 1),
('man/watcher-decision-engine', 'watcher-decision-engine',
u'Watcher Decision Engine', [u'OpenStack'], 1),
'Watcher Decision Engine', ['OpenStack'], 1),
]
# -- Options for HTML output --------------------------------------------------
@@ -115,7 +115,7 @@ html_theme = 'openstackdocs'
htmlhelp_basename = '%sdoc' % project
#openstackdocstheme options
# openstackdocstheme options
openstackdocs_repo_name = 'openstack/watcher'
openstackdocs_pdf_link = True
openstackdocs_auto_name = False
@@ -128,8 +128,8 @@ openstackdocs_bug_tag = ''
latex_documents = [
('index',
'doc-watcher.tex',
u'Watcher Documentation',
u'OpenStack Foundation', 'manual'),
'Watcher Documentation',
'OpenStack Foundation', 'manual'),
]
# If false, no module index is generated.

View File

@@ -194,11 +194,14 @@ The configuration file is organized into the following sections:
* ``[watcher_applier]`` - Watcher Applier module configuration
* ``[watcher_decision_engine]`` - Watcher Decision Engine module configuration
* ``[oslo_messaging_rabbit]`` - Oslo Messaging RabbitMQ driver configuration
* ``[ceilometer_client]`` - Ceilometer client configuration
* ``[cinder_client]`` - Cinder client configuration
* ``[glance_client]`` - Glance client configuration
* ``[gnocchi_client]`` - Gnocchi client configuration
* ``[ironic_client]`` - Ironic client configuration
* ``[keystone_client]`` - Keystone client configuration
* ``[nova_client]`` - Nova client configuration
* ``[neutron_client]`` - Neutron client configuration
* ``[placement_client]`` - Placement client configuration
The Watcher configuration file is expected to be named
``watcher.conf``. When starting Watcher, you can specify a different
@@ -372,7 +375,7 @@ You can configure and install Ceilometer by following the documentation below :
#. https://docs.openstack.org/ceilometer/latest
The built-in strategy 'basic_consolidation' provided by watcher requires
"**compute.node.cpu.percent**" and "**cpu_util**" measurements to be collected
"**compute.node.cpu.percent**" and "**cpu**" measurements to be collected
by Ceilometer.
The measurements available depend on the hypervisors that OpenStack manages on
the specific implementation.
@@ -426,20 +429,38 @@ Configure Cinder Notifications
Watcher can also consume notifications generated by the Cinder services, in
order to build or update, in real time, its cluster data model related to
storage resources. To do so, you have to update the Cinder configuration
file on controller and volume nodes, in order to let Watcher receive Cinder
notifications in a dedicated ``watcher_notifications`` channel.
storage resources.
* In the file ``/etc/cinder/cinder.conf``, update the section
``[oslo_messaging_notifications]``, by redefining the list of topics
into which Cinder services will publish events ::
Cinder emits notifications on the ``notifications`` topic, in the openstack
control exchange (as it can be seen in the `Cinder conf`_).
* In the file ``/etc/cinder/cinder.conf``, the value of driver in the section
``[oslo_messaging_notifications]`` can't be noop.
[oslo_messaging_notifications]
driver = messagingv2
topics = notifications,watcher_notifications
* Restart the Cinder services.
.. _`Cinder conf`: https://docs.openstack.org/cinder/latest/configuration/block-storage/samples/cinder.conf.html
Configure Watcher listening to the Notifications
================================================
To consume either Cinder or Nova notifications, (or both), Watcher must be
configured to listen to the notifications topics that Cinder and Nova emit.
Use the `notification_topics`_ config option to indicate to Watcher that it
should listen to the correct topics. By default, Cinder emits notifications
on ``openstack.notifications``, while Nova emits notifications on
``nova.versioned_notifications``. The Watcher conf should have the topics for
the desired notifications, below is an example for both Cinder and Nova::
[watcher_decision_engine]
...
notification_topics = nova.versioned_notifications,openstack.notifications
.. _`notification_topics`: https://docs.openstack.org/watcher/latest/configuration/watcher.html#watcher_decision_engine.notification_topics
Workers
=======

View File

@@ -52,18 +52,43 @@ types of concurrency used in various services of Watcher.
.. _wait_for_any: https://docs.openstack.org/futurist/latest/reference/index.html#waiters
Concurrency modes
#################
Evenlet has been the main concurrency library within the OpenStack community
for the last 10 years since the removal of twisted. Over the last few years,
the maintenance of eventlet has decreased and the efforts to remove the GIL
from Python (PEP 703), have fundamentally changed how concurrency is making
eventlet no longer viable. While transitioning to a new native thread
solution, Watcher services will be supporting both modes, with the usage of
native threading mode initially classified as ``experimental``.
It is possible to enable the new native threading mode by setting the following
environment variable in the corresponding service configuration:
.. code:: bash
OS_WATCHER_DISABLE_EVENTLET_PATCHING=true
.. note::
The only service that supports two different concurrency modes is the
``decision engine``.
Decision engine concurrency
***************************
The concurrency in the decision engine is governed by two independent
threadpools. Both of these threadpools are GreenThreadPoolExecutor_ from the
futurist_ library. One of these is used automatically and most contributors
threadpools. These threadpools can be configured as GreenThreadPoolExecutor_
or ThreadPoolExecutor_, both from the futurist_ library, depending on the
service configuration. One of these is used automatically and most contributors
will not interact with it while developing new features. The other threadpool
can frequently be used while developing new features or updating existing ones.
It is known as the DecisionEngineThreadpool and allows to achieve performance
improvements in network or I/O bound operations.
.. _GreenThreadPoolExecutor: https://docs.openstack.org/futurist/latest/reference/index.html#executors
.. _GreenThreadPoolExecutor: https://docs.openstack.org/futurist/latest/reference/index.html#futurist.GreenThreadPoolExecutor
.. _ThreadPoolExecutor: https://docs.openstack.org/futurist/latest/reference/index.html#futurist.ThreadPoolExecutor
AuditEndpoint
#############
@@ -221,7 +246,7 @@ workflow engine can halt or take other actions while the action plan is being
executed based on the success or failure of individual actions. However, the
base workflow engine simply uses these notifies to store the result of
individual actions in the database. Additionally, since taskflow uses a graph
flow if any of the tasks would fail all childs of this tasks not be executed
flow if any of the tasks would fail all children of this tasks not be executed
while ``do_revert`` will be triggered for all parents.
.. code-block:: python

View File

@@ -16,7 +16,7 @@ multinode environment to use.
You can set up the Watcher services quickly and easily using a Watcher
DevStack plugin. See `PluginModelDocs`_ for information on DevStack's plugin
model. To enable the Watcher plugin with DevStack, add the following to the
`[[local|localrc]]` section of your controller's `local.conf` to enable the
``[[local|localrc]]`` section of your controller's ``local.conf`` to enable the
Watcher plugin::
enable_plugin watcher https://opendev.org/openstack/watcher
@@ -31,66 +31,104 @@ Quick Devstack Instructions with Datasources
============================================
Watcher requires a datasource to collect metrics from compute nodes and
instances in order to execute most strategies. To enable this a
`[[local|localrc]]` to setup DevStack for some of the supported datasources
is provided. These examples specify the minimal configuration parameters to
get both Watcher and the datasource working but can be expanded is desired.
instances in order to execute most strategies. To enable this two possible
examples of ``[[local|localrc]]`` to setup DevStack for some of the
supported datasources is provided. These examples specify the minimal
configuration parameters to get both Watcher and the datasource working
but can be expanded is desired.
The first example configures watcher to user prometheus as a datasource, while
the second example show how to use gnocchi as the datasource. The procedure is
equivalent, it just requires using the ``local.conf.controller`` and
``local.conf.compute`` in the first example and
``local_gnocchi.conf.controller`` and ``local_gnocchi.conf.compute`` in the
second.
Prometheus
----------
With the Prometheus datasource most of the metrics for compute nodes and
instances will work with the provided configuration but metrics that
require Ironic such as ``host_airflow and`` ``host_power`` will still be
unavailable as well as ``instance_l3_cpu_cache``
.. code-block:: ini
[[local|localrc]]
enable_plugin watcher https://opendev.org/openstack/watcher
enable_plugin watcher-dashboard https://opendev.org/openstack/watcher-dashboard
enable_plugin ceilometer https://opendev.org/openstack/ceilometer.git
enable_plugin aodh https://opendev.org/openstack/aodh
enable_plugin devstack-plugin-prometheus https://opendev.org/openstack/devstack-plugin-prometheus
enable_plugin sg-core https://github.com/openstack-k8s-operators/sg-core main
CEILOMETER_BACKEND=sg-core
[[post-config|$NOVA_CONF]]
[DEFAULT]
compute_monitors=cpu.virt_driver
Gnocchi
-------
With the Gnocchi datasource most of the metrics for compute nodes and
instances will work with the provided configuration but metrics that
require Ironic such as `host_airflow and` `host_power` will still be
unavailable as well as `instance_l3_cpu_cache`::
require Ironic such as ``host_airflow and`` ``host_power`` will still be
unavailable as well as ``instance_l3_cpu_cache``
[[local|localrc]]
enable_plugin watcher https://opendev.org/openstack/watcher
.. code-block:: ini
enable_plugin watcher-dashboard https://opendev.org/openstack/watcher-dashboard
[[local|localrc]]
enable_plugin ceilometer https://opendev.org/openstack/ceilometer.git
CEILOMETER_BACKEND=gnocchi
enable_plugin watcher https://opendev.org/openstack/watcher
enable_plugin watcher-dashboard https://opendev.org/openstack/watcher-dashboard
enable_plugin ceilometer https://opendev.org/openstack/ceilometer.git
enable_plugin aodh https://opendev.org/openstack/aodh
enable_plugin panko https://opendev.org/openstack/panko
enable_plugin aodh https://opendev.org/openstack/aodh
enable_plugin panko https://opendev.org/openstack/panko
[[post-config|$NOVA_CONF]]
[DEFAULT]
compute_monitors=cpu.virt_driver
CEILOMETER_BACKEND=gnocchi
[[post-config|$NOVA_CONF]]
[DEFAULT]
compute_monitors=cpu.virt_driver
Detailed DevStack Instructions
==============================
#. Obtain N (where N >= 1) servers (virtual machines preferred for DevStack).
One of these servers will be the controller node while the others will be
compute nodes. N is preferably >= 3 so that you have at least 2 compute
nodes, but in order to stand up the Watcher services only 1 server is
needed (i.e., no computes are needed if you want to just experiment with
the Watcher services). These servers can be VMs running on your local
machine via VirtualBox if you prefer. DevStack currently recommends that
you use Ubuntu 16.04 LTS. The servers should also have connections to the
same network such that they are all able to communicate with one another.
#. Obtain N (where N >= 1) servers (virtual machines preferred for DevStack).
One of these servers will be the controller node while the others will be
compute nodes. N is preferably >= 3 so that you have at least 2 compute
nodes, but in order to stand up the Watcher services only 1 server is
needed (i.e., no computes are needed if you want to just experiment with
the Watcher services). These servers can be VMs running on your local
machine via VirtualBox if you prefer. DevStack currently recommends that
you use Ubuntu 16.04 LTS. The servers should also have connections to the
same network such that they are all able to communicate with one another.
#. For each server, clone the DevStack repository and create the stack user::
#. For each server, clone the DevStack repository and create the stack user
sudo apt-get update
sudo apt-get install git
git clone https://opendev.org/openstack/devstack.git
sudo ./devstack/tools/create-stack-user.sh
.. code-block:: bash
sudo apt-get update
sudo apt-get install git
git clone https://opendev.org/openstack/devstack.git
sudo ./devstack/tools/create-stack-user.sh
Now you have a stack user that is used to run the DevStack processes. You
may want to give your stack user a password to allow SSH via a password::
may want to give your stack user a password to allow SSH via a password
sudo passwd stack
.. code-block:: bash
#. Switch to the stack user and clone the DevStack repo again::
sudo passwd stack
sudo su stack
cd ~
git clone https://opendev.org/openstack/devstack.git
#. Switch to the stack user and clone the DevStack repo again
#. For each compute node, copy the provided `local.conf.compute`_ example file
.. code-block:: bash
sudo su stack
cd ~
git clone https://opendev.org/openstack/devstack.git
#. For each compute node, copy the provided `local.conf.compute`_
(`local_gnocchi.conf.compute`_ if deploying with gnocchi) example file
to the compute node's system at ~/devstack/local.conf. Make sure the
HOST_IP and SERVICE_HOST values are changed appropriately - i.e., HOST_IP
is set to the IP address of the compute node and SERVICE_HOST is set to the
@@ -106,29 +144,47 @@ Detailed DevStack Instructions
to configure similar configuration options for the projects providing those
metrics.
#. For the controller node, copy the provided `local.conf.controller`_ example
#. For the controller node, copy the provided `local.conf.controller`_
(`local_gnocchi.conf.controller`_ if deploying with gnocchi) example
file to the controller node's system at ~/devstack/local.conf. Make sure
the HOST_IP value is changed appropriately - i.e., HOST_IP is set to the IP
address of the controller node.
Note: if you want to use another Watcher git repository (such as a local
one), then change the enable plugin line::
.. NOTE::
if you want to use another Watcher git repository (such as a local
one), then change the enable plugin line
.. code-block:: bash
enable_plugin watcher <your_local_git_repo> [optional_branch]
enable_plugin watcher <your_local_git_repo> [optional_branch]
If you do this, then the Watcher DevStack plugin will try to pull the
python-watcherclient repo from <your_local_git_repo>/../, so either make
sure that is also available or specify WATCHERCLIENT_REPO in the local.conf
python-watcherclient repo from ``<your_local_git_repo>/../``, so either make
sure that is also available or specify WATCHERCLIENT_REPO in the ``local.conf``
file.
Note: if you want to use a specific branch, specify WATCHER_BRANCH in the
local.conf file. By default it will use the master branch.
.. NOTE::
if you want to use a specific branch, specify WATCHER_BRANCH in the
local.conf file. By default it will use the master branch.
Note: watcher-api will default run under apache/httpd, set the variable
WATCHER_USE_MOD_WSGI=FALSE if you do not wish to run under apache/httpd.
For development environment it is suggested to set WATHCER_USE_MOD_WSGI
to FALSE. For Production environment it is suggested to keep it at the
default TRUE value.
.. Note::
watcher-api will default run under apache/httpd, set the variable
WATCHER_USE_MOD_WSGI=FALSE if you do not wish to run under apache/httpd.
For development environment it is suggested to set WATHCER_USE_MOD_WSGI
to FALSE. For Production environment it is suggested to keep it at the
default TRUE value.
#. If you want to use prometheus as a datasource, you need to provide a
Prometheus configuration with the compute nodes set as targets, so
it can consume their node-exporter metrics (if you are deploying watcher
with gnocchi as datasource you can skip this step altogether). Copy the
provided `prometheus.yml`_ example file and set the appropriate hostnames
for all the compute nodes (the example configures 2 of them plus the
controller, but you should add all of them if using more than 2 compute
nodes). Set the value of ``PROMETHEUS_CONFIG_FILE`` to the path of the
file you created in the local.conf file (the sample local.conf file uses
``$DEST`` as the default value for the prometheus config path).
#. Start stacking from the controller node::
@@ -136,11 +192,15 @@ Detailed DevStack Instructions
#. Start stacking on each of the compute nodes using the same command.
#. Configure the environment for live migration via NFS. See the
`Multi-Node DevStack Environment`_ section for more details.
.. seealso::
Configure the environment for live migration via NFS. See the
`Multi-Node DevStack Environment`_ section for more details.
.. _local.conf.controller: https://github.com/openstack/watcher/tree/master/devstack/local.conf.controller
.. _local.conf.compute: https://github.com/openstack/watcher/tree/master/devstack/local.conf.compute
.. _local_gnocchi.conf.controller: https://github.com/openstack/watcher/tree/master/devstack/local_gnocchi.conf.controller
.. _local_gnocchi.conf.compute: https://github.com/openstack/watcher/tree/master/devstack/local_gnocchi.conf.compute
.. _prometheus.yml: https://github.com/openstack/watcher/tree/master/devstack/prometheus.yml
Multi-Node DevStack Environment
===============================
@@ -149,60 +209,19 @@ Since deploying Watcher with only a single compute node is not very useful, a
few tips are given here for enabling a multi-node environment with live
migration.
Configuring NFS Server
----------------------
.. NOTE::
If you would like to use live migration for shared storage, then the controller
can serve as the NFS server if needed::
Nova supports live migration with local block storage so by default NFS
is not required and is considered an advance configuration.
The minimum requirements for live migration are:
sudo apt-get install nfs-kernel-server
sudo mkdir -p /nfs/instances
sudo chown stack:stack /nfs/instances
- all hostnames are resolvable on each host
- all hosts have a passwordless ssh key that is trusted by the other hosts
- all hosts have a known_hosts file that lists each hosts
Add an entry to `/etc/exports` with the appropriate gateway and netmask
information::
/nfs/instances <gateway>/<netmask>(rw,fsid=0,insecure,no_subtree_check,async,no_root_squash)
Export the NFS directories::
sudo exportfs -ra
Make sure the NFS server is running::
sudo service nfs-kernel-server status
If the server is not running, then start it::
sudo service nfs-kernel-server start
Configuring NFS on Compute Node
-------------------------------
Each compute node needs to use the NFS server to hold the instance data::
sudo apt-get install rpcbind nfs-common
mkdir -p /opt/stack/data/instances
sudo mount <nfs-server-ip>:/nfs/instances /opt/stack/data/instances
If you would like to have the NFS directory automatically mounted on reboot,
then add the following to `/etc/fstab`::
<nfs-server-ip>:/nfs/instances /opt/stack/data/instances nfs auto 0 0
Edit `/etc/libvirt/libvirtd.conf` to make sure the following values are set::
listen_tls = 0
listen_tcp = 1
auth_tcp = "none"
Edit `/etc/default/libvirt-bin`::
libvirtd_opts="-d -l"
Restart the libvirt service::
sudo service libvirt-bin restart
If these requirements are met live migration will be possible.
Shared storage such as ceph, booting form cinder volume or nfs are recommend
when testing evacuate if you want to preserve vm data.
Setting up SSH keys between compute nodes to enable live migration
------------------------------------------------------------------
@@ -231,22 +250,91 @@ must exist in every other compute node's stack user's authorized_keys file and
every compute node's public ECDSA key needs to be in every other compute
node's root user's known_hosts file.
Disable serial console
----------------------
Configuring NFS Server (ADVANCED)
---------------------------------
Serial console needs to be disabled for live migration to work.
If you would like to use live migration for shared storage, then the controller
can serve as the NFS server if needed
On both the controller and compute node, in /etc/nova/nova.conf
.. code-block:: bash
[serial_console]
enabled = False
sudo apt-get install nfs-kernel-server
sudo mkdir -p /nfs/instances
sudo chown stack:stack /nfs/instances
Alternatively, in devstack's local.conf:
Add an entry to ``/etc/exports`` with the appropriate gateway and netmask
information
[[post-config|$NOVA_CONF]]
[serial_console]
#enabled=false
.. code-block:: bash
/nfs/instances <gateway>/<netmask>(rw,fsid=0,insecure,no_subtree_check,async,no_root_squash)
Export the NFS directories
.. code-block:: bash
sudo exportfs -ra
Make sure the NFS server is running
.. code-block:: bash
sudo service nfs-kernel-server status
If the server is not running, then start it
.. code-block:: bash
sudo service nfs-kernel-server start
Configuring NFS on Compute Node (ADVANCED)
------------------------------------------
Each compute node needs to use the NFS server to hold the instance data
.. code-block:: bash
sudo apt-get install rpcbind nfs-common
mkdir -p /opt/stack/data/instances
sudo mount <nfs-server-ip>:/nfs/instances /opt/stack/data/instances
If you would like to have the NFS directory automatically mounted on reboot,
then add the following to ``/etc/fstab``
.. code-block:: bash
<nfs-server-ip>:/nfs/instances /opt/stack/data/instances nfs auto 0 0
Configuring libvirt to listen on tcp (ADVANCED)
-----------------------------------------------
.. NOTE::
By default nova will use ssh as a transport for live migration
if you have a low bandwidth connection you can use tcp instead
however this is generally not recommended.
Edit ``/etc/libvirt/libvirtd.conf`` to make sure the following values are set
.. code-block:: ini
listen_tls = 0
listen_tcp = 1
auth_tcp = "none"
Edit ``/etc/default/libvirt-bin``
.. code-block:: ini
libvirtd_opts="-d -l"
Restart the libvirt service
.. code-block:: bash
sudo service libvirt-bin restart
VNC server configuration
------------------------
@@ -254,13 +342,18 @@ VNC server configuration
The VNC server listening parameter needs to be set to any address so
that the server can accept connections from all of the compute nodes.
On both the controller and compute node, in /etc/nova/nova.conf
On both the controller and compute node, in ``/etc/nova/nova.conf``
vncserver_listen = 0.0.0.0
.. code-block:: ini
Alternatively, in devstack's local.conf:
[vnc]
server_listen = "0.0.0.0"
VNCSERVER_LISTEN=0.0.0.0
Alternatively, in devstack's ``local.conf``:
.. code-block:: bash
VNCSERVER_LISTEN="0.0.0.0"
Environment final checkup

View File

@@ -43,7 +43,7 @@ different version of the above, please document your configuration here!
Getting the latest code
=======================
Make a clone of the code from our `Git repository`:
Make a clone of the code from our ``Git repository``:
.. code-block:: bash
@@ -72,9 +72,9 @@ These dependencies can be installed from PyPi_ using the Python tool pip_.
.. _PyPi: https://pypi.org/
.. _pip: https://pypi.org/project/pip
However, your system *may* need additional dependencies that `pip` (and by
However, your system *may* need additional dependencies that ``pip`` (and by
extension, PyPi) cannot satisfy. These dependencies should be installed
prior to using `pip`, and the installation method may vary depending on
prior to using ``pip``, and the installation method may vary depending on
your platform.
* Ubuntu 16.04::
@@ -141,7 +141,7 @@ forget to activate it:
$ workon watcher
You should then be able to `import watcher` using Python without issue:
You should then be able to ``import watcher`` using Python without issue:
.. code-block:: bash

View File

@@ -10,3 +10,4 @@ Contribution Guide
devstack
testing
rally_link
release-guide

View File

@@ -300,6 +300,6 @@ Using that you can now query the values for that specific metric:
.. code-block:: py
avg_meter = self.datasource_backend.statistic_aggregation(
instance.uuid, 'cpu_util', self.periods['instance'],
instance.uuid, 'instance_cpu_usage', self.periods['instance'],
self.granularity,
aggregation=self.aggregation_method['instance'])

View File

@@ -0,0 +1,462 @@
..
Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License. You may obtain
a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
Chronological Release Liaison Guide
====================================
This is a reference guide that a release liaison may use as an aid, if
they choose.
Watcher uses the `Distributed Project Leadership (DPL)`__ model where
traditional release liaison responsibilities are distributed among various
liaisons. The release liaison is responsible for requesting releases,
reviewing Feature Freeze Exception (FFE) requests, and coordinating
release-related activities with the team.
.. __: https://governance.openstack.org/tc/reference/distributed-project-leadership.html
How to Use This Guide
---------------------
This guide is organized chronologically to follow the OpenStack release
cycle from PTG planning through post-release activities. You can use it
in two ways:
**For New Release Liaisons**
Read through the entire guide to understand the full release cycle,
then bookmark it for reference during your term.
**For Experienced Release Liaisons**
Jump directly to the relevant section for your current phase in the
release cycle. Each major section corresponds to a specific time period.
**Key Navigation Tips**
* The :ref:`glossary` defines all acronyms and terminology used
* Time-sensitive activities are clearly marked by milestone phases
* DPL coordination notes indicate when team collaboration is required
DPL Liaison Coordination
-------------------------
Under the DPL model, the release liaison coordinates with other project
liaisons and the broader team for effective release management. The release
liaison has authority for release-specific decisions (FFE approvals, release
timing, etc.) while major process changes and strategic decisions require
team consensus.
This coordination approach ensures that:
* Release activities are properly managed by a dedicated liaison
* Team input is gathered for significant decisions
* Other liaisons are informed of release-related developments that may
affect their areas
* Release processes remain responsive while maintaining team alignment
Project Context
---------------
* Coordinate with the watcher meeting (chair rotates each meeting, with
volunteers requested at the end of each meeting)
* Meeting etherpad: https://etherpad.opendev.org/p/openstack-watcher-irc-meeting
* IRC channel: #openstack-watcher
* Get acquainted with the release schedule
* Example: https://releases.openstack.org/<current-release>/schedule.html
* Familiarize with Watcher project repositories and tracking:
Watcher Main Repository
`Primary codebase for the Watcher service <https://opendev.org/openstack/watcher>`__
Watcher Dashboard
`Horizon plugin for Watcher UI <https://opendev.org/openstack/watcher-dashboard>`__
Watcher Tempest Plugin
`Integration tests <https://opendev.org/openstack/watcher-tempest-plugin>`__ (follows tempest cycle)
Python Watcher Client
`Command-line client and Python library <https://opendev.org/openstack/python-watcherclient>`__
Watcher Specifications
`Design specifications <https://opendev.org/openstack/watcher-specs>`__ (not released)
Watcher Launchpad (Main)
`Primary bug and feature tracking <https://launchpad.net/watcher>`__
Watcher Dashboard Launchpad
`Dashboard-specific tracking <https://launchpad.net/watcher-dashboard/>`__
Watcher Tempest Plugin Launchpad
`Test plugin tracking <https://launchpad.net/watcher-tempest-plugin>`__
Python Watcher Client Launchpad
`Client library tracking <https://launchpad.net/python-watcherclient>`__
Project Team Gathering
----------------------
Event Liaison Coordination
~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Work with the project team to select an event liaison for PTG coordination.
The event liaison is responsible for:
* Reserving sufficient space at PTG for the project team's meetings
* Putting out an agenda for team meetings
* Ensuring meetings are organized and facilitated
* Documenting meeting results
* If no event liaison is selected, these duties revert to the release liaison.
* Monitor for OpenStack Events team queries on the mailing list requesting
event liaison volunteers - teams not responding may lose event
representation.
PTG Planning and Execution
~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Create PTG planning etherpad, retrospective etherpad and alert about it in
watcher meeting and dev mailing list
* Example: https://etherpad.opendev.org/p/apr2025-ptg-watcher
* Run sessions at the PTG (if no event liaison is selected)
* Do a retro of the previous cycle
* Coordinate with team to establish agreement on the agenda for this release:
Review Days Planning
Determine number of review days allocated for specs and implementation work
Freeze Dates Coordination
Define Spec approval and Feature freeze dates through team collaboration
Release Schedule Modifications
Modify the OpenStack release schedule if needed by proposing new dates
(Example: https://review.opendev.org/c/openstack/releases/+/877094)
* Discuss the implications of the `SLURP or non-SLURP`__ current release
.. __: https://governance.openstack.org/tc/resolutions/20220210-release-cadence-adjustment.html
* Sign up for group photo at the PTG (if applicable)
After PTG
---------
* Send PTG session summaries to the dev mailing list
* Add `RFE bugs`__ if you have action items that are simple to do but
without a owner yet.
* Update IRC #openstack-watcher channel topic to point to new
development-planning etherpad.
.. __: https://bugs.launchpad.net/watcher/+bugs?field.tag=rfe
A few weeks before milestone 1
------------------------------
* Plan a spec review day
* Periodically check the series goals others have proposed in the “Set series
goals” link:
* Example: https://blueprints.launchpad.net/watcher/<current-release>/+setgoals
Milestone 1
-----------
* Release watcher and python-watcherclient via the openstack/releases repo.
Watcher follows the `cycle-with-intermediary`__ release model:
.. __: https://releases.openstack.org/reference/release_models.html#cycle-with-intermediary
* Create actual releases (not just launchpad bookkeeping) at milestone points
* No launchpad milestone releases are created for intermediary releases
* When releasing the first version of a library for the cycle,
bump
the minor version to leave room for future stable branch
releases
* Release stable branches of watcher
Stable Branch Release Process
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Prepare the stable branch for evaluation:
.. code-block:: bash
git checkout <stable branch>
git log --no-merges <last tag>..
Analyze commits to determine version bump according to semantic versioning.
Semantic Versioning Guidelines
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Choose version bump based on changes since last release:
Major Version (X)
Backward-incompatible changes that break existing APIs
Minor Version (Y)
New features that maintain backward compatibility
Patch Version (Z)
Bug fixes that maintain backward compatibility
Release Command Usage
~~~~~~~~~~~~~~~~~~~~~
Generate the release using OpenStack tooling:
* Use the `new-release command
<https://releases.openstack.org/reference/using.html#using-new-release-command>`__
* Propose the release with version according to chosen semver format
(x.y.z)
Summit
------
``Responsibility Precedence for Summit Activities:``
1. ``Project Update/Onboarding Liaisons`` (if appointed):
* ``Project Update Liaison``: responsible for giving the project update
showcasing team's achievements for the cycle to the community
* ``Project Onboarding Liaison``: responsible for giving/facilitating
onboarding sessions during events for the project's community
2. ``Event Liaison`` (if no Project Update/Onboarding liaisons exist):
* Coordinates all Summit activities including project updates and onboarding
3. ``Release Liaison`` (if no Event Liaison is appointed):
* Work with the team to ensure Summit activities are properly handled:
* Prepare the project update presentation
* Prepare the on-boarding session materials
* Prepare the operator meet-and-greet session
.. note::
The team can choose to not have a Summit presence if desired.
A few weeks before milestone 2
------------------------------
* Plan a spec review day (optional)
Milestone 2
-----------
* Spec freeze (unless changed by team agreement at PTG)
* Release watcher and python-watcherclient (if needed)
* Stable branch releases of watcher
Shortly after spec freeze
-------------------------
* Create a blueprint status etherpad to help track, especially non-priority
blueprint work, to help things get done by Feature Freeze (FF). Example:
* https://etherpad.opendev.org/p/watcher-<release>-blueprint-status
* Create or review a patch to add the next releases specs directory so people
can propose specs for next release after spec freeze for current release
Milestone 3
-----------
* Feature freeze day
* Client library freeze, release python-watcherclient
* Close out all blueprints, including “catch all” blueprints like mox,
versioned notifications
* Stable branch releases of watcher
* Start writing the `cycle highlights
<https://docs.openstack.org/project-team-guide/release-management.html#cycle-highlights>`__
Week following milestone 3
--------------------------
* If warranted, announce the FFE (feature freeze exception process) to
have people propose FFE requests to a special etherpad where they will
be reviewed.
FFE requests should first be discussed in the IRC meeting with the
requester present.
The release liaison has final decision on granting exceptions.
.. note::
if there is only a short time between FF and RC1 (lately its been 2
weeks), then the only likely candidates will be low-risk things that are
almost done. In general Feature Freeze exceptions should not be granted,
instead features should be deferred and reproposed for the next
development
cycle. FFE never extend beyond RC1.
* Mark the max microversion for the release in the
:doc:`/contributor/api_microversion_history`
A few weeks before RC
---------------------
* Update the release status etherpad with RC1 todos and keep track
of them in meetings
* Go through the bug list and identify any rc-potential bugs and tag them
RC
--
* Follow the standard OpenStack release checklist process
* If we want to drop backward-compat RPC code, we have to do a major RPC
version bump and coordinate it just before the major release:
* https://wiki.openstack.org/wiki/RpcMajorVersionUpdates
* Example: https://review.opendev.org/541035
* “Merge latest translations" means translation patches
* Check for translations with:
* https://review.opendev.org/#/q/status:open+project:openstack/watcher+branch:master+topic:zanata/translations
* Should NOT plan to have more than one RC if possible. RC2 should only happen
if there was a mistake and something was missed for RC, or a new regression
was discovered
* Write the reno prelude for the release GA
* Example: https://review.opendev.org/644412
* Push the cycle-highlights in marketing-friendly sentences and propose to the
openstack/releases repo. Usually based on reno prelude but made more readable
and friendly
* Example: https://review.opendev.org/644697
Immediately after RC
--------------------
* Look for bot proposed changes to reno and stable/<cycle>
* Create the launchpad series for the next cycle
* Set the development focus of the project to the new cycle series
* Set the status of the new series to “active development”
* Set the last series status to “current stable branch release”
* Set the previous to last series status to “supported”
* Repeat launchpad steps ^ for all watcher deliverables.
* Make sure the specs directory for the next cycle gets created so people can
start proposing new specs
* Make sure to move implemented specs from the previous release
* Move implemented specs manually (TODO: add tox command in future)
* Remove template files:
.. code-block:: bash
rm doc/source/specs/<release>/index.rst
rm doc/source/specs/<release>/template.rst
* Ensure liaison handoff: either transition to new release liaison or confirm
reappointment for next cycle
.. _glossary:
Glossary
--------
DPL
Distributed Project Leadership - A governance model where traditional PTL
responsibilities are distributed among various specialized liaisons.
FFE
Feature Freeze Exception - A request to add a feature after the feature
freeze deadline. Should be used sparingly for low-risk, nearly
complete features.
GA
General Availability - The final release of a software version for
production use.
PTG
Project Team Gathering - A collaborative event where OpenStack project
teams meet to plan and coordinate development activities.
RC
Release Candidate - A pre-release version that is potentially the final
version, pending testing and bug fixes.
RFE
Request for Enhancement - A type of bug report requesting a new feature
or enhancement to existing functionality.
SLURP
Skip Level Upgrade Release Process - An extended maintenance release
that allows skipping intermediate versions during upgrades.
Summit
OpenStack Summit - A conference where the OpenStack community gathers
for presentations, discussions, and project updates.
Miscellaneous Notes
-------------------
How to track launchpad blueprint approvals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Core team approves blueprints through team consensus. The release liaison
ensures launchpad status is updated correctly after core team approval:
* Set the approver as the core team member who approved the spec
* Set the Direction => Approved and Definition => Approved and make sure the
Series goal is set to the current release. If code is already proposed, set
Implementation => Needs Code Review
* Optional: add a comment to the Whiteboard explaining the approval,
with a date
(launchpad does not record approval dates). For example: “We discussed this
in the team meeting and agreed to approve this for <release>. -- <nick>
<YYYYMMDD>”
How to complete a launchpad blueprint
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Set Implementation => Implemented. The completion date will be recorded by
launchpad

View File

@@ -0,0 +1,157 @@
================
Aetos datasource
================
Synopsis
--------
The Aetos datasource allows Watcher to use an Aetos reverse proxy server as the
source for collected metrics used by the Watcher decision engine. Aetos is a
multi-tenant aware reverse proxy that sits in front of a Prometheus server and
provides Keystone authentication and role-based access control. The Aetos
datasource uses Keystone service discovery to locate the Aetos endpoint and
requires authentication via Keystone tokens.
Requirements
-------------
The Aetos datasource has the following requirements:
* An Aetos reverse proxy server deployed in front of Prometheus
* Aetos service registered in Keystone with service type 'metric-storage'
* Valid Keystone credentials for Watcher with admin or service role
* Prometheus metrics with appropriate labels (same as direct Prometheus access)
Like the Prometheus datasource, it is required that Prometheus metrics contain
a label to identify the hostname of the exporter from which the metric was
collected. This is used to match against the Watcher cluster model
``ComputeNode.hostname``. The default for this label is ``fqdn`` and in the
prometheus scrape configs would look like:
.. code-block::
scrape_configs:
- job_name: node
static_configs:
- targets: ['10.1.2.3:9100']
labels:
fqdn: "testbox.controlplane.domain"
This default can be overridden when a deployer uses a different label to
identify the exporter host (for example ``hostname`` or ``host``, or any other
label, as long as it identifies the host).
Internally this label is used in creating ``fqdn_instance_labels``, containing
the list of values assigned to the label in the Prometheus targets.
The elements of the resulting fqdn_instance_labels are expected to match the
``ComputeNode.hostname`` used in the Watcher decision engine cluster model.
An example ``fqdn_instance_labels`` is the following:
.. code-block::
[
'ena.controlplane.domain',
'dio.controlplane.domain',
'tria.controlplane.domain',
]
For instance metrics, it is required that Prometheus contains a label
with the uuid of the OpenStack instance in each relevant metric. By default,
the datasource will look for the label ``resource``. The
``instance_uuid_label`` config option in watcher.conf allows deployers to
override this default to any other label name that stores the ``uuid``.
Limitations
-----------
The Aetos datasource shares the same limitations as the Prometheus datasource:
The current implementation doesn't support the ``statistic_series`` function of
the Watcher ``class DataSourceBase``. It is expected that the
``statistic_aggregation`` function (which is implemented) is sufficient in
providing the **current** state of the managed resources in the cluster.
The ``statistic_aggregation`` function defaults to querying back 300 seconds,
starting from the present time (the time period is a function parameter and
can be set to a value as required). Implementing the ``statistic_series`` can
always be re-visited if the requisite interest and work cycles are volunteered
by the interested parties.
One further note about a limitation in the implemented
``statistic_aggregation`` function. This function is defined with a
``granularity`` parameter, to be used when querying whichever of the Watcher
``DataSourceBase`` metrics providers. In the case of Aetos (like Prometheus),
we do not fetch and then process individual metrics across the specified time
period. Instead we use the PromQL querying operators and functions, so that the
server itself will process the request across the specified parameters and
then return the result. So ``granularity`` parameter is redundant and remains
unused for the Aetos implementation of ``statistic_aggregation``. The
granularity of the data fetched by Prometheus server is specified in
configuration as the server ``scrape_interval`` (current default 15 seconds).
Additionally, there is a slight performance impact compared to direct
Prometheus access. Since Aetos acts as a reverse proxy in front of Prometheus,
there is an additional step for each request, resulting in slightly longer
delays.
Configuration
-------------
A deployer must set the ``datasources`` parameter to include ``aetos``
under the watcher_datasources section of watcher.conf (or add ``aetos`` in
datasources for a specific strategy if preferred eg. under the
``[watcher_strategies.workload_stabilization]`` section).
.. note::
Having both Prometheus and Aetos datasources configured at the same time
is not supported and will result in a configuration error. Allowing this
can be investigated in the future if a need or a proper use case is
identified.
The watcher.conf configuration file is also used to set the parameter values
required by the Watcher Aetos data source. The configuration can be
added under the ``[aetos_client]`` section and the available options are
duplicated below from the code as they are self documenting:
.. code-block::
cfg.StrOpt('interface',
default='public',
choices=['internal', 'public', 'admin'],
help="Type of endpoint to use in keystoneclient."),
cfg.StrOpt('region_name',
help="Region in Identity service catalog to use for "
"communication with the OpenStack service."),
cfg.StrOpt('fqdn_label',
default='fqdn',
help="The label that Prometheus uses to store the fqdn of "
"exporters. Defaults to 'fqdn'."),
cfg.StrOpt('instance_uuid_label',
default='resource',
help="The label that Prometheus uses to store the uuid of "
"OpenStack instances. Defaults to 'resource'."),
Authentication and Service Discovery
------------------------------------
Unlike the Prometheus datasource which requires explicit host and port
configuration, the Aetos datasource uses Keystone service discovery to
automatically locate the Aetos endpoint. The datasource:
1. Uses the configured Keystone credentials to authenticate
2. Searches the service catalog for a service with type 'metric-storage'
3. Uses the discovered endpoint URL to connect to Aetos
4. Attaches a Keystone token to each request for authentication
If the Aetos service is not registered in Keystone, the datasource will
fail to initialize and prevent the decision engine from starting.
So a sample watcher.conf configured to use the Aetos datasource would look
like the following:
.. code-block::
[watcher_datasources]
datasources = aetos
[aetos_client]
interface = public
region_name = RegionOne
fqdn_label = fqdn

View File

@@ -90,15 +90,15 @@ parameter will need to specify the type of http protocol and the use of
plain text http is strongly discouraged due to the transmission of the access
token. Additionally the path to the proxy interface needs to be supplied as
well in case Grafana is placed in a sub directory of the web server. An example
would be: `https://mygrafana.org/api/datasource/proxy/` were
`/api/datasource/proxy` is the default path without any subdirectories.
would be: ``https://mygrafana.org/api/datasource/proxy/`` were
``/api/datasource/proxy`` is the default path without any subdirectories.
Likewise, this parameter can not be placed in the yaml.
To prevent many errors from occurring and potentially filing the logs files it
is advised to specify the desired datasource in the configuration as it would
prevent the datasource manager from having to iterate and try possible
datasources with the launch of each audit. To do this specify `datasources` in
the `[watcher_datasources]` group.
datasources with the launch of each audit. To do this specify
``datasources`` in the ``[watcher_datasources]`` group.
The current configuration that is required to be placed in the traditional
configuration file would look like the following:
@@ -120,7 +120,7 @@ traditional configuration file or in the yaml, however, it is not advised to
mix and match but in the case it does occur the yaml would override the
settings from the traditional configuration file. All five of these parameters
are dictionaries mapping specific metrics to a configuration parameter. For
instance the `project_id_map` will specify the specific project id in Grafana
instance the ``project_id_map`` will specify the specific project id in Grafana
to be used. The parameters are named as follow:
* project_id_map
@@ -149,10 +149,10 @@ project_id
The project id's can only be determined by someone with the admin role in
Grafana as that role is required to open the list of projects. The list of
projects can be found on `/datasources` in the web interface but
projects can be found on ``/datasources`` in the web interface but
unfortunately it does not immediately display the project id. To display
the id one can best hover the mouse over the projects and the url will show the
project id's for example `/datasources/edit/7563`. Alternatively the entire
project id's for example ``/datasources/edit/7563``. Alternatively the entire
list of projects can be retrieved using the `REST api`_. To easily make
requests to the REST api a tool such as Postman can be used.
@@ -239,18 +239,24 @@ conversion from bytes to megabytes.
SELECT value/1000000 FROM memory...
Queries will be formatted using the .format string method within Python. This
format will currently have give attributes exposed to it labeled `{0}` to
`{4}`. Every occurrence of these characters within the string will be replaced
Queries will be formatted using the .format string method within Python.
This format will currently have give attributes exposed to it labeled
``{0}`` through ``{4}``.
Every occurrence of these characters within the string will be replaced
with the specific attribute.
- {0} is the aggregate typically `mean`, `min`, `max` but `count` is also
supported.
- {1} is the attribute as specified in the attribute parameter.
- {2} is the period of time to aggregate data over in seconds.
- {3} is the granularity or the interval between data points in seconds.
- {4} is translator specific and in the case of InfluxDB it will be used for
retention_periods.
{0}
is the aggregate typically ``mean``, ``min``, ``max`` but ``count``
is also supported.
{1}
is the attribute as specified in the attribute parameter.
{2}
is the period of time to aggregate data over in seconds.
{3}
is the granularity or the interval between data points in seconds.
{4}
is translator specific and in the case of InfluxDB it will be used for
retention_periods.
**InfluxDB**

View File

@@ -1,6 +1,11 @@
Datasources
===========
.. note::
The Monasca datasource is deprecated for removal and optional. To use it, install the optional extra:
``pip install watcher[monasca]``. If Monasca is configured without installing the extra, Watcher will raise
an error guiding you to install the client.
.. toctree::
:glob:
:maxdepth: 1

View File

@@ -0,0 +1,140 @@
=====================
Prometheus datasource
=====================
Synopsis
--------
The Prometheus datasource allows Watcher to use a Prometheus server as the
source for collected metrics used by the Watcher decision engine. At minimum
deployers must configure the ``host`` and ``port`` at which the Prometheus
server is listening.
Requirements
-------------
It is required that Prometheus metrics contain a label to identify the hostname
of the exporter from which the metric was collected. This is used to match
against the Watcher cluster model ``ComputeNode.hostname``. The default for
this label is ``fqdn`` and in the prometheus scrape configs would look like:
.. code-block::
scrape_configs:
- job_name: node
static_configs:
- targets: ['10.1.2.3:9100']
labels:
fqdn: "testbox.controlplane.domain"
This default can be overridden when a deployer uses a different label to
identify the exporter host (for example ``hostname`` or ``host``, or any other
label, as long as it identifies the host).
Internally this label is used in creating ``fqdn_instance_labels``, containing
the list of values assigned to the label in the Prometheus targets.
The elements of the resulting fqdn_instance_labels are expected to match the
``ComputeNode.hostname`` used in the Watcher decision engine cluster model.
An example ``fqdn_instance_labels`` is the following:
.. code-block::
[
'ena.controlplane.domain',
'dio.controlplane.domain',
'tria.controlplane.domain',
]
For instance metrics, it is required that Prometheus contains a label
with the uuid of the OpenStack instance in each relevant metric. By default,
the datasource will look for the label ``resource``. The
``instance_uuid_label`` config option in watcher.conf allows deployers to
override this default to any other label name that stores the ``uuid``.
Limitations
-----------
The current implementation doesn't support the ``statistic_series`` function of
the Watcher ``class DataSourceBase``. It is expected that the
``statistic_aggregation`` function (which is implemented) is sufficient in
providing the **current** state of the managed resources in the cluster.
The ``statistic_aggregation`` function defaults to querying back 300 seconds,
starting from the present time (the time period is a function parameter and
can be set to a value as required). Implementing the ``statistic_series`` can
always be re-visited if the requisite interest and work cycles are volunteered
by the interested parties.
One further note about a limitation in the implemented
``statistic_aggregation`` function. This function is defined with a
``granularity`` parameter, to be used when querying whichever of the Watcher
``DataSourceBase`` metrics providers. In the case of Prometheus, we do not
fetch and then process individual metrics across the specified time period.
Instead we use the PromQL querying operators and functions, so that the
server itself will process the request across the specified parameters and
then return the result. So ``granularity`` parameter is redundant and remains
unused for the Prometheus implementation of ``statistic_aggregation``. The
granularity of the data fetched by Prometheus server is specified in
configuration as the server ``scrape_interval`` (current default 15 seconds).
Configuration
-------------
A deployer must set the ``datasources`` parameter to include ``prometheus``
under the watcher_datasources section of watcher.conf (or add ``prometheus`` in
datasources for a specific strategy if preferred eg. under the
``[watcher_strategies.workload_stabilization]`` section).
The watcher.conf configuration file is also used to set the parameter values
required by the Watcher Prometheus data source. The configuration can be
added under the ``[prometheus_client]`` section and the available options are
duplicated below from the code as they are self documenting:
.. code-block::
cfg.StrOpt('host',
help="The hostname or IP address for the prometheus server."),
cfg.StrOpt('port',
help="The port number used by the prometheus server."),
cfg.StrOpt('fqdn_label',
default="fqdn",
help="The label that Prometheus uses to store the fqdn of "
"exporters. Defaults to 'fqdn'."),
cfg.StrOpt('instance_uuid_label',
default="resource",
help="The label that Prometheus uses to store the uuid of "
"OpenStack instances. Defaults to 'resource'."),
cfg.StrOpt('username',
help="The basic_auth username to use to authenticate with the "
"Prometheus server."),
cfg.StrOpt('password',
secret=True,
help="The basic_auth password to use to authenticate with the "
"Prometheus server."),
cfg.StrOpt('cafile',
help="Path to the CA certificate for establishing a TLS "
"connection with the Prometheus server."),
cfg.StrOpt('certfile',
help="Path to the client certificate for establishing a TLS "
"connection with the Prometheus server."),
cfg.StrOpt('keyfile',
help="Path to the client key for establishing a TLS "
"connection with the Prometheus server."),
The ``host`` and ``port`` are **required** configuration options which have
no set default. These specify the hostname (or IP) and port for at which
the Prometheus server is listening. The ``fqdn_label`` allows deployers to
override the required metric label used to match Prometheus node exporters
against the Watcher ComputeNodes in the Watcher decision engine cluster data
model. The default is ``fqdn`` and deployers can specify any other value
(e.g. if they have an equivalent but different label such as ``host``).
So a sample watcher.conf configured to use the Prometheus server at
``10.2.3.4:9090`` would look like the following:
.. code-block::
[watcher_datasources]
datasources = prometheus
[prometheus_client]
host = 10.2.3.4
port = 9090
fqdn_label = fqdn

View File

@@ -0,0 +1,23 @@
@startuml
skinparam ArrowColor DarkRed
skinparam StateBorderColor DarkRed
skinparam StateBackgroundColor LightYellow
skinparam Shadowing true
[*] --> PENDING: The Watcher Planner\ncreates the Action
PENDING --> SKIPPED: The Action detects skipping condition\n in pre_condition or was\n skipped by cloud Admin.
PENDING --> FAILED: The Action fails unexpectedly\n in pre_condition.
PENDING --> ONGOING: The Watcher Applier starts executing/n the action.
ONGOING --> FAILED: Something failed while executing\nthe Action in the Watcher Applier
ONGOING --> SUCCEEDED: The Watcher Applier executed\nthe Action successfully
FAILED --> DELETED : Administrator removes\nAction Plan
SUCCEEDED --> DELETED : Administrator removes\n theAction
ONGOING --> CANCELLED : The Action was cancelled\n as part of an Action Plan cancellation.
PENDING --> CANCELLED : The Action was cancelled\n as part of an Action Plan cancellation.
CANCELLED --> DELETED
FAILED --> DELETED
SKIPPED --> DELETED
DELETED --> [*]
@enduml

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

View File

@@ -42,6 +42,7 @@ specific prior release.
user/index
configuration/index
contributor/plugin/index
integrations/index
man/index
.. toctree::

View File

@@ -9,7 +9,7 @@
...
connection = mysql+pymysql://watcher:WATCHER_DBPASS@controller/watcher?charset=utf8
* In the `[DEFAULT]` section, configure the transport url for RabbitMQ message broker.
* In the ``[DEFAULT]`` section, configure the transport url for RabbitMQ message broker.
.. code-block:: ini
@@ -20,7 +20,7 @@
Replace the RABBIT_PASS with the password you chose for OpenStack user in RabbitMQ.
* In the `[keystone_authtoken]` section, configure Identity service access.
* In the ``[keystone_authtoken]`` section, configure Identity service access.
.. code-block:: ini
@@ -39,7 +39,7 @@
Replace WATCHER_PASS with the password you chose for the watcher user in the Identity service.
* Watcher interacts with other OpenStack projects via project clients, in order to instantiate these
clients, Watcher requests new session from Identity service. In the `[watcher_clients_auth]` section,
clients, Watcher requests new session from Identity service. In the ``[watcher_clients_auth]`` section,
configure the identity service access to interact with other OpenStack project clients.
.. code-block:: ini
@@ -56,7 +56,7 @@
Replace WATCHER_PASS with the password you chose for the watcher user in the Identity service.
* In the `[api]` section, configure host option.
* In the ``[api]`` section, configure host option.
.. code-block:: ini
@@ -66,7 +66,7 @@
Replace controller with the IP address of the management network interface on your controller node, typically 10.0.0.11 for the first node in the example architecture.
* In the `[oslo_messaging_notifications]` section, configure the messaging driver.
* In the ``[oslo_messaging_notifications]`` section, configure the messaging driver.
.. code-block:: ini

View File

@@ -0,0 +1,126 @@
============
Integrations
============
The following table provides an Integration status with different services
which Watcher interact with. Some integrations are marked as Supported,
while others as Experimental due to the lack of testing and a proper
documentations.
Integration Status Matrix
-------------------------
.. list-table::
:widths: 20 20 20 20
:header-rows: 1
* - Service Name
- Integration Status
- Documentation
- Testing
* - :ref:`Cinder <cinder_integration>`
- Supported
- Minimal
- Unit
* - :ref:`Glance <glance_integration>`
- Experimental
- Missing
- None
* - :ref:`Ironic <ironic_integration>`
- Experimental
- Minimal
- Unit
* - :ref:`Keystone <keystone_integration>`
- Supported
- Minimal
- Integration
* - :ref:`MAAS <maas_integration>`
- Experimental
- Missing
- Unit
* - :ref:`Neutron <neutron_integration>`
- Experimental
- Missing
- Unit
* - :ref:`Nova <nova_integration>`
- Supported
- Minimal
- Unit and Integration
* - :ref:`Placement <placement_integration>`
- Supported
- Minimal
- Unit and Integration
.. note::
Minimal documentation covers only basic configuration and, if available,
how to enable notifications.
.. _cinder_integration:
Cinder
^^^^^^
The OpenStack Block Storage service integration includes a cluster data
model collector that creates a in-memory representation of the storage
resources, strategies that propose solutions based on storage capacity
and Actions that perform volume migration.
.. _glance_integration:
Glance
^^^^^^
The Image service integration is consumed by Nova Helper to create instances
from images, which was used older releases of Watcher to cold migrate
instances. This procedure is not used by Watcher anymore and this integration
is classified as Experimental and may be removed in future releases.
.. _ironic_integration:
Ironic
^^^^^^
The Bare Metal service integration includes a data model collector that
creates an in-memory representation of Ironic resources and Actions that
allows the management of the power state of nodes. This integration is
classified as Experimental and may be removed in future releases.
.. _keystone_integration:
Keystone
^^^^^^^^
The Identity service integration includes authentication with other services
and retrieving information about domains, projects and users.
.. _maas_integration:
MAAS (Metal As A Service)
^^^^^^^^^^^^^^^^^^^^^^^^^
This integration allows managing bare metal servers of a MAAS service,
which includes Actions that manage the power state of nodes. This
integration is classified as Experimental and may be removed in future
releases.
.. _neutron_integration:
Neutron
^^^^^^^
Neutron integration is currently consumed by Nova Helper to create instance,
which was used by older releases of Watcher to cold migrate instances. This
procedure is not used by Watcher anymore and this integration is classified
as Experimental and may be removed in future releases.
.. _nova_integration:
Nova
^^^^
Nova service integration includes a cluster data model collector that creates
an in-memory representation of the compute resources available in the cloud,
strategies that propose solutions based on available resources and Actions
that perform instance migrations.
.. _placement_integration:
Placement
^^^^^^^^^
Placement integration allows Watcher to track resource provider inventories
and usages information, building a in-memory representation of those resources
that can be used by strategies when calculating new solutions.

View File

@@ -48,7 +48,7 @@
logging configuration to any other existing logging
options. Please see the Python logging module documentation
for details on logging configuration files. The log-config
name for this option is depcrecated.
name for this option is deprecated.
**--log-format FORMAT**
A logging.Formatter log message format string which may use any

View File

@@ -26,8 +26,7 @@ metric service name plugins comment
``compute_monitors`` option
to ``cpu.virt_driver`` in
the nova.conf.
``cpu_util`` ceilometer_ none cpu_util has been removed
since Stein.
``cpu`` ceilometer_ none
============================ ============ ======= ===========================
.. _ceilometer: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#openstack-compute

View File

@@ -11,10 +11,6 @@ Synopsis
.. watcher-term:: watcher.decision_engine.strategy.strategies.host_maintenance.HostMaintenance
Requirements
------------
None.
Metrics
*******
@@ -56,15 +52,29 @@ Configuration
Strategy parameters are:
==================== ====== ====================================
parameter type default Value description
==================== ====== ====================================
``maintenance_node`` String The name of the compute node which
need maintenance. Required.
``backup_node`` String The name of the compute node which
will backup the maintenance node.
Optional.
==================== ====== ====================================
========================== ======== ========================== ==========
parameter type description required
========================== ======== ========================== ==========
``maintenance_node`` String The name of the Required
compute node
which needs maintenance.
``backup_node`` String The name of the compute Optional
node which will backup
the maintenance node.
``disable_live_migration`` Boolean False: Active instances Optional
will be live migrated.
True: Active instances
will be cold migrated
if cold migration is
not disabled. Otherwise,
they will be stopped.
False by default.
``disable_cold_migration`` Boolean False: Inactive instances Optional
will be cold migrated.
True: Inactive instances
will not be cold migrated.
False by default.
========================== ======== ========================== ==========
Efficacy Indicator
------------------
@@ -80,13 +90,46 @@ to: https://specs.openstack.org/openstack/watcher-specs/specs/queens/approved/cl
How to use it ?
---------------
Run an audit using Host Maintenance strategy.
Executing the actions will move the servers from compute01 host
to a host determined by the Nova scheduler service.
.. code-block:: shell
$ openstack optimize audit create \
-g cluster_maintaining -s host_maintenance \
-p maintenance_node=compute01
Run an audit using Host Maintenance strategy with a backup node specified.
Executing the actions will move the servers from compute01 host
to compute02 host.
.. code-block:: shell
$ openstack optimize audit create \
-g cluster_maintaining -s host_maintenance \
-p maintenance_node=compute01 \
-p backup_node=compute02 \
--auto-trigger
-p backup_node=compute02
Run an audit using Host Maintenance strategy with migration disabled.
This will only stop active instances on compute01, useful for maintenance
scenarios where operators do not want to migrate workloads to other hosts.
.. code-block:: shell
$ openstack optimize audit create \
-g cluster_maintaining -s host_maintenance \
-p maintenance_node=compute01 \
-p disable_live_migration=True \
-p disable_cold_migration=True
Note that after executing this strategy, the *maintenance_node* will be
marked as disabled, with the reason set to ``watcher_maintaining``.
To enable the node again:
.. code-block:: shell
$ openstack compute service set --enable compute01
External Links
--------------

View File

@@ -6,3 +6,53 @@ Strategies
:maxdepth: 1
./*
Strategies status matrix
------------------------
.. list-table::
:widths: 33 33 34
:header-rows: 1
* - Strategy Name
- Status
- Testing
* - :doc:`actuation`
- Experimental
- Unit, Integration
* - :doc:`basic-server-consolidation`
- Experimental
- Missing
* - :doc:`host_maintenance`
- Supported
- Unit, Integration
* - :doc:`node_resource_consolidation`
- Supported
- Unit, Integration
* - :doc:`noisy_neighbor`
- Deprecated
- Unit
* - :doc:`outlet_temp_control`
- Experimental
- Unit
* - :doc:`saving_energy`
- Experimental
- Unit
* - :doc:`storage_capacity_balance`
- Experimental
- Unit
* - :doc:`uniform_airflow`
- Experimental
- Unit
* - :doc:`vm_workload_consolidation`
- Supported
- Unit, Integration
* - :doc:`workload-stabilization`
- Experimental
- Missing
* - :doc:`workload_balance`
- Supported
- Unit, Integration
* - :doc:`zone_migration`
- Supported (Instance migrations), Experimental (Volume migration)
- Unit, Some Integration

View File

@@ -89,9 +89,9 @@ step 2: Create audit to do optimization
.. code-block:: shell
$ openstack optimize audittemplate create \
at1 saving_energy --strategy saving_energy
saving_energy_template1 saving_energy --strategy saving_energy
$ openstack optimize audit create -a at1 \
$ openstack optimize audit create -a saving_energy_audit1 \
-p free_used_percent=20.0
External Links

View File

@@ -35,6 +35,11 @@ power ceilometer_ kwapi_ one point every 60s
.. _ceilometer: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#openstack-compute
.. _monasca: https://github.com/openstack/monasca-agent/blob/master/docs/Libvirt.md
.. note::
The Monasca datasource is deprecated for removal and optional. If a strategy requires Monasca metrics,
ensure the Monasca optional extra is installed: ``pip install watcher[monasca]``.
.. _kwapi: https://kwapi.readthedocs.io/en/latest/index.html

View File

@@ -22,14 +22,19 @@ The *vm_workload_consolidation* strategy requires the following metrics:
============================ ============ ======= =========================
metric service name plugins comment
============================ ============ ======= =========================
``cpu_util`` ceilometer_ none cpu_util has been removed
since Stein.
``cpu`` ceilometer_ none
``memory.resident`` ceilometer_ none
``memory`` ceilometer_ none
``disk.root.size`` ceilometer_ none
``compute.node.cpu.percent`` ceilometer_ none (optional) need to set the
``compute_monitors`` option
to ``cpu.virt_driver`` in the
nova.conf.
``hardware.memory.used`` ceilometer_ SNMP_ (optional)
============================ ============ ======= =========================
.. _ceilometer: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#openstack-compute
.. _SNMP: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#snmp-based-meters
Cluster data model
******************

View File

@@ -1,6 +1,6 @@
=============================================
Watcher Overload standard deviation algorithm
=============================================
===============================
Workload Stabilization Strategy
===============================
Synopsis
--------
@@ -19,21 +19,20 @@ Metrics
The *workload_stabilization* strategy requires the following metrics:
============================ ============ ======= =============================
metric service name plugins comment
============================ ============ ======= =============================
``compute.node.cpu.percent`` ceilometer_ none need to set the
``compute_monitors`` option
to ``cpu.virt_driver`` in the
nova.conf.
``hardware.memory.used`` ceilometer_ SNMP_
``cpu_util`` ceilometer_ none cpu_util has been removed
since Stein.
``memory.resident`` ceilometer_ none
============================ ============ ======= =============================
.. _ceilometer: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#openstack-compute
.. _SNMP: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#snmp-based-meters
============================ ==================================================
metric description
============================ ==================================================
``instance_ram_usage`` ram memory usage in an instance as float in
megabytes
``instance_cpu_usage`` cpu usage in an instance as float ranging between
0 and 100 representing the total cpu usage as
percentage
``host_ram_usage`` ram memory usage in a compute node as float in
megabytes
``host_cpu_usage`` cpu usage in a compute node as float ranging
between 0 and 100 representing the total cpu
usage as percentage
============================ ==================================================
Cluster data model
******************
@@ -69,23 +68,49 @@ Configuration
Strategy parameters are:
==================== ====== ===================== =============================
parameter type default Value description
==================== ====== ===================== =============================
``metrics`` array |metrics| Metrics used as rates of
====================== ====== =================== =============================
parameter type default Value description
====================== ====== =================== =============================
``metrics`` array |metrics| Metrics used as rates of
cluster loads.
``thresholds`` object |thresholds| Dict where key is a metric
``thresholds`` object |thresholds| Dict where key is a metric
and value is a trigger value.
``weights`` object |weights| These weights used to
The strategy will only will
look for an action plan when
the standard deviation for
the usage of one of the
resources included in the
metrics, taken as a
normalized usage between
0 and 1 among the hosts is
higher than the threshold.
The value of a perfectly
balanced cluster for the
standard deviation would be
0, while in a totally
unbalanced one would be 0.5,
which should be the maximum
value.
``weights`` object |weights| These weights are used to
calculate common standard
deviation. Name of weight
contains meter name and
_weight suffix.
``instance_metrics`` object |instance_metrics| Mapping to get hardware
statistics using instance
metrics.
``host_choice`` string retry Method of host's choice.
deviation when optimizing
the resources usage.
Name of weight contains meter
name and _weight suffix.
Higher values imply the
metric will be prioritized
when calculating an optimal
resulting cluster
distribution.
``instance_metrics`` object |instance_metrics| This parameter represents
the compute node metrics
representing compute resource
usage for the instances
resource indicated in the
metrics parameter.
``host_choice`` string retry Method of hosts choice when
analyzing destination for
instances.
There are cycle, retry and
fullsearch methods. Cycle
will iterate hosts in cycle.
@@ -94,32 +119,49 @@ parameter type default Value description
retry_count option).
Fullsearch will return each
host from list.
``retry_count`` number 1 Count of random returned
``retry_count`` number 1 Count of random returned
hosts.
``periods`` object |periods| These periods are used to get
statistic aggregation for
instance and host metrics.
The period is simply a
repeating interval of time
into which the samples are
grouped for aggregation.
Watcher uses only the last
period of all received ones.
==================== ====== ===================== =============================
``periods`` object |periods| Time, in seconds, to get
statistical values for
resources usage for instance
and host metrics.
Watcher will use the last
period to calculate resource
usage.
``granularity`` number 300 NOT RECOMMENDED TO MODIFY:
The time between two measures
in an aggregated timeseries
of a metric.
``aggregation_method`` object |aggn_method| NOT RECOMMENDED TO MODIFY:
Function used to aggregate
multiple measures into an
aggregated value.
====================== ====== =================== =============================
.. |metrics| replace:: ["cpu_util", "memory.resident"]
.. |thresholds| replace:: {"cpu_util": 0.2, "memory.resident": 0.2}
.. |weights| replace:: {"cpu_util_weight": 1.0, "memory.resident_weight": 1.0}
.. |instance_metrics| replace:: {"cpu_util": "compute.node.cpu.percent", "memory.resident": "hardware.memory.used"}
.. |metrics| replace:: ["instance_cpu_usage", "instance_ram_usage"]
.. |thresholds| replace:: {"instance_cpu_usage": 0.2, "instance_ram_usage": 0.2}
.. |weights| replace:: {"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0}
.. |instance_metrics| replace:: {"instance_cpu_usage": "host_cpu_usage", "instance_ram_usage": "host_ram_usage"}
.. |periods| replace:: {"instance": 720, "node": 600}
.. |aggn_method| replace:: {"instance": 'mean', "compute_node": 'mean'}
Efficacy Indicator
------------------
Global efficacy indicator:
.. watcher-func::
:format: literal_block
watcher.decision_engine.goal.efficacy.specs.ServerConsolidation.get_global_efficacy_indicator
watcher.decision_engine.goal.efficacy.specs.WorkloadBalancing.get_global_efficacy_indicator
Other efficacy indicators of the goal are:
- ``instance_migrations_count``: The number of VM migrations to be performed
- ``instances_count``: The total number of audited instances in strategy
- ``standard_deviation_after_audit``: The value of resulted standard deviation
- ``standard_deviation_before_audit``: The value of original standard deviation
Algorithm
---------
@@ -136,10 +178,10 @@ How to use it ?
at1 workload_balancing --strategy workload_stabilization
$ openstack optimize audit create -a at1 \
-p thresholds='{"memory.resident": 0.05}' \
-p metrics='["memory.resident"]'
-p thresholds='{"instance_ram_usage": 0.05}' \
-p metrics='["instance_ram_usage"]'
External Links
--------------
- `Watcher Overload standard deviation algorithm spec <https://specs.openstack.org/openstack/watcher-specs/specs/newton/implemented/sd-strategy.html>`_
None

View File

@@ -11,26 +11,35 @@ Synopsis
.. watcher-term:: watcher.decision_engine.strategy.strategies.workload_balance.WorkloadBalance
Requirements
------------
None.
Metrics
*******
The *workload_balance* strategy requires the following metrics:
The ``workload_balance`` strategy requires the following metrics:
======================= ============ ======= =========================
metric service name plugins comment
======================= ============ ======= =========================
``cpu_util`` ceilometer_ none cpu_util has been removed
since Stein.
``memory.resident`` ceilometer_ none
======================= ============ ======= =========================
======================= ============ ======= =========== ======================
metric service name plugins unit comment
======================= ============ ======= =========== ======================
``cpu`` ceilometer_ none percentage CPU of the instance.
Used to calculate the
threshold
``memory.resident`` ceilometer_ none MB RAM of the instance.
Used to calculate the
threshold
======================= ============ ======= =========== ======================
.. _ceilometer: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#openstack-compute
.. note::
* The parameters above reference the instance CPU or RAM usage, but
the threshold calculation is based of the CPU/RAM usage on the
hypervisor.
* The RAM usage can be calculated based on the RAM consumed by the instance,
and the available RAM on the hypervisor.
* The CPU percentage calculation relies on the CPU load, but also on the
number of CPUs on the hypervisor.
* The host memory metric is calculated by summing the RAM usage of each
instance on the host. This measure is close to the real usage, but is
not the exact usage on the host.
Cluster data model
******************
@@ -65,15 +74,28 @@ Configuration
Strategy parameters are:
============== ====== ============= ====================================
parameter type default Value description
============== ====== ============= ====================================
``metrics`` String 'cpu_util' Workload balance base on cpu or ram
utilization. choice: ['cpu_util',
'memory.resident']
``threshold`` Number 25.0 Workload threshold for migration
``period`` Number 300 Aggregate time period of ceilometer
============== ====== ============= ====================================
================ ====== ==================== ==================================
parameter type default value description
================ ====== ==================== ==================================
``metrics`` String instance_cpu_usage Workload balance base on cpu or
ram utilization. Choices:
['instance_cpu_usage',
'instance_ram_usage']
``threshold`` Number 25.0 Workload threshold for migration.
Used for both the source and the
destination calculations.
Threshold is always a percentage.
``period`` Number 300 Aggregate time period of
ceilometer
``granularity`` Number 300 The time between two measures in
an aggregated timeseries of a
metric.
This parameter is only used
with the Gnocchi data source,
and it must match to any of the
valid archive policies for the
metric.
================ ====== ==================== ==================================
Efficacy Indicator
------------------
@@ -89,13 +111,35 @@ to: https://specs.openstack.org/openstack/watcher-specs/specs/mitaka/implemented
How to use it ?
---------------
Create an audit template using the Workload Balancing strategy.
.. code-block:: shell
$ openstack optimize audittemplate create \
at1 workload_balancing --strategy workload_balance
Run an audit using the Workload Balance strategy. The result of
the audit should be an action plan to move VMs from any host
where the CPU usage is over the threshold of 26%, to a host
where the utilization of CPU is under the threshold.
The measurements of CPU utilization are taken from the configured
datasouce plugin with an aggregate period of 310.
.. code-block:: shell
$ openstack optimize audit create -a at1 -p threshold=26.0 \
-p period=310 -p metrics=cpu_util
-p period=310 -p metrics=instance_cpu_usage
Run an audit using the Workload Balance strategy to
obtain a plan to balance VMs over hosts with a threshold of 20%.
In this case, the stipulation of the CPU utilization metric
measurement is a combination of period and granularity.
.. code-block:: shell
$ openstack optimize audit create -a at1 \
-p granularity=30 -p threshold=20 -p period=300 \
-p metrics=instance_cpu_usage --auto-trigger
External Links
--------------

View File

@@ -11,6 +11,13 @@ Synopsis
.. watcher-term:: watcher.decision_engine.strategy.strategies.zone_migration.ZoneMigration
.. note::
The term ``Zone`` in the strategy name is not a reference to
`Openstack availability zones <https://docs.openstack.org/nova/latest/admin/availability-zones.html>`_
but rather a user-defined set of Compute nodes and storage pools.
Currently, migrations across actual availability zones is not fully tested
and might not work in all cluster configurations.
Requirements
------------
@@ -59,66 +66,83 @@ Configuration
Strategy parameters are:
======================== ======== ============= ==============================
parameter type default Value description
======================== ======== ============= ==============================
``compute_nodes`` array None Compute nodes to migrate.
``storage_pools`` array None Storage pools to migrate.
``parallel_total`` integer 6 The number of actions to be
run in parallel in total.
``parallel_per_node`` integer 2 The number of actions to be
run in parallel per compute
node.
``parallel_per_pool`` integer 2 The number of actions to be
run in parallel per storage
pool.
``priority`` object None List prioritizes instances
and volumes.
``with_attached_volume`` boolean False False: Instances will migrate
after all volumes migrate.
True: An instance will migrate
after the attached volumes
migrate.
======================== ======== ============= ==============================
======================== ======== ======== ========= ==========================
parameter type default required description
======================== ======== ======== ========= ==========================
``compute_nodes`` array None Optional Compute nodes to migrate.
``storage_pools`` array None Optional Storage pools to migrate.
``parallel_total`` integer 6 Optional The number of actions to
be run in parallel in
total.
``parallel_per_node`` integer 2 Optional The number of actions to
be run in parallel per
compute node in one
action plan.
``parallel_per_pool`` integer 2 Optional The number of actions to
be run in parallel per
storage pool.
``priority`` object None Optional List prioritizes instances
and volumes.
``with_attached_volume`` boolean False Optional False: Instances will
migrate after all volumes
migrate.
True: An instance will
migrate after the
attached volumes migrate.
======================== ======== ======== ========= ==========================
.. note::
* All parameters in the table above have defaults and therefore the
user can create an audit without specifying a value. However,
if **only** defaults parameters are used, there will be nothing
actionable for the audit.
* ``parallel_*`` parameters are not in reference to concurrency,
but rather on limiting the amount of actions to be added to the action
plan
* ``compute_nodes``, ``storage_pools``, and ``priority`` are optional
parameters, however, if they are passed they **require** the parameters
in the tables below:
The elements of compute_nodes array are:
============= ======= =============== =============================
parameter type default Value description
============= ======= =============== =============================
``src_node`` string None Compute node from which
instances migrate(mandatory).
``dst_node`` string None Compute node to which
instances migrate.
============= ======= =============== =============================
============= ======= ======== ========= ========================
parameter type default required description
============= ======= ======== ========= ========================
``src_node`` string None Required Compute node from which
instances migrate.
``dst_node`` string None Optional Compute node to which
instances migrate.
If omitted, nova will
choose the destination
node automatically.
============= ======= ======== ========= ========================
The elements of storage_pools array are:
============= ======= =============== ==============================
parameter type default Value description
============= ======= =============== ==============================
``src_pool`` string None Storage pool from which
volumes migrate(mandatory).
``dst_pool`` string None Storage pool to which
volumes migrate.
``src_type`` string None Source volume type(mandatory).
``dst_type`` string None Destination volume type
(mandatory).
============= ======= =============== ==============================
============= ======= ======== ========= ========================
parameter type default required description
============= ======= ======== ========= ========================
``src_pool`` string None Required Storage pool from which
volumes migrate.
``dst_pool`` string None Optional Storage pool to which
volumes migrate.
``src_type`` string None Optional Source volume type.
``dst_type`` string None Required Destination volume type
============= ======= ======== ========= ========================
The elements of priority object are:
================ ======= =============== ======================
parameter type default Value description
================ ======= =============== ======================
``project`` array None Project names.
``compute_node`` array None Compute node names.
``storage_pool`` array None Storage pool names.
``compute`` enum None Instance attributes.
|compute|
``storage`` enum None Volume attributes.
|storage|
================ ======= =============== ======================
================ ======= ======== ========= =====================
parameter type default Required description
================ ======= ======== ========= =====================
``project`` array None Optional Project names.
``compute_node`` array None Optional Compute node names.
``storage_pool`` array None Optional Storage pool names.
``compute`` enum None Optional Instance attributes.
|compute|
``storage`` enum None Optional Volume attributes.
|storage|
================ ======= ======== ========= =====================
.. |compute| replace:: ["vcpu_num", "mem_size", "disk_size", "created_at"]
.. |storage| replace:: ["size", "created_at"]
@@ -126,11 +150,26 @@ parameter type default Value description
Efficacy Indicator
------------------
The efficacy indicators for action plans built from the command line
are:
.. watcher-func::
:format: literal_block
watcher.decision_engine.goal.efficacy.specs.HardwareMaintenance.get_global_efficacy_indicator
In **Horizon**, these indictors are shown with alternative text.
* ``live_migrate_instance_count`` is shown as
``The number of instances actually live migrated`` in Horizon
* ``planned_live_migrate_instance_count`` is shown as
``The number of instances planned to live migrate`` in Horizon
* ``planned_live_migration_instance_count`` refers to the instances planned
to live migrate in the action plan.
* ``live_migrate_instance_count`` tracks all the instances that could be
migrated according to the audit input.
Algorithm
---------
@@ -148,6 +187,19 @@ How to use it ?
$ openstack optimize audit create -a at1 \
-p compute_nodes='[{"src_node": "s01", "dst_node": "d01"}]'
.. note::
* The Cinder model collector is not enabled by default.
If the Cinder model collector is not enabled while deploying Watcher,
the model will become outdated and cause errors eventually.
See the `Configuration option to enable the storage collector <https://docs.openstack.org/watcher/latest/configuration/watcher.html#collector.collector_plugins>`_ documentation.
Support caveats
---------------
This strategy offers the option to perform both Instance migrations and
Volume migrations. Currently, Instance migrations are ready for production
use while Volume migrations remain experimental.
External Links
--------------

View File

@@ -0,0 +1,430 @@
..
Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License. You may obtain
a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
=======================
Using Continuous Audit
=======================
Continuous audits allow Watcher to continuously monitor and optimize your
OpenStack infrastructure based on predefined schedules or intervals. This guide
demonstrates how to set up and use continuous audits with the dummy strategy,
which is useful for testing, development, and understanding the continuous
audit workflow. However, this doc is valid for any other combination of
strategy and goal.
Overview
========
A continuous audit differs from a oneshot audit in that it runs repeatedly
at specified intervals. It supports both time-based intervals
(in seconds) and cron-like expressions for more complex scheduling patterns.
The dummy strategy is a test strategy that doesn't perform actual optimization
but creates sample actions (nop and sleep) to demonstrate the complete audit
workflow. It's ideal for:
- Testing continuous audit functionality
- Development and debugging
- Learning how Watcher works
Prerequisites
=============
Before setting up continuous audits, ensure:
1. Watcher services are running and configured properly
2. You have administrator access to OpenStack
You can verify the services are running:
.. code-block:: bash
$ openstack optimize service list
+----+-------------------------+------------+--------+
| ID | Name | Host | Status |
+----+-------------------------+------------+--------+
| 1 | watcher-decision-engine | controller | ACTIVE |
| 2 | watcher-applier | controller | ACTIVE |
+----+-------------------------+------------+--------+
Continuous Audit State Machine
==============================
You can view the Audit state machine diagram in the Watcher documentation:
`Audit State Machine`_
.. _Audit State Machine: https://docs.openstack.org/watcher/latest/architecture.html#audit-state-machine
Transitions:
- An audit is created and enters the **PENDING** state.
- When the scheduled time arrives, a **PENDING** audit becomes **ONGOING**.
- A continuous audit remains in the **ONGOING** state across executions.
It does not switch to **SUCCEEDED** after each run.
- If an execution fails, the audit transitions to **FAILED** and is no longer
executed.
- Each execution produces a new action plan. When a new action plan is created
by the same continuous audit, previous **RECOMMENDED** action plans are moved
to **CANCELLED**. Only the latest action plan remains in **RECOMMENDED**.
- An administrator can **CANCEL** an audit that is **PENDING** or **ONGOING**.
- An administrator can **SUSPEND** an **ONGOING** audit.
- A **SUSPENDED** audit can be resumed by an administrator, at which point it
becomes **ONGOING** again.
- An administrator can **DELETE** an audit only when its state is
**SUCCEEDED**, **FAILED**, or **CANCELLED**.
.. note::
You can enable the auto-trigger option if you want to automatically apply action
plans generated by continuous audits as soon as they are created.
Depending on the environment, continuous audits are often good candidates for
auto-trigger.
Create a Continuous Audit
--------------------------
Create a continuous audit that will run at regular intervals. You can specify
the interval in seconds or use cron-like expressions.
Using Time Interval (seconds)
------------------------------
This example creates a continuous audit that runs every 5 minutes indefinitely
(300 seconds):
.. code-block:: bash
$ openstack optimize audit create \
--goal dummy \
--strategy dummy \
--audit_type CONTINUOUS \
--interval 300 \
--name "continuous-dummy-5min"
+---------------+--------------------------------------+
| Field | Value |
+---------------+--------------------------------------+
| UUID | 7607cf57-ea05-4e1a-b8d7-34e570f95132 |
| Name | continuous-dummy-5min |
| Created At | 2025-08-12T07:26:18.496536+00:00 |
| Updated At | None |
| Deleted At | None |
| State | PENDING |
| Audit Type | CONTINUOUS |
| Parameters | {'para1': 3.2, 'para2': 'hello'} |
| Interval | 300 |
| Goal | dummy |
| Strategy | dummy |
| Audit Scope | [] |
| Auto Trigger | False |
| Next Run Time | None |
| Hostname | None |
| Start Time | None |
| End Time | None |
| Force | False |
+---------------+--------------------------------------+
Using Cron Expression
----------------------
For more complex scheduling, you can use cron-like expressions. This example
runs the audit every hour at the 15-minute mark:
.. code-block:: bash
$ openstack optimize audit create \
--goal dummy \
--strategy dummy \
--audit_type CONTINUOUS \
--interval "15 * * * *" \
--name "continuous-dummy-hourly"
+---------------+--------------------------------------+
| Field | Value |
+---------------+--------------------------------------+
| UUID | 9cbce4f1-eb75-405a-8f4e-108eb08fdd0a |
| Name | continuous-dummy-hourly |
| Created At | 2025-08-12T07:32:31.469309+00:00 |
| Updated At | None |
| Deleted At | None |
| State | PENDING |
| Audit Type | CONTINUOUS |
| Parameters | {'para1': 3.2, 'para2': 'hello'} |
| Interval | 15 * * * * |
| Goal | dummy |
| Strategy | dummy |
| Audit Scope | [] |
| Auto Trigger | False |
| Next Run Time | None |
| Hostname | None |
| Start Time | None |
| End Time | None |
| Force | False |
+---------------+--------------------------------------+
Time Constraints via start_time and end_time
--------------------------------------------
We can limit when the continuous audit runs by setting start and end times
in a time-interval schedule. The interval can passed in seconds or cron expression.
.. note::
Start and End Time are interpreted in the timezone configured on the host where the
Watcher Decision Engine service is running. We can provide ``start_time`` and
``end_time`` in ISO 8601 format, for example ``'2025-08-13T14:30:00'``.
The example below creates a continuous audit that runs from 12:00 to 13:00
with a 5 minute interval.
.. code-block:: bash
$ openstack optimize audit create \
--goal dummy \
--strategy dummy \
--audit_type CONTINUOUS \
--interval 300 \
--start-time "$(date -d 'today 12:00' +%Y-%m-%dT%H:%M:%S)" \
--end-time "$(date -d 'today 13:00' +%Y-%m-%dT%H:%M:%S)" \
--name "continuous-dummy-5min"
+---------------+--------------------------------------+
| Field | Value |
+---------------+--------------------------------------+
| UUID | dadd279b-1e3d-4c38-aba6-4a730a78589b |
| Name | continuous-dummy-5min |
| Created At | 2025-08-12T08:36:42.924460+00:00 |
| Updated At | None |
| Deleted At | None |
| State | PENDING |
| Audit Type | CONTINUOUS |
| Parameters | {'para1': 3.2, 'para2': 'hello'} |
| Interval | 300 |
| Goal | dummy |
| Strategy | dummy |
| Audit Scope | [] |
| Auto Trigger | False |
| Next Run Time | None |
| Hostname | None |
| Start Time | 2025-08-12T12:00:00 |
| End Time | 2025-08-12T13:00:00 |
| Force | False |
+---------------+--------------------------------------+
Monitoring Continuous Audit Execution
======================================
Create a continuous audit
--------------------------
Create a continuous audit with 5 second interval:
.. code-block:: bash
$ openstack optimize audit create \
--goal dummy \
--strategy dummy \
--audit_type CONTINUOUS \
--interval 5 \
--name "continuous-dummy-5sec"
+---------------+--------------------------------------+
| Field | Value |
+---------------+--------------------------------------+
| UUID | 7d1f1961-41a6-47ae-a94a-cf5e43174fbd |
| Name | continuous-dummy-5sec |
| Created At | 2025-08-12T09:27:33.592575+00:00 |
| Updated At | None |
| Deleted At | None |
| State | PENDING |
| Audit Type | CONTINUOUS |
| Parameters | {'para1': 3.2, 'para2': 'hello'} |
| Interval | 5 |
| Goal | dummy |
| Strategy | dummy |
| Audit Scope | [] |
| Auto Trigger | False |
| Next Run Time | None |
| Hostname | None |
| Start Time | None |
| End Time | None |
| Force | False |
+---------------+--------------------------------------+
Once created, the continuous audit will be automatically scheduled and executed
by the Watcher Decision Engine. You can monitor its progress:
Check Audit Status
------------------
.. code-block:: bash
$ openstack optimize audit show 7d1f1961-41a6-47ae-a94a-cf5e43174fbd
+---------------+--------------------------------------+
| Field | Value |
+---------------+--------------------------------------+
| UUID | 7d1f1961-41a6-47ae-a94a-cf5e43174fbd |
| Name | continuous-dummy-5sec |
| Created At | 2025-08-12T09:27:34+00:00 |
| Updated At | 2025-08-12T09:28:28+00:00 |
| Deleted At | None |
| State | ONGOING |
| Audit Type | CONTINUOUS |
| Parameters | {'para1': 3.2, 'para2': 'hello'} |
| Interval | 5 |
| Goal | dummy |
| Strategy | dummy |
| Audit Scope | [] |
| Auto Trigger | False |
| Next Run Time | 2025-08-12T09:28:33 |
| Hostname | chkumar-devstack-1 |
| Start Time | None |
| End Time | None |
| Force | False |
+---------------+--------------------------------------+
.. note::
The *Next Run Time* is the next time the audit will run. It is calculated based on the
interval and the start and end times.
List Generated Action Plans
---------------------------
Each execution of the continuous audit generates a new action plan:
.. code-block:: bash
$ openstack optimize actionplan list --audit 7d1f1961-41a6-47ae-a94a-cf5e43174fbd
+--------------------------------------+--------------------------------------+-------------+
| UUID | Audit | State |
+--------------------------------------+--------------------------------------+-------------+
| b301dd17-a139-4a45-ade2-b2c2ddf006ef | 7d1f1961-41a6-47ae-a94a-cf5e43174fbd | CANCELLED |
| 22a5bc60-adef-447a-aa27-731b4f5f7ee3 | 7d1f1961-41a6-47ae-a94a-cf5e43174fbd | RECOMMENDED |
+--------------------------------------+--------------------------------------+-------------+
.. note::
In continuous audits, when a new action plan is generated, previous
RECOMMENDED action plans are automatically set to CANCELLED state to
avoid conflicts.
Manage Continuous Audits
========================
Stop a Continuous Audit
------------------------
To stop a continuous audit, update its state:
.. code-block:: bash
$ openstack optimize audit update 550e8400-e29b-41d4-a716-446655440000 replace state=CANCELLED
Modify Audit Interval
---------------------
You can change the interval of a running continuous audit:
.. code-block:: bash
$ openstack optimize audit update 550e8400-e29b-41d4-a716-446655440000 replace interval=900
The Decision Engine will automatically reschedule the audit with the new
interval.
Modify End Time
---------------
You can change the end time of a running continuous audit:
.. code-block:: bash
$ openstack optimize audit update 550e8400-e29b-41d4-a716-446655440000 replace end_time=2025-08-12T14:00:00
Delete a Continuous Audit
--------------------------
In order to delete a continuous audit, the audit state must be
SUCCEEDED, FAILED, or CANCELLED.
An audit with PENDING or ONGOING state cannot be deleted.
To delete an ongoing or pending continuous audit, update its state to
CANCELLED:
.. code-block:: bash
$ openstack optimize audit update 550e8400-e29b-41d4-a716-446655440000 replace state=CANCELLED
Then, delete the audit:
.. code-block:: bash
$ openstack optimize audit delete 550e8400-e29b-41d4-a716-446655440000
Configuration Reference
========================
Continuous Audit Intervals
---------------------------
**Numeric Intervals (seconds):**
- Minimum recommended: 60 seconds
- Common values: 300 (5 min), 600 (10 min), 1800 (30 min), 3600 (1 hour)
**Cron Expressions (5 format fields):**
See the `POSIX crontab specification <https://pubs.opengroup.org/onlinepubs/9799919799/utilities/crontab.html>`_.
- ``0 * * * *``: Every hour at minute 0
- ``*/15 * * * *``: Every 15 minutes
- ``0 9-17 * * 1-5``: Every hour during business hours (9 AM - 5 PM, Mon-Fri)
- ``30 2 * * *``: Daily at 2:30 AM
Decision Engine Configuration
-----------------------------
The continuous audit polling interval is configured in ``watcher.conf``:
.. code-block:: ini
[watcher_decision_engine]
# Interval for checking continuous audits (seconds)
continuous_audit_interval = 30
Spec Linked with Continuous Audit
=================================
- `Watcher Continuous Optimization <https://specs.openstack.org/openstack/watcher-specs/specs/newton/implemented/continuously-optimization.html>`_
- `Cron-based continuous audits <https://specs.openstack.org/openstack/watcher-specs/specs/pike/implemented/cron-based-continuous-audits.html>`_
- `Add the start and end time for CONTINUOUS audit <https://specs.openstack.org/openstack/watcher-specs/specs/stein/implemented/add-start-end-time-for-continuous-audit.html>`_

View File

@@ -8,3 +8,4 @@ User Guide
ways-to-install
user-guide
event_type_audit
continuous_type_audit

View File

@@ -132,8 +132,8 @@ audit) that you want to use.
$ openstack optimize audit create -a <your_audit_template>
If your_audit_template was created by --strategy <your_strategy>, and it
defines some parameters (command `watcher strategy show` to check parameters
format), your can append `-p` to input required parameters:
defines some parameters (command ``watcher strategy show`` to check parameters
format), your can append ``-p`` to input required parameters:
.. code:: bash

View File

@@ -9,6 +9,8 @@ namespace = oslo.concurrency
namespace = oslo.db
namespace = oslo.log
namespace = oslo.messaging
namespace = oslo.middleware.cors
namespace = oslo.middleware.http_proxy_to_wsgi
namespace = oslo.policy
namespace = oslo.reports
namespace = oslo.service.periodic_task

View File

@@ -0,0 +1,9 @@
---
- hosts: all
tasks:
- name: Generate prometheus.yml config file
delegate_to: controller
template:
src: "templates/prometheus.yml.j2"
dest: "/home/zuul/prometheus.yml"
mode: "0644"

View File

@@ -0,0 +1,13 @@
global:
scrape_interval: 10s
scrape_configs:
- job_name: "node"
static_configs:
- targets: ["localhost:3000"]
{% if 'compute' in groups %}
{% for host in groups['compute'] %}
- targets: ["{{ hostvars[host]['ansible_fqdn'] }}:9100"]
labels:
fqdn: "{{ hostvars[host]['ansible_fqdn'] }}"
{% endfor %}
{% endif %}

3
pyproject.toml Normal file
View File

@@ -0,0 +1,3 @@
[build-system]
requires = ["pbr>=6.0.0", "setuptools>=64.0.0"]
build-backend = "pbr.build"

View File

@@ -1,7 +1,8 @@
Rally job
=========
We provide, with Watcher, a Rally plugin you can use to benchmark the optimization service.
We provide, with Watcher, a Rally plugin you can use to benchmark
the optimization service.
To launch this task with configured Rally you just need to run:

View File

@@ -0,0 +1,33 @@
---
prelude: |
The ``Openstack 2025.1`` (``Watcher 14.0.0``) includes several new features,
deprecations, and removals. After a period of inactivity, the Watcher
project moved to the Distributed leadership model in ``2025.1`` with
several new contributors working to modernize the code base.
Activity this cycle was mainly focused on paying down technical debt
related to supporting newer testing runtimes. With this release,
``ubuntu 24.04`` is now officially tested and supported.
``Ubuntu 24.04`` brings a new default Python runtime ``3.12`` and with it
improvements to eventlet and SQLAlchemy 2.0 compatibility where required.
``2025.1`` is the last release to officially support and test with ``Ubuntu 22.04``.
``2025.1`` is the second official skip-level upgrade release supporting
upgrades from either ``2024.1`` or ``2024.2``
Another area of focus in this cycle was the data sources supported by Watcher.
The long obsolete `Ceilometer` API data source has been removed, and the untested
`Monasca` data source has been deprecated and a new `Prometheus` data source
has been added.
https://specs.openstack.org/openstack/watcher-specs/specs/2025.1/approved/prometheus-datasource.html
fixes:
- https://bugs.launchpad.net/watcher/+bug/2086710 watcher compatibility between
eventlet, apscheduler, and python 3.12
- https://bugs.launchpad.net/watcher/+bug/2067815 refactoring of the SQLAlchemy
database layer to improve compatibility with eventlet on newer Pythons
- A number of linting issues were addressed with the introduction
of pre-commit. The issues include but are not limited to, spelling and grammar
fixes across all documentation and code, numerous sphinx documentation build warnings
, and incorrect file permission such as files having the execute bit set when not required.
While none of these changes should affect the runtime behavior of Watcher, they
generally improve the maintainability and quality of the codebase.

Some files were not shown because too many files have changed in this diff Show More