Merge "Update Overload standard deviation doc"

This commit is contained in:
Zuul
2025-08-22 15:22:04 +00:00
committed by Gerrit Code Review
3 changed files with 108 additions and 48 deletions

View File

@@ -1,6 +1,6 @@
============================================= ===============================
Watcher Overload standard deviation algorithm Workload Stabilization Strategy
============================================= ===============================
Synopsis Synopsis
-------- --------
@@ -19,20 +19,20 @@ Metrics
The *workload_stabilization* strategy requires the following metrics: The *workload_stabilization* strategy requires the following metrics:
============================ ============ ======= ============================= ============================ ==================================================
metric service name plugins comment metric description
============================ ============ ======= ============================= ============================ ==================================================
``compute.node.cpu.percent`` ceilometer_ none need to set the ``instance_ram_usage`` ram memory usage in an instance as float in
``compute_monitors`` option megabytes
to ``cpu.virt_driver`` in the ``instance_cpu_usage`` cpu usage in an instance as float ranging between
nova.conf. 0 and 100 representing the total cpu usage as
``hardware.memory.used`` ceilometer_ SNMP_ percentage
``cpu`` ceilometer_ none ``host_ram_usage`` ram memory usage in a compute node as float in
``instance_ram_usage`` ceilometer_ none megabytes
============================ ============ ======= ============================= ``host_cpu_usage`` cpu usage in a compute node as float ranging
between 0 and 100 representing the total cpu
.. _ceilometer: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#openstack-compute usage as percentage
.. _SNMP: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#snmp-based-meters ============================ ==================================================
Cluster data model Cluster data model
****************** ******************
@@ -68,23 +68,49 @@ Configuration
Strategy parameters are: Strategy parameters are:
==================== ====== ===================== ============================= ====================== ====== =================== =============================
parameter type default Value description parameter type default Value description
==================== ====== ===================== ============================= ====================== ====== =================== =============================
``metrics`` array |metrics| Metrics used as rates of ``metrics`` array |metrics| Metrics used as rates of
cluster loads. cluster loads.
``thresholds`` object |thresholds| Dict where key is a metric ``thresholds`` object |thresholds| Dict where key is a metric
and value is a trigger value. and value is a trigger value.
The strategy will only will
``weights`` object |weights| These weights used to look for an action plan when
the standard deviation for
the usage of one of the
resources included in the
metrics, taken as a
normalized usage between
0 and 1 among the hosts is
higher than the threshold.
The value of a perfectly
balanced cluster for the
standard deviation would be
0, while in a totally
unbalanced one would be 0.5,
which should be the maximum
value.
``weights`` object |weights| These weights are used to
calculate common standard calculate common standard
deviation. Name of weight deviation when optimizing
contains meter name and the resources usage.
_weight suffix. Name of weight contains meter
``instance_metrics`` object |instance_metrics| Mapping to get hardware name and _weight suffix.
statistics using instance Higher values imply the
metrics. metric will be prioritized
``host_choice`` string retry Method of host's choice. when calculating an optimal
resulting cluster
distribution.
``instance_metrics`` object |instance_metrics| This parameter represents
the compute node metrics
representing compute resource
usage for the instances
resource indicated in the
metrics parameter.
``host_choice`` string retry Method of hosts choice when
analyzing destination for
instances.
There are cycle, retry and There are cycle, retry and
fullsearch methods. Cycle fullsearch methods. Cycle
will iterate hosts in cycle. will iterate hosts in cycle.
@@ -93,32 +119,49 @@ parameter type default Value description
retry_count option). retry_count option).
Fullsearch will return each Fullsearch will return each
host from list. host from list.
``retry_count`` number 1 Count of random returned ``retry_count`` number 1 Count of random returned
hosts. hosts.
``periods`` object |periods| These periods are used to get ``periods`` object |periods| Time, in seconds, to get
statistic aggregation for statistical values for
instance and host metrics. resources usage for instance
The period is simply a and host metrics.
repeating interval of time Watcher will use the last
into which the samples are period to calculate resource
grouped for aggregation. usage.
Watcher uses only the last ``granularity`` number 300 NOT RECOMMENDED TO MODIFY:
period of all received ones. The time between two measures
==================== ====== ===================== ============================= in an aggregated timeseries
of a metric.
``aggregation_method`` object |aggn_method| NOT RECOMMENDED TO MODIFY:
Function used to aggregate
multiple measures into an
aggregated value.
====================== ====== =================== =============================
.. |metrics| replace:: ["instance_cpu_usage", "instance_ram_usage"] .. |metrics| replace:: ["instance_cpu_usage", "instance_ram_usage"]
.. |thresholds| replace:: {"instance_cpu_usage": 0.2, "instance_ram_usage": 0.2} .. |thresholds| replace:: {"instance_cpu_usage": 0.2, "instance_ram_usage": 0.2}
.. |weights| replace:: {"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0} .. |weights| replace:: {"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0}
.. |instance_metrics| replace:: {"instance_cpu_usage": "compute.node.cpu.percent", "instance_ram_usage": "hardware.memory.used"} .. |instance_metrics| replace:: {"instance_cpu_usage": "host_cpu_usage", "instance_ram_usage": "host_ram_usage"}
.. |periods| replace:: {"instance": 720, "node": 600} .. |periods| replace:: {"instance": 720, "node": 600}
.. |aggn_method| replace:: {"instance": 'mean', "compute_node": 'mean'}
Efficacy Indicator Efficacy Indicator
------------------ ------------------
Global efficacy indicator:
.. watcher-func:: .. watcher-func::
:format: literal_block :format: literal_block
watcher.decision_engine.goal.efficacy.specs.ServerConsolidation.get_global_efficacy_indicator watcher.decision_engine.goal.efficacy.specs.WorkloadBalancing.get_global_efficacy_indicator
Other efficacy indicators of the goal are:
- ``instance_migrations_count``: The number of VM migrations to be performed
- ``instances_count``: The total number of audited instances in strategy
- ``standard_deviation_after_audit``: The value of resulted standard deviation
- ``standard_deviation_before_audit``: The value of original standard deviation
Algorithm Algorithm
--------- ---------
@@ -141,4 +184,4 @@ How to use it ?
External Links External Links
-------------- --------------
- `Watcher Overload standard deviation algorithm spec <https://specs.openstack.org/openstack/watcher-specs/specs/newton/implemented/sd-strategy.html>`_ None

View File

@@ -0,0 +1,7 @@
---
other:
- |
The Watcher Overload Standard Deviation algorithm is now referred to in the
documentation as the Workload Stabilization Strategy. The documentation of
this strategy has been enhanced to clarify and better explain the usage of
parameters.

View File

@@ -48,9 +48,19 @@ def _set_memoize(conf):
class WorkloadStabilization(base.WorkloadStabilizationBaseStrategy): class WorkloadStabilization(base.WorkloadStabilizationBaseStrategy):
"""Workload Stabilization control using live migration """Workload Stabilization control using live migration
This is workload stabilization strategy based on standard deviation This workload stabilization strategy is based on the standard deviation
algorithm. The goal is to determine if there is an overload in a cluster algorithm, as a measure of cluster resource usage balance. The goal is to
and respond to it by migrating VMs to stabilize the cluster. determine if there is an overload in a cluster and respond to it by
migrating VMs to stabilize the cluster.
The standard deviation is determined using normalized CPU and/or memory
usage values, which are scaled to a range between 0 and 1 based on the
usage metrics in the data sources.
A standard deviation of 0 means that your cluster's resources are
perfectly balanced, with all usage values being identical. However, a
standard deviation of 0.5 indicates completely unbalanced resource usage,
where some resources are heavily utilized and others are not at all.
This strategy has been tested in a small (32 nodes) cluster. This strategy has been tested in a small (32 nodes) cluster.