Merge "Update Overload standard deviation doc"

2025-08-22 15:22:04 +00:00
parent 534c340df1 457819072f
commit a6668a1b39
3 changed files with 108 additions and 48 deletions
--- a/doc/source/strategies/workload-stabilization.rst
+++ b/doc/source/strategies/workload-stabilization.rst
@@ -1,6 +1,6 @@
-=============================================
+===============================
-Watcher Overload standard deviation algorithm
+Workload Stabilization Strategy
-=============================================
+===============================
 Synopsis
 --------
@@ -19,20 +19,20 @@ Metrics
 The *workload_stabilization* strategy requires the following metrics:
-============================ ============ ======= =============================
+============================ ==================================================
-metric                       service name plugins comment
+metric                       description
-============================ ============ ======= =============================
+============================ ==================================================
-``compute.node.cpu.percent`` ceilometer_  none    need to set the
+``instance_ram_usage``       ram memory usage in an instance as float in
-                                                  ``compute_monitors`` option
+                             megabytes
-                                                  to ``cpu.virt_driver`` in the
+``instance_cpu_usage``       cpu usage in an instance as float ranging between
-                                                  nova.conf.
+                             0 and 100 representing the total cpu usage as
-``hardware.memory.used``     ceilometer_  SNMP_
+                             percentage
-``cpu``                      ceilometer_  none
+``host_ram_usage``           ram memory usage in a compute node as float in
-``instance_ram_usage``       ceilometer_  none
+                             megabytes
-============================ ============ ======= =============================
+``host_cpu_usage``           cpu usage in a compute node as float ranging
-
+                             between 0 and 100 representing the total cpu
-.. _ceilometer: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#openstack-compute
+                             usage as percentage
-.. _SNMP: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#snmp-based-meters
+============================ ==================================================
 Cluster data model
 ******************
@@ -68,23 +68,49 @@ Configuration
 Strategy parameters are:
-==================== ====== ===================== =============================
+====================== ====== =================== =============================
-parameter            type   default Value         description
+parameter              type   default Value       description
-==================== ====== ===================== =============================
+====================== ====== =================== =============================
-``metrics``          array  |metrics|             Metrics used as rates of
+``metrics``            array  |metrics|           Metrics used as rates of
                                                  cluster loads.
-``thresholds``       object |thresholds|          Dict where key is a metric
+``thresholds``         object |thresholds|        Dict where key is a metric
                                                  and value is a trigger value.
-
+                                                  The strategy will only will
-``weights``          object |weights|             These weights used to
+                                                  look for an action plan when
                                                  the standard deviation for
                                                  the usage of one of the
                                                  resources included in the
                                                  metrics, taken as a
                                                  normalized usage between
                                                  0 and 1 among the hosts is
                                                  higher than the threshold.
                                                  The value of a perfectly
                                                  balanced cluster for the
                                                  standard deviation would be
                                                  0, while in a totally
                                                  unbalanced one would be 0.5,
                                                  which should be the maximum
                                                  value.
 ``weights``            object   |weights|         These weights are used to
                                                  calculate common standard
-                                                  deviation. Name of weight
+                                                  deviation when optimizing
-                                                  contains meter name and
+                                                  the resources usage.
-                                                  _weight suffix.
+                                                  Name of weight contains meter
-``instance_metrics`` object |instance_metrics|    Mapping to get hardware
+                                                  name and _weight suffix.
-                                                  statistics using instance
+                                                  Higher values imply the
-                                                  metrics.
+                                                  metric will be prioritized
-``host_choice``      string retry                 Method of host's choice.
+                                                  when calculating an optimal
                                                  resulting cluster
                                                  distribution.
 ``instance_metrics``   object |instance_metrics|  This parameter represents
                                                  the compute node metrics
                                                  representing compute resource
                                                  usage for the instances
                                                  resource indicated in the
                                                  metrics parameter.
 ``host_choice``        string retry               Method of host’s choice when
                                                  analyzing destination for
                                                  instances.
                                                  There are cycle, retry and
                                                  fullsearch methods. Cycle
                                                  will iterate hosts in cycle.
@@ -93,32 +119,49 @@ parameter            type   default Value         description
                                                  retry_count option).
                                                  Fullsearch will return each
                                                  host from list.
-``retry_count``      number 1                     Count of random returned
+``retry_count``        number 1                   Count of random returned
                                                  hosts.
-``periods``          object |periods|             These periods are used to get
+``periods``            object |periods|           Time, in seconds, to get
-                                                  statistic aggregation for
+                                                  statistical values for
-                                                  instance and host metrics.
+                                                  resources usage for instance
-                                                  The period is simply a
+                                                  and host metrics.
-                                                  repeating interval of time
+                                                  Watcher will use the last
-                                                  into which the samples are
+                                                  period to calculate resource
-                                                  grouped for aggregation.
+                                                  usage.
-                                                  Watcher uses only the last
+``granularity``        number 300                 NOT RECOMMENDED TO MODIFY:
-                                                  period of all received ones.
+                                                  The time between two measures
-==================== ====== ===================== =============================
+                                                  in an aggregated timeseries
                                                  of a metric.
 ``aggregation_method`` object |aggn_method|       NOT RECOMMENDED TO MODIFY:
                                                  Function used to aggregate
                                                  multiple measures into an
                                                  aggregated value.
 ====================== ====== =================== =============================
 .. |metrics| replace:: ["instance_cpu_usage", "instance_ram_usage"]
 .. |thresholds| replace:: {"instance_cpu_usage": 0.2, "instance_ram_usage": 0.2}
 .. |weights| replace:: {"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0}
-.. |instance_metrics| replace:: {"instance_cpu_usage": "compute.node.cpu.percent", "instance_ram_usage": "hardware.memory.used"}
+.. |instance_metrics| replace:: {"instance_cpu_usage": "host_cpu_usage", "instance_ram_usage": "host_ram_usage"}
 .. |periods| replace:: {"instance": 720, "node": 600}
 .. |aggn_method| replace:: {"instance": 'mean', "compute_node": 'mean'}
 Efficacy Indicator
 ------------------
 Global efficacy indicator:
 .. watcher-func::
  :format: literal_block
-  watcher.decision_engine.goal.efficacy.specs.ServerConsolidation.get_global_efficacy_indicator
+  watcher.decision_engine.goal.efficacy.specs.WorkloadBalancing.get_global_efficacy_indicator
 Other efficacy indicators of the goal are:
 - ``instance_migrations_count``: The number of VM migrations to be performed
 - ``instances_count``: The total number of audited instances in strategy
 - ``standard_deviation_after_audit``: The value of resulted standard deviation
 - ``standard_deviation_before_audit``: The value of original standard deviation
 Algorithm
 ---------
@@ -141,4 +184,4 @@ How to use it ?
 External Links
 --------------
- `Watcher Overload standard deviation algorithm spec <https://specs.openstack.org/openstack/watcher-specs/specs/newton/implemented/sd-strategy.html>`_
+None
--- a/releasenotes/notes/wokload-stablization-strategy-name-9988e554ac2655a2.yaml
+++ b/releasenotes/notes/wokload-stablization-strategy-name-9988e554ac2655a2.yaml
@@ -0,0 +1,7 @@
 ---
 other:
  - |
    The Watcher Overload Standard Deviation algorithm is now referred to in the
    documentation as the Workload Stabilization Strategy. The documentation of
    this strategy has been enhanced to clarify and better explain the usage of
    parameters.
--- a/watcher/decision_engine/strategy/strategies/workload_stabilization.py
+++ b/watcher/decision_engine/strategy/strategies/workload_stabilization.py
@@ -48,9 +48,19 @@ def _set_memoize(conf):
 class WorkloadStabilization(base.WorkloadStabilizationBaseStrategy):
    """Workload Stabilization control using live migration
-    This is workload stabilization strategy based on standard deviation
+    This workload stabilization strategy is based on the standard deviation
-    algorithm. The goal is to determine if there is an overload in a cluster
+    algorithm, as a measure of cluster resource usage balance. The goal is to
-    and respond to it by migrating VMs to stabilize the cluster.
+    determine if there is an overload in a cluster and respond to it by
    migrating VMs to stabilize the cluster.
    The standard deviation is determined using normalized CPU and/or memory
    usage values, which are scaled to a range between 0 and 1 based on the
    usage metrics in the data sources.
    A standard deviation of 0 means that your cluster's resources are
    perfectly balanced, with all usage values being identical. However, a
    standard deviation of 0.5 indicates completely unbalanced resource usage,
    where some resources are heavily utilized and others are not at all.
    This strategy has been tested in a small (32 nodes) cluster.