From 457819072fcde259878121691293751698655ea5 Mon Sep 17 00:00:00 2001 From: Ronelle Landy Date: Thu, 3 Jul 2025 16:51:09 -0400 Subject: [PATCH] Update Overload standard deviation doc Bug #2113862 details a number of suggested corrections and additions to the Workload Stabilization doc. This patch adds those suggested changes. Closes-Bug: #2113862 Assisted-By: Cursor (claude-3.5-sonnet) Change-Id: I4131a304c064d2ea397b2447025c7edf69a56e2a Signed-off-by: Ronelle Landy --- .../strategies/workload-stabilization.rst | 133 ++++++++++++------ ...zation-strategy-name-9988e554ac2655a2.yaml | 7 + .../strategies/workload_stabilization.py | 16 ++- 3 files changed, 108 insertions(+), 48 deletions(-) create mode 100644 releasenotes/notes/wokload-stablization-strategy-name-9988e554ac2655a2.yaml diff --git a/doc/source/strategies/workload-stabilization.rst b/doc/source/strategies/workload-stabilization.rst index c2c341b67..e6f588da6 100644 --- a/doc/source/strategies/workload-stabilization.rst +++ b/doc/source/strategies/workload-stabilization.rst @@ -1,6 +1,6 @@ -============================================= -Watcher Overload standard deviation algorithm -============================================= +=============================== +Workload Stabilization Strategy +=============================== Synopsis -------- @@ -19,20 +19,20 @@ Metrics The *workload_stabilization* strategy requires the following metrics: -============================ ============ ======= ============================= -metric service name plugins comment -============================ ============ ======= ============================= -``compute.node.cpu.percent`` ceilometer_ none need to set the - ``compute_monitors`` option - to ``cpu.virt_driver`` in the - nova.conf. -``hardware.memory.used`` ceilometer_ SNMP_ -``cpu`` ceilometer_ none -``instance_ram_usage`` ceilometer_ none -============================ ============ ======= ============================= - -.. _ceilometer: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#openstack-compute -.. _SNMP: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#snmp-based-meters +============================ ================================================== +metric description +============================ ================================================== +``instance_ram_usage`` ram memory usage in an instance as float in + megabytes +``instance_cpu_usage`` cpu usage in an instance as float ranging between + 0 and 100 representing the total cpu usage as + percentage +``host_ram_usage`` ram memory usage in a compute node as float in + megabytes +``host_cpu_usage`` cpu usage in a compute node as float ranging + between 0 and 100 representing the total cpu + usage as percentage +============================ ================================================== Cluster data model ****************** @@ -68,23 +68,49 @@ Configuration Strategy parameters are: -==================== ====== ===================== ============================= -parameter type default Value description -==================== ====== ===================== ============================= -``metrics`` array |metrics| Metrics used as rates of +====================== ====== =================== ============================= +parameter type default Value description +====================== ====== =================== ============================= +``metrics`` array |metrics| Metrics used as rates of cluster loads. -``thresholds`` object |thresholds| Dict where key is a metric +``thresholds`` object |thresholds| Dict where key is a metric and value is a trigger value. - -``weights`` object |weights| These weights used to + The strategy will only will + look for an action plan when + the standard deviation for + the usage of one of the + resources included in the + metrics, taken as a + normalized usage between + 0 and 1 among the hosts is + higher than the threshold. + The value of a perfectly + balanced cluster for the + standard deviation would be + 0, while in a totally + unbalanced one would be 0.5, + which should be the maximum + value. +``weights`` object |weights| These weights are used to calculate common standard - deviation. Name of weight - contains meter name and - _weight suffix. -``instance_metrics`` object |instance_metrics| Mapping to get hardware - statistics using instance - metrics. -``host_choice`` string retry Method of host's choice. + deviation when optimizing + the resources usage. + Name of weight contains meter + name and _weight suffix. + Higher values imply the + metric will be prioritized + when calculating an optimal + resulting cluster + distribution. +``instance_metrics`` object |instance_metrics| This parameter represents + the compute node metrics + representing compute resource + usage for the instances + resource indicated in the + metrics parameter. +``host_choice`` string retry Method of host’s choice when + analyzing destination for + instances. There are cycle, retry and fullsearch methods. Cycle will iterate hosts in cycle. @@ -93,32 +119,49 @@ parameter type default Value description retry_count option). Fullsearch will return each host from list. -``retry_count`` number 1 Count of random returned +``retry_count`` number 1 Count of random returned hosts. -``periods`` object |periods| These periods are used to get - statistic aggregation for - instance and host metrics. - The period is simply a - repeating interval of time - into which the samples are - grouped for aggregation. - Watcher uses only the last - period of all received ones. -==================== ====== ===================== ============================= +``periods`` object |periods| Time, in seconds, to get + statistical values for + resources usage for instance + and host metrics. + Watcher will use the last + period to calculate resource + usage. +``granularity`` number 300 NOT RECOMMENDED TO MODIFY: + The time between two measures + in an aggregated timeseries + of a metric. +``aggregation_method`` object |aggn_method| NOT RECOMMENDED TO MODIFY: + Function used to aggregate + multiple measures into an + aggregated value. +====================== ====== =================== ============================= .. |metrics| replace:: ["instance_cpu_usage", "instance_ram_usage"] .. |thresholds| replace:: {"instance_cpu_usage": 0.2, "instance_ram_usage": 0.2} .. |weights| replace:: {"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0} -.. |instance_metrics| replace:: {"instance_cpu_usage": "compute.node.cpu.percent", "instance_ram_usage": "hardware.memory.used"} +.. |instance_metrics| replace:: {"instance_cpu_usage": "host_cpu_usage", "instance_ram_usage": "host_ram_usage"} .. |periods| replace:: {"instance": 720, "node": 600} +.. |aggn_method| replace:: {"instance": 'mean', "compute_node": 'mean'} + Efficacy Indicator ------------------ +Global efficacy indicator: + .. watcher-func:: :format: literal_block - watcher.decision_engine.goal.efficacy.specs.ServerConsolidation.get_global_efficacy_indicator + watcher.decision_engine.goal.efficacy.specs.WorkloadBalancing.get_global_efficacy_indicator + +Other efficacy indicators of the goal are: + +- ``instance_migrations_count``: The number of VM migrations to be performed +- ``instances_count``: The total number of audited instances in strategy +- ``standard_deviation_after_audit``: The value of resulted standard deviation +- ``standard_deviation_before_audit``: The value of original standard deviation Algorithm --------- @@ -141,4 +184,4 @@ How to use it ? External Links -------------- -- `Watcher Overload standard deviation algorithm spec `_ +None diff --git a/releasenotes/notes/wokload-stablization-strategy-name-9988e554ac2655a2.yaml b/releasenotes/notes/wokload-stablization-strategy-name-9988e554ac2655a2.yaml new file mode 100644 index 000000000..0adda89c5 --- /dev/null +++ b/releasenotes/notes/wokload-stablization-strategy-name-9988e554ac2655a2.yaml @@ -0,0 +1,7 @@ +--- +other: + - | + The Watcher Overload Standard Deviation algorithm is now referred to in the + documentation as the Workload Stabilization Strategy. The documentation of + this strategy has been enhanced to clarify and better explain the usage of + parameters. diff --git a/watcher/decision_engine/strategy/strategies/workload_stabilization.py b/watcher/decision_engine/strategy/strategies/workload_stabilization.py index c083370ff..4913ecb62 100644 --- a/watcher/decision_engine/strategy/strategies/workload_stabilization.py +++ b/watcher/decision_engine/strategy/strategies/workload_stabilization.py @@ -48,9 +48,19 @@ def _set_memoize(conf): class WorkloadStabilization(base.WorkloadStabilizationBaseStrategy): """Workload Stabilization control using live migration - This is workload stabilization strategy based on standard deviation - algorithm. The goal is to determine if there is an overload in a cluster - and respond to it by migrating VMs to stabilize the cluster. + This workload stabilization strategy is based on the standard deviation + algorithm, as a measure of cluster resource usage balance. The goal is to + determine if there is an overload in a cluster and respond to it by + migrating VMs to stabilize the cluster. + + The standard deviation is determined using normalized CPU and/or memory + usage values, which are scaled to a range between 0 and 1 based on the + usage metrics in the data sources. + + A standard deviation of 0 means that your cluster's resources are + perfectly balanced, with all usage values being identical. However, a + standard deviation of 0.5 indicates completely unbalanced resource usage, + where some resources are heavily utilized and others are not at all. This strategy has been tested in a small (32 nodes) cluster.