Update Overload standard deviation doc
Bug #2113862 details a number of suggested corrections and additions to the Workload Stabilization doc. This patch adds those suggested changes. Closes-Bug: #2113862 Assisted-By: Cursor (claude-3.5-sonnet) Change-Id: I4131a304c064d2ea397b2447025c7edf69a56e2a Signed-off-by: Ronelle Landy <rlandy@redhat.com>
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
=============================================
|
||||
Watcher Overload standard deviation algorithm
|
||||
=============================================
|
||||
===============================
|
||||
Workload Stabilization Strategy
|
||||
===============================
|
||||
|
||||
Synopsis
|
||||
--------
|
||||
@@ -19,20 +19,20 @@ Metrics
|
||||
|
||||
The *workload_stabilization* strategy requires the following metrics:
|
||||
|
||||
============================ ============ ======= =============================
|
||||
metric service name plugins comment
|
||||
============================ ============ ======= =============================
|
||||
``compute.node.cpu.percent`` ceilometer_ none need to set the
|
||||
``compute_monitors`` option
|
||||
to ``cpu.virt_driver`` in the
|
||||
nova.conf.
|
||||
``hardware.memory.used`` ceilometer_ SNMP_
|
||||
``cpu`` ceilometer_ none
|
||||
``instance_ram_usage`` ceilometer_ none
|
||||
============================ ============ ======= =============================
|
||||
|
||||
.. _ceilometer: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#openstack-compute
|
||||
.. _SNMP: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#snmp-based-meters
|
||||
============================ ==================================================
|
||||
metric description
|
||||
============================ ==================================================
|
||||
``instance_ram_usage`` ram memory usage in an instance as float in
|
||||
megabytes
|
||||
``instance_cpu_usage`` cpu usage in an instance as float ranging between
|
||||
0 and 100 representing the total cpu usage as
|
||||
percentage
|
||||
``host_ram_usage`` ram memory usage in a compute node as float in
|
||||
megabytes
|
||||
``host_cpu_usage`` cpu usage in a compute node as float ranging
|
||||
between 0 and 100 representing the total cpu
|
||||
usage as percentage
|
||||
============================ ==================================================
|
||||
|
||||
Cluster data model
|
||||
******************
|
||||
@@ -68,23 +68,49 @@ Configuration
|
||||
|
||||
Strategy parameters are:
|
||||
|
||||
==================== ====== ===================== =============================
|
||||
parameter type default Value description
|
||||
==================== ====== ===================== =============================
|
||||
``metrics`` array |metrics| Metrics used as rates of
|
||||
====================== ====== =================== =============================
|
||||
parameter type default Value description
|
||||
====================== ====== =================== =============================
|
||||
``metrics`` array |metrics| Metrics used as rates of
|
||||
cluster loads.
|
||||
``thresholds`` object |thresholds| Dict where key is a metric
|
||||
``thresholds`` object |thresholds| Dict where key is a metric
|
||||
and value is a trigger value.
|
||||
|
||||
``weights`` object |weights| These weights used to
|
||||
The strategy will only will
|
||||
look for an action plan when
|
||||
the standard deviation for
|
||||
the usage of one of the
|
||||
resources included in the
|
||||
metrics, taken as a
|
||||
normalized usage between
|
||||
0 and 1 among the hosts is
|
||||
higher than the threshold.
|
||||
The value of a perfectly
|
||||
balanced cluster for the
|
||||
standard deviation would be
|
||||
0, while in a totally
|
||||
unbalanced one would be 0.5,
|
||||
which should be the maximum
|
||||
value.
|
||||
``weights`` object |weights| These weights are used to
|
||||
calculate common standard
|
||||
deviation. Name of weight
|
||||
contains meter name and
|
||||
_weight suffix.
|
||||
``instance_metrics`` object |instance_metrics| Mapping to get hardware
|
||||
statistics using instance
|
||||
metrics.
|
||||
``host_choice`` string retry Method of host's choice.
|
||||
deviation when optimizing
|
||||
the resources usage.
|
||||
Name of weight contains meter
|
||||
name and _weight suffix.
|
||||
Higher values imply the
|
||||
metric will be prioritized
|
||||
when calculating an optimal
|
||||
resulting cluster
|
||||
distribution.
|
||||
``instance_metrics`` object |instance_metrics| This parameter represents
|
||||
the compute node metrics
|
||||
representing compute resource
|
||||
usage for the instances
|
||||
resource indicated in the
|
||||
metrics parameter.
|
||||
``host_choice`` string retry Method of host’s choice when
|
||||
analyzing destination for
|
||||
instances.
|
||||
There are cycle, retry and
|
||||
fullsearch methods. Cycle
|
||||
will iterate hosts in cycle.
|
||||
@@ -93,32 +119,49 @@ parameter type default Value description
|
||||
retry_count option).
|
||||
Fullsearch will return each
|
||||
host from list.
|
||||
``retry_count`` number 1 Count of random returned
|
||||
``retry_count`` number 1 Count of random returned
|
||||
hosts.
|
||||
``periods`` object |periods| These periods are used to get
|
||||
statistic aggregation for
|
||||
instance and host metrics.
|
||||
The period is simply a
|
||||
repeating interval of time
|
||||
into which the samples are
|
||||
grouped for aggregation.
|
||||
Watcher uses only the last
|
||||
period of all received ones.
|
||||
==================== ====== ===================== =============================
|
||||
``periods`` object |periods| Time, in seconds, to get
|
||||
statistical values for
|
||||
resources usage for instance
|
||||
and host metrics.
|
||||
Watcher will use the last
|
||||
period to calculate resource
|
||||
usage.
|
||||
``granularity`` number 300 NOT RECOMMENDED TO MODIFY:
|
||||
The time between two measures
|
||||
in an aggregated timeseries
|
||||
of a metric.
|
||||
``aggregation_method`` object |aggn_method| NOT RECOMMENDED TO MODIFY:
|
||||
Function used to aggregate
|
||||
multiple measures into an
|
||||
aggregated value.
|
||||
====================== ====== =================== =============================
|
||||
|
||||
.. |metrics| replace:: ["instance_cpu_usage", "instance_ram_usage"]
|
||||
.. |thresholds| replace:: {"instance_cpu_usage": 0.2, "instance_ram_usage": 0.2}
|
||||
.. |weights| replace:: {"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0}
|
||||
.. |instance_metrics| replace:: {"instance_cpu_usage": "compute.node.cpu.percent", "instance_ram_usage": "hardware.memory.used"}
|
||||
.. |instance_metrics| replace:: {"instance_cpu_usage": "host_cpu_usage", "instance_ram_usage": "host_ram_usage"}
|
||||
.. |periods| replace:: {"instance": 720, "node": 600}
|
||||
.. |aggn_method| replace:: {"instance": 'mean', "compute_node": 'mean'}
|
||||
|
||||
|
||||
Efficacy Indicator
|
||||
------------------
|
||||
|
||||
Global efficacy indicator:
|
||||
|
||||
.. watcher-func::
|
||||
:format: literal_block
|
||||
|
||||
watcher.decision_engine.goal.efficacy.specs.ServerConsolidation.get_global_efficacy_indicator
|
||||
watcher.decision_engine.goal.efficacy.specs.WorkloadBalancing.get_global_efficacy_indicator
|
||||
|
||||
Other efficacy indicators of the goal are:
|
||||
|
||||
- ``instance_migrations_count``: The number of VM migrations to be performed
|
||||
- ``instances_count``: The total number of audited instances in strategy
|
||||
- ``standard_deviation_after_audit``: The value of resulted standard deviation
|
||||
- ``standard_deviation_before_audit``: The value of original standard deviation
|
||||
|
||||
Algorithm
|
||||
---------
|
||||
@@ -141,4 +184,4 @@ How to use it ?
|
||||
External Links
|
||||
--------------
|
||||
|
||||
- `Watcher Overload standard deviation algorithm spec <https://specs.openstack.org/openstack/watcher-specs/specs/newton/implemented/sd-strategy.html>`_
|
||||
None
|
||||
|
||||
@@ -0,0 +1,7 @@
|
||||
---
|
||||
other:
|
||||
- |
|
||||
The Watcher Overload Standard Deviation algorithm is now referred to in the
|
||||
documentation as the Workload Stabilization Strategy. The documentation of
|
||||
this strategy has been enhanced to clarify and better explain the usage of
|
||||
parameters.
|
||||
@@ -48,9 +48,19 @@ def _set_memoize(conf):
|
||||
class WorkloadStabilization(base.WorkloadStabilizationBaseStrategy):
|
||||
"""Workload Stabilization control using live migration
|
||||
|
||||
This is workload stabilization strategy based on standard deviation
|
||||
algorithm. The goal is to determine if there is an overload in a cluster
|
||||
and respond to it by migrating VMs to stabilize the cluster.
|
||||
This workload stabilization strategy is based on the standard deviation
|
||||
algorithm, as a measure of cluster resource usage balance. The goal is to
|
||||
determine if there is an overload in a cluster and respond to it by
|
||||
migrating VMs to stabilize the cluster.
|
||||
|
||||
The standard deviation is determined using normalized CPU and/or memory
|
||||
usage values, which are scaled to a range between 0 and 1 based on the
|
||||
usage metrics in the data sources.
|
||||
|
||||
A standard deviation of 0 means that your cluster's resources are
|
||||
perfectly balanced, with all usage values being identical. However, a
|
||||
standard deviation of 0.5 indicates completely unbalanced resource usage,
|
||||
where some resources are heavily utilized and others are not at all.
|
||||
|
||||
This strategy has been tested in a small (32 nodes) cluster.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user