Merge "Update Overload standard deviation doc"
This commit is contained in:
@@ -1,6 +1,6 @@
|
|||||||
=============================================
|
===============================
|
||||||
Watcher Overload standard deviation algorithm
|
Workload Stabilization Strategy
|
||||||
=============================================
|
===============================
|
||||||
|
|
||||||
Synopsis
|
Synopsis
|
||||||
--------
|
--------
|
||||||
@@ -19,20 +19,20 @@ Metrics
|
|||||||
|
|
||||||
The *workload_stabilization* strategy requires the following metrics:
|
The *workload_stabilization* strategy requires the following metrics:
|
||||||
|
|
||||||
============================ ============ ======= =============================
|
============================ ==================================================
|
||||||
metric service name plugins comment
|
metric description
|
||||||
============================ ============ ======= =============================
|
============================ ==================================================
|
||||||
``compute.node.cpu.percent`` ceilometer_ none need to set the
|
``instance_ram_usage`` ram memory usage in an instance as float in
|
||||||
``compute_monitors`` option
|
megabytes
|
||||||
to ``cpu.virt_driver`` in the
|
``instance_cpu_usage`` cpu usage in an instance as float ranging between
|
||||||
nova.conf.
|
0 and 100 representing the total cpu usage as
|
||||||
``hardware.memory.used`` ceilometer_ SNMP_
|
percentage
|
||||||
``cpu`` ceilometer_ none
|
``host_ram_usage`` ram memory usage in a compute node as float in
|
||||||
``instance_ram_usage`` ceilometer_ none
|
megabytes
|
||||||
============================ ============ ======= =============================
|
``host_cpu_usage`` cpu usage in a compute node as float ranging
|
||||||
|
between 0 and 100 representing the total cpu
|
||||||
.. _ceilometer: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#openstack-compute
|
usage as percentage
|
||||||
.. _SNMP: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#snmp-based-meters
|
============================ ==================================================
|
||||||
|
|
||||||
Cluster data model
|
Cluster data model
|
||||||
******************
|
******************
|
||||||
@@ -68,23 +68,49 @@ Configuration
|
|||||||
|
|
||||||
Strategy parameters are:
|
Strategy parameters are:
|
||||||
|
|
||||||
==================== ====== ===================== =============================
|
====================== ====== =================== =============================
|
||||||
parameter type default Value description
|
parameter type default Value description
|
||||||
==================== ====== ===================== =============================
|
====================== ====== =================== =============================
|
||||||
``metrics`` array |metrics| Metrics used as rates of
|
``metrics`` array |metrics| Metrics used as rates of
|
||||||
cluster loads.
|
cluster loads.
|
||||||
``thresholds`` object |thresholds| Dict where key is a metric
|
``thresholds`` object |thresholds| Dict where key is a metric
|
||||||
and value is a trigger value.
|
and value is a trigger value.
|
||||||
|
The strategy will only will
|
||||||
``weights`` object |weights| These weights used to
|
look for an action plan when
|
||||||
|
the standard deviation for
|
||||||
|
the usage of one of the
|
||||||
|
resources included in the
|
||||||
|
metrics, taken as a
|
||||||
|
normalized usage between
|
||||||
|
0 and 1 among the hosts is
|
||||||
|
higher than the threshold.
|
||||||
|
The value of a perfectly
|
||||||
|
balanced cluster for the
|
||||||
|
standard deviation would be
|
||||||
|
0, while in a totally
|
||||||
|
unbalanced one would be 0.5,
|
||||||
|
which should be the maximum
|
||||||
|
value.
|
||||||
|
``weights`` object |weights| These weights are used to
|
||||||
calculate common standard
|
calculate common standard
|
||||||
deviation. Name of weight
|
deviation when optimizing
|
||||||
contains meter name and
|
the resources usage.
|
||||||
_weight suffix.
|
Name of weight contains meter
|
||||||
``instance_metrics`` object |instance_metrics| Mapping to get hardware
|
name and _weight suffix.
|
||||||
statistics using instance
|
Higher values imply the
|
||||||
metrics.
|
metric will be prioritized
|
||||||
``host_choice`` string retry Method of host's choice.
|
when calculating an optimal
|
||||||
|
resulting cluster
|
||||||
|
distribution.
|
||||||
|
``instance_metrics`` object |instance_metrics| This parameter represents
|
||||||
|
the compute node metrics
|
||||||
|
representing compute resource
|
||||||
|
usage for the instances
|
||||||
|
resource indicated in the
|
||||||
|
metrics parameter.
|
||||||
|
``host_choice`` string retry Method of host’s choice when
|
||||||
|
analyzing destination for
|
||||||
|
instances.
|
||||||
There are cycle, retry and
|
There are cycle, retry and
|
||||||
fullsearch methods. Cycle
|
fullsearch methods. Cycle
|
||||||
will iterate hosts in cycle.
|
will iterate hosts in cycle.
|
||||||
@@ -93,32 +119,49 @@ parameter type default Value description
|
|||||||
retry_count option).
|
retry_count option).
|
||||||
Fullsearch will return each
|
Fullsearch will return each
|
||||||
host from list.
|
host from list.
|
||||||
``retry_count`` number 1 Count of random returned
|
``retry_count`` number 1 Count of random returned
|
||||||
hosts.
|
hosts.
|
||||||
``periods`` object |periods| These periods are used to get
|
``periods`` object |periods| Time, in seconds, to get
|
||||||
statistic aggregation for
|
statistical values for
|
||||||
instance and host metrics.
|
resources usage for instance
|
||||||
The period is simply a
|
and host metrics.
|
||||||
repeating interval of time
|
Watcher will use the last
|
||||||
into which the samples are
|
period to calculate resource
|
||||||
grouped for aggregation.
|
usage.
|
||||||
Watcher uses only the last
|
``granularity`` number 300 NOT RECOMMENDED TO MODIFY:
|
||||||
period of all received ones.
|
The time between two measures
|
||||||
==================== ====== ===================== =============================
|
in an aggregated timeseries
|
||||||
|
of a metric.
|
||||||
|
``aggregation_method`` object |aggn_method| NOT RECOMMENDED TO MODIFY:
|
||||||
|
Function used to aggregate
|
||||||
|
multiple measures into an
|
||||||
|
aggregated value.
|
||||||
|
====================== ====== =================== =============================
|
||||||
|
|
||||||
.. |metrics| replace:: ["instance_cpu_usage", "instance_ram_usage"]
|
.. |metrics| replace:: ["instance_cpu_usage", "instance_ram_usage"]
|
||||||
.. |thresholds| replace:: {"instance_cpu_usage": 0.2, "instance_ram_usage": 0.2}
|
.. |thresholds| replace:: {"instance_cpu_usage": 0.2, "instance_ram_usage": 0.2}
|
||||||
.. |weights| replace:: {"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0}
|
.. |weights| replace:: {"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0}
|
||||||
.. |instance_metrics| replace:: {"instance_cpu_usage": "compute.node.cpu.percent", "instance_ram_usage": "hardware.memory.used"}
|
.. |instance_metrics| replace:: {"instance_cpu_usage": "host_cpu_usage", "instance_ram_usage": "host_ram_usage"}
|
||||||
.. |periods| replace:: {"instance": 720, "node": 600}
|
.. |periods| replace:: {"instance": 720, "node": 600}
|
||||||
|
.. |aggn_method| replace:: {"instance": 'mean', "compute_node": 'mean'}
|
||||||
|
|
||||||
|
|
||||||
Efficacy Indicator
|
Efficacy Indicator
|
||||||
------------------
|
------------------
|
||||||
|
|
||||||
|
Global efficacy indicator:
|
||||||
|
|
||||||
.. watcher-func::
|
.. watcher-func::
|
||||||
:format: literal_block
|
:format: literal_block
|
||||||
|
|
||||||
watcher.decision_engine.goal.efficacy.specs.ServerConsolidation.get_global_efficacy_indicator
|
watcher.decision_engine.goal.efficacy.specs.WorkloadBalancing.get_global_efficacy_indicator
|
||||||
|
|
||||||
|
Other efficacy indicators of the goal are:
|
||||||
|
|
||||||
|
- ``instance_migrations_count``: The number of VM migrations to be performed
|
||||||
|
- ``instances_count``: The total number of audited instances in strategy
|
||||||
|
- ``standard_deviation_after_audit``: The value of resulted standard deviation
|
||||||
|
- ``standard_deviation_before_audit``: The value of original standard deviation
|
||||||
|
|
||||||
Algorithm
|
Algorithm
|
||||||
---------
|
---------
|
||||||
@@ -141,4 +184,4 @@ How to use it ?
|
|||||||
External Links
|
External Links
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
- `Watcher Overload standard deviation algorithm spec <https://specs.openstack.org/openstack/watcher-specs/specs/newton/implemented/sd-strategy.html>`_
|
None
|
||||||
|
|||||||
@@ -0,0 +1,7 @@
|
|||||||
|
---
|
||||||
|
other:
|
||||||
|
- |
|
||||||
|
The Watcher Overload Standard Deviation algorithm is now referred to in the
|
||||||
|
documentation as the Workload Stabilization Strategy. The documentation of
|
||||||
|
this strategy has been enhanced to clarify and better explain the usage of
|
||||||
|
parameters.
|
||||||
@@ -48,9 +48,19 @@ def _set_memoize(conf):
|
|||||||
class WorkloadStabilization(base.WorkloadStabilizationBaseStrategy):
|
class WorkloadStabilization(base.WorkloadStabilizationBaseStrategy):
|
||||||
"""Workload Stabilization control using live migration
|
"""Workload Stabilization control using live migration
|
||||||
|
|
||||||
This is workload stabilization strategy based on standard deviation
|
This workload stabilization strategy is based on the standard deviation
|
||||||
algorithm. The goal is to determine if there is an overload in a cluster
|
algorithm, as a measure of cluster resource usage balance. The goal is to
|
||||||
and respond to it by migrating VMs to stabilize the cluster.
|
determine if there is an overload in a cluster and respond to it by
|
||||||
|
migrating VMs to stabilize the cluster.
|
||||||
|
|
||||||
|
The standard deviation is determined using normalized CPU and/or memory
|
||||||
|
usage values, which are scaled to a range between 0 and 1 based on the
|
||||||
|
usage metrics in the data sources.
|
||||||
|
|
||||||
|
A standard deviation of 0 means that your cluster's resources are
|
||||||
|
perfectly balanced, with all usage values being identical. However, a
|
||||||
|
standard deviation of 0.5 indicates completely unbalanced resource usage,
|
||||||
|
where some resources are heavily utilized and others are not at all.
|
||||||
|
|
||||||
This strategy has been tested in a small (32 nodes) cluster.
|
This strategy has been tested in a small (32 nodes) cluster.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user