Update Workload Balance strategy documentation

Adds additional parameter and usage explanations and combined example. Closes-Bug: #2111848 Change-Id: Id0de4d56fa7083388ad82c61596e7484431d465b
2025-05-27 14:59:59 -04:00
parent 59757249bb
commit f42cb8557b
2 changed files with 72 additions and 24 deletions
--- a/doc/source/strategies/workload_balance.rst
+++ b/doc/source/strategies/workload_balance.rst
@@ -11,25 +11,35 @@ Synopsis

    .. watcher-term:: watcher.decision_engine.strategy.strategies.workload_balance.WorkloadBalance

-Requirements
------------
-
-None.
-
 Metrics
 *******

 The *workload_balance* strategy requires the following metrics:

-======================= ============ ======= =========================
-metric                  service name plugins comment
-======================= ============ ======= =========================
-``cpu``                 ceilometer_  none
-``memory.resident``     ceilometer_  none
-======================= ============ ======= =========================
+======================= ============ ======= =========== ======================
+metric                  service name plugins unit        comment
+======================= ============ ======= =========== ======================
+``cpu``                 ceilometer_  none    percentage  CPU of the instance.
+                                                         Used to calculate the
+                                                         threshold
+``memory.resident``     ceilometer_  none    MB          RAM of the instance.
+                                                         Used to calculate the
+                                                         threshold
+======================= ============ ======= =========== ======================

 .. _ceilometer: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html#openstack-compute

+**Notes**
+
+* The parameters above reference the instance CPU or RAM usage, but
+  the threshold calculation is based of the CPU/RAM usage on the hypervisor.
+* The RAM usage can be calculated based on the RAM consumed by the instance,
+  and the available RAM on the hypervisor.
+* The CPU percentage calculation relies on the CPU load, but also on the number
+  of CPUs on the hypervisor.
+* The memory host metric is calculated by summing the RAM usage of each
+  instance on the host. This measure is close to the real usage, but is not
+  the exact usage on the host.

 Cluster data model
 ******************
@@ -64,16 +74,28 @@ Configuration

 Strategy parameters are:

-============== ====== ==================== ====================================
-parameter      type   default Value        description
-============== ====== ==================== ====================================
-``metrics``    String 'instance_cpu_usage' Workload balance base on cpu or ram
-                                           utilization. Choices:
-                                           ['instance_cpu_usage',
-                                           'instance_ram_usage']
-``threshold``  Number 25.0                 Workload threshold for migration
-``period``     Number 300                  Aggregate time period of ceilometer
-============== ====== ==================== ====================================
+================ ====== ==================== ==================================
+parameter        type   default value        description
+================ ====== ==================== ==================================
+``metrics``      String 'instance_cpu_usage' Workload balance base on cpu or
+                                             ram utilization. Choices:
+                                             ['instance_cpu_usage',
+                                             'instance_ram_usage']
+``threshold``    Number 25.0                 Workload threshold for migration.
+                                             Used for both the source and the
+                                             destination calculations.
+                                             Threshold is always a percentage.
+``period``       Number 300                  Aggregate time period of
+                                             ceilometer
+``granularity``  Number 300                  The time between two measures in
+                                             an aggregated timeseries of a
+                                             metric.
+                                             This parameter is only used
+                                             with the Gnocchi data source,
+                                             and it must match to any of the
+                                             valid archive policies for the
+                                             metric.
+================ ====== ==================== ==================================

 Efficacy Indicator
 ------------------
@@ -89,14 +111,36 @@ to: https://specs.openstack.org/openstack/watcher-specs/specs/mitaka/implemented
 How to use it ?
 ---------------

+Create and audit template using the Workload Balancing strategy.
+
 .. code-block:: shell

    $ openstack optimize audittemplate create \
      at1 workload_balancing --strategy workload_balance

+Run an audit using the Workload Balance strategy where
+the aim is to get a plan to move VMs from any host where the
+CPU usage is over the threshold of 26%, to a host where the
+utilization of CPU is under the threshold.
+The measurements of CPU utilization are taken from Ceilometer
+with an aggregate period of 310.
+
+.. code-block:: shell
+
    $ openstack optimize audit create -a at1 -p threshold=26.0 \
            -p period=310 -p metrics=instance_cpu_usage

+Run an audit using the Workload Balance strategy to
+obtain a plan to balance VMs over hosts with a threshold of 20%.
+In this case, the stipulation of the Ceilometer CPU utilization
+metric measurement is a combination of period and granularity.
+
+.. code-block:: shell
+
+    $ openstack optimize audit create -a at1 \
+           -p granularity=30 -p threshold=20 -p period=300 \
+           -p metrics=instance_cpu_usage --auto-trigger
+
 External Links
 --------------

--- a/watcher/decision_engine/strategy/strategies/workload_balance.py
+++ b/watcher/decision_engine/strategy/strategies/workload_balance.py
@@ -28,13 +28,16 @@ LOG = log.getLogger(__name__)


 class WorkloadBalance(base.WorkloadStabilizationBaseStrategy):
-    """[PoC]Workload balance using live migration
+    """Workload balance using live migration

    *Description*

        It is a migration strategy based on the VM workload of physical
        servers. It generates solutions to move a workload whenever a server's
        CPU or RAM utilization % is higher than the specified threshold.
+        The threshold specified is used to trigger a migration,
+        but it is also used to determine if there is an available host,
+        with low enough utilization, to migrate the instance.
        The VM to be moved should make the host close to average workload
        of all compute nodes.

@@ -48,7 +51,6 @@ class WorkloadBalance(base.WorkloadStabilizationBaseStrategy):

    *Limitations*

-       - This is a proof of concept that is not meant to be used in production
       - We cannot forecast how many servers should be migrated. This is the
         reason why we only plan a single virtual machine migration at a time.
         So it's better to use this algorithm with `CONTINUOUS` audits.
@@ -105,7 +107,9 @@ class WorkloadBalance(base.WorkloadStabilizationBaseStrategy):
                    "default": "instance_cpu_usage"
                },
                "threshold": {
-                    "description": "workload threshold for migration",
+                    "description": "Workload threshold for migration - "
+                                   "used for source and destination hosts. "
+                                   "It is always a percentage value.",
                    "type": "number",
                    "default": 25.0
                },