.. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ========== Glossary ========== .. glossary:: :sorted: This page explains the different terms used in the Watcher system. They are sorted in alphabetical order. .. _action_definition: Action ====== An :ref:`Action ` is what enables Watcher to transform the current state of a :ref:`Cluster ` after an :ref:`Audit `. An :ref:`Action ` is an atomic task which changes the current state of a target :ref:`Managed resource ` of the OpenStack :ref:`Cluster ` such as: - Live migration of an instance from one compute node to another compute node with Nova - Changing the power level of a compute node (ACPI level, ...) - Changing the current state of an hypervisor (enable or disable) with Nova In most cases, an :ref:`Action ` triggers some concrete commands on an existing OpenStack module (Nova, Neutron, Cinder, Ironic, etc.). An :ref:`Action ` has a life-cycle and its current state may be one of the following: - **PENDING** : the :ref:`Action ` has not been executed yet by the :ref:`Watcher Applier ` - **ONGOING** : the :ref:`Action ` is currently being processed by the :ref:`Watcher Applier ` - **SUCCEEDED** : the :ref:`Action ` has been executed successfully - **FAILED** : an error occured while trying to execute the :ref:`Action ` - **DELETED** : the :ref:`Action ` is still stored in the :ref:`Watcher database ` but is not returned any more through the Watcher APIs. - **CANCELLED** : the :ref:`Action ` was in **PENDING** or **ONGOING** state and was cancelled by the :ref:`Administrator ` .. _action_plan_definition: Action Plan =========== An :ref:`Action Plan ` is a flow of :ref:`Actions ` that should be executed in order to satisfy a given :ref:`Goal `. An :ref:`Action Plan ` is generated by Watcher when an :ref:`Audit ` is successful which implies that the :ref:`Strategy ` which was used has found a :ref:`Solution ` to achieve the :ref:`Goal ` of this :ref:`Audit `. In the default implementation of Watcher, an :ref:`Action Plan ` is only composed of successive :ref:`Actions ` (i.e., a Workflow of :ref:`Actions ` belonging to a unique branch). However, Watcher provides abstract interfaces for many of its components, allowing other implementations to generate and handle more complex :ref:`Action Plan(s) ` composed of two types of Action Item(s): - simple :ref:`Actions `: atomic tasks, which means it can not be split into smaller tasks or commands from an OpenStack point of view. - composite Actions: which are composed of several simple :ref:`Actions ` ordered in sequential and/or parallel flows. An :ref:`Action Plan ` may be described using standard workflow model description formats such as `Business Process Model and Notation 2.0 (BPMN 2.0) `_ or `Unified Modeling Language (UML) `_. An :ref:`Action Plan ` has a life-cycle and its current state may be one of the following: - **RECOMMENDED** : the :ref:`Action Plan ` is waiting for a validation from the :ref:`Administrator ` - **ONGOING** : the :ref:`Action Plan ` is currently being processed by the :ref:`Watcher Applier ` - **SUCCEEDED** : the :ref:`Action Plan ` has been executed successfully (i.e. all :ref:`Actions ` that it contains have been executed successfully) - **FAILED** : an error occured while executing the :ref:`Action Plan ` - **DELETED** : the :ref:`Action Plan ` is still stored in the :ref:`Watcher database ` but is not returned any more through the Watcher APIs. - **CANCELLED** : the :ref:`Action Plan ` was in **PENDING** or **ONGOING** state and was cancelled by the :ref:`Administrator ` .. _administrator_definition: Administrator ============= The :ref:`Administrator ` is any user who has admin access on the OpenStack cluster. This user is allowed to create new projects for tenants, create new users and assign roles to each user. The :ref:`Administrator ` usually has remote access to any host of the cluster in order to change the configuration and restart any OpenStack service, including Watcher. In the context of Watcher, the :ref:`Administrator ` is a role for users which allows them to run any Watcher commands, such as: - Create/Delete an :ref:`Audit Template ` - Launch an :ref:`Audit ` - Get the :ref:`Action Plan ` - Launch a recommended :ref:`Action Plan ` manually - Archive previous :ref:`Audits ` and :ref:`Action Plans ` The :ref:`Administrator ` is also allowed to modify any Watcher configuration files and to restart Watcher services. .. _audit_definition: Audit ===== In the Watcher system, an :ref:`Audit ` is a request for optimizing a :ref:`Cluster `. The optimization is done in order to satisfy one :ref:`Goal ` on a given :ref:`Cluster `. For each :ref:`Audit `, the Watcher system generates an :ref:`Action Plan `. An :ref:`Audit ` has a life-cycle and its current state may be one of the following: - **PENDING** : a request for an :ref:`Audit ` has been submitted (either manually by the :ref:`Administrator ` or automatically via some event handling mechanism) and is in the queue for being processed by the :ref:`Watcher Decision Engine ` - **ONGOING** : the :ref:`Audit ` is currently being processed by the :ref:`Watcher Decision Engine ` - **SUCCEEDED** : the :ref:`Audit ` has been executed successfully (note that it may not necessarily produce a :ref:`Solution `). - **FAILED** : an error occured while executing the :ref:`Audit ` - **DELETED** : the :ref:`Audit ` is still stored in the :ref:`Watcher database ` but is not returned any more through the Watcher APIs. - **CANCELLED** : the :ref:`Audit ` was in **PENDING** or **ONGOING** state and was cancelled by the :ref:`Administrator ` .. _audit_template_definition: Audit Template ============== An :ref:`Audit ` may be launched several times with the same settings (:ref:`Goal `, thresholds, ...). Therefore it makes sense to save those settings in some sort of Audit preset object, which is known as an :ref:`Audit Template `. An :ref:`Audit Template ` contains at least the :ref:`Goal ` of the :ref:`Audit `. It may also contain some error handling settings indicating whether: - :ref:`Watcher Applier ` stops the entire operation - :ref:`Watcher Applier ` performs a rollback and how many retries should be attempted before failure occurs (also the latter can be complex: for example the scenario in which there are many first-time failures on ultimately successful :ref:`Actions `). Moreover, an :ref:`Audit Template ` may contain some settings related to the level of automation for the :ref:`Action Plan ` that will be generated by the :ref:`Audit `. A flag will indicate whether the :ref:`Action Plan ` will be launched automatically or will need a manual confirmation from the :ref:`Administrator `. Last but not least, an :ref:`Audit Template ` may contain a list of extra parameters related to the :ref:`Strategy ` configuration. These parameters can be provided as a list of key-value pairs. .. _availability_zone_definition: Availability Zone ================= Please, read `the official OpenStack definition of an Availability Zone `_. .. _cluster_definition: Cluster ======= A :ref:`Cluster ` is a set of physical machines which provide compute, storage and networking resources and are managed by the same OpenStack Controller node. A :ref:`Cluster ` represents a set of resources that a cloud provider is able to offer to his/her :ref:`customers `. A data center may contain several clusters. The :ref:`Cluster ` may be divided in one or several :ref:`Availability Zone(s) `. .. _cluster_data_model_definition: Cluster Data Model ================== A :ref:`Cluster Data Model ` is a logical representation of the current state and topology of the :ref:`Cluster ` :ref:`Managed resources `. It is represented as a set of :ref:`Managed resources ` (which may be a simple tree or a flat list of key-value pairs) which enables Watcher :ref:`Strategies ` to know the current relationships between the different :ref:`resources `) of the :ref:`Cluster ` during an :ref:`Audit ` and enables the :ref:`Strategy ` to request information such as: - What compute nodes are in a given :ref:`Availability Zone ` or a given :ref:`Host Aggregate ` ? - What :ref:`Instances ` are hosted on a given compute node ? - What is the current load of a compute node ? - What is the current free memory of a compute node ? - What is the network link between two compute nodes ? - What is the available bandwidth on a given network link ? - What is the current space available on a given virtual disk of a given :ref:`Instance ` ? - What is the current state of a given :ref:`Instance `? - ... In a word, this data model enables the :ref:`Strategy ` to know: - the current topology of the :ref:`Cluster ` - the current capacity for each :ref:`Managed resource ` - the current amount of used/free space for each :ref:`Managed resource ` - the current state of each :ref:`Managed resources ` In the Watcher project, we aim at providing a generic and very basic :ref:`Cluster Data Model ` for each :ref:`Goal `, usable in the associated :ref:`Strategies ` through some helper classes in order to: - simplify the development of a new :ref:`Strategy ` for a given :ref:`Goal ` when there already are some existing :ref:`Strategies ` associated to the same :ref:`Goal ` - avoid duplicating the same code in several :ref:`Strategies ` associated to the same :ref:`Goal ` - have a better consistency between the different :ref:`Strategies ` for a given :ref:`Goal ` - avoid any strong coupling with any external :ref:`Cluster Data Model ` (the proposed data model acts as a pivot data model) There may be various :ref:`generic and basic Cluster Data Models ` proposed in Watcher helpers, each of them being adapted to achieving a given :ref:`Goal `: - For example, for a :ref:`Goal ` which aims at optimizing the network :ref:`resources ` the :ref:`Strategy ` may need to know which :ref:`resources ` are communicating together. - Whereas for a :ref:`Goal ` which aims at optimizing thermal and power conditions, the :ref:`Strategy ` may need to know the location of each compute node in the racks and the location of each rack in the room. Note however that a developer can use his/her own :ref:`Cluster Data Model ` if the proposed data model does not fit his/her needs as long as the :ref:`Strategy ` is able to produce a :ref:`Solution ` for the requested :ref:`Goal `. For example, a developer could rely on the Nova Data Model to optimize some compute resources. The :ref:`Cluster Data Model ` may be persisted in any appropriate storage system (SQL database, NoSQL database, JSON file, XML File, In Memory Database, ...). .. _cluster_history_definition: Cluster History =============== The :ref:`Cluster History ` contains all the previously collected timestamped data such as metrics and events associated to any :ref:`managed resource ` of the :ref:`Cluster `. Just like the :ref:`Cluster Data Model `, this history may be used by any :ref:`Strategy ` in order to find the most optimal :ref:`Solution ` during an :ref:`Audit `. In the Watcher project, a generic :ref:`Cluster History ` API is proposed with some helper classes in order to : - share a common measurement (events or metrics) naming based on what is defined in Ceilometer. See `the full list of available measurements `_ - share common meter types (Cumulative, Delta, Gauge) based on what is defined in Ceilometer. See `the full list of meter types `_ - simplify the development of a new :ref:`Strategy ` - avoid duplicating the same code in several :ref:`Strategies ` - have a better consistency between the different :ref:`Strategies ` - avoid any strong coupling with any external metrics/events storage system (the proposed API and measurement naming system acts as a pivot format) Note however that a developer can use his/her own history management system if the Ceilometer system does not fit his/her needs as long as the :ref:`Strategy ` is able to produce a :ref:`Solution ` for the requested :ref:`Goal `. The :ref:`Cluster History ` data may be persisted in any appropriate storage system (InfluxDB, OpenTSDB, MongoDB,...). .. _controller_node_definition: Controller Node =============== A controller node is a machine that typically runs the following core OpenStack services: - Keystone: for identity and service management - Cinder scheduler: for volumes management - Glance controller: for image management - Neutron controller: for network management - Nova controller: for global compute resources management with services such as nova-scheduler, nova-conductor and nova-network. In many configurations, Watcher will reside on a controller node even if it can potentially be hosted on a dedicated machine. .. _compute_node_definition: Compute node ============ Please, read `the official OpenStack definition of a Compute Node `_. .. _customer_definition: Customer ======== A :ref:`Customer ` is the person or company which subscribes to the cloud provider offering. A customer may have several :ref:`Project(s) ` hosted on the same :ref:`Cluster ` or dispatched on different clusters. In the private cloud context, the :ref:`Customers ` are different groups within the same organization (different departments, project teams, branch offices and so on). Cloud infrastructure includes the ability to precisely track each customer's service usage so that it can be charged back to them, or at least reported to them. .. _goal_definition: Goal ==== A :ref:`Goal ` is a human readable, observable and measurable end result having one objective to be achieved. Here are some examples of :ref:`Goals `: - minimize the energy consumption - minimize the number of compute nodes (consolidation) - balance the workload among compute nodes - minimize the license cost (some softwares have a licensing model which is based on the number of sockets or cores where the software is deployed) - find the most appropriate moment for a planned maintenance on a given group of host (which may be an entire availability zone): power supply replacement, cooling system replacement, hardware modification, ... .. _host_aggregates_definition: Host Aggregate ============== Please, read `the official OpenStack definition of a Host Aggregate `_. .. _instance_definition: Instance ======== A running virtual machine, or a virtual machine in a known state such as suspended, that can be used like a hardware server. .. _managed_resource_definition: Managed resource ================ A :ref:`Managed resource ` is one instance of :ref:`Managed resource type ` in a topology with particular properties and dependencies on other :ref:`Managed resources ` (relationships). For example, a :ref:`Managed resource ` can be one virtual machine (i.e., an :ref:`instance `) hosted on a :ref:`compute node ` and connected to another virtual machine through a network link (represented also as a :ref:`Managed resource ` in the :ref:`Cluster Data Model `). .. _managed_resource_type_definition: Managed resource type ===================== A :ref:`Managed resource type ` is a type of hardware or software element of the :ref:`Cluster ` that the Watcher system can act on. Here are some examples of :ref:`Managed resource types `: - `Nova Host Aggregates `_ - `Nova Servers `_ - `Cinder Volumes `_ - `Neutron Routers `_ - `Neutron Networks `_ - `Neutron load-balancers `_ - `Sahara Hadoop Cluster `_ - ... It can be any of the `the official list of available resource types defined in OpenStack for HEAT `_. .. _efficiency_definition: Optimization Efficiency ======================= The :ref:`Optimization Efficiency ` is the objective measure of how much of the :ref:`Goal ` has been achieved in respect with constraints and :ref:`SLAs ` defined by the :ref:`Customer `. The way efficiency is evaluated will depend on the :ref:`Goal ` to achieve. Of course, the efficiency will be relevant only as long as the :ref:`Action Plan ` is relevant (i.e., the current state of the :ref:`Cluster ` has not changed in a way that a new :ref:`Audit ` would need to be launched). For example, if the :ref:`Goal ` is to lower the energy consumption, the :ref:`Efficiency ` will be computed using several indicators (KPIs): - the percentage of energy gain (which must be the highest possible) - the number of :ref:`SLA violations ` (which must be the lowest possible) - the number of virtual machine migrations (which must be the lowest possible) All those indicators (KPIs) are computed within a given timeframe, which is the time taken to execute the whole :ref:`Action Plan `. The efficiency also enables the :ref:`Administrator ` to objectively compare different :ref:`Strategies ` for the same goal and same workload of the :ref:`Cluster `. .. _project_definition: Project ======= :ref:`Projects ` represent the base unit of “ownership” in OpenStack, in that all :ref:`resources ` in OpenStack should be owned by a specific :ref:`project `. In OpenStack Identity, a :ref:`project ` must be owned by a specific domain. Please, read `the official OpenStack definition of a Project `_. .. _sla_definition: SLA === :ref:`SLA ` means Service Level Agreement. The resources are negotiated between the :ref:`Customer ` and the Cloud Provider in a contract. Most of the time, this contract is composed of two documents: - :ref:`SLA ` : Service Level Agreement - :ref:`SLO ` : Service Level Objectives Note that the :ref:`SLA ` is more general than the :ref:`SLO ` in the sense that the former specifies what service is to be provided, how it is supported, times, locations, costs, performance, and responsibilities of the parties involved while the :ref:`SLO ` focuses on more measurable characteristics such as availability, throughput, frequency, response time or quality. You can also read `the Wikipedia page for SLA `_ which provides a good definition. .. _sla_violation_definition: SLA violation ============= A :ref:`SLA violation ` happens when a :ref:`SLA ` defined with a given :ref:`Customer ` could not be respected by the cloud provider within the timeframe defined by the official contract document. .. _slo_definition: SLO === A Service Level Objective (SLO) is a key element of a :ref:`SLA ` between a service provider and a :ref:`Customer `. SLOs are agreed as a means of measuring the performance of the Service Provider and are outlined as a way of avoiding disputes between the two parties based on misunderstanding. You can also read `the Wikipedia page for SLO `_ which provides a good definition. .. _solution_definition: Solution ======== A :ref:`Solution ` is a set of :ref:`Actions ` generated by a :ref:`Strategy ` (i.e., an algorithm) in order to achieve the :ref:`Goal ` of an :ref:`Audit `. A :ref:`Solution ` is different from an :ref:`Action Plan ` because it contains the non-scheduled list of :ref:`Actions ` which is produced by a :ref:`Strategy `. In other words, the list of Actions in a :ref:`Solution ` has not yet been re-ordered by the :ref:`Watcher Planner `. Note that some algorithms (i.e. :ref:`Strategies `) may generate several :ref:`Solutions `. This gives rise to the problem of determining which :ref:`Solution ` should be applied. Two approaches to dealing with this can be envisaged: - **fully automated mode**: only the :ref:`Solution ` with the highest ranking (i.e., the highest :ref:`Optimization Efficiency `) will be sent to the :ref:`Watcher Planner ` and translated into concrete :ref:`Actions `. - **manual mode**: several :ref:`Solutions ` are proposed to the :ref:`Administrator ` with a detailed measurement of the estimated :ref:`Optimization Efficiency ` and he/she decides which one will be launched. .. _strategy_definition: Strategy ======== A :ref:`Strategy ` is an algorithm implementation which is able to find a :ref:`Solution ` for a given :ref:`Goal `. There may be several potential strategies which are able to achieve the same :ref:`Goal `. This is why it is possible to configure which specific :ref:`Strategy ` should be used for each :ref:`Goal `. Some strategies may provide better optimization results but may take more time to find an optimal :ref:`Solution `. When a new :ref:`Goal ` is added to the Watcher configuration, at least one default associated :ref:`Strategy ` should be provided as well. .. _watcher_applier_definition: Watcher Applier =============== This component is in charge of executing the :ref:`Action Plan ` built by the :ref:`Watcher Decision Engine `. See :doc:`architecture` for more details on this component. .. _watcher_database_definition: Watcher Database ================ This database stores all the Watcher domain objects which can be requested by the Watcher API or the Watcher CLI: - Audit templates - Audits - Action plans - Actions - Goals The Watcher domain being here "*optimization of some resources provided by an OpenStack system*". See :doc:`architecture` for more details on this component. .. _watcher_decision_engine_definition: Watcher Decision Engine ======================= This component is responsible for computing a set of potential optimization :ref:`Actions ` in order to fulfill the :ref:`Goal ` of an :ref:`Audit `. It first reads the parameters of the :ref:`Audit ` from the associated :ref:`Audit Template ` and knows the :ref:`Goal ` to achieve. It then selects the most appropriate :ref:`Strategy ` depending on how Watcher was configured for this :ref:`Goal `. The :ref:`Strategy ` is then executed and generates a set of :ref:`Actions ` which are scheduled in time by the :ref:`Watcher Planner ` (i.e., it generates an :ref:`Action Plan `). See :doc:`architecture` for more details on this component. .. _watcher_planner_definition: Watcher Planner =============== The :ref:`Watcher Planner ` is part of the :ref:`Watcher Decision Engine `. This module takes the set of :ref:`Actions ` generated by a :ref:`Strategy ` and builds the design of a workflow which defines how-to schedule in time those different :ref:`Actions ` and for each :ref:`Action ` what are the prerequisite conditions. It is important to schedule :ref:`Actions ` in time in order to prevent overload of the :ref:`Cluster ` while applying the :ref:`Action Plan `. For example, it is important not to migrate too many instances at the same time in order to avoid a network congestion which may decrease the :ref:`SLA ` for :ref:`Customers `. It is also important to schedule :ref:`Actions ` in order to avoid security issues such as denial of service on core OpenStack services. See :doc:`architecture` for more details on this component.