diff --git a/doc/source/dev/glossary.rst b/doc/source/dev/glossary.rst new file mode 100644 index 000000000..c57765449 --- /dev/null +++ b/doc/source/dev/glossary.rst @@ -0,0 +1,703 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + +========== + Glossary +========== + +.. glossary:: + :sorted: + +This page explains the different terms used in the Watcher system. + +They are sorted in alphabetical order. + +.. _action_definition: + +Action +====== + +An :ref:`Action ` is what enables Watcher to transform the +current state of a :ref:`Cluster ` after an +:ref:`Audit `. + +An :ref:`Action ` is an atomic task which changes the +current state of a target :ref:`Managed resource ` +of the OpenStack :ref:`Cluster ` such as: + +- Live migration of an instance from one compute node to another compute + node with Nova +- Changing the power level of a compute node (ACPI level, ...) +- Changing the current state of an hypervisor (enable or disable) with Nova + +In most cases, an :ref:`Action ` triggers some concrete +commands on an existing OpenStack module (Nova, Neutron, Cinder, Ironic, etc.). + +An :ref:`Action ` has a life-cycle and its current state may +be one of the following: + +- **PENDING** : the :ref:`Action ` has not been executed + yet by the :ref:`Watcher Applier ` +- **ONGOING** : the :ref:`Action ` is currently being + processed by the :ref:`Watcher Applier ` +- **SUCCEEDED** : the :ref:`Action ` has been executed + successfully +- **FAILED** : an error occured while trying to execute the + :ref:`Action ` +- **DELETED** : the :ref:`Action ` is still stored in the + :ref:`Watcher database ` but is not returned + any more through the Watcher APIs. +- **CANCELLED** : the :ref:`Action ` was in **PENDING** or + **ONGOING** state and was cancelled by the + :ref:`Administrator ` + +.. _action_plan_definition: + +Action Plan +=========== + +An :ref:`Action Plan ` is a flow of +:ref:`Actions ` that should be executed in order to satisfy +a given :ref:`Goal `. + +An :ref:`Action Plan ` is generated by Watcher when an +:ref:`Audit ` is successful which implies that the :ref:`Strategy ` +which was used has found a :ref:`Solution ` to achieve the +:ref:`Goal ` of this :ref:`Audit `. + +In the default implementation of Watcher, an :ref:`Action Plan ` +is only composed of successive :ref:`Actions ` +(i.e., a Workflow of :ref:`Actions ` belonging to a unique +branch). + +However, Watcher provides abstract interfaces for many of its components, +allowing other implementations to generate and handle more complex :ref:`Action Plan(s) ` +composed of two types of Action Item(s): + +- simple :ref:`Actions `: atomic tasks, which means it + can not be split into smaller tasks or commands from an OpenStack point of + view. +- composite Actions: which are composed of several simple :ref:`Actions ` + ordered in sequential and/or parallel flows. + +An :ref:`Action Plan ` may be described using +standard workflow model description formats such as +`Business Process Model and Notation 2.0 (BPMN 2.0) `_ +or `Unified Modeling Language (UML) `_. + +An :ref:`Action Plan ` has a life-cycle and its current +state may be one of the following: + +- **RECOMMENDED** : the :ref:`Action Plan ` is waiting + for a validation from the :ref:`Administrator ` +- **ONGOING** : the :ref:`Action Plan ` is currently + being processed by the :ref:`Watcher Applier ` +- **SUCCEEDED** : the :ref:`Action Plan ` has been + executed successfully (i.e. all :ref:`Actions ` that it + contains have been executed successfully) +- **FAILED** : an error occured while executing the + :ref:`Action Plan ` +- **DELETED** : the :ref:`Action Plan ` is still + stored in the :ref:`Watcher database ` but is + not returned any more through the Watcher APIs. +- **CANCELLED** : the :ref:`Action Plan ` was in + **PENDING** or **ONGOING** state and was cancelled by the + :ref:`Administrator ` + +.. _administrator_definition: + +Administrator +============= + +The :ref:`Administrator ` is any user who has admin +access on the OpenStack cluster. This user is allowed to create new projects +for tenants, create new users and assign roles to each user. + +The :ref:`Administrator ` usually has remote access +to any host of the cluster in order to change the configuration and restart any +OpenStack service, including Watcher. + +In the context of Watcher, the :ref:`Administrator ` +is a role for users which allows them to run any Watcher commands, such as: + +- Create/Delete an :ref:`Audit Template ` +- Launch an :ref:`Audit ` +- Get the :ref:`Action Plan ` +- Launch a recommended :ref:`Action Plan ` manually +- Archive previous :ref:`Audits ` and :ref:`Action Plans ` + + +The :ref:`Administrator ` is also allowed to modify +any Watcher configuration files and to restart Watcher services. + +.. _audit_definition: + +Audit +===== + +In the Watcher system, an :ref:`Audit ` is a request for +optimizing a :ref:`Cluster `. + +The optimization is done in order to satisfy one :ref:`Goal ` +on a given :ref:`Cluster `. + +For each :ref:`Audit `, the Watcher system generates an +:ref:`Action Plan `. + +An :ref:`Audit ` has a life-cycle and its current state may +be one of the following: + +- **PENDING** : a request for an :ref:`Audit ` has been + submitted (either manually by the + :ref:`Administrator ` or automatically via some + event handling mechanism) and is in the queue for being processed by the + :ref:`Watcher Decision Engine ` +- **ONGOING** : the :ref:`Audit ` is currently being + processed by the :ref:`Watcher Decision Engine ` +- **SUCCEEDED** : the :ref:`Audit ` has been executed + successfully (note that it may not necessarily produce a + :ref:`Solution `). +- **FAILED** : an error occured while executing the + :ref:`Audit ` +- **DELETED** : the :ref:`Audit ` is still stored in the + :ref:`Watcher database ` but is not returned + any more through the Watcher APIs. +- **CANCELLED** : the :ref:`Audit ` was in **PENDING** or + **ONGOING** state and was cancelled by the + :ref:`Administrator ` + +.. _audit_template_definition: + +Audit Template +============== + +An :ref:`Audit ` may be launched several times with the same +settings (:ref:`Goal `, thresholds, ...). Therefore it makes +sense to save those settings in some sort of Audit preset object, which is +known as an :ref:`Audit Template `. + +An :ref:`Audit Template ` contains at least the +:ref:`Goal ` of the :ref:`Audit `. + +It may also contain some error handling settings indicating whether: + +- :ref:`Watcher Applier ` stops the entire operation +- :ref:`Watcher Applier ` performs a rollback + +and how many retries should be attempted before failure occurs (also the latter +can be complex: for example the scenario in which there are many first-time +failures on ultimately successful :ref:`Actions `). + +Moreover, an :ref:`Audit Template ` may contain some +settings related to the level of automation for the +:ref:`Action Plan ` that will be generated by the +:ref:`Audit `. +A flag will indicate whether the :ref:`Action Plan ` +will be launched automatically or will need a manual confirmation from the +:ref:`Administrator `. + +Last but not least, an :ref:`Audit Template ` may +contain a list of extra parameters related to the +:ref:`Strategy ` configuration. These parameters can be +provided as a list of key-value pairs. + +.. _availability_zone_definition: + +Availability Zone +================= + +Please, read `the official OpenStack definition of an Availability Zone `_. + +.. _cluster_definition: + +Cluster +======= + +A :ref:`Cluster ` is a set of physical machines which +provide compute, storage and networking resources and are managed by the same +OpenStack Controller node. +A :ref:`Cluster ` represents a set of resources that a +cloud provider is able to offer to his/her +:ref:`customers `. + +A data center may contain several clusters. + +The :ref:`Cluster ` may be divided in one or several +:ref:`Availability Zone(s) `. + +.. _cluster_data_model_definition: + +Cluster Data Model +================== + +A :ref:`Cluster Data Model ` is a logical +representation of the current state and topology of the :ref:`Cluster ` +:ref:`Managed resources `. + +It is represented as a set of :ref:`Managed resources ` +(which may be a simple tree or a flat list of key-value pairs) +which enables Watcher :ref:`Strategies ` to know the +current relationships between the different +:ref:`resources `) of the +:ref:`Cluster ` during an :ref:`Audit ` +and enables the :ref:`Strategy ` to request information +such as: + +- What compute nodes are in a given :ref:`Availability Zone ` + or a given :ref:`Host Aggregate ` ? +- What :ref:`Instances ` are hosted on a given compute + node ? +- What is the current load of a compute node ? +- What is the current free memory of a compute node ? +- What is the network link between two compute nodes ? +- What is the available bandwidth on a given network link ? +- What is the current space available on a given virtual disk of a given + :ref:`Instance ` ? +- What is the current state of a given :ref:`Instance ` ? +- ... + +In a word, this data model enables the :ref:`Strategy ` +to know: + +- the current topology of the :ref:`Cluster ` +- the current capacity for each :ref:`Managed resource ` +- the current amount of used/free space for each :ref:`Managed resource ` +- the current state of each :ref:`Managed resources ` + +In the Watcher project, we aim at providing a generic and very basic +:ref:`Cluster Data Model ` for each +:ref:`Goal `, usable in the associated +:ref:`Strategies ` through some helper classes in order +to: + +- simplify the development of a new + :ref:`Strategy ` for a given + :ref:`Goal ` when there already are some existing + :ref:`Strategies ` associated to the same + :ref:`Goal ` +- avoid duplicating the same code in several + :ref:`Strategies ` associated to the same + :ref:`Goal ` +- have a better consistency between the different + :ref:`Strategies ` for a given + :ref:`Goal ` +- avoid any strong coupling with any external + :ref:`Cluster Data Model ` + (the proposed data model acts as a pivot data model) + +There may be various :ref:`generic and basic Cluster Data Models ` +proposed in Watcher helpers, each of them being adapted to achieving a given +:ref:`Goal `: + +- For example, for a + :ref:`Goal ` which aims at optimizing the network + :ref:`resources ` the + :ref:`Strategy ` may need to know which + :ref:`resources ` are communicating together. +- Whereas for a :ref:`Goal ` which aims at optimizing thermal + and power conditions, the :ref:`Strategy ` may need to + know the location of each compute node in the racks and the location of each + rack in the room. + +Note however that a developer can use his/her own +:ref:`Cluster Data Model ` if the proposed data +model does not fit his/her needs as long as the :ref:`Strategy ` +is able to produce a :ref:`Solution ` for the requested :ref:`Goal `. +For example, a developer could rely on the Nova Data Model to optimize some +compute resources. + +The :ref:`Cluster Data Model ` may be persisted +in any appropriate storage system (SQL database, NoSQL database, JSON file, +XML File, In Memory Database, ...). + +.. _cluster_history_definition: + +Cluster History +=============== + +The :ref:`Cluster History ` contains all the +previously collected timestamped data such as metrics and events associated +to any :ref:`managed resource ` of the +:ref:`Cluster `. + +Just like the :ref:`Cluster Data Model `, this +history may be used by any :ref:`Strategy ` in order to +find the most optimal :ref:`Solution ` during an +:ref:`Audit `. + +In the Watcher project, a generic :ref:`Cluster History ` +API is proposed with some helper classes in order to : + +- share a common measurement (events or metrics) naming based on what is + defined in Ceilometer. See `the full list of available measurements `_ +- share common meter types (Cumulative, Delta, Gauge) based on what is + defined in Ceilometer. See `the full list of meter types `_ +- simplify the development of a new :ref:`Strategy ` +- avoid duplicating the same code in several :ref:`Strategies ` +- have a better consistency between the different :ref:`Strategies ` +- avoid any strong coupling with any external metrics/events storage system + (the proposed API and measurement naming system acts as a pivot format) + +Note however that a developer can use his/her own history management system if +the Ceilometer system does not fit his/her needs as long as the +:ref:`Strategy ` is able to produce a +:ref:`Solution ` for the requested +:ref:`Goal `. + +The :ref:`Cluster History ` data may be persisted +in any appropriate storage system (InfluxDB, OpenTSDB, MongoDB,...). + +.. _controller_node_definition: + +Controller Node +=============== + +A controller node is a machine that typically runs the following core OpenStack +services: + +- Keystone: for identity and service management +- Cinder scheduler: for volumes management +- Glance controller: for image management +- Neutron controller: for network management +- Nova controller: for global compute resources management with services such as + nova-scheduler, nova-conductor and nova-network + +In many configurations, Watcher will reside on a controller node even if it +can potentially be hosted on a dedicated machine. + +.. _compute_node_definition: + +Compute node +============ + +Please, read `the official OpenStack definition of a Compute Node `_. + +.. _customer_definition: + +Customer +======== + +A :ref:`Customer ` is the person or company which +subscribes to the cloud provider offering. A customer may have several :ref:`Project(s) ` +hosted on the same :ref:`Cluster ` or dispatched on +different clusters. + +In the private cloud context, the :ref:`Customers ` are +different groups within the same organization (different departments, project +teams, branch offices and so on). Cloud infrastructure includes the ability to +precisely track each customer's service usage so that it can be charged back to +them, or at least reported to them. + +.. _goal_definition: + +Goal +==== + +A :ref:`Goal ` is a human readable, observable and measurable +end result having one objective to be achieved. + +Here are some examples of :ref:`Goals `: + +- minimize the energy consumption +- minimize the number of compute nodes (consolidation) +- balance the workload among compute nodes +- minimize the license cost (some softwares have a licensing model which is + based on the number of sockets or cores where the software is deployed) +- find the most appropriate moment for a planned maintenance on a + given group of host (which may be an entire availability zone): + power supply replacement, cooling system replacement, hardware + modification, ... + + +.. _host_aggregates_definition: + +Host Aggregate +============== + +Please, read `the official OpenStack definition of a Host Aggregate `_. + +.. _instance_definition: + +Instance +======== + +A running virtual machine, or a virtual machine in a known state such as +suspended, that can be used like a hardware server. + +.. _managed_resource_definition: + +Managed resource +================ + +A :ref:`Managed resource ` is one instance of +:ref:`Managed resource type ` in a topology +with particular properties and dependencies on other +:ref:`Managed resources ` (relationships). + +For example, a :ref:`Managed resource ` can be one +virtual machine (i.e., an :ref:`instance `) hosted on a +:ref:`compute node ` and connected to another virtual +machine through a network link (represented also as a +:ref:`Managed resource ` in the +:ref:`Cluster Data Model `). + +.. _managed_resource_type_definition: + +Managed resource type +===================== + +A :ref:`Managed resource type ` is a type of +hardware or software element of the :ref:`Cluster ` that +the Watcher system can act on. + +Here are some examples of +:ref:`Managed resource types `: + +- `Nova Host Aggregates `_ +- `Nova Servers `_ +- `Cinder Volumes `_ +- `Neutron Routers `_ +- `Neutron Networks `_ +- `Neutron load-balancers `_ +- `Sahara Hadoop Cluster `_ +- ... + +It can be any of the `the official list of available resource types defined in OpenStack for HEAT `_. + +.. _efficiency_definition: + +Optimization Efficiency +======================= + +The :ref:`Optimization Efficiency ` is the objective +measure of how much of the :ref:`Goal ` has been achieved in +respect with constraints and :ref:`SLAs ` defined by the +:ref:`Customer `. + +The way efficiency is evaluated will depend on the :ref:`Goal ` +to achieve. + +Of course, the efficiency will be relevant only as long as the :ref:`Action Plan ` +is relevant (i.e., the current state of the :ref:`Cluster ` +has not changed in a way that a new :ref:`Audit ` would need +to be launched). + +For example, if the :ref:`Goal ` is to lower the energy +consumption, the :ref:`Efficiency ` will be computed +using several indicators (KPIs): + +- the percentage of energy gain (which must be the highest possible) +- the number of :ref:`SLA violations ` + (which must be the lowest possible) +- the number of virtual machine migrations (which must be the lowest possible) + +All those indicators (KPIs) are computed within a given timeframe, which is the +time taken to execute the whole :ref:`Action Plan `. + +The efficiency also enables the :ref:`Administrator ` +to objectively compare different :ref:`Strategies ` for +the same goal and same workload of the :ref:`Cluster `. + +.. _project_definition: + +Project +======= + +:ref:`Projects ` represent the base unit of “ownership” +in OpenStack, in that all :ref:`resources ` in +OpenStack should be owned by a specific :ref:`project `. +In OpenStack Identity, a :ref:`project ` must be owned by a +specific domain. + +Please, read `the official OpenStack definition of a Project `_. + +.. _sla_definition: + +SLA +=== + +:ref:`SLA ` means Service Level Agreement. + +The resources are negotiated between the :ref:`Customer ` +and the Cloud Provider in a contract. + +Most of the time, this contract is composed of two documents: + +- :ref:`SLA ` : Service Level Agreement +- :ref:`SLO ` : Service Level Objectives + +Note that the :ref:`SLA ` is more general than the +:ref:`SLO ` in the sense that the former specifies what service +is to be provided, how it is supported, times, locations, costs, performance, +and responsibilities of the parties involved while the +:ref:`SLO ` focuses on more measurable characteristics such as +availability, throughput, frequency, response time or quality. + +You can also read `the Wikipedia page for SLA `_ +which provides a good definition. + +.. _sla_violation_definition: + +SLA violation +============= + +A :ref:`SLA violation ` happens when a :ref:`SLA ` +defined with a given :ref:`Customer ` could not be +respected by the cloud provider within the timeframe defined by the official +contract document. + +.. _slo_definition: + +SLO +=== + +A Service Level Objective (SLO) is a key element of a :ref:`SLA ` +between a service provider and a :ref:`Customer `. SLOs +are agreed as a means of measuring the performance of the Service Provider and +are outlined as a way of avoiding disputes between the two parties based on +misunderstanding. + +You can also read `the Wikipedia page for SLO `_ +which provides a good definition. + +.. _solution_definition: + +Solution +======== + +A :ref:`Solution ` is a set of :ref:`Actions ` +generated by a :ref:`Strategy ` (i.e., an algorithm) in +order to achieve the :ref:`Goal ` of an :ref:`Audit `. + +A :ref:`Solution ` is different from an +:ref:`Action Plan ` because it contains the +non-scheduled list of :ref:`Actions ` which is produced by a +:ref:`Strategy `. In other words, the list of Actions in +a :ref:`Solution ` has not yet been re-ordered by the +:ref:`Watcher Planner `. + +Note that some algorithms (i.e. :ref:`Strategies `) may +generate several :ref:`Solutions `. This gives rise to the +problem of determining which :ref:`Solution ` should be +applied. + +Two approaches to dealing with this can be envisaged: + +- **fully automated mode**: only the :ref:`Solution ` with + the highest ranking (i.e., the highest + :ref:`Optimization Efficiency `) + will be sent to the :ref:`Watcher Planner ` and + translated into concrete :ref:`Actions `. +- **manual mode**: several :ref:`Solutions ` are proposed + to the :ref:`Administrator ` with a detailed + measurement of the estimated + :ref:`Optimization Efficiency ` and he/she decides + which one will be launched. + +.. _strategy_definition: + +Strategy +======== + +A :ref:`Strategy ` is an algorithm implementation which is +able to find a :ref:`Solution ` for a given :ref:`Goal `. + +There may be several potential strategies which are able to achieve the same +:ref:`Goal `. This is why it is possible to configure which +specific :ref:`Strategy ` should be used for each :ref:`Goal `. + +Some strategies may provide better optimization results but may take more time +to find an optimal :ref:`Solution `. + +When a new :ref:`Goal ` is added to the Watcher configuration, +at least one default associated :ref:`Strategy ` should be +provided as well. + +.. _watcher_applier_definition: + +Watcher Applier +=============== + +This component is in charge of executing the :ref:`Action Plan ` +built by the :ref:`Watcher Decision Engine `. + +See :doc:`architecture` for more details on this component. + +.. _watcher_database_definition: + +Watcher Database +================ + +This database stores all the Watcher domain objects which can be requested +by the Watcher API or the Watcher CLI: + +- Audit templates +- Audits +- Action plans +- Actions +- Goals + +The Watcher domain being here "*optimization of some resources provided by an +OpenStack system*". + +See :doc:`architecture` for more details on this component. + +.. _watcher_decision_engine_definition: + +Watcher Decision Engine +======================= + +This component is responsible for computing a set of potential optimization +:ref:`Actions ` in order to fulfill the :ref:`Goal ` +of an :ref:`Audit `. + +It first reads the parameters of the :ref:`Audit ` from the +associated :ref:`Audit Template ` and knows the +:ref:`Goal ` to achieve. + +It then selects the most appropriate :ref:`Strategy ` +depending on how Watcher was configured for this :ref:`Goal `. + +The :ref:`Strategy ` is then executed and generates a set +of :ref:`Actions ` which are scheduled in time by the +:ref:`Watcher Planner ` (i.e., it generates an +:ref:`Action Plan `). + +See :doc:`architecture` for more details on this component. + +.. _watcher_planner_definition: + +Watcher Planner +=============== + +The :ref:`Watcher Planner ` is part of the +:ref:`Watcher Decision Engine `. + +This module takes the set of :ref:`Actions ` generated by a +:ref:`Strategy ` and builds the design of a workflow which +defines how-to schedule in time those different +:ref:`Actions ` and for each +:ref:`Action ` what are the prerequisite conditions. + +It is important to schedule :ref:`Actions ` in time in order +to prevent overload of the :ref:`Cluster ` while applying +the :ref:`Action Plan `. For example, it is important +not to migrate too many instances at the same time in order to avoid a network +congestion which may decrease the :ref:`SLA ` for +:ref:`Customers `. + +It is also important to schedule :ref:`Actions ` in order to +avoid security issues such as denial of service on core OpenStack services. + +See :doc:`architecture` for more details on this component. + diff --git a/doc/source/index.rst b/doc/source/index.rst index beced85dd..2c4baf728 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -21,6 +21,7 @@ Introduction .. toctree:: :maxdepth: 1 + dev/glossary dev/architecture dev/environment dev/contributing @@ -55,6 +56,7 @@ Commands cmds/watcher-db-manage + Indices and tables ==================