Monitoring Openstack – The Relationship Between Nagios and Ceilometer Konstantin Benz, Researcher @ Zurich University of Applied Sciences benn@zhaw.ch Introduction & Agenda • About me • Working as researcher @ Zurich University of Applied Sciences • • • OpenStack / Cloud Computing Engaged in monitoring and High Availability systems Currently working on a Europe-wide cloud federation: • XIFI – eXtensible Infrastructure for Future Internet http://www.fi-xifi.eu • 17 nodes / OpenStack clouds • Test environment for Future Internet (FI-WARE) applications • Infrastructure for smart cities, public healthcare, traffic management… • European-wide L2-connected backbone network • Nagios as main monitoring tool of that project Introduction & Agenda • What are you talking about in this presentation? • • • How to use Nagios to monitor an OpenStack cloud environment Integrate Nagios with OpenStack Anything else? • • • Cloud monitoring requirements OpenStack cloud management software and Ceilometer Comparison between Nagios and Ceilometer: • Technological paradigms • Commonalities and differences • • How to integrate Nagios with Ceilometer Can't wait! Cloud Monitoring Requirements Cloud ≈ virtualization + elasticity • Types of clouds: • • • IaaS: virtual VMs and network devices, elasticity in number/size of devices PaaS: virtual, elastically sized platform SaaS: software provided by employing virtual, elastic resources • Cloud is a collection of virtual resources provided in physical infrastructure • Cloud provides resources elastically Cloud Monitoring Requirements Why should someone use clouds? • Cloud consumer can outsource IT infrastructure • • • • No fixed costs for cloud consumer Pay for resource utilization Cloud provider responsible for building and maintaining physical infrastructure Cloud provider can rent out unused IT infrastructure • • Eliminate waste Get money back for overcapacity Monitoring OpenStack OpenStack Architecture • • Open source cloud computing software Consists in multiple services: • • • • • • • • • Keystone: OpenStack identity services (authentication, authorization, accounting) Cinder: management of block storage volumes Nova: management and provision of virtual resources (VM instances) Glance: management of VM images Swift: management of object storage Neutron: management of network resources (IPs, routing, connectivity) Horizon: GUI dashboard for end users Heat: orchestration of virtualized environments (important for providing elasticity) Ceilometer: monitoring of virtual resources Monitoring OpenStack Things to monitor • • Operation of OpenStack itself: • Services: Cinder, Glance, Nova, Swift ... • Infrastructure: Hardware, Operating System where OpenStack services are running Operation of virtual resources provided by OpenStack: • Resource availability: VMs, virtual network devices • Resource utilization: VM uptime, CPU / memory usage → Virtual resources are commonly monitored by Ceilometer → Ceilometer gathers data through the API of OpenStack services Monitoring OpenStack Why is Ceilometer not enough? → Ceilometer monitors virtual resources through APIs of OpenStack components, BUT NOT operation of the OpenStack components Comparison Nagios / Ceilometer Nagios operational model • Configuration: • Check interval (and retry interval) to poll system status and update frontend GUI • Remote execution of monitoring clients (usually Nagios plugins) • Thresholds that result in "Okay", "Warning", "Critical" status messages which are sent back to Nagios server (and "Unknown" if status not measurable) Main usage: • Effective monitoring solution for physical servers • System administration console that allows for fast reaction in case of problems • Strength: extensibility and customizability • Nagios must be extended in order to monitor virtual resources inside administrated systems Comparison Nagios / Ceilometer Ceilometer operational model • Configuration: • Polling services check metrics • OpenStack objects generate event notifications automatically • All events and metrics collected in a database Main usage: • OpenStack integrated metrics collector and database • Temporal database that can be used for rating, charging and billing of virtual resource utilization • Strength: fully integrated in OpenStack, collecting most important metrics and storing their change history • Weakness: Does not monitor physical hosts Nagios / OpenStack Integration • Alternative 1: Ceilometer Plugin in Nagios Use Nagios server as frontend for Ceilometer: • Nagios plugin that queries Ceilometer database • Virtual resource utilization data collected by Ceilometer • Nagios server responsible for monitoring non-virtual resources Benefits: • • • Drawbacks: • Simple and easy to implement No extra Nagios plugins required to monitor virtual devices that are managed within OpenStack Ceilometer tool can be left unchanged Monitoring data is stored at 2 different places: Nagios flat file and Ceilometer database Nagios / OpenStack Integration • Alternative 1: Ceilometer Plugin in Nagios Implementation: • Nagios plugin on client which hosts the Ceilometer API (code sample below) • Initialization with default values, OpenStack authentication: #!/bin/bash #initialization with default values SERVICE='cpu_util' THRESHOLD='50.0' CRITICAL_THRESHOLD='80.0' #get openstack token to access ceilometer-api export OS_USERNAME="youruser" export OS_TENANT_NAME="yourtenant" export OS_PASSWORD="yourpassword" export OS_AUTH_URL=http://yourkeystoneurl:35357/v2.0/ Nagios / OpenStack Integration • Alternative 1: Ceilometer Plugin in Nagios The plugin should receive paramaters for: • Resource to be monitored (VM) • Service (Ceilometer metric) • Warning threshold • Critical threshold while getopts ":hs:t:T:" opt do case $opt in h ) printusage;; r ) RESOURCE=${OPTARG};; s ) SERVICE=${OPTARG};; t ) THRESHOLD=${OPTARG};; T ) CRITICAL_THRESHOLD=${OPTARG};; ? ) printusage;; esac done Nagios / OpenStack Integration • Alternative 1: Ceilometer Plugin in Nagios Query Nova API to get resource to monitor (VM to be monitored): RESOURCE=$(nova list | grep $RESOURCE | tail -2 | head -1 | awk -F '|' '{print $2; end}') RESOURCE=$(echo $RESOURCE) • Query metric on that resource, multiple entries possible requires an iterator): ITERATOR=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk 'END{print NR; end}') • Initialize with return code 0 (no warning or error): RETURNCODE=0 Nagios / OpenStack Integration • Alternative 1: Ceilometer Plugin in Nagios Iterate through metric: for (( C=1; C<=$ITERATOR; C++ )) do METER_NAME=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk -F '|' -v var="$C" '{if (NR == var) {print $2 $1; end}}') METER_UNIT=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk -F '|' -v var="$C" '{if (NR == var) {print $4 $1; end}}') RESOURCE_ID=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk -F '|' -v var="$C" '{if (NR == var) {print $5 $1; end}}') ACTUAL_VALUE=$(ceilometer sample-list -m $METER_NAME -q "resource_id=$RESOURCE" -l 1 | grep $RESOURCE_ID | head -4 | tail -1| awk -F '|' '{print $5; end}') Nagios / OpenStack Integration • Alternative 1: Ceilometer Plugin in Nagios Update return code if value of one metric is above a threshold: if [ $(echo "$ACTUAL_VALUE > $THRESHOLD" | bc) -eq 1 ] then if (( "$RETURNCODE" < "1" )) then RETURNCODE=1 fi if [ $(echo "$ACTUAL_VALUE > $CRITICAL_THRESHOLD" | bc) -eq 1 ] then if (( "$RETURNCODE" < "2" )) then RETURNCODE=2 Nagios / OpenStack Integration • Alternative 1: Ceilometer Plugin in Nagios Output return code: STATUS=$(echo "$METER_NAME on $RESOURCE_ID is: $ACTUAL_VALUE $METER_UNIT") echo $STATUS done echo $RETURNCODE Nagios / OpenStack Integration Alternative 1: Ceilometer Plugin in Nagios • Plugin can be downloaded from Github: • https://github.com/kobe6661/nagios_ceilometer_plugin.git • Additionally: • NRPE-Plugin: remote execution of Nagios calls to Ceilometer • Install NRPE on Nagios Core server and server that hosts Ceilometer API • Change nrpe.cfg to include call to VM metric Nagios / OpenStack Integration Alternative 1: Implementation • OpenStack installed on 3 nodes: • • • Management node: responsible for monitoring other OpenStack nodes Controller node: responsible for management and configuration of cloud resources (VMs, network) Compute node: provisions virtual resources Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins Nagios as a tool to monitor OpenStack services and VMs: • Plugins to monitor health of OpenStack services • As soon as new VMs are created, Nagios should monitor them • Requires elastic reconfiguration of Nagios Benefits: • No data duplication, Nagios is the only monitoring tool required to monitor OpenStack Drawbacks: • • Elastic reconfiguration Rather complex Nagios configuration Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins Problem: • Dynamic provisioning of resources (Virtual Machines) • Dynamic configuration of hosts in Nagios Server required MONITORS Nagios Server OpenStack Controller Node VM Image PROVIDES Virtual Machine OpenStack Compute Node Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins Problem: • What happens if VM is terminated by end user? • Nagios assumes a host failure and produces a critical warning MONITORS Nagios Server OpenStack Controller Node VM Image PROVIDES Virtual Machine OpenStack Compute Node Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins Solution: • Nova-API triggers reconfiguration of Nagios if VMs are created or terminated RECONFIGURES Nagios Server OpenStack Controller Node VM Image PROVIDES Virtual Machine OpenStack Compute Node Nagios / OpenStack Integration • • Alternative 2: Nagios OpenStack Plugins Another problem: • VMs must have Nagios plugins installed when they are created Solution: • Use only VM Images that contain Nagios plugins for VM creation OR • Use package management tools like Puppet, Chef… Nagios Server OpenStack Controller Node NRPE Plugins VM Image NRPE Plugins PROVIDES Virtual Machine OpenStack Compute Node Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins Trigger for dynamic Nagios configuration: • Find available resources via nova-api (requires name of host and IP address) #!/bin/bash NUMLINES=$(nova list | wc -l) NUMLINES=$[$NUMLINES-3] for (( C=1; C<=$ITERATOR; C++ )) do VM_NAME=$(nova list | tail -$NUMLINES | awk -F'|' -v var="$I" '{if (NR==var){print $3 $1;end}}') IP_ADDRESS=$(nova list | tail -$NUMLINES | awk -F'|' -v var="$I" '{if (NR==var){print $7 $1;end}}' | sed 's/[a-zA-Z0-9]*[=|-]//g') Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins Trigger for dynamic Nagios configuration: • Create a config file including VM name and IP address from a template (e. g. vm_template.cfg) CONFIG_FILE=$(echo $VM_NAME).cfg sed "s/<vm_name>/$VM_NAME/g" vm_template.cfg>named_template.cfg sed "s/<ip_address>/$IP_ADDRESS/g" named_template.cfg>$CONFIG_FILE • Set Nagios as owner of the file and move file to Nagios configuration directory chown nagios.nagios $CONFIG_FILE chmod 644 $CONFIG_FILE mv $CONFIG_FILE /usr/local/nagios/etc/objects/$CONFIG_FILE Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins Trigger for dynamic Nagios configuration: • Add config file to nagios.cfg echo "cfg_file=/usr/local/nagios/etc/objects/$CONFIG_FILE" >> /usr/local/nagios/etc/nagios.cfg • Restart nagios service nagios restart Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins Why restart Nagios? • Nagios must know that a new VM is present or that an old VM has been terminated • Reconfigure and restart Nagios (!) Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins Trigger for dynamic Nagios configuration: • Add trigger to Nova-API: • • Nagios Event Broker module: • Check_MK: http://mathias-kettner.de/checkmk_livestatus.html Reconfigure Nagios dynamically: • • Edit nagios.cfg and restart Nagios – bad idea (!!) in a cloud environment Autoconfiguration tools: • NagioSQL: http://www.nagiosql.org/documentation.html Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins What other ways do exist to dynamically reconfigure Nagios? • Puppet master that triggers: • • VMs to install Nagios NRPE plugins and Nagios Server to update its configuration • Same can be done with Chef, Ansible… • Drawback: Puppet scalability if 1‘000s of servers have to be (de-)commisioned dynamically Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins What other ways do exist to dynamically reconfigure Nagios? • Python fabric with Cuisine to trigger: • • VMs to install Nagios NRPE plugins and Nagios Server to update its configuration • Get list of VMs from novaclient.client import Client nova = Client(VERSION, USERNAME, PASSWORD, PROJECT_ID, AUTH_URL) servers = nova.servers.list() • Write VM list to file file = open('servers'‚ 'w') file.write(servers) Nagios / OpenStack Integration • Alternative 2: Nagios OpenStack Plugins What other ways do exist to dynamically reconfigure Nagios? • Python fabric with Cuisine to trigger: • • • VMs to install Nagios NRPE plugins and Nagios Server to update its configuration Create fabfile.py and define which servers should be configured from fabric.api import * from . import vm_recipe, nagios_recipe env.use_ssh_config = True servers=open('servers‘) serverlist=[str(line) for line in servers] env.roledefs = {‘vm': serverlist, ‘nagios_server': xx.xx.xx.xx } Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins • Assign recipes @roles(„vm") def configure_vm(): vm_recipe.ensure() @roles(„nagios") def configure_nagios(): nagios_recipe.ensure() Nagios / OpenStack Integration Alternative 2: Nagios OpenStack Plugins • Create vm_recipe.py and nagios_recipe.py from fabric.api import * import cuisine def ensure(): if not is_installed(): puts("Installing NRPE...") install() else: puts(„NRPE already installed") def install_prerequisites(): cuisine.package_ensure(„nrpe") Choice of Alternatives Which option should we choose? • Implementation advantages and drawbacks Implementation Advantages Drawbacks A1: Ceilometer collects data • • Very easy solution Scales well • • Data duplication Two monitoring systems working in parallel A2: Shell script • • No data duplication Easy solution • • • Difficult to maintain Possibly insecure Nagios is forced to restart A2: Puppet • Automatic VM and Nagios configuration Allows for elastic reconfiguration of Nagios • • Heavyweight Bad scalability for large IaaS clusters Lightweight Automatic VM and Nagios configuration Allows for elastic reconfiguration of Nagios • Bigger configuration effort for package management with strong dependencies between packages • A2: Python fabric & cuisine • • • Conclusion What did you talk about? • How to use Nagios to monitor an OpenStack cloud environment • • OpenStack monitoring tools Nagios and Ceilometer • • • Nagios as extensible monitoring system Ceilometer captures data through Nova-API Nagios/OpenStack integration • • • • Cloud monitoring requirements: • Elasticity, dynamic provisioning of virtual machines Alternative 1: • Ceilometer monitors VMs with Nagios as graphical frontend Alternative 2: • Nagios monitors VMs and is automatically reconfigured Discovered need for dynamic reloading of Nagios configuration Discussed advantages/drawbacks of different implementations Questions? Any questions? Thanks! The End Konstantin Benz benn@zhaw.ch