NSClient++: Distributed Monitoring Adventures into the unknown Disclaimer! These slides represent the work and opinions of the author and do not constitute official positions of any organization sponsoring the author’s work This material has not been peer reviewed and is presented here as-is with the permission of the author. The author assumes no liability for any content or opinion expressed in this presentation and or use of content herein. My Background Developer (not system manager) Accidentally ended up in our NOC 2003: The birth of NSClient++ 2004: The open source of NSClient++ 2007: The rebirth of NSClient++ 2011: The Present ◦ Not working with Nagios ◦ Hated BB so we migrated to Nagios ◦ NSClient/NRPE_NT was not to my liking ◦ “just for fun” ◦ Got a lot of emails and hits on the webpage ◦ 0.3.9 out last may ◦ 0.4.0 RC out Agenda About ◦ NSClient++ Distributed monitoring ◦ About ◦ Concepts ◦ Protocols Distributed monitoring with NSClient++ ◦ Configuration Changes ◦ Configuration Concepts ◦ Scenarios Q/A NSClient++ Quick Introduction About NSClient++ Internals: Runs on: Current Version: ◦ ◦ ◦ ◦ C++ Around 75.000 lines of code Actively developed (unfortunately only by me) Modularized design (use what you need) ◦ Windows: NT4, w2k, XP, w2k3, Vista, w2k8, X64, X86 … ◦ Unix: Linux/Debian (probably many/most others as well) ◦ 0.3.9 with 0.4.0 in beta Most features require NRPE or NSCA (or NSCP) Documentation online (WIKI) ◦ http://nsclient.org About NSClient++ (cont.) Not supported by a commercial entity ◦ Donations welcome ◦ Sponsoring available (contact me for details) Used by a lot of people (I think) ◦ Impossible to estimate any figures Please, Help out! ◦ ◦ ◦ ◦ Add documentation Report problems Come with ideas, thoughts, etc… Tell me what sucks! Why should you use NSClient++ NSClient++ ◦ Freedom! Custom scripts Decentralized or centralized Active or Passive Can monitor “anything” (including your application) Can perform “tasks” (fix your problems) Other options: ◦ SNMP Generally complex to use and limited on “standard” hardware ◦ pNSClient/NRPE_NT/OpMonAgent/* Old, outdated and usually limited functionality ◦ “Agentless” WMI Limited functionality Enforces centralized and active monitoring But... ◦ I am biased, so might not want to take my word for it... Thank you! Using NSClient++ (0.4.0) New command line syntax! ◦ nscp <context> [<options>] ◦ nscp service --start ◦ nscp help Testing ◦ nscp test Configuration: ◦ nscp settings --help ◦ nscp settings --migrate-to ini ◦ nscp settings --set … Run scripts: ◦ nscp python --exec run --script test.py ◦ nscp nrpe –-query command –-host 192.168.0.1 Roadmap (rough) 0.4.3 0.4.2 0.4.0 • Core switch 0.3.9 • Last 0.3.x • Linux support • Distributed Monitoring (v1) 0.4.1 • Monitoring Kits • Bugfixes • New windows checksubsytem • True passive checks • Distributed Monitoring (v2) •Bugfixes What’s new 0.4.0 Brand new core based upon libraries Unix support New settings subsystem New protocol Distributed monitoring Python scripting Updated installer Many many more things ◦ Things should *work* not just “work” ◦ Even more modular and extensible ◦ Both as a client and server ◦ Registry, improved ini support, http, etc ◦ NSCP (HTTP(s), MQ, Native) ◦ Built for distributed monitoring ◦ Allows (for me) unit testing ◦ Wix 3.5, more customizable Distributed Monitoring Introduction Distributed monitoring What is it? ◦ ◦ ◦ ◦ ◦ ◦ Passive checks? Clustered Nagios? Nagios front ends? Distributed Nagios? Distributed Check “thingies”? … For me: ◦ The ability to distribute check metrics across the network! So…uhmm… Should be simple right? Internet Distributed Monitoring Concepts Requirements Support Security Extensible ◦ ◦ ◦ ◦ All paradigms Any size payload Multiple commands Multiple arguments ◦ Encryption ◦ Strong authentication ◦ No transport restrictions ◦ Allow customization Firewall friendly (HTTP?) … 3 main paradigms Query ◦ Ask the status of something ◦ (Sometimes called Active) Notification ◦ Send a notification to someone ◦ (sometimes called Passive, Message or Channel) Command ◦ Tell someone what to do But also: ◦ ◦ ◦ ◦ Configuration Installation Upgrade Push/pull file (scripts) Query Modeled on NRPE The normal (active) scenario in Nagios/Icinga Are you ok? ◦ Yes / no (or, OK, WARN, CRIT, UNKNOWN) Query NSClient++ check_nrpe Network NRPEServer forward request CheckEventlog CheckEventlog CheckEventlog Submission Modeled on NSCA The passive scenario in Nagios/Icinga I am (not) ok! ◦ (or, OK, WARN, CRIT, UNKNOWN) Submission NSClient++ send_nsca Network NSCAServer CheckEventlog channel channel nsca Network CheckEventlog CheckEventlog NSCAClient Command Event handlers (but more) Normally implemented via NRPE (ish) Restart service X, run script Y Technically similar to Query Command NSClient++ check_nrpe Network NRPEServer forward request CheckEventlog CheckEventlog CheckEventlog Distributed Monitoring Protocols Distributed via NRPE? Does not support: ◦ ◦ ◦ ◦ ◦ ◦ ◦ Passive Encryption (for real) Authentication Big payloads Multiple commands Firewall friendly protocol (ie. HTTP) … But we can still use it ◦ Inside the network ◦ Where big payloads are not required ◦ When we only need active monitoring Distributed via NSCA? Does not support ◦ ◦ ◦ ◦ Active Big payloads Firewall friendly protocol (ie. HTTP) … But we can still use it ◦ Inside the network. ◦ Where big payloads are not required ◦ When we only need passive monitoring Distributed via NRDP? Does not support ◦ Active ◦ Strong authentication ◦ … But we can still use it ◦ When we only need passive monitoring Distributed via SSH? Does not support ◦ ◦ ◦ ◦ ◦ Passive (?) Not very Windows friendly Not very firewall friendly Cumbersome to manage certificates … But we can still use it ◦ On our *nix machines Distributed via NSCP? Does not support: ◦ Firewall friendly protocol (ie. HTTP) Will come in next major release ◦ Experimental But we can still use it ◦ When we want play around ◦ Inside the network (currently) Other options Distributed NSCP ◦ No encryption support ◦ Not firewall friendly protocol (ie. HTTP) Syslog ◦ Not really good for metrics/status ◦ No support for active checks ◦ Not firewall friendly protocol (ie. HTTP) SMTP ◦ Not practical ◦ No support for active checks Will come in next major release ◦ Not real time ◦ Firewall friendly? Summary of protocols Protocol Paradigm Encryption Auth Payload NSClient NRPE NSCA NSCP D-NSCP Syslog SMTP check_mk NRDP Active Active Passive All MQ ? ? Active Passive No No Yes Yes No No Yes ? Yes Yes No Yes No 1024 512 ∞ ∞ 1024 ∞ ∞ Yes Yes No ? No Yes ∞ Multiple- MultipleArguments commands Yes Yes Yes Yes Yes N/A Yes No Yes No No Yes Yes Yes N/A Yes Yes Yes HTTP No No No Yes No No No No Yes Distributed monitoring with NSClient++ Configuration changes NRPE Main configuration change ◦ Allow multiple modules ◦ Allow more configuration ◦ No “NRPE Handler” support Replaced by CheckExternalScripts Compatible ◦ Except for NRPE Handlers Upgradable ◦ nscp settings --migrate-to ini Old configuration: NRPE [modules] NRPEListener.dll CheckExternalScripts.dll [NRPE] port=5666 allow_arguments=0 allow_nasty_meta_chars=0 allowed_hosts=192.168.0.1 [External Script] allow_arguments=0 allow_nasty_meta_chars=0 [External Scripts] check_es_ok=scripts\ok.bat [External Alias] alias_cpu=checkCPU warn=80 time=1m New configuration: NRPE [/modules] NRPEListener= CheckExternalScripts= [/settings/NRPE/server] port=5666 allow arguments=0 allow nasty characters=0 use ssl=true allowed hosts=192.168.0.1 certificate=${certificate-path}/nrpe_dh_512.pem [/settings/external scripts] allow arguments=0 allow nasty characters=0 [/settings/external scripts/scrips] check_es_ok=scripts\ok.bat [/settings/external scripts/alias] alias_cpu=checkCPU warn=80 time=1m NSCA Main changes ◦ Scheduling is a separate module Main Configuration changes ◦ Schedules are much more configurable ◦ Supports multiple NSCA servers Compatible ◦ Should be Upgradable ◦ nscp settings --migrate-to ini Old configuration: NSCA [modules] NSCAAgent.dll CheckExternalScripts.dll [NSCA Agent] interval=5 encryption_method=14 password=foobar nsca_host=192.168.0.1 nsca_port=5667 [NSCA Commands] cpu=checkCPU warn=80 time=1m host_check=check_ok New configuration: NSCA [/modules] NSCAClient= [/modules] Scheduler= [/settings/NSCA/client/targets/default] [/settings/scheduler/schedules/default] host=192.168.0.1 port=5667 password=secret encryption=none time offset=-1h channel=NSCA interval=5s [/settings/scheduler/schedules] cpu=checkCPU warn=80 time=1m host_check=check_ok Distribute monitoring with NSClient++ New configuration concepts Targets A target defines a “target” host There is usually a “default” target There can be any number of targets Targets can be either local or global Targets consist of: ◦ ◦ ◦ ◦ ◦ ◦ Host port address (=host:port) alias parent And any arbitrary strings required Targets (sample) [/settings/NRPE/client/targets] test=192.168.0.1:5666 [/settings/NRPE/client/targets/foobar] address=192.168.0.1:5666 ssl=false [/targets] foobar=192.168.0.1:5666 Command Handlers (Command) Handlers defines command A list of command handlers ◦ <name> = <command line> Syntax is the “same” (as for nscp.exe) ◦ In the future you will be able to configure these more [/settings/NRPE/client/handlers] test=query --host 192.168.0.1 --command $ARG1$ Distribute monitoring with NSClient++ Scenarios NRPE to NSCA proxy Purpose ◦ Setup checking by proxy Required components ◦ Scheduler Running our checks ◦ NRPEClient Execute checks ◦ NSCAClient Forward results Experimentalness ◦ Low The Concept NSClient++ Scheduler nrpe Network forward request nsca nsca NRPEClient NSCAClient Network nsca Config: Schedule commands [/modules] Scheduler=1 [/settings/scheduler/schedules/sample] channel=NSCA alias=system_x_ok command=check_r_ok x interval=5s Config: Execute Commands [/modules] NRPEClient=1 [/settings/nrpe/client/targets] x=nrpe://192.168.0.1:5666 [/settings/nrpe/client/handlers] check_r_ok=query --command check_ok --target $ARG1$ Config: Forward results [/modules] NSCAClient=1 [/settings/nsca/client/targets/default] host=192.168.0.1 password=secret encryption=none time offset=-1h Testing nscp test Demo? Eventlog to syslog forwarder Purpose ◦ Forward eventlog errors to syslogserver Required components ◦ CheckEventlog Running in ”active mode” ◦ SyslogClient Setup to forward notifications Experimentalness ◦ Medium The Concept NSClient++ CheckEventlog syslog syslog syslog Network SyslogClient Config: Listening for events [/modules] CheckEventLog=1 [/settings/eventlog/real-time] enabled=true destination=syslog filter=type NOT IN ('success', 'info', 'auditSuccess') log=application,system Config: Listening for events (Short) [/modules] CheckEventLog=1 [/settings/eventlog/real-time] enabled=true destination=syslog Config: Forward to syslog [/modules] SyslogClient=1 [/settings/syslog/client/targets/default] facility=kernel tag template=NSClient message template=%message% host=192.168.0.1 Config: Forward to syslog (short) [/modules] SyslogClient=1 [/settings/syslog/client/targets] default=192.168.0.1 Testing nscp eventlog --exec insert --source SQLBrowser 49230 = 1100 0000 0100 1110 --id 3 (error + 78) --type warning --event-argument a --event-argument b --facility 78 --severity error <Event> <System> <Provider Name="SQLBrowser" /> <EventID Qualifiers="49230">3</EventID> <Level>3</Level> <!-- ... --> </System> <EventData> <Data>a</Data> <Data>b</Data> </EventData> </Event> Demo? Scripting Purpose ◦ Automatically add Nagios configuration Required components ◦ PythonScript Running the script ◦ NSCAServer Responds to submissions ◦ NSCAClient Forwars modified submissions Experimentalness ◦ High The Concept NSClient++ send_nsca Network NSCAServer Channel Channel 11 PythonScript Channel Channel 22 nsca Network NSCAClient Configuration: Receive Results [/modules] NSCAServer=1 [/settings/nsca/server] port=5668 inbox=channel_1 encryption=none password=secret allowed hosts=192.168.0.1,127.0.0.1 Config: Forward results [/modules] NSCAClient=1 [/settings/nsca/client] channel=channel_2 [/settings/nsca/client/targets/default] host=192.168.0.1 password=secret encryption=none time offset=-1h Configuration: Setup Python [/modules] PythonScript=1 [/settings/python/scripts] f=forward.py Writing a Script from NSCP import Registry, Core, log, log_error import unicodedata core = Core.get() def process(channel, source, command, code, message, perf): message = unicodedata.normalize('NFKD', message).encode('ascii','ignore') core.simple_submit('channel_2', command, code, 'PythonEnhanced: %s'%message, perf) def init(plugin_id, plugin_alias, script_alias): reg = Registry.get(plugin_id) reg.simple_subscription('channel_1', process) def shutdown(): None Writing a Script from NSCP import Registry, Core core = Core.get() def process(channel, src, cmd, code, msg, perf): core.simple_submit('channel_2', cmd, code, 'PythonEnhanced: %s'%msg, perf) def init(plugin_id, plugin_alias, script_alias): reg = Registry.get(plugin_id) reg.simple_subscription('channel_1', process) def shutdown(): None Testing nscp nsca –-exec submit -–message “Hello World” --host 192.168.0.1 --password secret –-encryption none Distribute monitoring with NSClient++ Summary My vision for the future… Should be simple right? Distributed Internet Monitoring Network! Questions? Thank You! michael@medin.name http://www. .com/in/mickem http://blog.medin.name/ http://nsclient.org facebook.com/nsclient http://nsclient.org/nscp/conferances/osmc/2011/