5 years of vaporware These slides represent the work and opinions is not theirconstitute fault! of the author Itand do not official positions of any organization sponsoring the author’s work It is not my fault! This material has not been peer reviewed and It is your fault! is presented here as-is with the permission of the author. The author assumes no liability for any content or opinion expressed in this presentation and or use of content herein. Developer (not manager) Accidentally ended up in our NOC 2003: The birth of NSClient++ 2004: The open source of NSClient++ 2007: The rebirth of NSClient++ 2011: The Present ◦ Not working with Nagios ◦ Hated BB so we migrated to Nagios ◦ NSClient sucked (Broke Exchange) ◦ NRPE_NT was to much work ◦ “just for fun” ◦ Got a lot of emails and hits on the webpage ◦ 0.3.9 out last may ◦ 0.4.0 out as alfa Windows Monitoring and NSClient++ What’s new in 0.3.9 What’s new in 0.4.0 ◦ Quick Introduction ◦ ◦ ◦ ◦ Disk/File/* Scheduled Tasks Aliases Crash Handling ◦ ◦ ◦ ◦ ◦ New core Unix support New settings subsystem New protocol Python Scripting The end of NSClient++! Q/A Quick Introduction What is NSClient? ◦ A (pretty old) program pNSClient A (pretty limited) protocol check_nt ◦ A (pretty incorrect) concept ”Windows monitoring” What is it not? ◦ NSClient++! NSClient++ was written as a replacement for pNSClient But it has evolved much since then NSClient++ ◦ Freedom! Custom scripts Decentralized or centralized Active or Passive Can monitor “anything” (including your application) Can perform “tasks” (fix your problems) Other options: ◦ SNMP Generally complex to use and limited on “standard” hardware ◦ pNSClient/NRPE_NT/OpMonAgent/* Old, outdated and usually limited functionality ◦ “Agentless” WMI Limited functionality Enforces centralized and active monitoring But... ◦ I am biased, so might not want to take my word for it... Protocol Method Encryption Auth Payload M. args. M. cmds HTTP NSClient NRPE NSCA NRDP NSCP Active Active Passive Passive Active Passive No No Yes Yes Yes Yes No Yes Yes No 1024 512 ∞ Yes Yes Yes Yes Yes No No Yes Yes Yes No No No Yes Yes MQ Active No ? Yes No ∞ Yes No Yes Yes No No DNSCP check_mk Configuration Commands Extensible Yes ∞ ∞ Internals: Runs on: Current Version: ◦ ◦ ◦ ◦ C++ Around 75.000 lines of code Actively developed (unfortunately only by me) Modularized design (use what you need) ◦ Windows: NT4, w2k, XP, w2k3, Vista, w2k8, X64, X86 … ◦ Unix: Linux/Debian (probably many/most others as well) ◦ 0.3.9 with 0.4.0 in beta Most features require NRPE or NSCA (or NSCP) Documentation online (WIKI) ◦ http://nsclient.org Not supported by a commercial entity ◦ Donations welcome ◦ Sponsoring available (contact me for details) Used by a lot of people (I think) ◦ Impossible to estimate any figures Please, Help out! ◦ Add documentation ◦ Report problems ◦ Come with ideas, thoughts, etc… Using NSClient++ NSClient++ is a command line program! ◦ nsclient++ -start (net start nsclientpp) ◦ nsclient++ -stop (net stop nsclientpp) ◦ nsclient++ -test nsclient++ -test Configuration: Is your friend! ◦ notepad nsc.ini Testing: 1. Local (nsclient++ -test) 2. From CLI (check_nrpe ...) 3. From Nagios (add command) Works with “anything” ◦ Including many non Nagios based systems New command line syntax! Testing Configuration: Run scripts: ◦ nscp --service --start ◦ nscp --service –-stop ◦ nscp --help nscp --test Is your friend! ◦ nscp --test ◦ ◦ ◦ ◦ nscp --settings-help nscp --settings --migrate-to ini nscp --settings --set … … ◦ nscp --client --module PythonScript --command execute-and-load-python --script test.py --install Overview Major simplification to the disk/file checker ◦ CheckFile (removed) ◦ CheckFile2 Deprecated ◦ CheckFiles (replaces above) Volume support (for real this time) Aliases NSCA/NRPE enhancements Scheduled task checks Crash Handling A bunch of new commands Bug fixes and many more things… We have recruited a new member to the team! A girl actually… …Still a bit wet behind the ears… CheckFile(1,2,s,…) The good: ◦ Powerfull interface! ◦ Simple to use! ◦ out-of-the-box solution! (on which you can expand) The bad: ◦ Nothing! Really, I mean it! …and then… yesterday… ◦ …in the bar… ◦ …all hopes shattered… ◦ …aparently it is still to complicated… Same as was introduced for eventlog last year Based on SQL WHERE clauses ◦ ◦ ◦ ◦ ◦ ◦ generated > -2d AND severity = 'error‘ size > 5k size > 5k OR size < 1k size > 5k AND written > -2d (size > 5k OR size < 1k ) AND written > -2d … Type filename Description Name of the file path Path of the file size Size of the file accessed When the file was last accessed written When the file was last written creation When the file was created version The exe file version (slow) line_count Number of lines in the file (slow) Operator Safe Meaning = eq Equality != ne Not equal > gt Greater then < lt Less then => ge Greater then or equal =< le Less then or equal like String similarity (substring matching) not like Opposit of like regexp Regular expression matching Option Description path The root path to use pattern The file pattern to use filter Define the filter (there can only be one) warn How many hits constitutes a warning state. warn=>5, warn==5 warn=!=5 crit How many hits constitutes a critical state. truncate Length of returned data. Since NRPE/NSCA has a limited capacity this is important. (Will be deprecated in 0.4.0) syntax How to format the return data master-syntax How to format the “message string” debug=true Displays a lot more information in the logfile/console CheckDriveSize … CheckAll=volumes … Other new features ◦ Added a new option to ignore drives which are not readable (like office 2010 q: drive) ignore-unreadable ◦ Added magic modifiers (from check_mk) magic=0.7 Scheduled Tasks Works the ”same” as CheckEventLog ◦ ”filter=exit_code ne 0” Two modules: ◦ CheckTaskSched.dll Works on Windows NT4 and beyond But cannot check ”new” tasks (from Vista and beyond) ◦ CheckTaskSched2.dll Works on Windows Vista and beyond Has fewer filter keywords Type Description title Tasks name application The application comment Retrieves the comment for the work item. parameters Retrieves the command-line parameters of a task. working_directory Retrieves the working directory of the task. Retrieves the last exit code returned by the executable associated with the work item on its last run. exit_code max_run_time status Retrieves the maximum length of time the task can run. Retrieves the status of the work item. Possible values include: ready, running, not_scheduled, has_not_run, disabled, has_more_runs, no_valid_triggers most_recent_run_time Retrieves the most recent time the work item began running. CheckTaskSched "filter=exit_code ne 0" "syntax=%title%: %exit_code%" warn=>0 WARNING:test.job (1) CheckTaskSched "filter=status = 'running' AND most_recent_run_time < -30m" "syntax=%title% (%most_recent_run_time%)“ warn=>0 WARNING:test.job (2011-02-10 23:14:35) Aliases System ◦ alias_cpu CPU Load past 5 minutes, 80/90% bounds ◦ alias_cpu_ex CPU Load past 5 minutes, custom bounds ◦ alias_mem Memory utilization (all) 80/90% bounds. ◦ alias_mem_ex Memory utilization (all), custom bounds ◦ alias_up System uptime Disk/Drive ◦ alias_disk All fixed drives ◦ alias_disk_loose All fixed drives, ignore any problematic drives ◦ alias_volumes All volumes ◦ alias_volumes_loose All volumes, ignore any problematic drives ◦ alias_file_size Check the size of a given file (filename, size) ◦ alias_file_age Check the age of a given file Eventlog ◦ alias_event_log Check for errors in the event log Schedules Tasks ◦ alias_sched_all No scheduled jobs have failed ◦ alias_sched_long No task has been running for longer then a given time. ◦ alias_sched_task Check if a given task succeeded Misc ◦ alias_updates Check that updates are applied Processes ◦ alias_service All services in “sensible state” ◦ alias_service_ex All services in “sensible state” (exclude various services) ◦ alias_process A process must be running ◦ alias_process_stopped A process must not be running ◦ alias_process_count A process must not have more then X instances ◦ alias_process_hung A process must not be hung Crash Handling Using Google break pad ◦ same as Google Chrome, Mozilla Firefox, etc Three options (not mutually exclusive) 1. Send crash dumps to crash.nsclient.org Server can be changed if you want to have an internal server or proxy server. 2. Store crash dumps for analysis Will also be checked with check_nscp 3. Restart service [crash] restart=1 service_name=nsclientpp submit=0 url=http://crash.nsclient.org/submit archive=1 #folder=<appfolder>/dumps Miscellaneous Fixes NSCA NRPE Checks All filters (read CheckEventLog et al) Process checks Performance data ◦ Fixed problems with sending ”many” results back ◦ Added support for large payloads ◦ Added ”check_nscp” to check health of NSClient++ ◦ Added new check for running other checks ”with a timeout” ◦ Added new negate check (to negate the result of another check) ◦ Many fixes and additions (regular expressions) ◦ Added support for checking if processes has ”hung” ◦ Added it to many places where it was intermittently missing before Whats to come? 0.4.3 0.4.2 0.4.0 • Core switch 0.3.9 • Last 0.3.x • Linux support • Distributed Monitoring (v1) 0.4.1 • Monitoring Kits • Bugfixes • New windows checksubsytem • True passive checks • Distributed Monitoring (v2) •Bugfixes Overview Brand new core based upon libraries Unix support New settings subsystem New protocol Distributed monitoring Python scripting Updated installer ◦ Things should *work* not just “work” ◦ More modular and extensible ◦ Both as a client and server ◦ Registry, improved ini support, http, etc ◦ NSCP (HTTP(s), MQ, Native) ◦ Many new things in this area (including MQ) ◦ Primary goal (for me) is to create “unit-test” ◦ Wix 3.5, more customizable “Monitoring Kits” ◦ Monitoring solutions for “standard things” New windows check-subsytem ◦ More modern and less arcane (no NT4 support) ◦ Remote checking .Net plugin support ◦ Possibly internal VBA scripting support Metrics cache and aggregation ◦ Lightweight version of CEP ◦ “crit=cpu > 80% AND transactions_per_sec < 10” Filter-like API (in addition to options) ◦ “warn=any drive > 90% OR c: > 80%” Remote updates/upgrades ◦ Allow NSCP to upgrade itself “port” of the “standard plugins”? ◦ Run your favorite check_xxx from inside NSClient++ Unix plugins? ◦ Run CheckCPU on unix machines? Client/web Interface? ◦ A nice little program (systray) Let me know what you would like to see! Brand new core This is why it was so long in the making ◦ Merging each new version took forever! New internal protocol ◦ ◦ ◦ ◦ Removed all internal “limits” (think buffer sizes) Allows many new features Allows much more advanced internal scripts Allows for “non NRPE based checks” A lot of new bugs? ◦ This is the scary part (for me) but my testing has show it seems very stable Unix support Good question… ◦ Since no one seems to like to program on Windows I brought NSClient++ to “unix” ◦ Because I can With the new core comes portability So, perhaps the better question was: Why not? Will NOT be supported for some time though ◦ Unless someone wants to help out New Settings Hierarchical settings subsystem ◦ [/settings/NRPE/server] ◦ allow arguments=false Instead of ◦ [NRPE Server] ◦ allow_arguments=false Why did I do this? ◦ Because it was fun ◦ Number of options has started to explode ◦ Simpler to use the registry (as well as xml?) Since settings have “url:s” ◦ ◦ ◦ ◦ old://${exe-path}/nsc.ini ini://${base-path}/nsclient.ini registry://HKEY_LOCAL_MACHINE/software/NSClient++ http://my.central.server/config/${hostname}.ini Allows extensions (not via plugins though) ◦ Maybe in the future: lua://${base-path}/config.lua python://${base-path}/config.py You can mix and match: ◦ ini://${base-path}/nsclient.ini Can “include”: registry://HKEY_LOCAL_MACHINE/software/NSClient++ Which in turn includes http://conf.server/${hostname}.conf Ability to load the same plugin twice. Normal (default alias is python) Multiple modules (define two aliases foo and bar) ◦ ◦ ◦ ◦ [/modules] PytonScript= [/settings/python/scripts] test.py ◦ ◦ ◦ ◦ ◦ ◦ ◦ [/modules] foo=PytonScript bar=PythonScript [/settings/foo/scripts] test1.py [/settings/bar/scripts] test2.py It depends… ◦ If you are “still” using check_nt: Probably not ◦ If you are using NSCA: Maybe not ◦ If you want to use all new features Yes How do I change? ◦ It is pretty simple… nscp --settings --migrate-to ini ◦ (or) nscp --settings --migrate-to registry New protocol Firewall Windows Computer Nagios Server CPU Fork Fork ... check_nrpe Fork Disk Fork Fork ... check_nrpe Fork check_nrpe Fork NSClient++ Mem Fork Fork ... ... Fork Fork ... ... Fork Firewall Windows Computer CPU check_nscp Disk NSClient++ Mem ... Nagios Server Allows more then one command to be sent Used internally for plugins Support both passive and active checks Supports configuration, management, etc… Extensible But will also support: ◦ Multiple locales (based on utf) ◦ Unlimited payloads (soft configurable) ◦ Support real performance data (not strings) Distributed monitoring Scheduler NSCA... Command broker CheckCPU ... Real time plugin Event broker XXX Agent XXX Server NSCA Agent NSCA Server ... ... check_nrpe NRPE Server Command broker ... Event broker Check EventLog CheckCPU Event broker NSCA Agent SYSLOG Agent NSCA Server SysLog Server an extension of the passive checks ◦ ◦ ◦ ◦ ”Something” can send notification events ”Something” can receive notification events Agents can forward notification events Replaces NSCAListener module Supports routing Not a one-to-one mapping. ◦ Multiple consumers ◦ multiple producers Allows ◦ Passive plugins (other then the built-in NSCA) ◦ Script and rule based routing Python scripting Built-in python scripting Has full API support ◦ Can build ”modules” in python ◦ Can access settings ◦ Can do “anything” Primarily used by me for unit-testing Requires a working python install Le Roi est mort, vive le Roi! 0.4.x (ish) will be the last ”Windows” monitoring agent The idea is to make it more: ◦ A platform/client/server for distributed monitoring Regardless of os/system Regardless of Monitoring solutions Don’t worry… ◦ It will still work just fine as a ”Windows Monitoring Agent” ◦ But in addition to this you will be able to do more. Questions? michael@medin.name http://www.linkedin.com/in/mickem http://nsclient.org Facebook: facebook.com/nsclient http://nsclient.org/nscp/conferances/2011/nwcna/