Distributed Monitoring

advertisement
NSClient++:
Distributed Monitoring
Adventures into the unknown
Disclaimer!



These slides represent the work and opinions
of the author and do not constitute official
positions of any organization sponsoring the
author’s work
This material has not been peer reviewed and
is presented here as-is with the permission of
the author.
The author assumes no liability for any
content or opinion expressed in this
presentation and or use of content herein.
My Background

Developer (not system manager)

Accidentally ended up in our NOC

2003: The birth of NSClient++

2004: The open source of NSClient++

2007: The rebirth of NSClient++

2011: The Present
◦ Not working with Nagios
◦ Hated BB so we migrated to Nagios
◦ NSClient/NRPE_NT was not to my liking
◦ “just for fun”
◦ Got a lot of emails and hits on the webpage
◦ 0.3.9 out last may
◦ 0.4.0 RC out
Agenda

About
◦ NSClient++

Distributed monitoring
◦ About
◦ Concepts
◦ Protocols

Distributed monitoring with NSClient++
◦ Configuration Changes
◦ Configuration Concepts
◦ Scenarios

Q/A
NSClient++
Quick Introduction
About NSClient++

Internals:

Runs on:

Current Version:


◦
◦
◦
◦
C++
Around 75.000 lines of code
Actively developed (unfortunately only by me)
Modularized design (use what you need)
◦ Windows: NT4, w2k, XP, w2k3, Vista, w2k8, X64, X86 …
◦ Unix: Linux/Debian (probably many/most others as well)
◦ 0.3.9 with 0.4.0 in beta
Most features require NRPE or NSCA (or NSCP)
Documentation online (WIKI)
◦ http://nsclient.org
About NSClient++ (cont.)

Not supported by a commercial entity
◦ Donations welcome
◦ Sponsoring available (contact me for details)

Used by a lot of people (I think)
◦ Impossible to estimate any figures

Please, Help out!
◦
◦
◦
◦
Add documentation
Report problems
Come with ideas, thoughts, etc…
Tell me what sucks!
Why should you use NSClient++

NSClient++
◦ Freedom!






Custom scripts
Decentralized or centralized
Active or Passive
Can monitor “anything” (including your application)
Can perform “tasks” (fix your problems)
Other options:
◦ SNMP
 Generally complex to use and limited on “standard” hardware
◦ pNSClient/NRPE_NT/OpMonAgent/*
 Old, outdated and usually limited functionality
◦ “Agentless” WMI
 Limited functionality
 Enforces centralized and active monitoring

But...
◦ I am biased, so might not want to take my word for it...
Thank you!
Using NSClient++ (0.4.0)

New command line syntax!
◦ nscp <context> [<options>]
◦ nscp service --start
◦ nscp help

Testing
◦ nscp test

Configuration:
◦ nscp settings --help
◦ nscp settings --migrate-to ini
◦ nscp settings --set …

Run scripts:
◦ nscp python --exec run --script test.py
◦ nscp nrpe –-query command –-host 192.168.0.1
Roadmap (rough)
0.4.3
0.4.2
0.4.0
• Core switch
0.3.9
• Last 0.3.x
• Linux
support
• Distributed
Monitoring
(v1)
0.4.1
• Monitoring
Kits
• Bugfixes
• New windows
checksubsytem
• True passive
checks
• Distributed
Monitoring
(v2)
•Bugfixes
What’s new 0.4.0

Brand new core based upon libraries

Unix support

New settings subsystem

New protocol

Distributed monitoring

Python scripting

Updated installer

Many many more things
◦ Things should *work* not just “work”
◦ Even more modular and extensible
◦ Both as a client and server
◦ Registry, improved ini support, http, etc
◦ NSCP (HTTP(s), MQ, Native)
◦ Built for distributed monitoring
◦ Allows (for me) unit testing
◦ Wix 3.5, more customizable
Distributed Monitoring
Introduction
Distributed monitoring

What is it?
◦
◦
◦
◦
◦
◦

Passive checks?
Clustered Nagios?
Nagios front ends?
Distributed Nagios?
Distributed Check “thingies”?
…
For me:
◦ The ability to distribute check metrics across the
network!
So…uhmm…
Should be simple right?
Internet
Distributed Monitoring
Concepts
Requirements

Support

Security

Extensible


◦
◦
◦
◦
All paradigms
Any size payload
Multiple commands
Multiple arguments
◦ Encryption
◦ Strong authentication
◦ No transport restrictions
◦ Allow customization
Firewall friendly (HTTP?)
…
3 main paradigms

Query
◦ Ask the status of something
◦ (Sometimes called Active)

Notification
◦ Send a notification to someone
◦ (sometimes called Passive, Message or Channel)

Command
◦ Tell someone what to do

But also:
◦
◦
◦
◦
Configuration
Installation
Upgrade
Push/pull file (scripts)
Query

Modeled on NRPE
The normal (active) scenario in Nagios/Icinga

Are you ok?

◦ Yes / no (or, OK, WARN, CRIT, UNKNOWN)
Query
NSClient++
check_nrpe
Network
NRPEServer
forward request
CheckEventlog
CheckEventlog
CheckEventlog
Submission

Modeled on NSCA
The passive scenario in Nagios/Icinga

I am (not) ok!

◦ (or, OK, WARN, CRIT, UNKNOWN)
Submission
NSClient++
send_nsca
Network
NSCAServer
CheckEventlog
channel
channel
nsca
Network
CheckEventlog
CheckEventlog
NSCAClient
Command




Event handlers (but more)
Normally implemented via NRPE (ish)
Restart service X, run script Y
Technically similar to Query
Command
NSClient++
check_nrpe
Network
NRPEServer
forward request
CheckEventlog
CheckEventlog
CheckEventlog
Distributed Monitoring
Protocols
Distributed via NRPE?

Does not support:
◦
◦
◦
◦
◦
◦
◦

Passive
Encryption (for real)
Authentication
Big payloads
Multiple commands
Firewall friendly protocol (ie. HTTP)
…
But we can still use it
◦ Inside the network
◦ Where big payloads are not required
◦ When we only need active monitoring
Distributed via NSCA?

Does not support
◦
◦
◦
◦

Active
Big payloads
Firewall friendly protocol (ie. HTTP)
…
But we can still use it
◦ Inside the network.
◦ Where big payloads are not required
◦ When we only need passive monitoring
Distributed via NRDP?

Does not support
◦ Active
◦ Strong authentication
◦ …

But we can still use it
◦ When we only need passive monitoring
Distributed via SSH?

Does not support
◦
◦
◦
◦
◦

Passive (?)
Not very Windows friendly
Not very firewall friendly
Cumbersome to manage certificates
…
But we can still use it
◦ On our *nix machines
Distributed via NSCP?

Does not support:
◦ Firewall friendly protocol (ie. HTTP)
 Will come in next major release
◦ Experimental

But we can still use it
◦ When we want play around
◦ Inside the network (currently)
Other options

Distributed NSCP
◦ No encryption support
◦ Not firewall friendly protocol (ie. HTTP)

Syslog
◦ Not really good for metrics/status
◦ No support for active checks
◦ Not firewall friendly protocol (ie. HTTP)

SMTP
◦ Not practical
◦ No support for active checks
 Will come in next major release
◦ Not real time
◦ Firewall friendly?
Summary of protocols
Protocol
Paradigm
Encryption
Auth
Payload
NSClient
NRPE
NSCA
NSCP
D-NSCP
Syslog
SMTP
check_mk
NRDP
Active
Active
Passive
All
MQ
?
?
Active
Passive
No
No
Yes
Yes
No
No
Yes
?
Yes
Yes
No
Yes
No
1024
512
∞
∞
1024
∞
∞
Yes
Yes
No
?
No
Yes
∞
Multiple- MultipleArguments commands
Yes
Yes
Yes
Yes
Yes
N/A
Yes
No
Yes
No
No
Yes
Yes
Yes
N/A
Yes
Yes
Yes
HTTP
No
No
No
Yes
No
No
No
No
Yes
Distributed monitoring
with NSClient++
Configuration changes
NRPE

Main configuration change
◦ Allow multiple modules
◦ Allow more configuration
◦ No “NRPE Handler” support
 Replaced by CheckExternalScripts

Compatible
◦ Except for NRPE Handlers

Upgradable
◦ nscp settings --migrate-to ini
Old configuration: NRPE
[modules]
NRPEListener.dll
CheckExternalScripts.dll
[NRPE]
port=5666
allow_arguments=0
allow_nasty_meta_chars=0
allowed_hosts=192.168.0.1
[External Script]
allow_arguments=0
allow_nasty_meta_chars=0
[External Scripts]
check_es_ok=scripts\ok.bat
[External Alias]
alias_cpu=checkCPU warn=80 time=1m
New configuration: NRPE
[/modules]
NRPEListener=
CheckExternalScripts=
[/settings/NRPE/server]
port=5666
allow arguments=0
allow nasty characters=0
use ssl=true
allowed hosts=192.168.0.1
certificate=${certificate-path}/nrpe_dh_512.pem
[/settings/external scripts]
allow arguments=0
allow nasty characters=0
[/settings/external scripts/scrips]
check_es_ok=scripts\ok.bat
[/settings/external scripts/alias]
alias_cpu=checkCPU warn=80 time=1m
NSCA

Main changes
◦ Scheduling is a separate module

Main Configuration changes
◦ Schedules are much more configurable
◦ Supports multiple NSCA servers

Compatible
◦ Should be

Upgradable
◦ nscp settings --migrate-to ini
Old configuration: NSCA
[modules]
NSCAAgent.dll
CheckExternalScripts.dll
[NSCA Agent]
interval=5
encryption_method=14
password=foobar
nsca_host=192.168.0.1
nsca_port=5667
[NSCA Commands]
cpu=checkCPU warn=80 time=1m
host_check=check_ok
New configuration: NSCA
[/modules]
NSCAClient=
[/modules]
Scheduler=
[/settings/NSCA/client/targets/default]
[/settings/scheduler/schedules/default]
host=192.168.0.1
port=5667
password=secret
encryption=none
time offset=-1h
channel=NSCA
interval=5s
[/settings/scheduler/schedules]
cpu=checkCPU warn=80 time=1m
host_check=check_ok
Distribute monitoring
with NSClient++
New configuration concepts
Targets





A target defines a “target” host
There is usually a “default” target
There can be any number of targets
Targets can be either local or global
Targets consist of:
◦
◦
◦
◦
◦
◦
Host
port
address (=host:port)
alias
parent
And any arbitrary strings required
Targets (sample)
[/settings/NRPE/client/targets]
test=192.168.0.1:5666
[/settings/NRPE/client/targets/foobar]
address=192.168.0.1:5666
ssl=false
[/targets]
foobar=192.168.0.1:5666
Command Handlers


(Command) Handlers defines command
A list of command handlers
◦ <name> = <command line>

Syntax is the “same” (as for nscp.exe)
◦ In the future you will be able to configure these more
[/settings/NRPE/client/handlers]
test=query --host 192.168.0.1
--command $ARG1$
Distribute monitoring
with NSClient++
Scenarios
NRPE to NSCA proxy

Purpose
◦ Setup checking by proxy

Required components
◦ Scheduler
 Running our checks
◦ NRPEClient
 Execute checks
◦ NSCAClient
 Forward results

Experimentalness
◦ Low
The Concept
NSClient++
Scheduler
nrpe
Network
forward request
nsca
nsca
NRPEClient
NSCAClient
Network
nsca
Config: Schedule commands
[/modules]
Scheduler=1
[/settings/scheduler/schedules/sample]
channel=NSCA
alias=system_x_ok
command=check_r_ok x
interval=5s
Config: Execute Commands
[/modules]
NRPEClient=1
[/settings/nrpe/client/targets]
x=nrpe://192.168.0.1:5666
[/settings/nrpe/client/handlers]
check_r_ok=query --command check_ok
--target $ARG1$
Config: Forward results
[/modules]
NSCAClient=1
[/settings/nsca/client/targets/default]
host=192.168.0.1
password=secret
encryption=none
time offset=-1h
Testing
nscp test
Demo?
Eventlog to syslog forwarder

Purpose
◦ Forward eventlog errors to syslogserver

Required components
◦ CheckEventlog
 Running in ”active mode”
◦ SyslogClient
 Setup to forward notifications

Experimentalness
◦ Medium
The Concept
NSClient++
CheckEventlog
syslog
syslog
syslog
Network
SyslogClient
Config: Listening for events
[/modules]
CheckEventLog=1
[/settings/eventlog/real-time]
enabled=true
destination=syslog
filter=type NOT IN ('success', 'info', 'auditSuccess')
log=application,system
Config: Listening for events (Short)
[/modules]
CheckEventLog=1
[/settings/eventlog/real-time]
enabled=true
destination=syslog
Config: Forward to syslog
[/modules]
SyslogClient=1
[/settings/syslog/client/targets/default]
facility=kernel
tag template=NSClient
message template=%message%
host=192.168.0.1
Config: Forward to syslog
(short)
[/modules]
SyslogClient=1
[/settings/syslog/client/targets]
default=192.168.0.1
Testing
nscp eventlog --exec insert
--source SQLBrowser
49230 = 1100 0000 0100 1110
--id 3
(error + 78)
--type warning
--event-argument a --event-argument b
--facility 78 --severity error
<Event>
<System>
<Provider Name="SQLBrowser" />
<EventID Qualifiers="49230">3</EventID>
<Level>3</Level>
<!-- ... -->
</System>
<EventData>
<Data>a</Data>
<Data>b</Data>
</EventData>
</Event>
Demo?
Scripting

Purpose
◦ Automatically add Nagios configuration

Required components
◦ PythonScript
 Running the script
◦ NSCAServer
 Responds to submissions
◦ NSCAClient
 Forwars modified submissions

Experimentalness
◦ High
The Concept
NSClient++
send_nsca
Network
NSCAServer
Channel
Channel 11
PythonScript
Channel
Channel 22
nsca
Network
NSCAClient
Configuration: Receive Results
[/modules]
NSCAServer=1
[/settings/nsca/server]
port=5668
inbox=channel_1
encryption=none
password=secret
allowed hosts=192.168.0.1,127.0.0.1
Config: Forward results
[/modules]
NSCAClient=1
[/settings/nsca/client]
channel=channel_2
[/settings/nsca/client/targets/default]
host=192.168.0.1
password=secret
encryption=none
time offset=-1h
Configuration: Setup Python
[/modules]
PythonScript=1
[/settings/python/scripts]
f=forward.py
Writing a Script
from NSCP import Registry, Core, log, log_error
import unicodedata
core = Core.get()
def process(channel, source, command, code, message, perf):
message = unicodedata.normalize('NFKD', message).encode('ascii','ignore')
core.simple_submit('channel_2', command, code,
'PythonEnhanced: %s'%message, perf)
def init(plugin_id, plugin_alias, script_alias):
reg = Registry.get(plugin_id)
reg.simple_subscription('channel_1', process)
def shutdown():
None
Writing a Script
from NSCP import Registry, Core
core = Core.get()
def process(channel, src, cmd, code, msg, perf):
core.simple_submit('channel_2', cmd, code,
'PythonEnhanced: %s'%msg, perf)
def init(plugin_id, plugin_alias, script_alias):
reg = Registry.get(plugin_id)
reg.simple_subscription('channel_1', process)
def shutdown():
None
Testing
nscp nsca
–-exec submit
-–message “Hello World”
--host 192.168.0.1
--password secret –-encryption none
Distribute monitoring
with NSClient++
Summary
My vision for the future…
Should be simple right?
Distributed Internet
Monitoring Network!
Questions?
Thank You!
michael@medin.name
http://www.
.com/in/mickem
http://blog.medin.name/
http://nsclient.org
facebook.com/nsclient
http://nsclient.org/nscp/conferances/osmc/2011/
Download