ourmon - Portland State University

advertisement
Intro to Ourmon
Jim Binkley
jrb@cs.pdx.edu
Portland State University
Computer Science
Outline


intro to ourmon, a network monitoring system
network control and anomaly detection
 theory
 pictures at an exhibition of attacks
 port signatures

Gigabit Ethernet - flow measurement
 what happens when you receive 1,488,000 64 byte
packets a second?
 answer: not enough

more information
2
the network security problem

you used IE and downloaded a file
 and executed it perhaps automagically

you used outlook and got email
 and executed it. just click here to solve the password problem




you were infected by a worm/virus thingee trolling for
holes which tries to spread itself with a known set of
network-based attacks
result: your box is now controlled by a 16-year old boy
in Romania, along with 1000 others
and may be a porn server, or is simply running an IRC
client or server, or has illegal music, “warez”. on it
and your box was just used to attack Microsoft or the
Whitehouse
3
how do the worm/virus thingees
behave on the network?

1. case the joint: try to find other points to infect:




via TCP syn scanning of remote IPs and remote ports
this is just L4, TCP SYNS, no data at L7
ports 135-139, 445, 80, 443, 1433, 1434 are popular
ports like 9898 are also popular
• and represent worms looking for worms (backdoors)

2. once they find you
 they can launch known and unknown attacks at you
 now we have L7 data


3. this is a random process, although certainly targeting
at certain institutions/networks is common
4. a tool like agobot can be aimed at dest X, and then
dest X can be “DOSSED/scanned” with a set of attacks
4
motivation: what the heck is going on
out there?


Vern Paxson said: the Internet is not something you
can simulate
Comer defined the term Martian
 means a packet on your network that is unexpected




The notion of well-known ports so you can state that
port 80 is used *only* for web traffic is defunct
media vendors “may” have had something to do with
this
John Gilmore said: The Internet routes around
censorship
We have more and more P2P apps where 1. you don’t
know the ports used, 2. things are less server-centric,
and 3. crypto may be in use
5
a lot of this work was motivated
by this problem
gee, where’s the “anomaly”?
now tell me what caused it …
and what caused the drops too
6
ourmon introduction


ourmon is a network monitoring system
with some similarities/differences to
 traditional SNMP RMON II
• name is a take off on this (ourmon is not rmon)
 Linux ntop is somewhat similar, Cisco netflow

we deployed it in the PSU DMZ a number of
years ago (2001)
 first emphasis on network stats
• how many packets, how much TCP vs UDP, etc.
• what the heck is going on out there?
 recent emphasis on detection of network anomalies
7
ourmon architectural overview

a simple 2-system distributed architecture
 front-end probe computer – gather stats
 back-end graphics/report processor

front-end depends on Ethernet switch portmirroring
 like Snort


does NOT use ASN.1/SNMP
summarizes/condenses data for back-end
 data is ASCII

cp summary file via out of band technique
 micro_httpd/wget, or scp, or rsync, or whatever
8
ourmon 之道

the way of ourmon:
 aggregate the right stuff - network engineers don’t
have time (aka too much info, not enuf data)
 try to do something with the data, don’t just pile it up
and expect “look for the pony in all of the poo”
 minimize the data in support of aggregation - as
opposed to: 1 sample out of 2k
 try to not lose stuff (again not 1 sample out of 2k)
 it’s ok to make new FLOW TUPLES
 it’s ok to look at layer 7
 it’s ok to try and make the probe smarter, it’s not a
router (separate those functions)
9
老子曰



道可道非長道
the network will change in unexpected ways
名可名非長名
most of your signatures will be useless
jrb says: anomaly detection is easy if you pay
attention to your outputs every day
10
PSU network

Gigabit Ethernet backbone including Ge
connection to Inet1 and Inet2
 PREN(L3)/NWAX(L2) exchange in Pittock
 lamba Rail/PNW/NERO etc.

300+ Ethernet switches at PSU
 10000 live ports, 5-6k hosts
 4 logical networks: resnet, OIT, CECS, 802.11
 10+ Cisco routers in DMZ

2 ourmon probes, 60k pps at peaks, 2G
packets per day
11
PSU network DMZ Simplified
overview
skaldi: Pittock PSU router at NWAX exchange
DMZ net
gig E link
instrumentation including
ourmon front-end
local DMZ switch
MCECS
PUBNET
OIT
all (often redundant) L3 routers
RESNET
Firewall ACLs exist for the most part on DMZ interior routers
ourmon.cat.pdx.edu/ourmon is within MCECS network
12
ourmon current deployment in
PSU DMZ
Inet
A. gigabit connection to
exchange router (Inet)
C. to dorms, OIT, etc.
B. to Engineering
Cisco gE switch
gE
ourmon
graphics
display
box
port-mirror of A, B, C
2 ourmon probe
boxes (P4)
13
ourmon architectural breakdown
pkts from NIC/kernel BPF
buffer
mon.lite
report file
probe box/FreeBSD
tcpworm.txt
ourmon.conf
config file
runtime:
1. N BPF expressions
2. + topn (hash table) of
flows and other things
(lists)
3. some hardwired C filters
(scalars of interest)
graphics box/BSD
or linux
outputs:
1. RRDTOOL strip charts
2. histogram graphs
3. various ASCII reports,
hourly summaries
automated tcpdump outputs (blacklist, triggers)
14
the front-end probe


written in C
input file: ourmon.conf (filter spec)
 1-6 BPF expressions may be grouped in a named
graph, and count either packets or bytes
 some hardwired filters written in C
 topn filters (generates lists, #1, #2, ... #N)
 kinds of topn tuples: flow, TCP syn, icmp/udp error

output files: mon.lite, tcpworm.txt, etc
 summarization of stats by named filter
 ASCII, but usually small (current 5k)
15
the front-end probe


typically use 7-8 megabyte kernel BPF buffer
we may look at traditional 68 byte snap size
 a la tcpdump
 meaning L2-L4 HEADERS only, not data
 if you want irc, then defaults to 256 bytes

at this point due to hash tuning we rarely drop
packets
 barring massive syn attacks

front-end basically is 2-stage
 gather packets and count according to filter type
 write report at 30-second alarm period
16
running the probe







# ourmon.sh start
or more like:
# ourmon -c conffile -i em0 -a 30 -D directory
(optional bpf expression)
So the name of the probe is ourmon and it is
found in
/home/mrourmon/bin/ourmon
or somewhere … depending on config process
conffile -> mrourmon/etc/ourmon.conf
(supplied)
17
ourmon.conf filter types










1. hardwired filters are specified as:
fixed_ipproto # tcp/udp/icmp/other pkts
packet capture filter cannot be removed
2. 1 user-mode bpf filter (configurable)
bpf “ports” “ssh” “tcp port 22”
bpf-next “p2p” “port 1214 or port 6881 or ...”
bpf-next “web” “tcp port 80 or tcp port 443”
bpf-next “ftp” “tcp port 20 or tcp port 21”
3. topN filter (ip flows) is just
topn_ip 60 meaning top 60 flows
18
mon.lite output file roughly like
this:





pkts: caught:670744 : drops:0:
fixed_ipproto: tcp:363040008 : udp:18202658 :
icmp:191109 : xtra:1230670:
bpf:ports:0:5:ssh:6063805:p2p:75721940:web:1
02989812:ftp:7948:email:1175965:xtra:0
topn_ip : 55216 : 131.252.117.82.3112>193.189.190.96.1540(tcp): 10338270 : etc
useful for debug, but not intended for human
use
19
back-end does graphics


written in perl, one main program but there are
now a lot of helper programs
uses Tobias Oetiker’s RRDTOOL for some
graphs (hardwired and BPF): integers
 as used in cricket/mrtg, other apps popular with
network engineers
 easy to baseline 1-year of data
 logs (rrd database) has fixed size at creation


top N uses histogram (our program)
plus perl reports for topn data and other things
 hourly/daily summarization of interesting things
20
backend run from crontab

every 30 seconds runs omupdate.sh script
 runs various perl scripts, makes rrd databases,
graphics (NOT html files which are static)
 except does make 2nd-level html files for top N

hourly run for ASCII reports
 top N daily aggregation


midnight run for rollover of last daily/hourly
report into previous day
thus some reports have one week of backups,
today, yesterday, etc.
21
hardwired-filter #1: bpf
counts/drops
this happens to be yet another SQL slammer attack.
front-end stressed as it lost packets due to the attack.
22
2. bpf filter output example
note: xtra means any remainder and is turned off in this graph.
note: 5 bpf filters mapped to one graph
23
3. OLD topN example
(histogram)
24
config process





after unpack, run configure.pl (and hope for the
best)
front-end in src/ourmon (may need to hack
makefile)
front-end depends on libpcap and libpcre (may
need to get from web and build on own)
back-end depends on RRDTOOL perl (get from
web and install on own possibly)
RRDTOOL depends on libpng (sourceforge)
25
dependency overview

front-end needs
 libpcap
 libpcre

back-end needs
 rrdtool (libpng)


and a web server (apache)
and C plus perl of course
26
reasonable install strategy

run configure.pl
 if ourmon front-end fails, goto src/ourmon and build with
Makefile until you get it right
 but first install libpcap and libpcre if needed




run the front-end, you should get output files (mon.lite,
etc.) with something in them (assuming you have net
traffic!!!)
now run the back-end and see if you get .png pictures
if not, debug RRDTOOL by hand - it has some sample
perlscript for making pics
if RRDTOOL can make pics, so can ourmon
27
2 ways to run ourmon

1. probe/back-end on same box
 web server access
 2 interfaces, one for control and one for sniffing

2. probe/back-end on different boxes
 use scp or wget with small web server on front-end
probe box so back-end can get files


#method#1 is supported by default
#method#2 can be easily hacked into
omupdate.sh shellscript (back-end shell script
driver) (say use wget to get the files)
28
probe output files







mon.lite - nearly all RRDTOOL, top N, etc.
tcpworm.txt - tcp port report (syn tuple filter)
syndump.txt - local home IP version of tcp port
report
p2pdump.txt - p2p apps (needs work)
emaildump.txt - port 25 users from tcp syn tuple
irc - irc tuple file
various automated tcpdump output files that do
not go to backend (blacklist, trigger files)
29
ourmon has taught us a few hard
facts about the PSU net

P2P never sleeps (although it does go up in the
evening)
 Internet2 wanted “new” apps. It got bittorrent.


PSU traffic is mostly TCP traffic (ditto worms)
web and P2P are the top apps
 bittorrent/gnutella

PSU’s security officer spends a great deal of time
chasing multimedia violations ...
 Sorry students: Disney doesn’t like it when you serve up Shrek
 It’s interesting that you have 1000 episodes of Friends on a
server, but …
30
current PSU DMZ ourmon probe

has about 60 BPF expressions grouped in 16 graphs
 many are subnet specific (e.g., watch the dorms)
 some are not (watch tcp control expressions)

about 7 hardwired graphs
 including a count of flows IP/TCP/UDP/ICMP, and a wormcounter …

topn graphs include :
 TCP syn’ners, IP flows (TCP/UDP/ICMP), top ports,
 ICMP error generators, UDP weighted errors
 topn scans, ip src to ip dst, ip src to L4 ports

important reports:
 IRC tuples
 TCP port report
31
ourmon and intrusion detection


obviously it can be an anomaly detector
McHugh/Gates paraphrase: Locality is a
paradigm for thinking about normal behavior
and “Outsider” threat
 or insider threat if you are at a university with dorms

thesis: anomaly detection focused on
 1. network control packets; e.g., TCP syns/fins/rsts
 2. errors such as ICMP packets (UDP)
 3. meta-data such as flow counts, # of hash inserts

this is useful for scanner/worm finding
32
inspired by noticing this ...
mon.lite file (reconstructed), Oct 1: 2003:
note 30 second flow counts (how many tuples)
topn_ip: 163000:
topn_tcp: 50000
topn_udp: 13000
topn_icmp: 100000
oops ...
normal icmp flow count: 1000/30 seconds
We should have been graphing the meta-data (the flow counts).
Widespread Nachi/Welchia worm infection in PSU dorms
33
actions taken as a result:

we use the BPF/RRDTOOL to graph:
 1. network “errors” TCP resets and ICMP errors
 2. we graph TCP syns/resets/fins
 3. we graph ICMP unreachables (admin prohibit, host
unreachable etc).

we have RRDTOOL graphs for flow meta-data:
 topN flow counts
 tcp SYN tuple weighted scanner numbers

topn syn graph now shows more info
 based on ip src, sorts by a work weight, shows
SYNS/FINS/RESETS

and the portreport …based on SYNs/2 functions
34
more new anomaly detectors

the TCP syn tuple is a rich vein of information
 top ip src syn’ners + front-end generates 2nd output
 tcpworm.txt file (which is processed by the backend)
 port signature report, top wormy ports, stats
 work metric is a measure of TCP efficiency
 worm metric is a set of IP srcs deemed to have too
many syns

topN ICMP/UDP error
 weighted metric for catching UDP scanners
35
daily topn reports are useful

top N syn reports show us the cumulative
 synners over time (so far today, and days/week)
 if many syns, few fins, few resets
• almost certainly a scanner/worm (or trinity?)
 many syns, same amount of fins, may be a P2P app

ICMP error stats
 show up both top TCP and UDP scanning hosts
 especially in cumulative report logs

both of the above reports show MANY infected
systems (and a few that are not)
36
front-end syn tuple



TCP Syn Tuple produced by the front-end
(ip src, syns sent, fins received, resets
received, icmp errors received, total tcp pkts
sent, total tcp pkts received, set of sampled
ports)
There are three underlying concepts in this
tuple:
 1. 2-way info exchange
 2. control data plane
 3. attempt to sample TCP dst ports, with counts
37
and these metrics used in
particular places

TCP work weight: (0..100%)
IP src: S(s) + F(s) + R(r)/total pkts



TCP worm weight: low-pass filter
IP src: S(s) - F(r) > N
TCP error weight:
IP src: (S - F) * (R+ICMP_errors)
UDP work metric:
IP src: (U(s) - U(r)) * (ICMP_errors)
38
EWORM flag system







TCP syn tuple only:
E, if error weight > 100000
W, if the work weight is greater than 90%
O (ouch), if there are few FINS returned and
some SYNS, no FINS more or less
R (reset), if there are RESETS
M (keeping mum flag), if no packets are sent
back, therefore not 2-way
what does that spell? (often it spells WOM)
39
6:00 am TCP attack - BPF net
errors
40
topn RRD flow count graph
41
bpf TCP control
42
6 am TCP top syn (this filter is
useful...)
43
topn syn syslog sort
start log time
: instances: DNS/ip : syns/fins/resets total counts
end log time
--------------------------------------------------------------------------------------Wed Mar 3 00:01:04 2004: 777: host-78-50.dhcp.pdx.edu:401550:2131:2983
Wed Mar 3 07:32:36 2004
Wed Mar 3 00:01:04 2004: 890: host-206-144.resnet.pdx.edu:378865:1356:4755
Wed Mar 3 08:01:03 2004
Wed Mar 3 00:01:04 2004: 876: host-245-190.resnet.pdx.edu:376983:1919:8041
Wed Mar 3 08:01:03 2004
Wed Mar 3 00:01:04 2004: 674: host-244-157.resnet.pdx.edu:348895:
:8468:29627
Wed Mar 3 08:01:03 2004
44
1st graph you see in the
morning:
45
BPF: in or out?
46
BPF ICMP unreachables
47
hmm... size is 100.500 bytes
48
flow picture: UDP counts up
49
bpf subnet graph:
OK, it came from the dorms (this is rare ..., it takes a strong signal)
50
top ICMP shows the culprit’s IP
8 out of 9 slots taken by ICMP errors back to 1 host
51
the worm graph (worm metric)
52
port signatures in tcpworm.txt








port signature report simplified:
ip src:
flags work: port-tuples
131.252.208.59 () 39: (25,93)(2703,2) ...
131.252.243.162 () 22: (6881,32)(6882,34) (6883,5)...
211.73.30.187 (WOM) 100:
(80,11)(135,11)(139,11)(445,11)(1025,11)(2745,11)(
3127,11)(5000,11)
211.73.30.200 (WOM) 100: port sig same as above
212.16.220.10 (WOM) 100: (135,100)
219.150.64.11 (WOM) 100: (5554,81) (9898,18)
53
tcpworm.txt stats show:







on May 20, 2004
top 20 ports for “W” metric are:
445,135,5554,9898,80, 2745, 6667
1025,139,6129,3127,44444,2125,6666
7000,9900,5000,1433
www.incidents.org reports at this time:
port 135 traffic increase due to bobax.c
ports mentioned are 135,445,5000
54
Gigabit Ethernet speed testing


test questions: what happens when we hit
ourmon and its various kinds of filters
1. with max MTU packets at gig E rate?
 can we do a reasonable amount of work?
 can we capture all the packets?

2. with min-sized packets (64 bytes)
 same questions

3. is any filter kind better/worse than any other
 topn in particular (answer is it is worse)
 and by the way roll the IP addresses (insert-only)
55
Gigabit Ethernet - Baseline


acc. to TR-645-2, Princeton University, Karlin,
Peterson, “Maximum Packet Rates for FullDuplex Ethernet”:
3 numbers of interest for gE
 min-packet theoretical rate: 1488 Kpps (64 bytes)
 max-packet theoretical rate: 81.3 Kpps (1518 bytes)
 min-packet end-end time: 672 ns


note: the min-pkt inter-frame gap for gigabit is
96 ns (not a lot of time between packets ...)
an IXIA 1600 packet generator can basically
send min/max at those rates
56
test setup for ourmon/bpf
measurement
ixia 1600 gE packet
generator
ixia
1. min-sized pkts. 2 max-sized pkts
ixia sends/recvs packets
Packet Engines
line-speed
gigabit ethernet switch
port-mirror of IXIA send port
UNIX
pc
1.7 ghz AMD 2000 UNIX/FreeBSD
64-bit PCI/syskonnect
+ ourmon/bpf
57
test notes

keep in mind that ourmon snap length is 68
bytes (includes L4 headers, not data)
 The kernel BPF is not capturing all of a max-sized
packet

An IDS like snort must capture all of the pkt
 it must run an arbitrary set of signatures over an
individual packet
 reconstruct flows
 undo fragmentation
 shove it in a database? (may I say… GAAAA)
58
maximum packets results



with work including 32 bpf expressions, top N,
and hardwired filters:
we can always capture all the packets
but we need a N megabyte BPF buffer in the
kernel
 add bpfs, add kernel buffer size
 8 megabytes (true for Linux too)

this was about the level of real work we were
doing at the time in the real world
59
minimum packet results

using only the basic count/drop filter
 NO OTHER WORK!





using any-size of kernel buffer (didn’t matter)
we start dropping packets at around 80 mbit
speed (10% of the line rate with overhead)
this is only with the drop/count filter!
if you want to do real work, 30-50 mbits
more like it
challenged by opponent that has 100mbit
NIC card ... (or distributed zombie)
60
why is min performance so poor?

Two points-of-view that are complimentary.
1. there is not enough time to do any real work (you
have 500 ns or so)
2. the bottom-half of the os is at HW priority, interrupts
prevent the top-half from running (enough) to avoid
drops.
note that growing the kernel buffer doesn’t help

research question: what




is to be done?
btw: this is why switch/router vendors usually publish
performance stats on min-sized pkts.
61
also top N has a problem

random inserts means bucket lookup always
fails
 followed by memory allocation/cache churn

random IP src and/or random IP dst
 how to deal with this?
 one obvious answer: make sure hash algorithm is
optimized as much as possible

improved lookup certainly does help
 insert is logically: lookup to find correct bucket +
insert (node allocation/setup/chaining)
 our hash bucket size was way too small ...
62
this explains my long standing
question of

why does ourmon sometimes drop packets?
 in the drop/count graph



but you couldn’t find any reason for it
reason: looking at BIG things like flows,
not small things like TCP syn attacks
 which do not add up to anything in the way of a
MegaByte flow
 and may be distributed

small packets are evil...
63
ourmon gigabit test conclusions

min packets are a problem
 no filters and still overflows
 topn needed optimization (now bpf needs optimization)
 does topn problem apply to route caching in routers?


a different parallel architecture is needed.for min pkts
consequences for IDS snort system are terrible




easy to construct a DOS attack that can sneak packets by snort
80 mbits of 64-byte packets will likely clog it.
to say nothing of a concerted zombie attack
less could always do the trick depending on the exact
circumstances and the amount of work done in the monitor
64
google 06/SOC result





experiment
4 hw thread (dual dual-core AMD) box with Intel
NICs
FBSD/Linux recent versions
threaded ourmon probe. -T threads
threading means:
 rfork - BSD. clone - linux
 per thread BPF buffer when it makes sense (BSD)
 x86 architecture spinlocks although low granularity
65
BSD results - gbits on Ge










pkt size
fbsd fbsdt
max
min/64
415 415
1500
work/64
150
255
1500
min/750
162 162
162
work/750
130 162
162
min/1500
82
82
82
work/1500
82
82
82
work - means ourmon has load (reasonable amt of work
to do)
BSD is not polled, has 8k BPF buffer
415 at min64 due to Intel NIC losing packets
66
Linux results







pkt size
linux(def) mmap linux/t
min/64
230
315
270
work/64
90
160 140
min/750
162
162 162
work/750
70
120 140
min/1500
82
82
82
work/1500
70
82
82
max
1500
1500
162
162
82
82
67
current work

event list - interesting anomaly info is put here in one
place (events come from front-end in mon.lite file or
from backend and all put in event file)
 e.g., hey stupid wake up and look at IRC channel N
 or irc channel Y has 50000 hosts in it!!!
 host X was talking to known evil host Y



DNS top N work.
thread-based front-end (done but needs integration)
IRC-mesh collection sans Layer 7 payloads
68
more ourmon information




ourmon.cat.pdx.edu/ourmon - PSU stats and
download page
ourmon.cat.pdx.edu/ourmon/info.html - how it
works (not totally out of date)
some papers on http://www.cs.pdx.edu/~jrb
Botnet book (www.syngress.com). 4 chapters
69
Download