What's Under Your Hood? Implementing a Network Monitoring System 4/9/2015 jonschipp@gmail.com 1 Who am I? Jon Schipp Unix Admin Linux & Unix User Group Southern Indiana Computer Klub 4/9/2015 jonschipp@gmail.com 2 and... I like computers a lot 4/9/2015 jonschipp@gmail.com 3 What's Network Monitoring? Monitoring? Monitoring your network Collecting data i.e. network traffic Interpreting the data 4/9/2015 jonschipp@gmail.com 4 Why? Network issues Attack detection Record keeping Fun 4/9/2015 jonschipp@gmail.com 5 Focus Small/Medium size business Basement endeavors Cheap goods Working with what you have 4/9/2015 jonschipp@gmail.com 6 where the magic happens 4/9/2015 jonschipp@gmail.com 7 gimme the data hubs monitor/SPAN ports, port mirroring taps ip forwarding/relaying/tunneling, whatev 4/9/2015 jonschipp@gmail.com 8 4/9/2015 9 Forwarding/Relaying Wireshark Remote Feature Network Minor Pro: Pcap-over-IP tcpdump -nni eth0 -s0 -w -| nc 192.168.1.254 33246 SSL/Encryption: ssh, socat, ncat, crypcat, stunnel Netfilter's Iptables iptables -t mangle -A PREROUTING -p tcp -m multiport --dport 80,443,22,20,21 -i eth0 -j TEE --gateway 192.168.1.254 iptables -t mangle -A PREROUTING -p tcp -m multiport --dport 80,443,22,20,21 -o eth0 -j TEE --gateway 192.168.1.254 OpenBSD's PF pass out on em0 dup-to (em1 192.168.1.254) proto tcp from any to any port { 80, 443, 22, 20 ,21 } pass in on em0 dup-to (em1, 192.168.1.254) proto tcp from any to any port { 80, 443, 22, 20, 21 } 4/9/2015 jonschipp@gmail.com 10 Architecture 4/9/2015 jonschipp@gmail.com 11 High Speed Packet Capture High-end equipment is expensive DIY: tuning and compiling Hardware is pretty fast nowadays but... 4/9/2015 We are using software that isn't designed for efficient packet capture jonschipp@gmail.com 12 NIC's Get a quality card NAPI is good DMA is good 4/9/2015 Intel PRO/1000 MT Gigabit models are generally good, $30 on Ebay jonschipp@gmail.com 13 PCI buses (bus speed in MHz) * (bus width in bits) / 8 = speed in Megabytes/second PCI 66 MHz * 32 bit / 8 = 264 MB/s PCI X 66 MHz * 64 bit / 8 = 400 MB/s (minus 20% overhead) PCI X 133 MHz * 64 bit / 8 = 850 MB/s (minus 20% overhead) PCI X 266 MHz * 64 bit / 8 = 1700 MB/s (minus 20% overhead) PCI X 533 MHz * 64 bit / 8 = 3400 MB/s (minus 20% overhead) PCIe v1 2500 Mhz * 32 1 bit lanes / 8 = 250 MB/s (minus 20% overhead) PCIe v2 x1 5000 Mhz * 1 1 bit lane / 8 = 500 MB/s (minus 20% overhead) PCIe v2 x2 5000 Mhz * 2 1 bit lanes / 8 = 1000 MB/s (minus 20% overhead) PCIe v2 x4 5000 Mhz * 4 1 bit lanes / 8 = 2000 MB/s (minus 20% overhead) PCIe v2 x8 5000 Mhz * 8 1 bit lanes / 8 = 4000 MB/s (minus 20% overhead) PCIe v2 x16 5000 Mhz * 16 1 bit lanes / 8 = 8000 MB/s (minus 20% overhead) PCIe v2 x32 5000 Mhz * 32 1 bit lanes / 8 = 16000 MB/s (minus 20% overhead) PCIe v3 x32 5000 Mhz * 32 1 bit lanes / 8 = 19700 MB/s (minus 1.5% overhead) 1000/8 = 128 Megabytes/second. 10000/8 = 1250 Megabytes/second 4/9/2015 jonschipp@gmail.com 14 Other things 4/9/2015 Decent commodity CPU, e.g. Opteron whoops Xeon in capture SMP is good If you plan on storing the data, writing to disk will be a bottleneck RAID Striping, SATA? for sure SSD (maybe ?) nah jonschipp@gmail.com 15 Typical Frame Processing Frame reaches NIC Ethernet preamble is removed FCS is calculated, if bad, dropped If interface is set in promiscuous mode, capture all Else, only process when dst MAC is me (unicast), or broadcast, or multicast (if on) FIFO to kernel ring buffer, CPU or DMA NIC generates an interrupt, interrupt handler is called Passed to host stack → ip_input module → tcp/udp module → userspace 4/9/2015 jonschipp@gmail.com 16 Frame Processing 4/9/2015 jonschipp@gmail.com 17 Specimen FreeBSD 8.2-RELEASE Ubuntu Server 10.04 4/9/2015 jonschipp@gmail.com 18 mbuf kernel structure FreeBSD - data and headers are stored in mbufs and mbuf clusters $netstat -m | head -n 3 82/653/735 mbufs in use (current/cache/total) 0/648/648/25600 mbuf clusters in use (current/cache/total/max) 0/256 mbuf+clusters out of packet secondary zone in use (current/cache) man mbuf: The total size of an mbuf, MSIZE, is a constant defined in <sys/param.h>. $grep -H -n MSIZE /sys/sys/param.h sys/sys/param.h:145:#define MSIZE sysctl kern.ipc.nmbclusters=25600 256 /* size of an mbuf */ (default) $ vmstat -z | grep mbuf_cluster mbuf_cluster: 2048, ^size^ 4/9/2015 25600 ^limit^ jonschipp@gmail.com 19 sk_buff kernel structure Linux - data and headers are stored in sk_buffs /usr/include/linux/skbuff.h 4/9/2015 jonschipp@gmail.com 20 Problems Each packet generates an interrupt, this can lead to receive live lock/interrupt storm Context switches System Calls 4/9/2015 jonschipp@gmail.com 21 Solutions Device Polling NAPI Shared memory, mmap(), and Zero Copy Bypassing host stack 4/9/2015 jonschipp@gmail.com 22 Solutions, less so Checksum offloading Large Receive Offload (LRO) Larger on-board memory size More data descriptors 4/9/2015 jonschipp@gmail.com 23 Capture Mechanisms/Subsystems Berkeley Packet Filter (BPF) Filter packets before they get to user space Linux Socket Filter (LSF) Extended BPF (kinda) 4/9/2015 and PF_RING (Linux) Others: CSPF, NDIS, xPF, MPF, DPF, Swift and so on... jonschipp@gmail.com 24 libpcap C library for packet capture Provides link layer access to data available on the network through interfaces attached to the system. Runs on almost all the modern Unices winpcap for windows 4/9/2015 When data reaches user space, it's stored in the libpcap buffer, applications read from it jonschipp@gmail.com 25 FreeBSD Frame Processing 4/9/2015 jonschipp@gmail.com 26 FreeBSD Processing cont. 3 copies due to double buffer Deals with smaller buffers compared to Linux Half of the double buffer is copied to user space 4/9/2015 Packet is passed to each BPF device, /dev/bpf[0-9] (where application via libpcap binds to) App reads from HOLD buffer, data is copied from the STORE buffer into the HOLD buffer jonschipp@gmail.com 27 Linux Frame Processing 4/9/2015 jonschipp@gmail.com 28 Linux Processing cont. 2 copies Deals with larger buffers compared to FreeBSD Smart queue, pointers 4/9/2015 Packets copied individually, not whole buffers full of packets If packets are available, wake up user spacer(libpcap) to grab data from LSF jonschipp@gmail.com 29 Tuning: Interrupt Livelock Interrupt usage high? Most modern Linux kernels are compiled with device polling FreeBSD does not have it on by default options DEVICE_POLLING options HZ=1000 make buildkernel KERNCONF=NEWKERN make installkernel KERNCONF=NEWKERN ifconfig em0 polling 4/9/2015 Get a New API (NAPI) card jonschipp@gmail.com 30 Tuning: Buffers Kernel dropping lots of packets? Increase the size of your kernel buffers FreeBSD sysctl net.bpf.bufsize=4096 sysctl net.bpf.maxbufsize=524288 Linux sysctl net.core.rmem_default=114688 sysctl net.core.rmem_max=131071 net.core.netdev_max_backlog=1000 Increase kernel virtual memory size 4/9/2015 jonschipp@gmail.com 31 Tuning: Drivers Bad NIC performance? FreeBSD: man driver e.g. man em: hw.em.rxd Number of receive descriptors allocated by the driver. The default value is 256. The 82542 and 82543-based adapters can handle up to 256 descriptors, while others can have up to 4096. echo hm.em.rxd=4096 >> /boot/loader.conf Linux: ethtool, find driver README file (/usr/src/linux/) ethtool –g eth0 ethtool -G rx 4096 4/9/2015 jonschipp@gmail.com 32 tcpdump tests, average 6,000,000 packets in 60 seconds using iperf, loss OS defaults, hardware: Dell PowerEdge 2850, Xeon (Quad), 4GB RAM tcpdump -nni em0 -w test96.pcap | FreeBSD: 0%, Linux: 8% tcpdump -nni em0 -w /dev/null | FreeBSD: 0%, Linux: 0% tcpdump -nni em0 -s0 -w test65535.pcap | FreeBSD: 1.6%, Linux: 22% tcpdump -nni em0 -s0 /dev/null | FreeBSD: 0%, Linux: .02% 4/9/2015 jonschipp@gmail.com 33 libpcap buffers libpcap library initializes libpcap buffer to 32kb, if bpf value is less than 32kb if ((ioctl (fd, BIOCGBLEN, (caddr_t)&v) < 0) || v < 32768) v = 32768; Linux initializes its buffer size at 512Kb Increase BPF buffer size globally, all apps, remember? net.bpf.bufsize, net.bpf.maxbufsize Libpcap will initialize its buffer to size in net.bpf.bufsize Set buffer for tcpdump only, use -B 524288 (512kb) 4/9/2015 jonschipp@gmail.com 34 FreeBSD, interface drop counts netstat $ netstat -dI em0 Name Mtu Network em0 1500 <Link#2> Address Ipkts Ierrs Idrop 00:02:b3:9a:c2:03 2083316 0 Opkts Oerrs Coll Drop 0 1043607 0 0 0 $ netstat –B Pid Netif Flags Recv Drop Match Sblen Hblen Command 90460 em0 p--s--103 0 103 632 0 tcpdump 43960 em0 p--s--- 3803363 0 3803363 712 0 ntop $ sysctl dev.em.0.dropped dev.em.0.dropped: 0 $ grep -R -H -n if_iqdrops /usr/src/ sys/dev/e1000/if_lem.c:3470: ifp->if_iqdrops++; usr.bin/netstat/if.c:289: idrops = ifnet.if_iqdrops 4/9/2015 jonschipp@gmail.com 35 Linux, interface drop counts ifconfig $ ifconfig -a | egrep -e "(^eth|drop)" $ ethtool -S eth0 static int get_dev_fields(char *bp, struct interface *ife) $ awk '{ print $1, $5 }' /proc/net/dev { switch (procnetdev_vsn) { Inter-| case 3: face drop sscanf(bp, lo: 0 "%llu %llu %lu %lu %lu %lu %lu", br0: 3354 &ife->stats.rx_bytes, eth0: 0 &ife->stats.rx_packets, eth1: 0 &ife->stats.rx_errors, eth2: 0 &ife->stats.rx_dropped, eth3: 14 ... eth4: 0 eth5: 103395 4/9/2015 jonschipp@gmail.com 36 tcpdump/libpcap drops “Packets captured” – Packets processed by tcpdump “Received by filter” – Passed the filter (LSF, BPF) “Dropped by kernel” - Not enough space in kernel buffer FreeBSD (kernel drops): libpcap gets its drop count from the kernel (BPF) ps_drop from pcap_stats() is bs_drop from BIOCGSTATS Linux (kernel drops) libpcap gets its drop count from PF_PACKET’s PACKET_STATISTICS ps_drop from pcap_stats() ps_ifdrop – Ubuntu addendum/patch (Linux , Tru64 Unix only) from /proc/net/dev 4/9/2015 jonschipp@gmail.com 37 PF_RING for Linux Creates new socket called PF_RING Works with existing PF_PACKET apps Shared memory Can bypass host stack, sniffing only 4/9/2015 PF_RING aware drivers for faster capture: e1000, igb, ixgbe jonschipp@gmail.com 38 PF_RING for Linux Compile PF_RING Compile PF_RING aware libpcap and tcpdump Load PF_RING kernel module modprobe pf_ring transparent_mode=2 enable_debug=0 enable_tx_capture=0 enable_ip_defrag=0 quick_mode=0 Recompile all apps to use new shared libraries, libpcap and PF_RING ./configure CPPFLAGS=”-I/usr/local/include” LDFLAGS=”-L/usr/local/lib -lpfring -lpcap” \ && make && make install 4/9/2015 jonschipp@gmail.com 39 PF_RING DNA Direct NIC Access, pure speed Map NIC memory and registers to user land Packet copy from the NIC to the DMA ring is done by the NIC's NPU One application at a time can use the DMA ring Requires DNA driver 4/9/2015 jonschipp@gmail.com 40 PF_RING TNAPI 4/9/2015 Threaded NAPI jonschipp@gmail.com 41 vPF_RING Virtual PF_RING Hypervisor bypass Zero-Copy 4/9/2015 jonschipp@gmail.com 42 netmap FreeBSD mmap() shared memory Use less system calls Creates new device, /dev/netmap 1 GHz CPU can generate the 14.8 Mpps that can saturate a 10GigE interface supports ixgbe, e1000, re 4/9/2015 jonschipp@gmail.com 43 others to checkout Ringmap – FreeBSD – code.google.com/p/ringmap/ Zero-copy sockets – FreeBSD: man zero_copy Requires specific NIC's Recompile kernel with “options ZERO_COPY_SOCKETS” The zero copy send and zero copy receive code can be individually turned off via the kern.ipc.zero_copy.send and kern.ipc.zero_copy.receive sysctl variables respectively. 4/9/2015 MMAP() libpcap – Linux - http://public.lanl.gov/cpw/ jonschipp@gmail.com 44 Interface Configuration Linux /etc/network/interfaces FreeBSD /etc/rc.conf auto eth0 iface eth0 inet manual up ifconfig eth0 0.0.0.0 -arp up up ip link set eth0 promisc on up ip link set eth0 multicast on up ip link set eth0 mtu 1514 down ip link set eth0 promisc off down ifconfig eth0 down auto eth1 iface eth1 inet manual up ifconfig eth1 0.0.0.0 -arp up up ip link set eth1 promisc on up ip link set eth1 multicast on up ip link set eth1 mtu 1514 down ip link set eth1 promisc off down ifconfig eth1 down 4/9/2015 ifconfig_em0=”inet 0.0.0.0 -arp promisc multicast mtu 1514 polling” ifconfig_em1=”inet 0.0.0.0 -arp promisc multicast mtu 1514 polling” Bridging two interfaces (Linux) jonschipp@gmail.com brctl addbr br0 brctl addif br0 eth0 eth1 ifconfig br0 up 45 Useful Applications 4/9/2015 snort, ntop, tcpdump, iftop trafshow, wireshark, tshark, tcpick tcpflow, etherape, ngrep, tcptrack suricata, bro-ids, ttt xplico, ifstat, tcpflow iptraf, bmon, bwm-ng, slurm dsniff, p0f, tcptrace, tcpreplay ipsumdump, speedometer jonschipp@gmail.com 46 ntop ntop -d -L -u ntop –access-log-file=/var/log/ntop/access.log -b -C –output-packet-path=/var/log/ntopsuspicious.log –local-subnets 192.168.1.0/24,192.168.2.0/24,192.168.3.0/24 -o -M -p /etc/ntop/protocol.list -i br0,eth0,eth1,eth2,eth3,eth4,eth5 -o /var/log/ntop 4/9/2015 jonschipp@gmail.com 47 netsniff-ng Linux, libpcap independent, zero-copy mechanism Kernel compiled with CONFIG_PACKET_MMAP 4/9/2015 jonschipp@gmail.com 48 Daemonlogger Packet Logger & Soft Tap This is a libpcap-based program. It has two runtime modes: 1)It sniffs packets and spools them straight to the disk and can daemonize itself for background packet logging. 2)It sniffs packets and rewrites them to a second interface, essentially acting as a soft tap. It can also do this in daemon mode. 4/9/2015 jonschipp@gmail.com 49 etherape 4/9/2015 jonschipp@gmail.com 50 iftop 4/9/2015 jonschipp@gmail.com 51 IPTraf 4/9/2015 jonschipp@gmail.com 52 Trafshow 4/9/2015 jonschipp@gmail.com 53 tcpick 4/9/2015 jonschipp@gmail.com 54 tcpstat 4/9/2015 jonschipp@gmail.com 55 speedometer 4/9/2015 jonschipp@gmail.com 56 bmon 4/9/2015 jonschipp@gmail.com 57 Contact Questions, comments, criticism: jonschipp@gmail.com More info: sickbits.networklabs.org/other/packetcapt dclinux.org 4/9/2015 jonschipp@gmail.com 58