Nfsen + Hadoop Vytautas Krakauskas LITNET CERT Swedbank SIRT Problems • Limited storage capacity • Large data set processing time Storage capacity • Steadily increasing network traffic • Up to six months of history for incident handling • I/O is the major bottleneck Processing time • Currently no SMP support in nfdump • Important if I/O bottleneck is resolved Processing with Nfdump Distributed processing The idea 1. 2. 3. 4. Distribute nfcap files between multiple nodes Process the files using nfdump Combine the output and return to nfsen Nfsen and nfdump usage should feel the same 1. File distribution • nfcapd stores files on a temporary file system – due to "random" write of stat header – copy to HDFS at the end of each interval – bonus: limited backup while system is being tested • Redundant copies on multiple nodes – higher redundancy for faster processing and better reliability – lower redundancy for larger storage capacity Modified architecture 2. Processing • Process using nfdump – I/O through stdin/stdout • Each node works only with locally stored files – Currently based on the first block • Aggregate when possible based on: – stats type, aggregation options, filters • Copy the results back to the HDFS for the combiner 3. Combining • Combine the results as a single stream – a custom tool (nfcat) – some information is lost (e.g. ident) • nfdump does the final processing – single instance (a bottleneck) • Displays the results Modified architecture Comparison • Limited to nfdump – Additional delays when using nfsen • Original – single nfdump instance – files on a local file system • Distributed – Two nodes – processes per node: 2 – HDFS replication factor: 2 Comparison • Top10 IPs, ordered by flows • 1-18 files (5-90 minute period) • Filter “proto icmp” Comparison Filter: proto icmp Processing time (seconds) 40 35 30 25 20 Original 15 Distributed 10 5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 Netflow period (minutes) Conclusions • Overhead has a significant impact for short periods – Initialization – Job scheduling – Combining and re-processing • Limited speed gains due to aggregation • Filtering is essential for achieving good speed gains • Still needs some issues to be addressed Thank you! • The code – https://github.com/vytautas/nfdist • Patches (nfdist branch) – https://github.com/vytautas/nfdump – https://github.com/vytautas/nfsen Comparison: bad case Processing time (seconds) No filter 100 90 80 70 60 50 40 30 20 10 0 Original Distributed 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 Netflow period (minutes)