Nfsen + Hadoop

advertisement
Nfsen + Hadoop
Vytautas Krakauskas
LITNET CERT
Swedbank SIRT
Problems
• Limited storage capacity
• Large data set processing time
Storage capacity
• Steadily increasing network traffic
• Up to six months of history for incident handling
• I/O is the major bottleneck
Processing time
• Currently no SMP support in nfdump
• Important if I/O bottleneck is resolved
Processing with Nfdump
Distributed processing
The idea
1.
2.
3.
4.
Distribute nfcap files between multiple nodes
Process the files using nfdump
Combine the output and return to nfsen
Nfsen and nfdump usage should feel the
same
1. File distribution
• nfcapd stores files on a temporary file system
– due to "random" write of stat header
– copy to HDFS at the end of each interval
– bonus: limited backup while system is being
tested
• Redundant copies on multiple nodes
– higher redundancy for faster processing and
better reliability
– lower redundancy for larger storage capacity
Modified architecture
2. Processing
• Process using nfdump
– I/O through stdin/stdout
• Each node works only with locally stored files
– Currently based on the first block
• Aggregate when possible based on:
– stats type, aggregation options, filters
• Copy the results back to the HDFS for the
combiner
3. Combining
• Combine the results as a single stream
– a custom tool (nfcat)
– some information is lost (e.g. ident)
• nfdump does the final processing
– single instance (a bottleneck)
• Displays the results
Modified architecture
Comparison
• Limited to nfdump
– Additional delays when using nfsen
• Original
– single nfdump instance
– files on a local file system
• Distributed
– Two nodes
– processes per node: 2
– HDFS replication factor: 2
Comparison
• Top10 IPs, ordered by flows
• 1-18 files (5-90 minute period)
• Filter “proto icmp”
Comparison
Filter: proto icmp
Processing time (seconds)
40
35
30
25
20
Original
15
Distributed
10
5
0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Netflow period (minutes)
Conclusions
• Overhead has a significant impact for short
periods
– Initialization
– Job scheduling
– Combining and re-processing
• Limited speed gains due to aggregation
• Filtering is essential for achieving good speed
gains
• Still needs some issues to be addressed
Thank you!
• The code
– https://github.com/vytautas/nfdist
• Patches (nfdist branch)
– https://github.com/vytautas/nfdump
– https://github.com/vytautas/nfsen
Comparison: bad case
Processing time (seconds)
No filter
100
90
80
70
60
50
40
30
20
10
0
Original
Distributed
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Netflow period (minutes)
Download