Autograph Toward Automated, Distributed Worm Signature Detection Hyang-Ah Kim Brad Karp Carnegie Mellon University Intel Research & Carnegie Mellon University Usenix Security 2004 Internet Worm Quarantine Internet Worm Quarantine Techniques Destination port blocking Infected source host IP blocking Content-based blocking [Moore et al., 2003] Worm Signature 05:45:31.912454 90.196.22.196.1716 > 209.78.235.128.80: . 0:1460(1460) ack 1 Signature for CodeRed II win 8760 (DF) 0x0000 4500 05dc 84af 4000 6f06 5315 5ac4 16c4 E.....@.o.S.Z... 0x0010 d14e eb80 06b4 0050 5e86 fe57 440b 7c3b .N.....P^..WD.|; 0x0020 5010 2238 6c8f 0000 4745 5420 2f64 6566 P."8l...GET./def 0x0030 6175 6c74 2e69 6461 3f58 5858 5858 5858 ault.ida?XXXXXXX 0x0040 5858 5858 5858 5858 5858 5858 5858 5858 XXXXXXXXXXXXXXXX . . . . . 0x00e0 5858 5858 5858 5858 5858 5858 5858 5858 XXXXXXXXXXXXXXXX 0x00f0 5858 5858 5858 5858 5858 5858 5858 5858 XXXXXXXXXXXXXXXX : A Payload Content String Specific To A Worm 0x0100 5858 5858 5858 5858 5858 5858 5858 5858 XXXXXXXXXXXXXXXX 0x0110 5858 5858 5858 5858 5825 7539 3039 3025 XXXXXXXXX%u9090% 0x01a0 303d 6120 4854 5450 2f31 2e30 0d0a 436f 0=a.HTTP/1.0..Co . Signature Usenix Security 2004 2 Content-based Blocking Signature for CodeRed II Internet Traffic Filtering X Our network Can be used by Bro, Snort, Cisco’s NBAR, ... Usenix Security 2004 3 Signature derivation is too slow Current Signature Derivation Process New worm outbreak Report of anomalies from people via phone/email/newsgroup Worm trace is captured Manual analysis by security experts Signature generation Labor-intensive, Human-mediated Usenix Security 2004 4 Goal Automatically generate signatures of previously unknown Internet worms as accurately as possible Content-Based Analysis as quickly as possible Automation, Distributed Monitoring Usenix Security 2004 5 Assumptions We focus on TCP worms that propagate via scanning Actually, any transport in which spoofed sources cannot communicate successfully in which transport framing is known to monitor Worm’s payloads share a common substring Vulnerability exploit part is not easily mutable Not polymorphic Usenix Security 2004 6 Outline Problem and Motivation Automated Signature Detection Distributed Signature Detection Desiderata Technique Evaluation Tattler Evaluation Related Work Conclusion Usenix Security 2004 7 Desiderata Automation: Minimal manual intervention Signature quality: Sensitive & specific Sensitive: match all worms low false negative rate Specific: match only worms low false positive rate Timeliness: Early detection Application neutrality Broad applicability Usenix Security 2004 8 Automated Signature Generation Internet Traffic Filtering Autograph Monitor Our network X Signature Step 1: Select suspicious flows using heuristics Step 2: Generate signature using contentprevalence analysis Usenix Security 2004 9 S1: Suspicious Flow Selection Reduce the work by filtering out vast amount of innocuous flows Heuristic: Flows from scanners are suspicious Focus on the successful flows from IPs who made unsuccessful connections to more than s destinations for last 24hours Suitable heuristic for TCP worm that scans network Suspicious Flow Pool Autograph (s = 2) Holds reassembled, suspicious flows captured during the last Non-existent time period t Triggers signature generation if there areNon-existent more than flows This flow will be selected Usenix Security 2004 10 S1: Suspicious Flow Selection Reduce the work by filtering out vast amount of innocuous flows Heuristic: Flows from scanners are suspicious Focus on the successful flows from IPs who made unsuccessful connections to more than s destinations for last 24hours Suitable heuristic for TCP worm that scans network Suspicious Flow Pool Holds reassembled, suspicious flows captured during the last time period t Triggers signature generation if there are more than flows Usenix Security 2004 11 S2: Signature Generation Use the most frequent byte sequences across suspicious flows as signatures All instances of a worm have a common byte pattern specific to the worm Rationale Worms propagate by duplicating themselves Worms propagate using vulnerability of a service How to find the most frequent byte sequences? Usenix Security 2004 12 Worm-specific Pattern Detection Use the entire payload Brittle to byte insertion, deletion, reordering Flow 1 GARBAGEEABCDEFGHIJKABCDXXXX Flow 2 GARBAGEABCDEFGHIJKABCDXXXXX Usenix Security 2004 13 Worm-specific Pattern Detection Partition flows into non-overlapping small blocks and count the number of occurrences Fixed-length Partition Still brittle to byte insertion, deletion, reordering Flow 1 GARBAGEEABCDEFGHIJKABCDXXXX Flow 2 GARBAGEABCDEFGHIJKABCDXXXXX Usenix Security 2004 14 Worm-specific Pattern Detection Content-based Payload Partitioning (COPP) Partition if Rabin fingerprint of a sliding window matches Breakmark Content Blocks Configurable parameters: content block size (minimum, average, maximum), breakmark, sliding window Flow 1 GARBAGEEABCDEFGHIJKABCDXXXX Flow 2 GARBAGEABCDEFGHIJKABCDXXXXX Breakmark = last 8 bits of fingerprint (ABCD) Usenix Security 2004 15 Why Prevalence? Prevalence Distribution in Suspicious Flow Pool - From 24-hr http traffic trace Nimda CodeRed2 Nimda (16 different payloads) WebDAV exploit Innocuous, misclassified Worm flows dominate in the suspicious flow pool Content-blocks from worms are highly ranked Usenix Security 2004 16 Select Most Frequent Content Block f0 C f1 C D G f2 A B D f3 A C E f4 A B E f5 A B D f6 H I f7 I H J f8 G F I J J Usenix Security 2004 17 Select Most Frequent Content Block f0 f0 C F f1 C D G f2 A B D f3 A C E E f4 A B E B E B D f5 f6 A B D H I J f7 I H J f8 G I J C F f1 C D G f2 A B D f3 A C f4 A f5 A A f6 B HC DI I J J A A H IJ J f7 B IC D A f8 B GC DI I J J E G H E G H F Usenix Security 2004 18 Select Most Frequent Content Block Signature: W≥90% W: target coverage in suspicious flow pool P: minimum occurrence to be selected A P≥3 A B C D I J A B C D I J E G H A B C D I J E f0 C F f1 C D G f2 A B D f3 A C E f4 A B E f5 f6 A B D H I J f7 I H J f8 G I J G H F Usenix Security 2004 19 Select Most Frequent Content Block Signature: A W≥90% W: target coverage in suspicious flow pool P: minimum occurrence to be selected A P≥3 A B C D I J A B C D I J E G H A B C D I J E f0 C F f1 C D G f2 A B D f3 A C E f4 A B E f5 f6 A B D H I J f7 I H J f8 G I J G H F Usenix Security 2004 20 Select Most Frequent Content Block Signature: A W≥90% W: target coverage in suspicious flow pool P: minimum occurrence to be selected A P≥3 A B C D I J A B C D I J E G H A B C D I J E f0 C F f1 C D G f2 A B D f3 A C E f4 A B E f5 f6 A B D H I J f7 I H J f8 G I J G H F Usenix Security 2004 21 Select Most Frequent Content Block Signature: A I W≥90% W: target coverage in suspicious flow pool P: minimum occurrence to be selected P≥3 I J I J C G H I J C G H D F Usenix Security 2004 f0 C F f1 C D G f2 A B D f3 A C E f4 A B E f5 f6 A B D H I J f7 I H J f8 G I J 22 Select Most Frequent Content Block Signature: A I W≥90% W: target coverage in suspicious flow pool P: minimum occurrence to be selected P≥3 f0 C F f1 C D G f2 A B D f3 A C E f4 A B E f5 f6 A B D H I J f7 I H J f8 G I J C C G D F Usenix Security 2004 23 Outline Problem and Motivation Automated Signature Detection Distributed Signature Detection Desiderata Technique Evaluation Tattler Evaluation Related Work Conclusion Usenix Security 2004 24 Behavior of Signature Generation Objectives Metrics Effect of COPP parameters on signature quality Sensitivity = # of true alarms / total # of worm flows false negatives Efficiency = # of true alarms / # of alarms false positives Trace Contains 24-hour http traffic Includes 17 different types of worm payloads Usenix Security 2004 25 Signature Quality Larger block sizes generate more specific signatures A range of w (90-95%, workload dependent) produces a good signature Usenix Security 2004 26 Outline Problem and Motivation Automated Signature Detection Distributed Signature Detection Desiderata Technique Evaluation Tattler Evaluation Related Work Conclusion Usenix Security 2004 27 Signature Generation Speed Bounded by worm payload accumulation speed Aggressiveness of scanner detection heuristic s: # of failed connection peers to detect a scanner # of payloads enough for content analysis : suspicious flow pool size to trigger signature generation Single Autograph Worm payload accumulation is slow Distributed Autograph Share scanner IP list Tattler: limit bandwidth consumption within a predefined cap A A A A tattler A Internet A A Usenix Security 2004 28 Benefit from tattler Worm payload accumulation (time to catch 5 worms) Many innocuous misclassified Fraction of Infected Hosts Info Autograph Aggressive flows Conservative Sharing Monitor None Luckiest Median (s = 1) 2% 25% (s = 4) 60% -- Tattler All <1% 15% Signature generation More aggressive scanner detection (s) and signature generation trigger () faster signature generation, more false positives With s=2 and =15, Autograph generates the good worm signature before < 2% hosts get infected Usenix Security 2004 29 Related Work Automated Worm Signature Detection EarlyBird HoneyComb [Singh et al. 2003] [Kreibich et al. 2003] Signature Generation Content prevalence Address Dispersion Honeypot + Pairwise LCS Suspicious flow selection Content prevalence Deployment Network Host Network Flow Reassembly No Yes Yes Distributed Monitoring No No Yes Autograph Distributed Monitoring Honeyd[Provos2003], DOMINO[Yegneswaran et al. 2004] Corroborate faster accumulation of worm payloads/scanner IPs Usenix Security 2004 30 Future Work Attacks Online evaluation with diverse traces & deployment on distributed sites Broader set of suspicious flow selection heuristics Overload Autograph Abuse Autograph for DoS attacks Non-scanning worms (ex. hit-list worms, topological worms, email worms) UDP worms Egress detection Distributed agreement for signature quality testing Trusted aggregation Usenix Security 2004 31 Conclusion Stopping spread of novel worms requires early generation of signatures Autograph: automated signature detection system Automated suspicious flow selection→ Automated content prevalence analysis COPP: robustness against payload variability Distributed monitoring: faster signature generation Autograph finds sensitive & specific signatures early in real network traces Usenix Security 2004 32 For more information, visit http://www.cs.cmu.edu/~hakim/autograph Usenix Security 2004 Attacks Overload due to flow reassembly Solutions Multiple instances of Autograph on separate HW (port-disjoint) Suspicious flow sampling under heavy load Abuse Autograph for DoS: pollute suspicious flow pool Port scan and then send innocuous traffic Solution Distributed verification of signatures at many monitors Source-address-spoofed port scan Solution Reply with SYN/ACK on behalf of non-existent hosts/services Usenix Security 2004 34 Number of Signatures Smaller block sizes generate small # of signatures Usenix Security 2004 35 tattler A modified RTCP (RTP Control Protocol) Limit the total bandwidth of announcements sent to the group within a predetermined cap Usenix Security 2004 36 Simulation Setup About 340,000 vulnerable hosts from about 6400 ASes Took small size edge networks (/16s) based on BGP table of 19th of July, 2001. Service deployment Scanning 50% of address space within the vulnerable ASes is reachable 25% of reachable hosts run web server 340,000 vulnerable hosts are randomly placed. 10probes per second Scanning the entire non-class-D IP address space Network/processing delays Randomly chosen in [0.5, 1.5] seconds Usenix Security 2004 37