Hash-Based IP Traceback Alex C. Snoeren+, Craig Partridge, Luis A. Sanchez++, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent and W. Timothy Strayer BBN Technologies +MIT Laboratories ++Megisto Systems Published SIGCOMM 2001 Who is attacking? IP Traceback Trace the path of IP packet(s) to their source Why is this difficult? IP networks are stateless Spoofed source addresses Many administration layers Approach: Log-Based Traceback R R R A R R R7 R4 R5 R R6 R3 R1 R2 V R Logging Challenges Attack path reconstruction is difficult Packet may be transformed as it moves through the network Full packet storage is problematic Memory requirements are prohibitive at high line speeds (OC-192 is ~10Mpkt/sec) Extensive packet logs are a privacy risk Traffic repositories may aid eavesdroppers Source Path Isolation Engine Goals Trace a single IP packet back to source Asymmetric attacks (e.g. Fraggle, Teardrop, ping-of-death) Minimal cost (resource usage) Maintain privacy (prevent eavesdropping) Robustness (min. false pos., no false neg.) Assumptions Network: Packets can be addressed to 1+ hosts (multicast, broadcast) Duplicate packets may exist in network Router infrastructure is unstable Attacker: Aware of Traceback mechanisms Routers may be subverted Mechanism: Packet size should not grow due to Traceback Goals Find attack graph for single packet Minimal cost (resource usage) Maintain privacy (prevent eavesdropping) Robustness (min. false pos., no false neg.) SPIE Architecture DGA: Data Generation Agent computes and stores digests of each packet on forwarding path. Deploy 1 DGA per router SCAR: SPIE Collection and Reduction agent Long term storage for needed packet digests Assembles attack graph for local topology STM: SPIE Traceback Manager Interfaces with IDS Verifies integrity and authenticity of Traceback call Sends requests to SCAR for local graphs Assembles attack graph from SCAR input IDS 1: IDS identifies attack packet 9: Send attack graph to IDS 2: Sends Packet, Time, Last Hop 3: Authenticates and verifies IDS request 8: Assemble local graphs, query for missing info STM 4: Provisions SCAR’s to collect local DGA digests 7: Collect SCAR local graphs 6: Identify routers with Packet’s digest and construct graph 5: Collect digest tables, time intervals, hash functions DGA DGA Router Router DGA/Router DGA Router DGA DGA/Router Router Goals Find attack graph for single packet Minimal cost (resource usage) Maintain privacy (prevent eavesdropping) Robustness (min. false pos., no false neg.) Data Generation Agents Compute “packet digest” Store in Bloom filter Flush filter every time interval, t Packet Digests Compute hash(p) Invariant fields of p only 28 bytes hash input, 0.00092% WAN collision rate Fixed sized hash output, n-bits Compute k independent digests Increased robustness Reduced collisions, reduced false positive rate Hash input: Invariant Content Ver HLen TOS Total Length D M F F Identification TTL 28 bytes Protocol Fragment Offset Checksum Source Address Destination Address Options First 8 bytes of Payload Remainder of Payload Hashing Properties Each hash function Uniform distribution of input -> output H1(x) = H1(y) for some x,y -> unlikely Use k independent hash functions Collisions among k functions independent H1(x) = H2(y) for some x,y -> unlikely Cycle k functions every time interval, t Digest Storage: Bloom Filters Insertion Use n-bit digest as indices into bit array Set to ‘1’ Membership Compute k digests, d1, d2, etc… If (filter[di]=1) for all i, router forwarded packet n bits H1(P) 1 1 H H(P) 2(P) 2n bits H3(P) 1 ... Fixed structure size Uses 2n bit array Initialized to zeros 1 Hk(P) Hash-Based IP Traceback Total Length Offset Checksum DM F F Fragment Source Address Destination Address Options First 8 bytes of Payload SCAR Remainder of Payload DGA n bits 1 1 H1(P) H2(P) H3(P) 1 ... 28 bytes Ver HLen TOS Identification TTL Protocol DGA 2n bits DGA DGA 1 Hk(P) Bloom Filter 16 SPIE Collection and Reduction Agent Polls DGA’s for digest tables, hash functions, time intervals Time critical operation Constructs local attack graph Reverse Path Flooding For each router, • Compute k * hashes of p with local hash functions • Membership test ( table[hi (p)]==1 for all i) Sends Result to STM SPIE Traceback Manager Interface to IDS System Receives attack signature for p Returns attack graph Authenticates/Verifies (no details) Provisions SCAR’s Send(packet, last hop router, arrival time) Assembles local graph Fills holes in graph Goals Find attack graph for single packet Minimal cost (resource usage) Maintain privacy (prevent eavesdropping) Robustness (min. false pos., no false neg.) 20 Memory utilization A Bloom filter is described in terms of: Number of digest/hash functions (k) The ratio of data items to be stored (n) to memory capacity (m) The effective false positive rate (P) for a Bloom filter that uses m-bits memory to store n packets with k digest functions is given by: SPIE Performance Local false positive rate (n, k,b) Length of time digests are stored (t) IDS->STM->SCAR->DGA Accuracy of attack graphs Derived from local false positive rates Network topology • Why? Conclusion Find attack graph for single packet Log every packet at every router Minimal cost (resource usage) Store fixed-sized hash(p), not p 0.05% link bandwidth per time Distribute graph creation (attack sub-graphs) Maintain privacy (prevent eavesdropping) Authenticate Traceback (IDS-> STM call) No header fields stored Robustness (min. false pos., no false neg.)? 23 Packet Marking Vs. Packet Logging Packet Marking Packet Logging Basic method routers write their IDs (IP packet information (digests or address) in the forwarded packets signatures) is written into (deterministic/probabilistic) router's buffer (det./prob.) Number of attack packets needed to infer an attack path a large number of attack packets (probabilistic); single attack packet (deterministic) Overhead no buffer overhead at routers; but high buffer overhead at routers; high packet overhead; router CPU but no packet overhead; router overhead for marking CPU overhead for logging Collecting path information not a big issue, i.e., can be done using the attack packets coordination among routers required Examples Probabilistic Packet Marking Hash-based Traceback Same as packet marking