Defending Against Sybil Attacks Paul Parker Advisor: Shouhuai Xu Talk Outline Intro and Motivation Problem Definition Existing Work Intended Approach Results So Far P2P and Other Self-Organizing Networks Backup File Sharing Distributed Computation Distributed File Systems Farsite GFS Organic GRID Sybil Attack Why Use Sybil Attack? disruption for-profit motives: RIAA [drop?] disproportionate access to resources (computation, storage) control network Problem Definition Detect creation of multiple node identities from a single physical node without a central certifying authority Existing Work: Is Preventing Sybil Attacks Possible? John Douceur, Microsoft Research “The Sybil Attack”, IPTPS '01 (First International Workshop on Peer-to-Peer Systems (revised paper 2002)) named and introduced problem strong negative theoretical results for networks without a centralized authority Douceur’s Assumptions set of entities (i.e., nodes) synchronous broadcast cloud message message recv’d by all entities w/i bounded time finite length bit string no direct links between entities (“form of centrally supplied authentication”) identity – abstraction that persists across multiple communication events Assumptions meant to be extremely general Douceur’s Model Entity behavior: correct entities will present 1 legitimate identity faulty entities will present 1 legitimate identiy and ≥ 1 counterfeit identity How could we possibly verify identities? Assume attacker has limited resources Distinguish identities via resource-consumption challenge: CPU storage network bandwidth Example: simultaneously issue puzzle to all claimed identities that takes 1 second for 1 GHz computer to solve Douceur’s Lemmas Direct validation: 1. Any faulty entity f can present as many distinct identities as the ratio of its power to minimal power e.g., 3 GHz CPU could present 3 identities at 1 GHz minimum 2. If an entity l accepts identities that are not validate simultaneously, a single f can present arbitrarily many distinct identities to l e.g., 1 GHz computer could present 3 identities over 3 seconds Indirect validation: 3. If an entity l accepts identities vouched for by q accepted identities, then F can present arbitrarily many identities to l if |F| > q or F has at least q + |F| resources 4. Without simultaneous challenges, even a minimally-capable entity f can present |C|/q distinct identities to l. Possibly not actual proofs, but very closely reasoned Douceur’s Conclusion “attacks always possible except under extreme and unrealistic assumptions of resource parity and coordination among entities” i.e., to prevent attacks must assume: all entities have nearly identical capabilities all presented identities are simultaneously checked by all entities across the entire system therefore in heterogeneous real systems such as Internet, Sybil attacks always possible Existing Work: New Ideas On the Establishment of Distinct Identities in Overlay Networks, Bazzi & Konjevod, PODC 2005 establishing pairwise distinctness often helpful distinctness test yields true or unknown Douceur abstracted out potentially helpful details real networks physically embedded in geometric spaces BK2005 Assumptions actual distance between 2 entities approx. satisfies metric properties symmetry (bc=cb) definiteness (ab exists) triangle inequality (ab+bc≥ac) sending message to and from 2 entities (Round-Trip Time) no faster than function of the actual distance a 5 4 b 3 3 c BK2005 Example: Using Latency to Distinguish Nodes ? ? C D 100 ms RTT A (trusted) 30 ms RTT 30 ms RTT A and B sign certificates for C and D Practical technique Assumptions: B (trusted) triangle inequality holds (c ≤ a + b) occasional network quiescence More BK2005 Assumptions Euclidean or Spherical Geometry can model RTT distances: Limited number of corrupt beacons Asynchronous unreliable network i.e., nodes can be embedded into Euclidean space Rd or spherical space Sd with little or no error on RTT distance Hence have metric properties Note similar to assuming efficient routing over long periods of time, occasional quiescence will allow synchrony and reliability these allow computing distance between beacons Broadcast or point-to-point message models BK2005 Theorems Can certify distinctness in presence of: trusted beacons: corrupt applicant (in convex hull in Rd, or in Sd anywhere) multiple colluding entities for broadcast up to d multiple colluding entities for point-topoint (d=dimensionality of space) up to f corrupt beacons at least f+d+1 correct ones, one corrupt applicant or multiple colluding corrupt applicants BK2005 Conclusions can prevent Sybil attacks via geometric distinctness certification (given assumptions) nice theoretical results translation to real work requires significant investigation “a lot more work” to make this of “more practical value” generalization of first example “has a good chance of leading to solutions that can be used in practice” Existing Work: Another Idea Remote physical device fingerprinting Computers have clocks Kohno, Broido, and claffy, UCSD, IEEE S&P 2005 (“Oakland”) quartz crystal resonant frequency function of size frequency varies slightly between typical crystals First derivative of clock frequency is skew (“fast” or “slow”ness of clock) Time reported by OS varies with hardware skew and OS factors Thus, particular skew distinguishes computer Kohno et al Details TCP spec includes TCP Timestamping option TCP stack inserts a timestamp when sending packet Clock skew can be estimated by observing these over time Thus, fingerprint remote physical device by observing TCP streams 5-6 bits of entropy (“distinctness”) TSOpt field can be disabled or scrubbed Our Intended Approach Provisional: Do BK experiments Combine multiple approaches (intelligently) (some of mine are new proposals) BK2005 approach Kohno et al approach neighborhood memory latency computational puzzles OS fingerprinting PlanetLab (to use for BK testbed) Worldwide research overlay network More than 600 nodes at 300 sites www.planet-lab.org Planned BK Experiments (so far) Data-based experiments: Test triangle inequality (to w/i a margin) Test technique applicability Actual experiments: Testbed for trying technique Results So Far Analyzing Triangle Inequality w/ PlanetLab Nodes: 481 Theoretical possible triangles: 110,591,520 Number with 3 sides: 50,963,180 % of theoretical % of 3-sided Obeying triangle inequality 8.94 19.50 Almost obeying (within 15%) 23.73 51.49 Approximately obeying 32.67 70.89 Results So Far: Timestamping and OS Fingerprinting Streams with Timestamp w/ Timestamp and p0f w/ Round clock skew w/ Reasonable skew and p0f Web trace 31636 23151 8858 4649 p0f – passive OS fingerprinting tool Issues: Abilene III dataset 14050 480 3360 110 p0f ran in less accurate SYN+ACK mode for Web trace, because host initiated all connections intersection of p0f result and reasonable clock skew low (15% even on Web trace) good timestamping data hard to obtain most traces truncate TCP header before TSOpt Implications: OS fingerprinting probably a secondary technique (also because can be faked) Timestamping didn’t work well on brief streams (not enough data?) Directions for Future Work Can a self-organizing network automatically defend against Sybil attacks, without starting from a set of trusted nodes? Can we provide identities for nodes, or merely distinguish them? If not, how much distinguishability can we provide? Questions?