Leveraging Social Networks to Defend against Sybil attacks Krishna Gummadi Networked Systems Research Group Max Planck Institute for Software Systems Germany Sybil attack • Fundamental problem in distributed systems • Attacker creates many fake identities (Sybils) – Used to manipulate the system • Many online services vulnerable – Webmail, social networks, p2p • Several observed instances of Sybil attacks – Ex. Content voting tampered on YouTube, Digg Sybil defense approaches • Tie identities to resources that are hard to forge or obtain • RESOURCE 1: Certification from trusted authorities – Ex. Passport, social security numbers – Users tend to resist such techniques • RESOURCE 2: Resource challenges (e.g., crypto-puzzles) – Vulnerable to attackers with significant resources – Ex. Botnets, renting cloud computing resources • RESOURCE 3: Links in a social network? Using social networks to detect Sybils • Assumption: Links to good users hard to form and maintain – Users mostly link to others they recognize • Attacker can only create limited links to non-Sybil users Leverage the topological feature introduced by sparse set of links Social network-based Sybil detection • Very active area of research – Many schemes proposed over past five years • Examples: – – – – – – SybilGuard [SIGCOMM’06] SybilLimit [Oakland S&P ’08] SybilInfer [NDSS’08] SumUp [NSDI’09] Whanau [NSDI’10] MOBID [INFOCOM’10] But, many unanswered questions • All schemes make two common assumptions – Honest nodes: they are fast mixing – Sybils: they do not mix quickly with honest nodes • But, each uses a different graph analysis algorithm – Unclear relationship between schemes • Is there a common insight across the schemes? – Is there a common structural property these schemes rely on? • Such an insight is necessary to understand – How well would these schemes work in practice? – Are there any fundamental limitations of Sybil detection? Common insight across schemes • All schemes find local communities around trusted nodes – Roughly, set of nodes more tightly knit than surrounding graph • Accept service from those within the community – Block service from the rest of the nodes Are certain network structures more vulnerable? Trusted Node Trusted Node • When honest nodes divide themselves into multiple communities – Cannot tell apart Sybils & non-Sybils in a distant community • How often do social networks exhibit such community structures? How often do non-Sybils form one cohesive community? • Not often! • Many real-world social networks have high modularity – They exhibit multiple well-defined community structures Facebook RICE undergraduates’ network • Exhibits densely connected user communities within the graph • Other social networks have even higher modularity How often do non-Sybils form one cohesive community? • Traditional methodology: – Analyze several real-world social network graphs – Generalize the results to the universe of social networks • A more scientific method: – Leverage insights from sociological theories on communities – Test if their predictions hold in online social networks – And then generalize the findings Group attachment theory • Explains how humans join and relate to groups • Common-identity based groups – Membership based on self interest or ideology – E.g., NRA, Greenpeace, and PETA – Tend to be loosely-knit and less cohesive • Common-bond based groups – Membership based on inter-personal ties, e.g., family or kinship – Tend to form tightly-knit communities within the network Dunbar’s theory • Limits the # of stable social relationships a user can have – To less than a couple of hundred – Linked to size of neo-cortex region of the brain • Observed throughout history since hunter-gatherer societies • Also observed repeatedly in studies of OSN user activity – Users might have a large number of contacts – But, regularly interact with less than a couple of hundred of them • Limits the size of cohesive common-bond based groups Prediction and implication • Strongly cohesive communities in real-world social networks will be necessarily small – No larger than a few hundred nodes! • If true, it imposes a limit on the number of non-Sybils we can detect with high accuracy – Will be problematic as social networks grow large Verifying the prediction • In all networks, groups larger than a few 100 nodes do not remain cohesive Real-world data sets analyzed • Small cohesive groups tend to be family and alumni groups • Large groups are often on abstract topics like music or politics Implications • Fundamental limits on social network-based Sybil detection • Can reliably identify only a limited number of honest nodes • In large networks, limits interactions to a small subset of honest nodes – Might still be useful in certain scenarios, e.g., white listing email from friends • But, what to do with nodes not in the honest node subset? One way forward: Sybil tolerance • Rather than detect bad nodes, lets limit bad behavior • Sybil detection: Use network to find Sybil nodes – Accept / receive unlimited service from non-Sybils – Refuse to interact with Sybils • Sybil tolerance: Use network to limit nodes’ privileges – Interact with all nodes, but monitor their behavior – Limit bad behavior from any node, Sybil or non-Sybil Illustrative example: Applying Sybil tolerance to email spam Destination Destination Source • Key idea: Link privileges to credit on network links – Once the credit is exhausted, the node stops receiving service – Does not matter if the node is a Sybil or not Illustrative example: Applying Sybil tolerance to email spam { Multiple Identities • Creating multiple node identities does not help – So long as they cannot create links to arbitrary honest nodes • No assumption about connectivity between non-Sybils Such Sybil tolerant systems already exist • Ostra [NSDI’08]: Limiting unwanted communication • SumUp [NSDI’09]: Sybil-resilient voting • Their properties were not well understood before Sybil detection versus tolerance • Sybil detection – Assumes network of honest nodes is fast mixing – Does not require anything beyond network topology • Sybil tolerance – No assumption about connectivity between honest nodes – Requires user behavior to be monitored and labeled Summary: A comprehensive approach to social network-based Sybil defense • Think beyond good and evil • Sybil tolerance complements Sybil detection – Use Sybil detection to white list nodes in local communities of trusted nodes – Use Sybil tolerance when interacting with nodes outside the local communities • Currently exploring applications of the approach – E.g., to deter site crawlers Thank you! Questions?