Overview Social Goal. Explain why information and disease spread so quickly in social networks. Mathematical Approach. Model social networks as random graphs and argue that they are likely to have low diameter. The Science of Networks 6.1 Definition: The clustering coefficient of a node v is the fraction of pairs of v’s friends that are connected to each other by edges. Clustering Coefficient = 1/2 The higher the clustering coefficient of a node, the more strongly triadic closure is acting on it The Science of Networks 6.2 Erdos-Renyi Graph models Randomly choose How well does the E-R model characterize real networks? high school friendship networks road networks peer-to-peer filesharing networks product co-purchase networks (people who bought X also bought Y) other? The Science of Networks 6.3 Random graph diameter Technique. Grow trees to bound path lengths. 1. 2. 3. Show when trees are small enough (< √n), # of leaves doubles. Grow small trees (< √n) around a pair of nodes. Use birthday paradox to argue trees prob. intersect. Conclude diameter is 2 log √n = log n. The Science of Networks 6.4 Small world phenomenon Milgram’s experiment (1960s). Ask someone to pass a letter to another person via friends knowing only the name, address, and occupation of the target. The Science of Networks 6.5 Small world phenomenon Bernard, David’s cousin who went to college with David, mayor of Bob’s town Bob, a farmer in Nebraska Maya, who grew up in Boston With Lashawn The Science of Networks 6.6 Last time: short paths exist. (argument: flood the network) The Science of Networks 6.7 This time: and people can find short paths! (without flooding the network) The Science of Networks 6.8 The Milgram experiment Experiment: We choose a random “source” and “target”. Your goal: pass ball from “source” to “target” by throwing it to people you know on a first-name basis. The Science of Networks 6.9 How did you find short paths? The Science of Networks 6.10 What information did you use to choose the next hop? The Science of Networks 6.11 Social network models Random rolodex model Random wiring model The Science of Networks 6.12 Random wiring model People have a predictable structure of local links (e.g., neighbors, colleagues) And a few random long-range links (e.g., someone you meet on a trip) The Science of Networks 6.13 Random wiring model Local links have grid-like structure (you know the person on your left/right and front/back) n n The Science of Networks 6.14 Random wiring model Local links represent homophily, the idea that we know people similar to us n n The Science of Networks 6.15 Random wiring model Long-range links are random – each node chooses one long-range link n n The Science of Networks 6.16 Random wiring model Long-range links represent weak ties, the links to acquantainces that would otherwise be far away n n The Science of Networks 6.17 Random wiring model What do you expect the diameter to be? Do you expect people to find short paths? If so, how short? The Science of Networks 6.18 Short paths? Problem: What if longWe’ll end up navigational clues range links are overlost repeatedly inuniformly long-range shooting target. links. random? The Science of Networks 6.19 Short paths? Problem: increases What We’ll end if longup path-length. taking range forever links areto very get anywhere. localized? The Science of Networks 6.20 Decentralized search Idea: Suppose long-range links are just slightly more likely to be to close nodes. Result: Then decentralized search finds short paths. The Science of Networks 6.21 Tradeoff Discovere d path length Actual path length path lengt h uniform The Science of Networks highly local 6.22 Optimal tradeoff Suppose links are proportional to (1/distance)2, i.e., inverse square. Inverse square?? WTF? The Science of Networks 6.23 Inverse square intuition “Scales of resolution”: Building Street City The Science of Networks Room number LSRC D235 450 Research Dr. Durham, NC USA State Countr y 6.24 Scales of resolution Street: location within 2 miles The Science of Networks 6.25 Scales of resolution City: location within 4 miles The Science of Networks 6.26 Scales of resolution County: location within 8 miles The Science of Networks 6.27 Scales of resolution Each new scale doubles distance from the center. The Science of Networks 6.28 Scales of resolution Long-range links equally likely to connect to each different scale of resolution! (allows people to make progress towards destination no matter how far away they are) The Science of Networks 6.29 How many people do you know? The Science of Networks 6.30 How many people do you know? The Science of Networks 6.31 How many people do you know? The Science of Networks 6.32 How many people do you know? The Science of Networks 6.33 How many people do you know? The Science of Networks 6.34 How many people do you know? The Science of Networks 6.35 How many people do you know? The Science of Networks 6.36 How well did this work? For me: Neighborhood Region (RTP) State (NC) Southeast Eastern US United States World The Science of Networks (Trinity Heights) 6.37 Next topic decentralized search The Science of Networks 6.38 How to route (sub) Problem. How can I get this message from me to the far-away target? Solution. Pass message to a friend. closer The Science of Networks 6.39 Scales of resolution Each new scale doubles distance from the center. The Science of Networks 6.40 Long-range links Suppose each person has a long-range friend in each scale of resolution. The Science of Networks 6.41 How to route Algorithm. Pass the message to your farthest friend that is to the left of the target. The Science of Networks 6.42 Trace of route The Science of Networks 6.43 Analysis new dist. 1 2 4 2j 2j+1 old dist. The Science of Networks 6.44 Distance is cut in half every step! The Science of Networks 6.45 Analysis 1. 2. 3. Original distance is ? at most n. Distance is cut in half every step (at least). Number of steps is ? at most log n. The Science of Networks 6.46 And in real life … The Science of Networks 6.47 Finding the Short Paths Milgram’s experiment, Columbia Small Worlds, E-R, amodel… How do individuals find short paths? in an incremental, next-step fashion using purely local information about the NW and location of target note: shortest path might require taking steps “away” from the target! This is not (only) a structural question, but an algorithmic one all emphasize existence of short paths between pairs statics vs. dynamics Navigability may impose additional restrictions on formation model! Briefly investigate two alternatives: a local/long-distance mixture model [Kleinberg] a “social identity” model [Watts, Dodd, Newman] The Science of Networks 6.48 Kleinberg’s Model Start with an n by n grid of vertices (so N = n^2) add some long-distance connections to each vertex: • k additional connections • probability of connection to grid distance d: ~ (1/d)^r – c.f. dollar bill migration paper Kleinberg’s question: what value of r permits effective navigation? # hops << N, e.g. log(N) Assume parties know only: so full model given by choice of k and r large r: heavy bias towards “more local” long-distance connections small r: approach uniformly random grid address of target addresses of their own immediate neighbors Algorithm: pass message to nbr closest to target in grid The Science of Networks 6.49 Kleinberg’s Result Intuition: if r is too large (strong local bias), then “long-distance” connections never help much; short paths may not even exist (remember, grid has large diameter, ~ sqrt(N)) if r is too small (no local bias), we may quickly get close to the target; but then we’ll have to use grid links to finish • think of a transport system with only long-haul jets or donkey carts effective search requires a delicate mixture of link distances The result (informally): r = 2 is the only value that permits rapid navigation (~log(N) steps) any other value of r will result in time ~ N^c for 0 < c <= 1 • N^c >> log(N) for large N a critical value phenomenon or “knife’s edge”; very sensitive Note: locality of information crucial to this argument centralized, “birds-eye” algorithm can still compute short paths at r < 2! can recognize when “backwards” steps are beneficial The Science of Networks 6.50 Navigation via Identity Watts et al.: Represent individuals by a vector of attributes profession, religion, hobbies, education, background, etc… attribute values have distances between them (tree-structured) distance between individuals: minimum distance in any attribute all jobs only need one thing in common to be close! Algorithm: we don’t navigate social networks by purely “geographic” information we don’t use any single criterion; recall Dodds et al. on Columbia SW different criteria used at different points in the chain CS given attribute vector of target forward message to neighbor closest to target scientists chemistry Permits fast navigation under broad conditions athletes track soccer not as sensitive as Kleinberg’s model The Science of Networks 6.51