Lecture 4: Connect (4/4) How the friendship we form connect us? Why we are within a few clicks on Facebook? COMS 4995-1: Introduction to Social Networks Tuesday, September 18th 1 Some announcements This course is now officially “sexy [kinda]” congratulations! 1st assign. due Thursday 4:10pm Part A+C on papers! Part B(+raw results of C) on dropbox Sign the cover sheet 1 late days: 5% (you have 3 free during semester) 2 Outline Milgram’s “small world” experiment It’s a “combinatorial small world” It’s a “complex small world” It’s an “algorithmic small world” 4 Small-world model Main idea: social networks follows a structure with a random perturbation Formal construction: 1. Connect all nodes at distance in a regular lattice 2. Rewire each edge uniformly with probability p (variant: connect each node to q neighbors, chosen uniformly) Collective dynamics of ‘small-world’ networks. D. Watts, S. Strogatz, Nature (1998) 5 Small-world model Main idea: social networks follows a structure with a random perturbation Collective dynamics of ‘small-world’ networks. D. Watts, S. Strogatz, Nature (1998) 6 Outline Milgram’s “small world” experiment It’s a “combinatorial small world” It’s a “complex small world” It’s an “algorithmic small world” 7 Where are we so far? Analogy with a cosmological principle − Are you ready to accept a cosmological theory that does not predict life? In other words, let’s perform a simple sanity check 8 A thought experiment 1. Consider a randomly augmented lattice (N nodes) 9 A thought experiment 1. Consider a randomly augmented lattice (N nodes) 2. Perform “small world” Milgram experiment Can you tell what will happen? (a) The folder arrives in 6 hops (b) The folder arrives in O(ln(N)) hops (c) The folder never arrives (d) I need more information 10 A thought experiment (a) The folder arrives in 6 hops NOT TRUE It actually does look like a naive answer More precisely: By previous result we know that shortest paths is of the order of ln(N), which contradicts this statement. 11 A thought experiment (b) The folder arrives in O(ln(N)) ACCORDING TO OUR PRINCIPLE, OUGHT TO BE TRUE BECAUSE IT WAS OBSERVED BY MILGRAM A sufficient condition for this to be true is: Milgram’s procedure extract shortest path Answering this critical question boils down to an algorithmic problem 12 A thought experiment (c) The folder never arrives SEEMS UNLIKELY unless the procedure is badly designed (cycle) or we model people dropping or if the grid contains hole 13 A thought experiment (d) I need more information In particular, how to model Milgram’s procedure “If you do not know a target, forward the folder to your friend or acquaintance that is most likely to know her.” 14 What is Greedy Routing? A mathematical model of what Milgram measured Participants know where the target is located They use grid information + shortcuts “incidentally” N.B.: Grid “dimensions” can describe geography or other sociological property (occupation, language) Example: 15 st analyze a simple case, the randomly augmented lattice with 1. How does greedy routing perform? 3 In a randomly augmented lattice of dimension k = 1 with N routing uses at least Ω( (N )) steps. : Let consider a target t, and the neighborhood subsetchoices not by he algorit hm do not impact impact he out come of random usalgorit Does it extract the shortest path?of by tt he hm do not tt he out come random choices not curre curre used. √ we need to analyze it! used. Not necessarily, this is why ∈V − t| tt≤he N cut Let IX Xl ii=be be ttuhe he node|utt hat hat hel ∗short short point ss tt o. Let node cut .root root ed ed at at U Uii point o. X X ii ff Case study: k=1, target t,Each starting from u0in an i.i.d. i.i.d. sequence of dimension uniformly chosen chosen point s. Each of tt hem lies an sequence of uniformly point s. of hem lies in II ll √ √ 1 2l √ 1 2l √ probability 2lat N Nmost ≤ interval: √2l .. N nodes. We introduce probability 2l ≤ t contains t NNand N by t he algorit hm d N onerouting of tconstructs he nconst first ruct element lies in I ll 2is t hen u ii U om T anheinit ial point s,t hat greedy patshofs X=used. probability The greedy routing a apath 0 , U1 , U bounded by t he union bound: t each st ep her a local edge in the of grid or ith a shortcut. weeit denote the end-point the shortcuts(U asi ) i ≥X0i be t he Let om points in the grid. an i.i.d. sequence t cuts are chosen independently of each ot her, we can probability apply nl the 1 2l √ P { X ii ∈ I ll } ≤ P [X ii ∈ I ll ] ≤ √ . N eferred decision, i =which st at es that previous random choices used N 1,...,n i = 1,...,n T he probability √ bounded by t he un 1 Hence, for n = l = 2 N , t his occurs wit h probability at most 1/ 2. i = 1,...,n i = 1,...,n conclude t hat , wit h probability at least 1/ 2 t he n first short cut s encount 3 16 P P∈ I } { iX = 1,...,n i l i l {≤X i ∈ I l } P≤i[X = i = 1,...,n Analysisi =of Greedy routing √ 1,...,n i = 1,...,n √ 1 Hence, for n √ =for l n== 2l = N1 , Nt his occu Hence, , this occ 2 1 Hence, for n =conclude l =, wit ,with t his occurs h p conclude t hat hNprobability at wit least probability at least 2that, X n out }atare lieside outside tha conclude hat wit least 2l . tIn n CLAIM: If{tnone of in I l1/ X 1, X .X. 1probability . ,,XX2 n, . .}. , lie . IIn the hat 2{, h in Ilie greedy outside l ,uthe { X 1 ,and X 2in ,we . .I.start }greedy side Iprocedure t hatneeds case, t he procedure eit either herif w 0out nfrom l . In needs l, X greedy routing needs at least min(n,l) in I Then greedy procedure needs eit hersteps l , t he • to make n steps • t o make n st eps • otherwise, to reach before t before • t o make n st eps • ot herwise, t o reach t before boundary of I l before using l local edges.n • ot herwise, t o reach of before t before n st eps, it boundary I l √using l local edges. In both cases, N steps are required. √ l local edges. boundary of I l using In bot√h cases, N st eps are required. Pr oposi t ion 4 In a randomly augmen In bot h cases, N st eps are required. nodes, greedy routing uses at least Ω(N P r op osi t i on 4 I n a randomly augmente 17 k i 2l T he √ probability N ≤ . 1 2lof probability t hat one t he n an i.i.d. sequence of uniformly chosen points. Each of them lies √ N .i.d. .i.d. sequence of uniformly uniformly chosen chosen point point s. s. Each Each of of t t hem hem lies lies in in I I wit wi probability 2l N ≤ . N l l i.i.d. sequence of uniformly chosen point s. Each of t hem lies in I l wit N √ N √ an i.i.d. sequence of tuniformly chosen point s 1√ 2l 11 2l 2l2l √ bounded by he union bound: √ probability . √ The TNNhe one of t he perform? nthat firstone element sfi bability ability How 2l2lNprobability Ndoes .. Nt hat bability N2l ≤≤N√NNN≤ probability of the n greedy routing 1 2l √ probability 2l N ≤ TTThe he probability tthat tthe he nnn first first element ssNof of.X Xi ii lies lies in in isist hen then hen bounded by heofNofunion bound: The probability that one of the nelement first elements ofbound: XI iIl ll lies in Iuppe is by the union he probability hattone one hebounded first element upp lupp nded nded by bound: unded byt the heunion union bound: bounded by the bound: t hat one of t he n first eleme T heunion probability i i i { Xi ∈ Il } bounded by t he union Pbound: nlnlX i ∈ I l } ≤ P { PP {{PXXi i ∈∈ IIIll l }}} ≤≤≤ { X i ∈ PPi[X [X IIl ll]]≤≤ ≤ √√ . . nlP [X =I li1,...,n ii ∈∈ } P1,...,n { X i ∈ I l ii=i}==1,...,n ≤ [X i ∈NN Il ] ≤ √ . i =P1,...,n i= 1,...,n i =1,...,n 1,...,n i = 1,...,n i√ =√ 1,...,n N i = 1,...,n i = 1,...,n 1 √ 11 √ 1most for n = l = N , t his o Hence, Hence, ,Hence, his occurs wit wit h h probability probability at at most 1/ 1/ 2. 2. W W Hence, for n = l = N , this o Hence, for for nn == l l == 22 NNP , tt√this his occurs occurs 1/ 2. W 2 √ { X ∈ I } ≤ P 2 i l 1 1 clude lude , , wit at at least least 1/ 1/ 22event tN the he nntwith first first short short cut cut ssencount encount ere er Hence, for = lconclude = = ,=,this probability atat mos clude hat withhnprobability probability atN least 1/occurs encount ere t that Fixing this has proba ≤1/2 t hat , wit h probability le Hence, for n liconclude , his occurs wit h pr that, with probability at lea 2 2 tthat = 1,...,n i =t he 1,...,n out ut side I . In t hat case, if we we assume assume hat s s it it self self is not in I , he.greed greed out side I . In t hat case, if we assume t is not in I , t he greed l l that, with probability l l conclude at least 1/ 2 the n first shortcuts et { X , X , . . . , X } lie outside I So with proba ≥1/2, are not in { X , X , . . . , X } lie out side I . In 1 2 n l conclude t hat , wit h probability at least 1/ 2 t he n 1 2 n l √that s itself is not in I l , cedure edure needs cedure needsI eit either her ie outside if we assume l . In that case, in 1 ,lsthe greedy procedure needs if eith lgreedy in I l },ntlie he procedure needs eit hh XOn this event, assuming not in Hence, for =Iout =side N , t his occurs wit { , X , . . . , X I . In t hat case, w 1 2 n l 2 procedure needs ststeps • t toomake makenn epseither routing needs nnsteps t hatprocedure , wit more h• probability at least 1/ 2 t h in Ilconclude ,Greedy t he greedy needs eit her to than make steps • t o make n st eps t o reach before t t before before n n st st eps, eps, it it requires requires tIn tt from fromtthth • otot•herwise, herwise, t o reach before t before n st tooreach reach from to{make n steps OrXit1has boundary of l steps , X 2to, .reach . . , Xt nfrom } lie out side I l ., using t hat case, boundary IIl l using l l local local edges. edges. boundary usingn local edges. • otherwise, to reach before t befo •int oIofof,make st eps greedy needs eit herl local √√ l t he • otherwise, to reach t before nt o steps, requires tot reach •before otprocedure herwise, reach before befo boundary of I it using edge bot cases, NN ststeps epsare arerequired. required. ot othhcases, required. 18 l boundary of I l using local edges. boundary of t√I lbefore using nl • ot herwise, t o lreach before local st eps,edg it The probability t hat one of t he n first element s of X i lies in I l bounded by t he union bound: e, to reach before t before n st eps, it requires to reach t from the of I l using l local edges. nl √ P { Xi ∈ Il } ≤ P [X i ∈ I l ] ≤ √ N steps are required. i =Hence the expectedi = 1,...,n number of st eps N 1,...,n dy rout ing is lower bounded by a1 √const ant time the square root A thought experiment for n = l = In a lineHence, Milgram’s uses N ,steps t his occurs wit h probability at m conclude t hat , wit h probability at least 1/ 2 t he n first short cut square root is not { X 1 , X 2 , . . . , X n } lie outside I l . In t hat case, if we assume t hat satisfying smallprocedure world ofneeds 4 In a randomly lattice dimension in I l , augmented t hefor greedy eit her k ≥ 1 containing 2 Not better when k>1 ! y routing usesmuch in expectation at least steps. • t o make n st eps even worse, the proof applies to any distributed alg. s a good practice exercice the same argument. that itt o rea • ot herwise, using t o reach before t before n st eps,Note it requires Ourthe sanity check test failed! boundary of I l has usinggrandly l local edges. eem that short cut augmentation of the latt ice does connect √ s k increases. But not e that inexplain aare lattice of short dimension with N “Small world” results that pathskexist In bot h cases, N st eps required. 1 ance in the lattice is of the order aofdaunting O(N k ), algorithmic so that the relative … finding them remains task btained withP rshort s augmentation becomes as k op osicut t ion 4 I n a randomlyactually augmented lattice ofworse dimension 1 4N 1 k+ 1 nodes, greedy routing uses at least Ω(N 19 k k+ 1 ) steps. The “ small world” model defined by K leinberg is a variat ion of Outline Milgram’s “small world” experiment It’s a “combinatorial small world” It’s a “complex small world” It’s an “algorithmic small world” Beyond uniform random augmentation 20 Autopsy of “Small-world” failure 1.3.2 anandom d om au gm ent at ation i on wit w i ht hbias bi 1.3.2 R R augment entdom at i onauwit bi as an gmhT ent atfailure i on w ioft of hMMilgram’s biilgram’s as he experiment under The failure experiment under √√ t In a uniformly augmented latticeas shortcuts do exist NNsh int uit ively explained follows: about intuitively explained as follows: about experiment under the uniform augmentation may be e of M ilgram’s√ experiment under t he uniform√√augment at io √ About shorcuts leads toon when lead t oto tabout he int erval I Il average l that = Nwill Non T heprob pro the interval when . .The ollows: Nlead shortcuts exist l when explained as follows: N short cut s exist average √ about √ s are uniformly dist ribut ed among a short cut are distribut ed among allt en = NI l. when The problem the fact thatfrom these intl erval lshortcuts = Ncomesfrom . T heuniformly problem comes t he fact of st eps t oto find one of tthese hese st art ing from istribut edthey among all N ed nodes and, hence, it takes a lot steps find one starting from But areof dispersed among are uniformly dist ribut among all Nof nodes and, hence, itanta t hey uniformly dist ribut ed,progress no algorit hmc they are uniformly distributed, no algorithm hese arting an arbitrary point. Moreover, since findstone of from t hese stare art ing from arbit rary point . M oreo Moreover, previous steps doan not lead to following otother her short cuthope s,progress since moving any but ed, nodist algorithm can hope to make some by some following shortcuts, since moving anyw niformly ribut ed, no algorit hm can t o make pr So need about N/√N = √N trials osince find one. s,t her since moving does not improvedoes his chance to find one. short cut s,tanywhere moving anywhere not improve h T he following spect acularillust illu The followingresult result is is a spectacular e. Isresult there augmentation? s a spect acularanother illustrat ion thatt illust properties oft hat algorithmic can allow hem tto o ion exploit informat ionin can allow them exploit information lowing is amic spect acular rat propert ies ofi xploit information in ainformat surprising ow t hem t o exploit ionmanner. in a surprising manner. ugment latt ti ce ice w wit A uAgm ent i ning g l at i t h aa bias: b i as: Kleinb K lein 21 The 10 papers that will make you a social expert 22 10 sociological must-reads 1. 2. S.Milgram, “The small world problem,” Psychology today, 1967. M. Granovetter, “The strength of weak ties: A network theory revisited,” Sociological theory, vol. 1, pp. 201–233, 1983. 3. M. McPherson, L. Smith-Lovin, and J. M. Cook, “Birds of a Feather: Homophily in Social Networks,” Annual review of sociology, vol. 27, pp. 415–444, Jan. 2001. 4. M. O. Lorenz, “Methods of measuring the concentration of wealth,” Publications of the American Statistical Association, vol. 9, no. 70, pp. 209–219, 1905. + H. Simon, “On a Class of Skew Distribution Functions,” Biometrika, vol. 42, no. 3, pp. 425–440, 1955. 5. R. I. M. Dunbar, “Coevolution of Neocortical Size, Group-Size and Language in Humans,” Behav Brain Sci, vol. 16, no. 4, pp. 681–694, 1993. 6. D. Cartwright and F. Harary, “Structural balance: a generalization of Heider's theory.,” Psychological Review, vol. 63, no. 5, pp. 277–293, 1956. 7. M. Granovetter, “Threshold Models of Collective Behavior,” The American Journal of Sociology, vol. 83, no. 6, pp. 1420–1443, May 1978. 8. B. Ryan and N. C. Gross, “The diffusion of hybrid seed corn in two Iowa communities,” Rural sociology, vol. 8, no. 1, pp. 15–24, 1943. + S. Asch, “Opinions and social pressure,” Scientific American, 1955. 9. R. S. Burt, Structural Holes: The Social Structure of Competition. Harvard University Press, 1992. 10. F. Galton, “Vox Populi,” Nature, vol. 75, no. 1949, pp. 450–451, Mar. 1907. 23 Homophily People “love those who are like themselves”, “Similarity begets friendship” Nichomachean Ethics, Aristotle & Phaedrus, Plato Do you think homophily produces or hinder small world? Homophily in Online Dating: When Do You Like Someone Like Yourself? Andrew T. Fiore and Judith S. Donath MIT Media Laboratory 20 Ames St., Cambridge, Mass., USA {fiore, judith}@media.mit.edu ABSTRACT Psychologists have found that actual and perceived similarity between potential romantic partners in demographics, attitudes, values, and attractiveness correlate positively with attraction and, later, relationship satisfaction. Online dating systems provide a new way for users to identify and communicate with potential partners, but the information they provide differs dramatically from what a person might glean from face-to-face interaction. An analysis of dyadic interactions of approximately 65,000 heterosexual users of an online dating system in the U.S. showed that, despite these differences, users of the system sought people like them much more often than chance would predict, just as in the offline world. The users’ preferences were most strongly sameseeking for attributes related to the life course, like marital history and whether one wants children, but they also demonstrated significant homophily in self-reported physical build, physical attractiveness, and smoking habits. Author Keywords 24 Online personals, attraction, computer-mediated communication, online dating, relationships ACM Classification Keywords H5.3. Group and Organization Interfaces; Asynchronous psychological and sociological perspectives (Lea & Spears 1995, Walther 1996, McKenna et al. 2002), and they have examined the personals ads that appear in print publications (Bolig et al. 1984, Ahuvia & Adelman 1992). This paper describes a quantitative examination of the characteristics for which online dating users seek others like them. NATURE OF ONLINE PERSONALS DATA We analyzed data from one online dating system in particular. Through an agreement brokered by the Media Laboratory with an online dating Web site (the “Site”), we obtained access to a snapshot of activity on the Site over an eight-month period, from June 2002 through February 2003. The data included users’ personal profile information, their self-reported preferences for a mate, and their communications via the site’s private message system with other users. Anonymous ID numbers distinguished unique users. Table 1 indicates which profile characteristics users could specify about themselves and about the partners they would like to meet. Data about private messages exchanged by the users included the sender, recipient, subject, text, date and time of delivery, and whether the recipient had read the message. Augmenting lattice with a bias What if the augmentation exhibits a bias Most of the people you know are near, Occasionally, you know someone outside Does this break the lower bound proof? Does finding a neighborhood of t becomes easier? 25 How to model augmentation bias Formal construction: 1. Connect nodes at distance p in a regular lattice 2. Connect each node to q other nodes, chosen with a biased probability 3. p=q=1 to simplify The small-world phenomenon: An algorithmic perspective. J. Kleinberg, Proc. of ACM STOC (2000) 26 We assume V = (i 1 , . . . , i k ) ∈ { 1, 2, . . . , L } k , (not e t hat How to model augmentation bias odes are connect ed t o all ot her nodes whose dist ance in t most p (i.e. v = (i 1 , . . . , i k ) and v = (i 1 , . . . , i k ) are i 1Formal | + . . . +construction: |i k − i k | ≤ p). 1 − 1. Connect nodes at distance p in a regular lattice n addit ion, node connect ed t onodes, q ot hers nodes i 2. each Connect eachisnode to q other chosen withchosen a uch t hat biased probability P [u v] = 1 u− v v= u r 1 u− v . r ly called t he clustering coefficient, is crit ical as it r may be called thegrid. clustering coefficient augment attion of t he A node t hatcut is stwice sthe ance p and he number of random short q are two p r times less If a node is twice further, probability is st ill be chosen but only wit h a probability 2 del, which have lit t le effect on t he performancet imes of dist ribut hatThe , in t he probability describing t he chance t o connect small-world phenomenon: An algorithmic perspective. J. Kleinberg, of ACM (2000) nat or onlyProc. plays t heSTOC role of a normalizing const ant . rat io between t hese probability approaches 1. When 27 Impact of clustering coefficient Small values of r Approaches uniform augmentation Large values of r Approaches original lattice 28 Can we break the lower bound? (a) Yes, finding a neighborhood of t becomes easier A PRIORI NOT TRUE It is easier only if you are already near the target In general, it can take a larger number of steps 29 Can we break the lower bound? (b) Yes, for another reason All positions are not equal, hence progress is possible As shortcut are used recursively, probability increases So we need to study the sequence of progress 30 The critical case Assume r=k (dimension of the grid) A neighborhood of t of radius d/2 Contains (d/2)k nodes Each may be chosen with probability roughly 1/(3d/2)k Growth of ball compensates probability decreases! Harmonic distribution. The small-world phenomenon: An algorithmic perspective. J. Kleinberg, Proc. of ACM STOC (2000) 31 Augmented lattice Navigable small world dist. alg need O(log2(N)) steps Combinatorial Small world (Short paths exist) dist. alg. need N(k-r)/(k+1) steps Not a small world (Short paths do not exist) alg. need N(r-k)/(r-(k-1)) steps r 0 r=k The small-world phenomenon: An algorithmic perspective. J. Kleinberg, Proc. of ACM STOC (2000) 32 Theoretical follow ups Is the analysis of greedy routing tight? Yes, greedy routing performs in Ω(log2 n) Can we find path as short as log(n) (shortest path)? Yes, with extra information on neighboring nodes Or another augmentation Can we build augmentation for an infinite lattice? See homework exercice (check tomorrow night) 33 Theoretical follow ups (cont’d) Can we augment other graphs? G=(V,E) (i.e. a lattice) with distance known Random augmentation adds one shortcut per node Is routing on G + shortcuts used incidentally efficient? Indeed all these graphs are polylog augmentable: Bounded ball growth, Doubling dimensions Bounded “width” (Trees, bounded treewidth graphs) What about all graphs? Lower Bound O(n1/√ln(n)) 34 Practical follow up Can we observe harmonic distribution? • Yes, using closeness rank instead of distance Can we prove it emerge? • Recent results • Through rewiring, mobility Geographic routing in social networks. D. Liben-Nowell et. al. PNAS (2005) 35 Summary Milgram’s experiment prove that social networks are navigable individuals can take advantage of short paths with basic information This is at odds with uniform random graphs The key ingredients to explain navigability A space easy to route (e.g. grid, trees, etc.). A subtle harmonic augmentation (e.g. ball radius). 36