CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian Topics covered in the course Structure and modeling of social networks Power law graphs; Small world phenomenon; High clustering coefficient; Probabilistic and game theoretic models Algorithms for link analysis Crawling the web; HITS; Page Rank; Webspam; Rank aggregation; Spectral clustering Economic aspects of the Internet Peering relations; Alternative mechanisms for routing; P2P networks Topics motivated by e-commerce Reputation mechanisms; Recommendation systems; Ad auctions Logistics Course web page: http://www.cs.washington.edu/education/courses/cse522/05au/ Course work: reading papers (1/week on avg) possibly a few problem sets How to contact us: {nickle,mahdian}@microsoft.com Social Networks A social network is a graph that represents relationships between independent entities. Graph of friendships (or in the virtual world, networks like orkut) Web of sexual contact Graph of scientific collaborations Cross-posts in newsgroups Web graph (links between webpages) Internet: Inter/Intra-domain graph Scientific Collaboration Network 400,000 nodes, authors in Mathematical Reviews database An edge between two authors if they have a joint paper Just 676,000 edges Picture from orgnet.com Scientific Collaboration Network Average degree 3.36 A few high-degrees: Paul Erdös, 509 Frank Harary, 268 Yuri Alekseevich Mitropolskii, 244 Many low-degrees: (100,000 of degree 1) Picture from orgnet.com Scientific Collaboration Network Short paths Max Erdös # is 13 Any two authors connected by path of length at most 23 Average distance between two authors is 7.64 e.g.: John Nash → Shapley → Fulkerson → Hoffman → Paul Erdös Many triangles … Picture from orgnet.com 9/11 Terrorist Network Picture from orgnet.com Newsgroup Cross-Post Graph Nodes are newsgroups, essentially archived email lists Edges are cross-posts, i.e. there is an edge between two newsgroups to which an identical email is posted alt.microsoft.sucks alt.linux.sucks Internet Graphs Inter-domain graphs Nodes are autonomous systems or domains Edges are inter-domain connections SPRINT AOL Inter-domain graph Picture from caida.org Internet Graphs Intra-domain graphs Nodes are routers Edges are links between routers 199.45.130.13 199.45.143.14 Intra-domain graph Colored by AS number Picture from lumeta.com World Wide Web Nodes are webpages Arcs (i.e., directed edges) are hyperlinks http://research.microsoft.com/~mahdian http://theory.csail.mit.edu Web graph, Chicago Tribune Page Picture generated by Nicheworks Social Networks Why Study These Networks Understand the creation of these networks Understand viral epidemics Help design crawling strategies for the web Analyze behavior of algorithms (web/internet) Predict evolution of the network and emergence of new phenomena In this lecture Common properties of social networks Power law degree distribution Small world phenomenon High clustering coefficient Structure of the web graph Power Laws Two quantities x and y are related by a power law if y is proportional to x(-c) for a constant c y = .x(-c) If x and y are related by a power law, then the graph of log(y) versus log(x) is a straight line log(y) = -c.log(x) + log() The slope of the log-log plot is the power exponent c Power Law Distributions A random variable X has a power law distribution if Pr[X=k] is proportional to k(-c) for a constant c The cumulative distribution, Pr[X>k], of a power law distribution is proportional to k(-c+1), and is called the Pareto law Similar to a power law, the Zipf law relates the rank r of X to its size: the r’th largest instance of X is proportional to r(-c’) Example: City Populations 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. New York Los Angeles Chicago Houston Philadelphia San Diego Detroit Dallas Phoenix San Antonio 7,322,564 3,485,398 2,783,726 1,630,553 1,585,577 1,110,549 1,027,974 1,006,877 983,403 935,933 Example: City Populations 3. New York Los Angeles Chicago 21. Seattle 516,259 94. Spokane, WA Tacoma, WA Little Rock, AR Bakersfield, CA Fremont, CA Fort Wayne, IN Arlington, VA 177,196 176,664 175,795 174,820 173,339 173,072 170,936 1. 2. 95. 96. 97. 98. 99. 100. 7,322,564 3,485,398 2,783,726 Example: City Populations Power law exponent: c = 0.74 Power Laws in Networks Degree distribution often satisfies a power law: fraction of nodes fd of degree d is proportional to d-c Degree d Fraction fd = 1/(2d) 1 1/2 2 1/4 3 1/6 4 ~1/8 Example: Collaboration Graph Power law exp: c = 2.97 With exponential decay factor, c = 2.46 Example: Cross-Post Graph Power law exponent: c = 1.3 Example: Inter-Domain Internet Power law exponent: 2.15 < c < 2.2 Example: Intra-Domain Internet Power law exponent: c = 2.48 Example: Web Graph In-Degree Power law exponent: c = 2.09 Example: Web Graph Out-Degree Power law exponent: c = 2.72 Small World Phenomenon Six degrees of separation: “Everybody on this planet is separated by only six other people. Six degrees of separation between us and everyone else on this planet. The President of the United States, a gondolier in Venice, just fill in the names.” Small World Phenomenon Milgram’s famous experiment (1960s): Choose a random person in Nebraska, Bob Ask Bob to deliver a letter to a random person in Massachusetts, Lashawn Tell Bob target’s name, address, and occupation Instruct Bob to only send letter to people he knows on a first-name basis Small World Phenomenon Bernard, David’s cousin who went to college with David, mayor of Bob’s town Bob, a farmer in Nebraska Maya, who grew up in Boston With Lashawn Small World Phenomenon in Graphs The diameter of a graph is the maximum distance (number of edges) between any pair of nodes The average distance of a graph is the average distance between any pair of nodes The average connected distance of a graph is the average distance between any pair of connected nodes Small World Phenomenon in Graphs A graph exhibits a small world phenomenon if it has low diameter or average (connected) distance Typically, the average distance of a small world graph is on the order of log n (where n is the number of nodes) Examples Collaboration graph Cross-post graph, giant component 30,000 nodes, 800,000 edges (average degree 53.3) Diameter: 13, Average distance: 3.8 Web graph 401,000 nodes, 676,000 edges (average degree 3.37) Diameter: 23, Average distance: 7.64 200 million nodes, 1.5 billion edges (average degree 15) Average connected distance: 16 Inter-domain Internet 3500 nodes, 6500 edges (average degree 3.71) 95% of pairs of nodes within distance 5 High Clustering Coefficient The clustering coefficient of a graph is the fraction of triangles among connected triples of nodes Intuitively, the clustering coefficient reflects the probability that your friends are themselves friends We expect social networks to have a high clustering coefficient Examples Collaboration graph Clustering coefficient is 0.14 Density of edges is 0.000008 Cross-post graph Clustering coefficient is 0.4492 Density of edges is 0.0016 Assignment READ: A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, Graph structure in the web, WWW, 2000. Graph Structure of the Web Breadth-first search from randomly chosen start nodes Follow both forward and backward links Reveal directed and undirected graph structure Over 90% of nodes reachable if links are treated as undirected Directed graph reveals complex bow-tie structure Bow-Tie Structure of Web Graph Picture from the Nature journal Next Time Probabilistic models for social networks