Welcome! CompSci 96: The Science of Networks SocSci 119 M,W 1:15-2:30 Professor: Jeffrey Forbes http://www.cs.duke.edu/courses/spring11/cps096 The Science of Networks 1.1 Today’s topics What is a network? Why are they important? The Oracle of Bacon Network construction Acknowledgements Notes taken from Michael Kearns ,Lada Adamic, and Nicole Immorlica Upcoming Network Structure: Graph Theory GUESS The Science of Networks 1.2 Course Information “The structure and interconnectivity of social, technological, and natural networks. Network structure: graph theory, economic, social, physical, and natural networks. Network behavior: game theory, markets and strategic interaction, aggregate and emergent functions, and dynamics. Information networks: search and integration. Applications in sociology, economics, public policy, and computing..” Grading Breakdown Assessment Weight (approx) Assignments (5) 30% Blog Posts (3) 15% Classwork/Com munity 15% Midterm 15% Final 25% The Science of Networks No background assumed, but we will Interpret and work with models both quantitatively and qualitatively Important Dates Midterm 2/23 Projects due 4/21 Final 5/5 9am-Noon Let me know ASAP if you have any concerns 1.3 A Future for Computer Science? The Science of Networks 1.4 Emerging science of networks Examining apparent similarities between many human and technological systems & organizations Importance of network effects in such systems How things are connected matters greatly Structure, asymmetry and heterogeneity Details of interaction matter greatly The metaphor of viral spread Dynamics of economic and strategic interaction Qualitative and quantitative; can be very subtle A revolution of measurement theory breadth of vision (M. Kearns) The Science of Networks 1.5 What is a network? A collection of individual or atomic entities Links can represent any pairwise relationship Network: entire collection of nodes and links might sometimes be annotated by other info (weights, etc.) For us, a network is an abstract object (list of pairs) and is separate from its visual layout Links can be directed or undirected that is, we will be interested in properties that are layoutinvariant We will be interested in properties of networks often structural properties often statistical properties of families of networks The Science of Networks 1.6 Repesenting networks Networks are collections of points joined by lines. What kinds of questions might we ask? node “Network” ≡ “Graph” edge The Science of Networks points lines vertices edges, arcs math nodes links computer science sites bonds physics actors ties, relations sociology 1.7 Definitions 5 7 8 6 3 2 4 1 Path: a sequence of nodes (v1, …, vk) such that for any adjacent pair vi and vi+1, there’s an edge ei,i+1 between them. Distance: the length of the shortest path between two nodes Diameter: the maximum shortest-path distance between any two nodes The Science of Networks 1.8 Network Definitions Network size: total number of vertices (denoted n) If the distance between all pairs is finite, we say the network is connected; else it has multiple components Attributes of edges Weight or cost Direction Degree of a node v = number of edges connected to v Maximum possible number of edges (m)? Directed versions (in-degree and out-degree) What else might we want to model beyond just the connections? The Science of Networks 1.9 Issues Why model networks? Structure & dynamics Models (structure): who is linked to whom? • How does position within a network (dis)advantage an agent? • What are the factors that lead people to trust each other? • Graph theoretic models Implications (dynamics): individual behavior can have global consequences • • • • • Diffusion of disease and information Search by navigating the network Resilience Population, structural, and aggregate effects Game theoretic models The Science of Networks 1.10 Social networks Example: Acquaintanceship networks vertices: people in the world links: have met in person and know last names hard to measure Example: scientific collaboration vertices: math and computer science researchers links: between coauthors on a published paper Erdos numbers : distance to Paul Erdos Erdos was definitely a hub or connector; had 507 coauthors How do we navigate in such networks? The Science of Networks 1.11 Acquaintanceship & more The Science of Networks 1.12 Six Degrees of Bacon Background Stanley Milgram’s Six Degrees of Separation? Craig Fass, Mike Ginelli, and Brian Turtle invented it as a drinking game at Albright College Brett Tjaden, Glenn Wasson, Patrick Reynolds have run t online website from UVa and beyond Instance of Small-World phenomenon http://oracleofbacon.org handles 2 kinds of requests 1. Find the links from Actor A to Actor B. 2. How good a center is a given actor? How does it answer these requests? The Science of Networks 1.13 How does the Oracle work? Not using Oracle™ Queries require traversal of the graph BN = 1 Sean Penn BN = 0 Kevin Bacon Mystic River Tim Robbins Tom Hanks Apollo 13 Footloose Bill Paxton Sarah Jessica Parker John Lithgow The Science of Networks 1.14 How does the Oracle Work? BN = Bacon Number Queries require traversal of the graph BN = 2 Woody Allen BN = 1 Sean Penn Sweet and Lowdown Judge Reinhold Fast Times at Ridgemont High Miranda Otto War of the Worlds BN = 0 Kevin Bacon Mystic River Tim Robbins The Shawshank Redemption Morgan Freeman Cast Away Helen Hunt Tom Hanks Apollo 13 Bill Paxton Footloose Forrest Gump Sarah Jessica Parker Sally Field Tombstone John Lithgow A Simple Plan Val Kilmer Billy Bob Thornton The Science of Networks 1.15 How does the Oracle work? How do we choose which movie or actor to explore next? Queries require traversal of the graph BN = 2 Woody Allen BN = 1 Sean Penn Sweet and Lowdown Judge Reinhold Fast Times at Ridgemont High Miranda Otto War of the Worlds BN = 0 Kevin Bacon Mystic River Tim Robbins The Shawshank Redemption Morgan Freeman Cast Away Helen Hunt Tom Hanks Apollo 13 Bill Paxton Footloose Forrest Gump Sarah Jessica Parker Sally Field Tombstone John Lithgow A Simple Plan Val Kilmer Billy Bob Thornton The Science of Networks 1.16 Center of the Hollywood Universe? 1,246,221 people can be connected to Bacon Is he the center of the Hollywood Universe? Who is? Who are other good centers? What makes them good centers? Centrality Closeness: the inverse average distance of a node to all other nodes Degree: the degree of a node Betweenness: a measure of how much a vertex is between other nodes The Science of Networks 1.17 Oracle of Bacon Name someone who is 4 degrees or more away from Kevin Bacon 1 2 3 4 5 6 What characteristics makes someone farther away? What makes someone a good center? Is Kevin Bacon a good center? The Science of Networks 1.18 Sample Blog Post I'm Related to Kevin Bacon? Overview of the Oracle of Bacon:In class we have talked a lot about social and computer networks and all of their component parts. We have learned many important aspects of networks and what makes them operate. One of the most interesting and complex notions is that of centrality and how one can go about calculating centrality within a social network. The Oracle of Bacon is one of the best examples of a project that has created an elaborate social network around the central figure of Kevin Bacon. However, it is interesting that the site proves Kevin Bacon to actually not be the center of the Hollywood network, in fact there are actually 1,048 actors who would make better centers than Bacon. Here is a breakdown of the best and worst centers of the Hollywood network. Although the only other actor mentioned who would make a better center is Sean Connery, it can be speculated as to what makes a great center. A good center would have to be an older actor, have appeared in many movies and many varities of movies, have appeared in large productions with many actors and have worked overseas. Alternatively, a bad center would be young, have appeared in only one type of movie, or one movie in general! The Science of Networks 1.19 Why is the Oracle of Bacon Interesting to us? • In reality, the game is an example of the small world phenomenon. The small world phenomenon was researched by Stanley Milgram as he examined the average path length for social networks of people in the United States. The phenomenon shows that paths between nodes are always shorter than expected, which is proved in the game. This oracle of Bacon game was designed by computer scientists at the University of Virginia in order to create an engaging way of dealing with the small world phenomenon. The program for calculating a Bacon number was developed by mapping networks from http://imdb.com/ (the database for movies and actors information). Other related points • Here is the original paper by Stanley Milgram, upon which all of this information is based. The game works to find links between different actors and find the degree of separation from Bacon. It is amazing that almost any actor, no matter how obscure, can be linked to Bacon within six degrees and the average is under three links (2.960). • It is also interesting to look at the earlier examples of small world phenomenon, which inspired the oracle of Bacon. Erdos numbers refer to the number of nodes mathematicians are away from Paul Erdos, a Hungarian mathematician famous for collaboration. The Erdos number project gives details similar to the Oracle of Bacon about the amount of connectivity within the network of mathematicians. In this network the median Erdos number is 5; the mean is 4.65, and the standard deviation is 1.21. This shows that there is slightly less connectivity, but aofhigh degree of centrality. The Science Networks 1.20 Here is a visualization of the Erdos Network. More recent centrality work • There are many examples of computer scientists who have dealt with the six degrees theory in their analysis of the small-world phenomenon including Jon Kleinberg. His paper: Could it be a Big World After All? The `Six Degrees of Separation’ Myth. Society, April 2002 deals with a lot of the important ideas discussed above. Kleinberg argues that the initial data used to create the notion of the small-world phenomenon was actually skewed and data shows that there might actually be less connectivity between people that was previously believed. This paper was published in 2002, and it does not seem to have garnered a large amount of debate amongst the scholarly community. It seems that more work and experimentation needs to be done in this field to in attempt to make claims about the connectedness of the actual world. Although Kleinberg and others made some really interesting points initially, unfortunately the computer science world seems focused on novelty, not finishing work on a phenomenon, so it may be awhile before all of our questions are answered! The Science of Networks 1.21 Physical Networks The Internet Vertices: Routers Edges: Physical connections Another layer of abstraction Vertices: Autonomous systems Edges: peering agreements Both a physical and business network Other examples US Power Grid Interdependence and August 2003 blackout The Science of Networks 1.22 What does the Internet look like? The Science of Networks 1.23 US Power Grid The Science of Networks 1.24 Business & Economic Networks Example: eBay bidding vertices: eBay users links: represent bidder-seller or buyer-seller fraud detection: bidding rings Example: corporate boards vertices: corporations links: between companies that share a board member Example: corporate partnerships vertices: corporations links: represent formal joint ventures Example: goods exchange networks vertices: buyers and sellers of commodities links: represent “permissible” transactions The Science of Networks 1.25 Enron The Science of Networks 1.26 Content Networks Example: Document similarity Vertices: documents on web Edges: Weights defined by similarity See TouchGraph GoogleBrowser Conceptual network: thesaurus Vertices: words Edges: synonym relationships The Science of Networks 1.27 Wordnet Source: http://wordnet.princeton.edu/man/wnlicens.7WN The Science of Networks 1.28 Biological Networks Example: the human brain Vertices: neuronal cells Edges: axons connecting cells links carry action potentials computation: threshold behavior N ~ 100 billion The Science of Networks 1.29 Gene regulatory networks Humans have only 30,000 genes, 98% shared with chimps The complexity is in the interaction of genes Can we predict what result of the inhibition of one gene will be? Source: http://www.zaik.uni-koeln.de/bioinformatik/regulatorynets.html.en The Science of Networks 1.30 Types of networks Pick a class of network: Give a real-world example of such a network: What are the vertices (nodes)? What are the edges (links)? How is the network formed? Is it decentralized or centralized? Is the communication or interaction local or global? What is the network's topology? For example, is it connected? What is its size? What is the degree distribution? The Science of Networks 1.31 Graph properties Max Degree? Center? The Science of Networks 1.32 Wrap up Networks are everywhere and can be used to describe many, many systems. By modeling networks, we can start to understand their properties and the implications those properties have for processes occurring on the network The Science of Networks 1.33