lect01.pptx

advertisement
Welcome!
CompSci 96: The Science of Networks
SocSci 119
M,W 1:15-2:30
Professor: Jeffrey Forbes
http://www.cs.duke.edu/courses/spring11/cps096
The Science of Networks
1.1
Today’s topics

What is a network? Why are they important?
The Oracle of Bacon
Network construction

Acknowledgements




Notes taken from Michael Kearns ,Lada Adamic, and
Nicole Immorlica
Upcoming


Network Structure: Graph Theory
GUESS
The Science of Networks
1.2
Course Information
“The structure and interconnectivity of social, technological, and natural networks.
Network structure: graph theory, economic, social, physical, and natural networks.
Network behavior: game theory, markets and strategic interaction, aggregate and
emergent functions, and dynamics. Information networks: search and integration.
Applications in sociology, economics, public policy, and computing..”

Grading Breakdown
Assessment

Weight (approx)
Assignments (5)
30%
Blog Posts (3)
15%
Classwork/Com
munity
15%
Midterm
15%
Final
25%


The Science of Networks
No background assumed, but
we will
 Interpret and work with
models both quantitatively
and qualitatively
Important Dates
 Midterm 2/23
 Projects due 4/21
 Final 5/5 9am-Noon
Let me know ASAP if you have
any concerns
1.3
A Future for Computer Science?
The Science of Networks
1.4
Emerging science of networks




Examining apparent similarities between many human and
technological systems & organizations
 Importance of network effects in such systems
How things are connected matters greatly
 Structure, asymmetry and heterogeneity
Details of interaction matter greatly
 The metaphor of viral spread
 Dynamics of economic and strategic interaction
 Qualitative and quantitative; can be very subtle
A revolution of
 measurement
 theory
 breadth of vision
(M. Kearns)
The Science of Networks
1.5
What is a network?


A collection of individual or atomic entities
Links can represent any pairwise relationship


Network: entire collection of nodes and links


might sometimes be annotated by other info (weights, etc.)
For us, a network is an abstract object (list of pairs)
and is separate from its visual layout


Links can be directed or undirected
that is, we will be interested in properties that are layoutinvariant
We will be interested in properties of networks


often structural properties
often statistical properties of families of networks
The Science of Networks
1.6
Repesenting networks


Networks are collections of points joined by lines.
What kinds of questions might we ask?
node
“Network” ≡ “Graph”
edge
The Science of Networks
points
lines
vertices
edges, arcs
math
nodes
links
computer science
sites
bonds
physics
actors
ties, relations
sociology
1.7
Definitions
5
7
8
6
3
2
4
1



Path: a sequence of nodes (v1, …, vk) such that for
any adjacent pair vi and vi+1, there’s an edge ei,i+1
between them.
Distance: the length of the shortest path between
two nodes
Diameter: the maximum shortest-path distance
between any two nodes
The Science of Networks
1.8
Network Definitions

Network size: total number of vertices (denoted n)



If the distance between all pairs is finite, we say the
network is connected; else it has multiple components
Attributes of edges



Weight or cost
Direction
Degree of a node v = number of edges connected to v


Maximum possible number of edges (m)?
Directed versions (in-degree and out-degree)
What else might we want to model beyond just the
connections?
The Science of Networks
1.9
Issues

Why model networks? Structure & dynamics

Models (structure): who is linked to whom?
• How does position within a network (dis)advantage an
agent?
• What are the factors that lead people to trust each other?
• Graph theoretic models

Implications (dynamics): individual behavior can have
global consequences
•
•
•
•
•
Diffusion of disease and information
Search by navigating the network
Resilience
Population, structural, and aggregate effects
Game theoretic models
The Science of Networks
1.10
Social networks

Example: Acquaintanceship networks
 vertices: people in the world
 links: have met in person and know last names
 hard to measure

Example: scientific collaboration
 vertices: math and computer science researchers
 links: between coauthors on a published paper
 Erdos numbers : distance to Paul Erdos
 Erdos was definitely a hub or connector; had 507 coauthors
How do we navigate in such networks?

The Science of Networks
1.11
Acquaintanceship & more
The Science of Networks
1.12
Six Degrees of Bacon

Background
 Stanley Milgram’s Six Degrees of Separation?
 Craig Fass, Mike Ginelli, and Brian Turtle invented it
as a drinking game at Albright College
 Brett Tjaden, Glenn Wasson, Patrick Reynolds have run t
online website from UVa and beyond
 Instance of Small-World phenomenon

http://oracleofbacon.org handles 2 kinds of requests
1. Find the links from Actor A to Actor B.
2. How good a center is a given actor?
 How does it answer these requests?
The Science of Networks
1.13
How does the Oracle work?


Not using Oracle™
Queries require traversal of the graph
BN = 1
Sean Penn
BN = 0
Kevin Bacon
Mystic River
Tim Robbins
Tom Hanks
Apollo 13
Footloose
Bill Paxton
Sarah Jessica Parker
John Lithgow
The Science of Networks
1.14
How does the Oracle Work?
BN = Bacon Number
Queries require traversal of the graph


BN = 2
Woody Allen
BN = 1
Sean Penn
Sweet and Lowdown
Judge Reinhold
Fast Times at Ridgemont High
Miranda Otto
War of the Worlds
BN = 0
Kevin Bacon
Mystic River
Tim Robbins
The Shawshank Redemption
Morgan Freeman
Cast Away
Helen Hunt
Tom Hanks
Apollo 13
Bill Paxton
Footloose
Forrest Gump
Sarah Jessica Parker
Sally Field
Tombstone
John Lithgow
A Simple Plan
Val Kilmer
Billy Bob Thornton
The Science of Networks
1.15
How does the Oracle work?

How do we choose which movie or actor to explore next?
 Queries require traversal of the graph
BN = 2
Woody Allen
BN = 1
Sean Penn
Sweet and Lowdown
Judge Reinhold
Fast Times at Ridgemont High
Miranda Otto
War of the Worlds
BN = 0
Kevin Bacon
Mystic River
Tim Robbins
The Shawshank Redemption
Morgan Freeman
Cast Away
Helen Hunt
Tom Hanks
Apollo 13
Bill Paxton
Footloose
Forrest Gump
Sarah Jessica Parker
Sally Field
Tombstone
John Lithgow
A Simple Plan
Val Kilmer
Billy Bob Thornton
The Science of Networks
1.16
Center of the Hollywood
Universe?


1,246,221 people can be connected to Bacon
Is he the center of the Hollywood Universe?




Who is?
Who are other good centers?
What makes them good centers?
Centrality



Closeness: the inverse average distance of a node to all
other nodes
Degree: the degree of a node
Betweenness: a measure of how much a vertex is between
other nodes
The Science of Networks
1.17
Oracle of Bacon

Name someone who is 4 degrees or more away
from Kevin Bacon
1
2
3
4
5
6

What characteristics makes someone farther away?

What makes someone a good center? Is Kevin
Bacon a good center?
The Science of Networks
1.18
Sample Blog Post

I'm Related to Kevin Bacon?
 Overview of the Oracle of Bacon:In class we have talked a lot about social
and computer networks and all of their component parts. We have learned
many important aspects of networks and what makes them operate. One
of the most interesting and complex notions is that of centrality and how
one can go about calculating centrality within a social network. The
Oracle of Bacon is one of the best examples of a project that has created an
elaborate social network around the central figure of Kevin Bacon.
However, it is interesting that the site proves Kevin Bacon to actually not
be the center of the Hollywood network, in fact there are actually 1,048
actors who would make better centers than Bacon. Here is a breakdown of
the best and worst centers of the Hollywood network. Although the only
other actor mentioned who would make a better center is Sean Connery, it
can be speculated as to what makes a great center. A good center would
have to be an older actor, have appeared in many movies and many
varities of movies, have appeared in large productions with many actors
and have worked overseas. Alternatively, a bad center would be young,
have appeared in only one type of movie, or one movie in general!
The Science of Networks
1.19
Why is the Oracle of Bacon Interesting to us?
• In reality, the game is an example of the small world phenomenon. The
small world phenomenon was researched by Stanley Milgram as he
examined the average path length for social networks of people in the
United States. The phenomenon shows that paths between nodes are
always shorter than expected, which is proved in the game. This oracle
of Bacon game was designed by computer scientists at the University
of Virginia in order to create an engaging way of dealing with the
small world phenomenon. The program for calculating a Bacon
number was developed by mapping networks from http://imdb.com/
(the database for movies and actors information).
 Other related points
• Here is the original paper by Stanley Milgram, upon which all of this
information is based. The game works to find links between different
actors and find the degree of separation from Bacon. It is amazing that
almost any actor, no matter how obscure, can be linked to Bacon within
six degrees and the average is under three links (2.960).
• It is also interesting to look at the earlier examples of small world
phenomenon, which inspired the oracle of Bacon. Erdos numbers refer
to the number of nodes mathematicians are away from Paul Erdos, a
Hungarian mathematician famous for collaboration. The Erdos number
project gives details similar to the Oracle of Bacon about the amount of
connectivity within the network of mathematicians. In this network
the median Erdos number is 5; the mean is 4.65, and the standard
deviation is 1.21. This shows that there is slightly less connectivity, but
aofhigh
degree of centrality.
The Science
Networks
1.20


Here is a visualization of the Erdos Network.

More recent centrality work
• There are many examples of computer scientists who have dealt with
the six degrees theory in their analysis of the small-world phenomenon
including Jon Kleinberg. His paper: Could it be a Big World After All?
The `Six Degrees of Separation’ Myth. Society, April 2002 deals with a
lot of the important ideas discussed above. Kleinberg argues that the
initial data used to create the notion of the small-world phenomenon
was actually skewed and data shows that there might actually be less
connectivity between people that was previously believed. This paper
was published in 2002, and it does not seem to have garnered a large
amount of debate amongst the scholarly community. It seems that
more work and experimentation needs to be done in this field to in
attempt to make claims about the connectedness of the actual world.
Although Kleinberg and others made some really interesting points
initially, unfortunately the computer science world seems focused on
novelty, not finishing work on a phenomenon, so it may be awhile
before all of our questions are answered!
The Science of Networks
1.21
Physical Networks

The Internet
 Vertices: Routers
 Edges: Physical connections

Another layer of abstraction
 Vertices: Autonomous systems
 Edges: peering agreements
 Both a physical and business network
Other examples



US Power Grid
Interdependence and August 2003 blackout
The Science of Networks
1.22
What does the Internet look like?
The Science of Networks
1.23
US Power Grid
The Science of Networks
1.24
Business & Economic Networks




Example: eBay bidding
 vertices: eBay users
 links: represent bidder-seller or buyer-seller
 fraud detection: bidding rings
Example: corporate boards
 vertices: corporations
 links: between companies that share a board member
Example: corporate partnerships
 vertices: corporations
 links: represent formal joint ventures
Example: goods exchange networks
 vertices: buyers and sellers of commodities
 links: represent “permissible” transactions
The Science of Networks
1.25
Enron
The Science of Networks
1.26
Content Networks

Example: Document similarity




Vertices: documents on web
Edges: Weights defined by similarity
See TouchGraph GoogleBrowser
Conceptual network: thesaurus


Vertices: words
Edges: synonym relationships
The Science of Networks
1.27
Wordnet
Source: http://wordnet.princeton.edu/man/wnlicens.7WN
The Science of Networks
1.28
Biological Networks

Example: the human brain





Vertices: neuronal cells
Edges: axons connecting cells
links carry action potentials
computation: threshold behavior
N ~ 100 billion
The Science of Networks
1.29
Gene regulatory networks

Humans have only 30,000 genes, 98% shared with chimps
The complexity is in the interaction of genes

Can we predict what result of the inhibition of one gene will

be?
Source: http://www.zaik.uni-koeln.de/bioinformatik/regulatorynets.html.en
The Science of Networks
1.30
Types of networks


Pick a class of network:
Give a real-world example of such a network:

What are the vertices (nodes)?

What are the edges (links)?

How is the network formed? Is it decentralized or
centralized? Is the communication or interaction local or
global?

What is the network's topology? For example, is it
connected? What is its size? What is the degree
distribution?
The Science of Networks
1.31
Graph properties

Max Degree?

Center?
The Science of Networks
1.32
Wrap up

Networks are everywhere and can be used to
describe many, many systems.

By modeling networks, we can start to understand
their properties and the implications those
properties have for processes occurring on the
network
The Science of Networks
1.33
Download