Slides

advertisement
CSE 522 – Algorithmic and
Economic Aspects of the
Internet
Instructors:
Nicole Immorlica
Mohammad Mahdian
Topics covered in the course

Structure and modeling of social
networks
Power law graphs; Small world phenomenon; High clustering coefficient;
Probabilistic and game theoretic models

Algorithms for link analysis
Crawling the web; HITS; Page Rank; Webspam; Rank aggregation;
Spectral clustering

Economic aspects of the Internet
Peering relations; Alternative mechanisms for routing; P2P networks

Topics motivated by e-commerce
Reputation mechanisms; Recommendation systems; Ad auctions
Logistics

Course web page:
http://www.cs.washington.edu/education/courses/cse522/05au/

Course work:



reading papers (1/week on avg)
possibly a few problem sets
How to contact us:
{nickle,mahdian}@microsoft.com
Social Networks

A social network is a graph that represents
relationships between independent entities.






Graph of friendships (or in the virtual world,
networks like orkut)
Web of sexual contact
Graph of scientific collaborations
Cross-posts in newsgroups
Web graph (links between webpages)
Internet: Inter/Intra-domain graph
Scientific Collaboration Network

400,000 nodes, authors
in Mathematical
Reviews database

An edge between two
authors if they have a
joint paper

Just 676,000 edges
Picture from orgnet.com
Scientific Collaboration Network


Average degree 3.36
A few high-degrees:




Paul Erdös, 509
Frank Harary, 268
Yuri Alekseevich
Mitropolskii, 244
Many low-degrees:
(100,000 of degree 1)
Picture from orgnet.com
Scientific Collaboration Network

Short paths




Max Erdös # is 13
Any two authors connected
by path of length at most 23
Average distance between
two authors is 7.64
e.g.: John Nash
→ Shapley
→ Fulkerson
→ Hoffman
→ Paul Erdös

Many triangles …
Picture from orgnet.com
9/11 Terrorist Network
Picture from orgnet.com
Newsgroup Cross-Post Graph


Nodes are newsgroups, essentially archived
email lists
Edges are cross-posts, i.e. there is an edge
between two newsgroups to which an
identical email is posted
alt.microsoft.sucks
alt.linux.sucks
Internet Graphs

Inter-domain graphs


Nodes are autonomous systems or domains
Edges are inter-domain connections
SPRINT
AOL
Inter-domain graph
Picture from caida.org
Internet Graphs

Intra-domain graphs


Nodes are routers
Edges are links between routers
199.45.130.13
199.45.143.14
Intra-domain graph
Colored by AS number
Picture from lumeta.com
World Wide Web


Nodes are webpages
Arcs (i.e., directed edges) are hyperlinks
http://research.microsoft.com/~mahdian
http://theory.csail.mit.edu
Web graph,
Chicago Tribune Page
Picture generated by Nicheworks
Social Networks
Why Study These Networks





Understand the creation of these networks
Understand viral epidemics
Help design crawling strategies for the web
Analyze behavior of algorithms (web/internet)
Predict evolution of the network and
emergence of new phenomena
In this lecture

Common properties of social networks




Power law degree distribution
Small world phenomenon
High clustering coefficient
Structure of the web graph
Power Laws



Two quantities x and y are related by a power
law if y is proportional to x(-c) for a constant c
y = .x(-c)
If x and y are related by a power law, then the
graph of log(y) versus log(x) is a straight line
log(y) = -c.log(x) + log()
The slope of the log-log plot is the power
exponent c
Power Law Distributions



A random variable X has a power law
distribution if Pr[X=k] is proportional to k(-c) for
a constant c
The cumulative distribution, Pr[X>k], of a
power law distribution is proportional to k(-c+1),
and is called the Pareto law
Similar to a power law, the Zipf law relates
the rank r of X to its size: the r’th largest
instance of X is proportional to r(-c’)
Example: City Populations
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
New York
Los Angeles
Chicago
Houston
Philadelphia
San Diego
Detroit
Dallas
Phoenix
San Antonio
7,322,564
3,485,398
2,783,726
1,630,553
1,585,577
1,110,549
1,027,974
1,006,877
983,403
935,933
Example: City Populations
3.
New York
Los Angeles
Chicago
21.
Seattle
516,259
94.
Spokane, WA
Tacoma, WA
Little Rock, AR
Bakersfield, CA
Fremont, CA
Fort Wayne, IN
Arlington, VA
177,196
176,664
175,795
174,820
173,339
173,072
170,936
1.
2.
95.
96.
97.
98.
99.
100.
7,322,564
3,485,398
2,783,726
Example: City Populations

Power law exponent: c = 0.74
Power Laws in Networks

Degree distribution often satisfies a power law:
fraction of nodes fd of degree d is proportional to d-c
Degree d
Fraction fd = 1/(2d)
1
1/2
2
1/4
3
1/6
4
~1/8
Example: Collaboration Graph

Power law exp:
c = 2.97

With exponential
decay factor,
c = 2.46
Example: Cross-Post Graph

Power law exponent: c = 1.3
Example: Inter-Domain Internet

Power law exponent: 2.15 < c < 2.2
Example: Intra-Domain Internet

Power law exponent: c = 2.48
Example: Web Graph In-Degree

Power law exponent: c = 2.09
Example: Web Graph Out-Degree

Power law exponent: c = 2.72
Small World Phenomenon
Six degrees of separation:
“Everybody on this planet is
separated by only six other
people. Six degrees of separation
between us and everyone else on
this planet. The President of the
United States, a gondolier in
Venice, just fill in the names.”
Small World Phenomenon

Milgram’s famous experiment (1960s):




Choose a random person in Nebraska, Bob
Ask Bob to deliver a letter to a random person in
Massachusetts, Lashawn
Tell Bob target’s name, address, and occupation
Instruct Bob to only send letter to people he
knows on a first-name basis
Small World Phenomenon
Bernard, David’s cousin
who went to college with
David, mayor
of Bob’s town
Bob, a farmer
in Nebraska
Maya, who grew
up in Boston
With Lashawn
Small World Phenomenon in Graphs



The diameter of a graph is the maximum
distance (number of edges) between any pair
of nodes
The average distance of a graph is the
average distance between any pair of nodes
The average connected distance of a graph
is the average distance between any pair of
connected nodes
Small World Phenomenon in Graphs

A graph exhibits a small world phenomenon
if it has low diameter or average (connected)
distance

Typically, the average distance of a small
world graph is on the order of log n (where n
is the number of nodes)
Examples

Collaboration graph



Cross-post graph, giant component



30,000 nodes, 800,000 edges (average degree 53.3)
Diameter: 13, Average distance: 3.8
Web graph



401,000 nodes, 676,000 edges (average degree 3.37)
Diameter: 23, Average distance: 7.64
200 million nodes, 1.5 billion edges (average degree 15)
Average connected distance: 16
Inter-domain Internet


3500 nodes, 6500 edges (average degree 3.71)
95% of pairs of nodes within distance 5
High Clustering Coefficient



The clustering coefficient of a graph is the
fraction of triangles among connected triples
of nodes
Intuitively, the clustering coefficient reflects
the probability that your friends are
themselves friends
We expect social networks to have a high
clustering coefficient
Examples

Collaboration graph



Clustering coefficient is 0.14
Density of edges is 0.000008
Cross-post graph


Clustering coefficient is 0.4492
Density of edges is 0.0016
Assignment
READ:
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S.
Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, Graph
structure in the web, WWW, 2000.
Graph Structure of the Web

Breadth-first search from randomly chosen
start nodes




Follow both forward and backward links
Reveal directed and undirected graph structure
Over 90% of nodes reachable if links are
treated as undirected
Directed graph reveals complex bow-tie
structure
Bow-Tie Structure of Web Graph
Picture from the Nature journal
Next Time
Probabilistic models for
social networks
Download