Long Tails and Navigation Networked Life CIS 112 Spring 2010

Long Tails and Navigation
Networked Life
CIS 112
Spring 2010
Prof. Michael Kearns
One More (Structural) Property…
• A properly tuned a-model can simultaneously explain
– small diameter
– high clustering coefficient
– other models can, too (e.g. cycle+random rewirings)
• But what about connectors and heavy-tailed degree distributions?
– a-model and simple variants will not explain this
– intuitively, no “bias” towards large degree evolves
– “all vertices are created equal”
• As usual, we want a “natural” model to explain this
Quantifying Connectors:
Heavy-Tailed Distributions
Heavy-tailed Distributions
• Pareto or power law distributions:
for random variables assuming integer values > 0
probability of value x ~ 1/x^a
typically 0 < a < 2; smaller a gives heavier tail
here are some examples
sometimes also referred to as being scale-free
• For binomial, normal, and Poisson distributions the tail probabilities
approach 0 exponentially fast
• Inverse polynomial decay vs. inverse exponential decay
• What kind of phenomena does this distribution model?
• What kind of process would generate it?
Distributions vs. Data
All these distributions are idealized models
In practice, we do not see distributions, but data
Thus, there will be some largest value we observe
Also, can be difficult to “eyeball” data and choose model
So how do we distinguish between Poisson, power law, etc?
Typical procedure:
might restrict our attention to a range of values of interest
accumulate counts of observed data into equal-sized bins
look at counts on a log-log plot
note that
• power law:
log(Pr[value = x]) = log(1/x^a) = -a log(x)
linear, slope –a
log(Pr[value = x]) = log(a exp(-x^2/b)) = log(a) – x^2/b
non-linear, concave near mean
log(Pr[value = x]) = log(exp(-l) l^x/x!)
also non-linear
• Normal/Gaussian:
• Poisson:
Let’s look at the paper on dollar bill migration
Heavy Tails Recap
We plot the distribution or histogram of some “resource”
By “heavy-tailed”, we broadly mean that the rate of decay is “slow” as we
move to the right on the x-axis
In mathland, we can write explicit equations for a slow rate of decay
When confronted with data, the empirical test for heavy tails is:
A related concept: ranking by some quantity
Again, heavy-tailed means rate of decay is slow as we move to the right
Claim: many interesting quantities have heavy-tailed distributions/rankings
on the x-axis, we put the amount or quantity of this resource (e.g. degrees)
on the y-axis, we put the number or fraction of the population with the corresponding amount
e.g. Pareto or power-law distributions
plot the histogram in log-log form (e.g. x = log(degree), y = log(# with that degree))
log-log plot should look roughly linear, especially towards the right
e.g. look at most popular iPhone app, second most popular, third most popular
x-axis: rank by popularity
y-axis: how popular is it (# downloads)
and again, signature is linearity of log(rank) vs. log(popularity)
Long Tail of iPhone App Popularity
Zipf’s Law
Look at the frequency of English words:
General theme:
Other examples:
People seem to dither over exact form of these distributions
– “the” is the most common, followed by “of”, “to”, etc.
– claim: frequency of the n-th most common ~ 1/n (power law, a ~ 1)
– rank events by their frequency of occurrence
– resulting distribution often is a power law!
North America city sizes
personal income
file sizes
genus sizes (number of species)
the “long tail of search” (on which more later…)
let’s look at log-log plots of these
– e.g. value of a
– but not over heavy tails
Generating Heavy-Tailed Degrees:
(Just) One Model
Preferential Attachment
Let’s warm up with a little Matlab demo…
Start with (say) two vertices connected by an edge
At each step, add one new vertex v with one edge back to previous vertices
Probability a previously added vertex u receives the new edge from v is
proportional to the (current) degree of u
– more precisely, probability u gets the edge =
(current degree of u)/(sum of all current degrees)
Vertices with high degree are likely to get even more links!
“Rich Get Richer” or “Matthew Effect”
Natural model for many processes:
Generates a power law distribution of degrees
Let’s look at the NetLogo simulation
Variation: each new vertex initially gets k edges
– …just like the crowded nightclub
– hyperlinks on the web
– new business and social contacts
– technology adoption (e.g. online social networks)
Two Out of Three Isn’t Bad…
• Preferential attachment explains
– heavy-tailed degree distributions
– small diameter (~log(N), via “hubs”)
• Will not generate very high clustering coefficients
– no bias towards local connectivity, but towards hubs
• Can we simultaneously capture all three properties?
– probably, but we’ll stop here
– soon there will be a fourth property anyway…
Navigation (Search) Revisited
Finding the Short Paths
• Milgram’s experiment, Columbia Small Worlds, E-R, a-model, P.A.…
– all emphasize existence of short paths between pairs (small diameter)
• How do individuals find short paths?
– in an incremental, next-step fashion
– using purely local information about the NW and location of target
– note: shortest path might require taking steps “away” from the target!
• This is not (only) a structural question, but an algorithmic one
– statics vs. dynamics
• Navigability may impose additional restrictions on formation model!
• Briefly investigate two alternatives:
– a local/long-distance mixture model [Kleinberg]
– a “social identity” model [Watts, Dodd, Newman]
Kleinberg’s Model
• Start with an n by n grid of vertices (so N = n^2)
– add some long-distance connections to each vertex:
• k additional connections
• probability of connection to grid distance d: ~ (1/d)^r
– c.f. dollar bill migration paper, r ~ 1.6
– so full model given by choice of k and r
– large r: heavy bias towards “more local” long-distance connections
– small r: approach uniformly random
• Kleinberg’s question:
– what value of r permits effective navigation?
– # hops << N, e.g. log(N)
• Assume parties know only:
– grid address of target
– addresses of their own immediate neighbors
• Algorithm: pass message to nbr closest to target in grid
Kleinberg’s Result
– if r is too large (strong local bias), then “long-distance” connections never help much;
short paths may not even exist (remember, grid has large diameter, ~ sqrt(N))
– if r is too small (no local bias), we may quickly get close to the target; but then we’ll
have to use grid links to finish
• think of a transport system with only long-haul jets or donkey carts
– effective search requires a delicate mixture of link distances
The result (informally):
– r = 2 is the only value that permits rapid navigation (~log(N) steps)
– any other value of r will result in time ~ N^c for 0 < c <= 1
• N^c >> log(N) for large N
– a critical value phenomenon or “knife’s edge”; very sensitive
– contrast with 1/d^(1.59) from dollar bill migration paper
Note: locality of information crucial to this argument
Later in the course: What happens when distance-d edges cost d^r?
– centralized, “birds-eye” algorithm can still compute short paths at r < 2!
– can recognize when “backwards” steps are beneficial
Navigation via Identity
• Watts et al.:
– we don’t navigate social networks by purely “geographic” information
– we don’t use any single criterion; recall Dodds et al. on Columbia SW
– different criteria used at different points in the chain
• Represent individuals by a vector of attributes
profession, religion, hobbies, education, background, etc…
attribute values have distances between them (tree-structured)
distance between individuals: minimum distance in any attribute
all jobs
only need one thing in common to be close!
• Algorithm:
– given attribute vector of target
– forward message to neighbor closest to target
• Let’s look a bit at the paper
• Permits fast navigation under broad conditions
– not as sensitive as Kleinberg’s model