Plotting Power-laws and the Degree Exponent, Xiaoyan Lu

advertisement
red
red
red
red
red
red
red
red
red
red
red
red
red
red
red
red
red
Plotting Power-laws and the Degree Exponent
Presenter: Xiaoyan Lu
RPI
2015
red
red
red
Overview
Plotting Power-Laws
Estimating the Degree Exponent
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
2 / 18
Plotting Degree Distribution
From Nk , the number of nodes with degree k, we calculate
pk =
Nk
N
Using linear k-axis to plot the distribution?
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
3 / 18
Linear-Linear Scale
A degree distribution of the form pk ∼ (k + k0 )−γ , with k0 = 10
and γ = 2.5
It is impossible to see the distribution on a lin-lin scale.
⇒ log-log plot for scale-free networks.
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
4 / 18
Log-Log Scale, Linear Binning
In the high-k region, we either have Nk = 0 (no node with degree
k) or Nk = 1 (a single node with degree k).
linear binning, each bin has the same size ∆k = 1.
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
5 / 18
Log-Log Scale, Log Binning
n-th bin contains all nodes with degrees from 2n−1 to 2n − 1.
p<kn > =
Nn
bn
Nn is number of
nodes in bin n
< kn > is average
degree of nodes in
bin n
With log-binning the plateau disappears. Show linear binning as
light grey the data.
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
6 / 18
Log-Log Scale, Cumulative Binning
Plot the complementary cumulative distribution Pk =
If pk ∼ k −γ , then
Pk ∼ k −γ+1
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
P∞
q=k+1 pq
2015
7 / 18
Estimating the Degree Exponent
Power-law distribution obeys
ln pk = −γ ln k + constant
We need to determine the value of γ.
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
8 / 18
Estimating the Degree Exponent
Power-law distribution obeys
ln pk = −γ ln k + constant
We need to determine the value of γ.
However,
Low-degree saturation:
fewer small degree nodes
than expected
High-degree cutoff: fewer
high-degree nodes than
expected
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
8 / 18
Maximum Likelihood Estimate
Assume data follows a power law exactly for k ≥ Kmin
MLE maximizes the probability generating observed data {ki }
with the provided model:
max
n
Y
i=1
Presenter: Xiaoyan Lu (RPI)
p(ki |γ) = max
n
Y
Cpkγ
i
i=1
Plotting Power-laws and the Degree Exponent
2015
9 / 18
Maximum Likelihood Estimate
Assume data follows a power law exactly for k ≥ Kmin
MLE maximizes the probability generating observed data {ki }
with the provided model:
max
n
Y
p(ki |γ) = max
i=1
n
Y
Cpkγ
i
i=1
The maximum likelihood estimate for the scaling parameter:
γ =1+N
N
X
i=1
ln
ki
Kmin −
1
2
Kmin is the minimum value at which power-law behavior holds.
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
9 / 18
Maximum Likelihood Estimate
Assume data follows a power law exactly for k ≥ Kmin
MLE maximizes the probability generating observed data {ki }
with the provided model:
max
n
Y
p(ki |γ) = max
i=1
n
Y
Cpkγ
i
i=1
The maximum likelihood estimate for the scaling parameter:
γ =1+N
N
X
i=1
ln
ki
Kmin −
1
2
Kmin is the minimum value at which power-law behavior holds.
How to find best Kmin ?
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
9 / 18
Kolmogorov-Smirnov Test (KS statistic)
Quantifying the distance between two probability distributions:
D = maxk≥Kmin |S(k) − Pk |
S(k): Cumulative Distribution Function (CDF) of the data
Pk : CDF provided by the fitted model
Pk = 1 −
∞
X
Cx −γ
x=k
Scanning the whole Kmin range from kmin to kmax to identify the
Kmin value for mininal D.
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
10 / 18
Citation Network
2,353,984 citations among 384,362 research papers published in
journals published by the American Physical Society
γ = 2.79
Kmin = 49
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
11 / 18
Goodness-of-Fit Test
Is power law itself is a good model for the studied distribution?
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
12 / 18
Goodness-of-Fit Test
Is power law itself is a good model for the studied distribution?
Hypothesis : power law is the right model
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
12 / 18
p-value
1. Obtain Dreal , KS distance between the real data and the
best fit
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
13 / 18
p-value
1. Obtain Dreal , KS distance between the real data and the
best fit
2. Generate a sequence of degrees according to obtained
power law, and obtain Dsynthetic for this hypothetical degree
sequence
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
13 / 18
p-value
1. Obtain Dreal , KS distance between the real data and the
best fit
2. Generate a sequence of degrees according to obtained
power law, and obtain Dsynthetic for this hypothetical degree
sequence
3. Repeat step2 M times (M >> 1) and obtain the
distribution of Dsynthetic
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
13 / 18
p-value
1. Obtain Dreal , KS distance between the real data and the
best fit
2. Generate a sequence of degrees according to obtained
power law, and obtain Dsynthetic for this hypothetical degree
sequence
3. Repeat step2 M times (M >> 1) and obtain the
distribution of Dsynthetic
4. p-value is
Z
∞
p=
p(Dsynthetic )dDsynthetic
Dreal
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
13 / 18
p-value
1. Obtain Dreal , KS distance between the real data and the
best fit
2. Generate a sequence of degrees according to obtained
power law, and obtain Dsynthetic for this hypothetical degree
sequence
3. Repeat step2 M times (M >> 1) and obtain the
distribution of Dsynthetic
4. p-value is
Z
∞
p=
p(Dsynthetic )dDsynthetic
Dreal
If p is very small, the model is not a plausible fit to the data.
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
13 / 18
Goodness-of-Fit in Citation Network
For the citation network, p < 10−4
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
14 / 18
Goodness-of-Fit in Citation Network
For the citation network, p < 10−4
Choosing Kmin = 49 forces us to discard over 96% of the data
points
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
14 / 18
Not a Pure Power Law
Use a function that offers a better fit
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
15 / 18
Not a Pure Power Law
Use a function that offers a better fit
Degree distribution of many real networks, like the citation
network, does not follow a pure power law
pk = P∞
k
0
1
0
(k + ksat )−γ e−k
=1
0
/kcut
(k + ksat )−γ e−k/kcut
ksat low-k saturation
kcut large-k cutoff
Procedure:
1. For fixed (ksat , kcut ), estimate γ using MLE
2. Identify (ksat , kcut ) values for which KS distance is minimal.
3. Obtain p-value using the Goodness-of-Fit test
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
15 / 18
Optimal Fit of Citation Network
ksat = 12, kcut = 5, 691, γ = 3.028, p-value = 69%
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
16 / 18
Systematic Fitting Issues
1. A pure power law is an idealized distribution. In reality, if
pk does not follow a pure power law, methods described
above will inevitably fail to detect statistical significance
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
17 / 18
Systematic Fitting Issues
1. A pure power law is an idealized distribution. In reality, if
pk does not follow a pure power law, methods described
above will inevitably fail to detect statistical significance
2. Kolmogorov-Smirnov criteria: a single point deviating from
the curve will affect the fit’s statistical significance
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
17 / 18
Systematic Fitting Issues
1. A pure power law is an idealized distribution. In reality, if
pk does not follow a pure power law, methods described
above will inevitably fail to detect statistical significance
2. Kolmogorov-Smirnov criteria: a single point deviating from
the curve will affect the fit’s statistical significance
3. Remove a huge fraction of the nodes to obtain a
statistically significant fit ?
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
17 / 18
Thanks
Thanks
Presenter: Xiaoyan Lu (RPI)
Plotting Power-laws and the Degree Exponent
2015
18 / 18
Download