red red red red red red red red red red red red red red red red red Plotting Power-laws and the Degree Exponent Presenter: Xiaoyan Lu RPI 2015 red red red Overview Plotting Power-Laws Estimating the Degree Exponent Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 2 / 18 Plotting Degree Distribution From Nk , the number of nodes with degree k, we calculate pk = Nk N Using linear k-axis to plot the distribution? Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 3 / 18 Linear-Linear Scale A degree distribution of the form pk ∼ (k + k0 )−γ , with k0 = 10 and γ = 2.5 It is impossible to see the distribution on a lin-lin scale. ⇒ log-log plot for scale-free networks. Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 4 / 18 Log-Log Scale, Linear Binning In the high-k region, we either have Nk = 0 (no node with degree k) or Nk = 1 (a single node with degree k). linear binning, each bin has the same size ∆k = 1. Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 5 / 18 Log-Log Scale, Log Binning n-th bin contains all nodes with degrees from 2n−1 to 2n − 1. p<kn > = Nn bn Nn is number of nodes in bin n < kn > is average degree of nodes in bin n With log-binning the plateau disappears. Show linear binning as light grey the data. Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 6 / 18 Log-Log Scale, Cumulative Binning Plot the complementary cumulative distribution Pk = If pk ∼ k −γ , then Pk ∼ k −γ+1 Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent P∞ q=k+1 pq 2015 7 / 18 Estimating the Degree Exponent Power-law distribution obeys ln pk = −γ ln k + constant We need to determine the value of γ. Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 8 / 18 Estimating the Degree Exponent Power-law distribution obeys ln pk = −γ ln k + constant We need to determine the value of γ. However, Low-degree saturation: fewer small degree nodes than expected High-degree cutoff: fewer high-degree nodes than expected Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 8 / 18 Maximum Likelihood Estimate Assume data follows a power law exactly for k ≥ Kmin MLE maximizes the probability generating observed data {ki } with the provided model: max n Y i=1 Presenter: Xiaoyan Lu (RPI) p(ki |γ) = max n Y Cpkγ i i=1 Plotting Power-laws and the Degree Exponent 2015 9 / 18 Maximum Likelihood Estimate Assume data follows a power law exactly for k ≥ Kmin MLE maximizes the probability generating observed data {ki } with the provided model: max n Y p(ki |γ) = max i=1 n Y Cpkγ i i=1 The maximum likelihood estimate for the scaling parameter: γ =1+N N X i=1 ln ki Kmin − 1 2 Kmin is the minimum value at which power-law behavior holds. Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 9 / 18 Maximum Likelihood Estimate Assume data follows a power law exactly for k ≥ Kmin MLE maximizes the probability generating observed data {ki } with the provided model: max n Y p(ki |γ) = max i=1 n Y Cpkγ i i=1 The maximum likelihood estimate for the scaling parameter: γ =1+N N X i=1 ln ki Kmin − 1 2 Kmin is the minimum value at which power-law behavior holds. How to find best Kmin ? Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 9 / 18 Kolmogorov-Smirnov Test (KS statistic) Quantifying the distance between two probability distributions: D = maxk≥Kmin |S(k) − Pk | S(k): Cumulative Distribution Function (CDF) of the data Pk : CDF provided by the fitted model Pk = 1 − ∞ X Cx −γ x=k Scanning the whole Kmin range from kmin to kmax to identify the Kmin value for mininal D. Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 10 / 18 Citation Network 2,353,984 citations among 384,362 research papers published in journals published by the American Physical Society γ = 2.79 Kmin = 49 Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 11 / 18 Goodness-of-Fit Test Is power law itself is a good model for the studied distribution? Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 12 / 18 Goodness-of-Fit Test Is power law itself is a good model for the studied distribution? Hypothesis : power law is the right model Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 12 / 18 p-value 1. Obtain Dreal , KS distance between the real data and the best fit Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 13 / 18 p-value 1. Obtain Dreal , KS distance between the real data and the best fit 2. Generate a sequence of degrees according to obtained power law, and obtain Dsynthetic for this hypothetical degree sequence Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 13 / 18 p-value 1. Obtain Dreal , KS distance between the real data and the best fit 2. Generate a sequence of degrees according to obtained power law, and obtain Dsynthetic for this hypothetical degree sequence 3. Repeat step2 M times (M >> 1) and obtain the distribution of Dsynthetic Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 13 / 18 p-value 1. Obtain Dreal , KS distance between the real data and the best fit 2. Generate a sequence of degrees according to obtained power law, and obtain Dsynthetic for this hypothetical degree sequence 3. Repeat step2 M times (M >> 1) and obtain the distribution of Dsynthetic 4. p-value is Z ∞ p= p(Dsynthetic )dDsynthetic Dreal Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 13 / 18 p-value 1. Obtain Dreal , KS distance between the real data and the best fit 2. Generate a sequence of degrees according to obtained power law, and obtain Dsynthetic for this hypothetical degree sequence 3. Repeat step2 M times (M >> 1) and obtain the distribution of Dsynthetic 4. p-value is Z ∞ p= p(Dsynthetic )dDsynthetic Dreal If p is very small, the model is not a plausible fit to the data. Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 13 / 18 Goodness-of-Fit in Citation Network For the citation network, p < 10−4 Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 14 / 18 Goodness-of-Fit in Citation Network For the citation network, p < 10−4 Choosing Kmin = 49 forces us to discard over 96% of the data points Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 14 / 18 Not a Pure Power Law Use a function that offers a better fit Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 15 / 18 Not a Pure Power Law Use a function that offers a better fit Degree distribution of many real networks, like the citation network, does not follow a pure power law pk = P∞ k 0 1 0 (k + ksat )−γ e−k =1 0 /kcut (k + ksat )−γ e−k/kcut ksat low-k saturation kcut large-k cutoff Procedure: 1. For fixed (ksat , kcut ), estimate γ using MLE 2. Identify (ksat , kcut ) values for which KS distance is minimal. 3. Obtain p-value using the Goodness-of-Fit test Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 15 / 18 Optimal Fit of Citation Network ksat = 12, kcut = 5, 691, γ = 3.028, p-value = 69% Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 16 / 18 Systematic Fitting Issues 1. A pure power law is an idealized distribution. In reality, if pk does not follow a pure power law, methods described above will inevitably fail to detect statistical significance Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 17 / 18 Systematic Fitting Issues 1. A pure power law is an idealized distribution. In reality, if pk does not follow a pure power law, methods described above will inevitably fail to detect statistical significance 2. Kolmogorov-Smirnov criteria: a single point deviating from the curve will affect the fit’s statistical significance Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 17 / 18 Systematic Fitting Issues 1. A pure power law is an idealized distribution. In reality, if pk does not follow a pure power law, methods described above will inevitably fail to detect statistical significance 2. Kolmogorov-Smirnov criteria: a single point deviating from the curve will affect the fit’s statistical significance 3. Remove a huge fraction of the nodes to obtain a statistically significant fit ? Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 17 / 18 Thanks Thanks Presenter: Xiaoyan Lu (RPI) Plotting Power-laws and the Degree Exponent 2015 18 / 18