Power Laws

advertisement
Power Laws
By Cameron Megaw
3/11/2013
What is a Power Law?
A power law is a distribution of the form:
𝑝 π‘₯ = 𝐢π‘₯ −𝛼 similarly ln 𝑝 π‘₯ = −𝛼ln π‘₯ + 𝑐
Example: The size of cities in the US (population 1000 or more)
• Highly right skewed
• The largest city has 8 million people
• Most cities have much fewer people
Measuring Power Laws
Sampling Errors
• 1 million random numbers from a power law distribution
• Exponent 𝛼 = 2.5
• Data is binned in intervals of size .1
• Linear scales produce a smooth curve
• Log-log scales have noisy data in the tail
• Result of sampling errors
• Corresponding bins have few samples (if any)
• Fractional fluctuations in the bin counts are large
Measuring Power Laws
Sampling errors
Solution 1: Throw out the data in the tail of the curve
•
Statistically significant information lost
•
Some distributions only follow a power law distribution in their tail
•
Not recommended
Measuring Power Laws
Sampling errors
Solution 2: Very the width of the bins
•
Normalize the data
•
Results in a count per unit interval of x
•
Very bin size by a fixed multiplier (for example 2)
• Bins become: 1 to 1.1, 1.1 to 1.3, 1.3 to 1.7 and so on
• Called logarithmic binning
Measuring Power Laws
Sampling errors
Solution 3: Calculate the probability distribution function (aka Zipf’s Law or a
Pareto distribution)
∞
𝛼
𝐢π‘₯ ′ 𝑑π‘₯′ =
𝑃 π‘₯ =
π‘₯
𝐢
π‘₯ 1−𝛼
𝛼−1
• No need to bin the data
• Information on individual values are preserved
• Eliminates the noise in the tail
Measuring Power Laws
Unknown exponent
1. Method of least squares:
• Most common method
• Plots the line of best fit in log-log scales
• Introduces systematic biases in the value of the exponent
• Estimated 𝛼 = 2.26 ± .02 (actual 2.5)
2. Use maximum likelihood formula
• A non-biased estimator
• Calculate an error estimate
• standard bootstrap resampling
• jackknife resampling
• Estimated 𝛼 = 2.500 ± .002
𝛼 =1+𝑛
π‘₯𝑖
𝑛
ln
𝑖=1
π‘₯
π‘šπ‘–π‘›
−1
Mathematics of Power Laws
Calculating C
∞
1=
∞
𝑝 π‘₯ 𝑑π‘₯ =
π‘₯min
π‘₯min
𝐢 π‘₯ −𝛼 𝑑π‘₯
𝐢
=
π‘₯ 1−𝛼
1−𝛼
𝛼−1
𝐢 = 1 − 𝛼 π‘₯min
∞
π‘₯min
Mathematics of Power Laws
Moments
π‘₯π‘š
∞
=
π‘₯π‘šπ‘
π‘₯min
𝛼−1
π‘š
π‘₯ 𝑑π‘₯ =
π‘₯min
𝛼−1−π‘š
• All moments π‘₯ π‘š exists for π‘š < 𝛼 − 1 and diverge otherwise:
• Mean:
π‘₯1 =
• Variance:
π‘₯2 =
𝛼−1
π‘₯
𝛼−2 min
𝛼−1
π‘₯
𝛼−2 min
• Intensity of Solar flares have an exponent 1.4 is the average intensity infinite?
• All data sets have finite upper bound
• Larger sampling space gives a non-negligible chance of increasing the upper bound
Mathematics of Power Laws
Largest Value
For a sample of size n we can estimate the largest value in the sample:
π‘₯max = 𝑛π‘₯min 𝑩
𝛼−2
𝑛,
𝛼−1
~ 𝑛
1
𝛼−1
as 𝑛 → ∞
Where B is beta-function
This estimate enables the calculation of moments for data sets whose
moments would otherwise diverge.
π‘₯π‘š =
π‘₯max
π‘₯min
π‘₯ π‘š 𝑝 π‘₯ 𝑑π‘₯
Mathematics of Power Laws
Scale Free Distribution
• A function is said to be scale free if:
𝑝 𝑏π‘₯ = 𝑔 𝑏 𝑝 π‘₯
• The unit of measure does not affect the shape of the distribution
• If 2kB files are
1
4
as common as 1kB files then 2mB files are
1
4
as common as 1mb files
• Scale free distribution is unique to Power Law distributions
• Scale free implies power law and vice versa
Mechanisms for Generating Power Laws
Some examples :
• Combinations of exponents
•
•
•
•
Inverses of quantities
Random Walks
The Yule process
Critical phenomena
The Topology of the Internet
Some Key Questions
What does the internet look like?
Are there any topological properties that stay constant in time?
How can I generate Internet-like graphs for simulation?
Internet Instances
• Three Inter-domain topologies
• November 1997, April and December 1998
• One Router topology from 1995
Metrics
Outdegree of a Node and it’s Rank
Power Law 1: The out degree 𝑑𝑣 of a node v is proportional to the rank of the node,
π‘Ÿπ‘£ , to the power of a constant R.
𝑑𝑣 = πΆπ‘Ÿπ‘£π‘…
1
By setting 𝑑𝑁 = 1 it can be shown that 𝐢 = 𝑁𝑅
Outdegree of a Node and it’s Rank
Inter domain topologies
• Correlation coefficient above .974
• Exponents -.81, -.82, -.74
Router
• Correlation coefficient .948
• Exponent -.48
Outdegree and it’s Rank
Power Law Analysis
The exponent is relatively fixed for the three inter-domain topologies
• Topological property is fixed in time
• Can be used to generate models or test authenticity
Significant difference in exponent value for the router topology
• Can characterize different families of graphs
The rank exponent 𝑅 can be used to estimate the number of edges
𝑁
1
𝐸 = 2 𝑅+1 (1 − 𝑁𝑅+1 )
Frequency of the Outdegree
Power Law 2: The frequency, 𝑓𝑑 , of an outdegree, d, is proportional to the outdegree
to the power 𝜎:
𝑓𝑑 = 𝐢𝑑 𝜎
Frequency of the Outdegree
Inter domain topologies
• Correlation coefficient above .968
• Exponents -2.15, -2.16, and -2.2
Router
• Correlation coefficient .966
• Exponent -2.48
Frequency of the Outdegree
Power Law Analysis
The exponent is relatively fixed for the three inter-domain topologies
• Topological property is fixed in time
• Could be used to generate models or test authenticity
Similar exponent value for the router topology
• Could suggest a fundamental property of the network
Eigenvalues and their Ordering
Power Law 3: The eigenvalues, πœ†π‘– , of a graph are proportional to the order, 𝑖,
to the power of a constant πœ€:
πœ†π‘– = 𝐢𝑖 πœ€
Eigenvalues and their Ordering
Inter domain topologies
• Correlation coefficient .99
• Exponents -.47, -.50, and -.48
Router
• Correlation coefficient .99
• Exponent -.1777
Eigenvalues and their Ordering
Power Law analysis
Eigenvalues are closely related to many topological properties
• Graph diameter
• Number of edges
• Number of spanning trees…
The exponent is relatively fixed for the three inter-domain topologies
• Topological property seems fixed in time
• Can be used to generate models
Significant difference in the exponent value for the router topology
• Can characterize different families of graphs
Hop Plot Exponent
Approximation 1: The total number of pairs of nodes, 𝑃 β„Ž , within β„Ž hops can be
approximated by:
π‘β„Žπ» , β„Ž β‰ͺ 𝛿
𝑃 β„Ž =
𝑁2,
β„Ž≥𝛿
Where 𝑐 = 𝑁 + 2𝐸
Hop Plot Exponent
Inter domain topologies
• First 4 hops
• Correlation coefficient above .96
• Exponents -4.6, -4.7, -4.86
Router
• First 12 hops
• Correlation coefficient .98
• Exponent -2.8
Hop Plot Exponent
Power Law analysis
• The exponent is relatively fixed for the three inter-domain topologies
• Topological property seems fixed in time
• Can be used to generate models
• Significant difference in the exponent value for the router topology
• Can characterize different families of graphs
The Effective Diameter
How many hops to reach a “sufficiently large” part of the
network?
• Too small a broadcast will not reach the target
• Too large a broadcast can clog the network
• A good guess is the intersection of the hop-plot at β„Ž
The effective diameter
𝛿𝑒𝑓 =
For the interdomain instances
• 80% of nodes were within 𝛿𝑒𝑓
• 90% were within ⌈𝛿𝑒𝑓 ⌉
𝑁2
𝑁+2𝐸
1
𝐻
Average Neighborhood Size
Average outdegree:
𝑁𝑁′ β„Ž = 𝑑 𝑑 − 1
Hop-plot exponent:
𝑁𝑁 β„Ž =
𝑃 β„Ž
𝑁
β„Ž−1
−1=
𝑐 𝐻
β„Ž
𝑁
−1
Conclusions
Power Law and Internet topology
• Can assess realism of synthetic graphs
• Provide important parameters for graph generators
• Help with network protocols
• Help answer “what if” questions
• What would the diameter be if the number of nodes doubles?
• What would be the average neighborhood size be?
Questions?
Download