Lecture 9 Moments of distributions Body size distribution of European Collembola Body size distribution of European Collembola Body Species weight [mg] Tetrodontophora bielanensis (Waga 1842) 13.471729 Orchesella chiantica Frati & Szeptycki 1990 13.471729 Disparrhopalites tergestinus Fanciulli, Colla, Dallai 2005 12.924837 Orchesella dallaii Frati & Szeptycki 1990 9.4503028 Seira pini Jordana & Arbea 1989 9.4503028 Isotomurus pentodon (Kos,1937) 7.1044808 Heteromurus (V.) longicornis (Absolon 1900) 7.1044808 Pogonognathellus flavescens (Tullberg 1871) 6.9512714 Orchesella hoffmanni Stomp 1968 6.9512714 Heteromurus (H) constantinellus Lučić, Ćurčić & Mitić 2007 6.3862223 Pogonognathellus longicornis (Müller 1776) 6.2133935 Orchesella devergens Handschin 1924 6.2133935 Orchesella flavescens (Bourlet 1839) 6.2133935 Orchesella quinquefasciata (Bourlet 1841) 6.2133935 Number of species 500 Modus Collembola 400 300 200 100 0 -4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25 ln body weight class ln body Number ln weight [mg] of weight class means species 2.6006 -4.71511 7 2.6006 -4.018377 53 2.5592 -3.321643 133 2.246 -2.624909 224 2.246 -1.928176 353 1.9607 -1.231442 395 1.9607 -0.534708 325 1.9389 0.162025 126 1.9389 0.858759 45 1.8541 1.555493 24 1.8267 2.252226 9 1.8267 1.8267 1.8267 The histogram of raw data Three Collembolan weight classes Class 1 N 25 Mean 1.8169079 2.6005933 2.5591508 2.2460468 2.2460468 1.9607257 1.9607257 1.9389246 1.9389246 1.8541429 1.8267072 1.8267072 1.8267072 1.8267072 1.8267072 1.584378 1.584378 1.584378 1.584378 1.584378 1.584378 1.5326904 1.5326904 1.5064044 1.4529137 1.4529137 Class 2 31 1.032923 1.313477 1.313477 1.313477 1.313477 1.313477 1.301948 1.225568 1.165038 1.165038 1.165038 1.165038 1.006355 1.006355 1.006355 1.006355 1.006355 1.006355 1.006355 1.006355 1.006355 1.006355 0.939683 0.871022 0.871022 0.835906 0.835906 0.800247 0.800247 0.764026 0.756712 0.727225 Class 3 43 0.531059 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.651808 0.613152 0.573835 0.573835 0.533834 0.493125 0.493125 0.493125 0.493125 0.493125 0.489014 0.451682 0.451682 0.451682 0.451682 0.409479 What is the average body weight? n n x i 1 i x x i 1 i n n Population mean Sample mean Weighed mean x 25 31 43 1.812 1.033 0.531 1.013 99 99 99 k k ni 1 k x xi ni xi xi f (i) n i 1 n i 1 i 1 Number of species 0.25 0.2 f ( x1 ) Weighed mean Collembola ni n n k k xi ni xi x xi f ( xi ) i 1 n i 1 n i 1 0.15 0.1 0.05 Discrete distributions 0 -4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25 ln body weight class ln body Number weight [mg] of class means species -4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25 7 53 133 224 353 395 325 126 45 24 9 Sum 1694 Frequency Arithmetic mean =B2/B14 0.031286895 0.078512397 0.132231405 0.208382527 0.233175915 0.191853601 0.074380165 0.026564345 0.014167651 0.005312869 =A2*C2 =(A2-D14)^2*C2 -0.125723 0.202268085 -0.26079 0.267516588 -0.347095 0.174619987 -0.401798 0.042653444 -0.287143 0.013917567 -0.102586 0.169898317 0.0120514 0.199510727 0.0228124 0.144774029 0.0220377 0.130178627 0.0119658 0.073837264 -1.475751 StDev Variance 1.462535979 1.209353538 The average European springtail has a body weight of e-1.476 = 023 mg. Most often encounted is a weight around e-1.23 = 029 mg. Continuous distributions max xf ( x)dx min Why did we use log transformed values? Average Body body length weight [mm] [mg] Species Tetrodontophora bielanensis (Waga 1842) Orchesella chiantica Frati & Szeptycki 1990 Disparrhopalites tergestinus Fanciulli, Colla, Dallai 2005 Orchesella dallaii Frati & Szeptycki 1990 Seira pini Jordana & Arbea 1989 Isotomurus pentodon (Kos,1937) Heteromurus (V.) longicornis (Absolon 1900) Pogonognathellus flavescens (Tullberg 1871) Orchesella hoffmanni Stomp 1968 Heteromurus (H) constantinellus Lučić, Ćurčić & Mitić 2007 Pogonognathellus longicornis (Müller 1776) Orchesella devergens Handschin 1924 Orchesella flavescens (Bourlet 1839) Orchesella quinquefasciata (Bourlet 1841) Log transformed data Collembola 400 300 200 100 0 -6.00 13.472 13.472 12.925 9.4503 9.4503 7.1045 7.1045 6.9513 6.9513 6.3862 6.2134 6.2134 1.875 6.2134 6.2134 =JEŻELI(B86=0;0;EXP(-1.875+LN(B86)*2.3)) W[mg] e Linear data 500 Number of species Number of species 500 7 7 6.875 6 6 5.3 5.3 5.25 5.25 5.06 5 5 5 5 5 [W / L]L[mm]2.3 Collembola 400 The distribution is skewed 300 200 100 0 -4.00 -2.00 0.00 ln body weight class 2.00 4.00 0 2 4 6 Body weight class 8 10 W [m g] e 1.875 [W / L]L[m m]2.3 Body weight Number [mg] class of means species W W0 Lz ln W ln W0 z ln L Number of species 500 Collembola 400 300 200 100 0 0 2 4 6 Body weight class n n n x i e ln xi i 1 i 1 n 8 10 0.01 0.02 0.04 0.07 0.15 0.29 0.59 1.18 2.36 4.74 9.51 7 53 133 224 353 395 325 126 45 24 9 Sum Exp() 1694 lb scaled weight classes Frequency Arithmetic mean Geometric mean 0.004132231 0.031286895 0.078512397 0.132231405 0.208382527 0.233175915 0.191853601 0.074380165 0.026564345 0.014167651 0.005312869 3.702E-05 0.0005626 0.0028338 0.0095797 0.0303016 0.0680574 0.1123956 0.0874629 0.062698 0.0671181 0.0505194 -0.019483926 -0.125722539 -0.260790153 -0.347095405 -0.401798187 -0.287142615 -0.102585655 0.012051446 0.02281237 0.022037681 0.011965782 0.491566 -1.4757512 0.228606933 The average European springtail has a body weight of e-1.476 = 023 mg. Geometric mean In the case of exponentially distributed data we have to use the geometric mean. To make things easier we first log-transform our data. How to use geometric means n n n x i ln xi e i1 n i 1 A tropical forest is logged during three years: first year 0.1%, second year 1% and third year 10% of area. Hence the total decrease in forest area is 11% of area has been logged during three year. What is the mean logging rate per year? A3 (1 0.001)(1 0.01)(1 0.1)A0 0.890A0 Arithmetic mean 0.999 0.99 0.9 0.963 3 A3 0.9633 A 0 0.893A 0 Geometric mean (0.999*0.99*0.9)1/ 3 0.962 A3 0.9623 A0 0.890A 0 In multiplicative processes we should use the geometric mean. ln body Number weight [mg] of class means species -4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25 7 53 133 224 353 395 325 126 45 24 9 Sum 1694 Frequency =B2/B14 0.031286895 0.078512397 0.132231405 0.208382527 0.233175915 0.191853601 0.074380165 0.026564345 0.014167651 0.005312869 Arithmetic mean =A2*C2 =(A2-D14)^2*C2 -0.125723 0.202268085 -0.26079 0.267516588 -0.347095 0.174619987 -0.401798 0.042653444 -0.287143 0.013917567 -0.102586 0.169898317 0.0120514 0.199510727 0.0228124 0.144774029 0.0220377 0.130178627 0.0119658 0.073837264 -1.475751 StDev 1.462535979 1.209353538 Mean Number of species 0.25 0.2 f ( x1 ) i 1 n 1 2 (x i 1 i )2 n Degrees of freedom Variance n s ( xi x) 2 f ( xi ) 2 i 1 Continuous distributions s2 2 ( x x ) f ( x)dx min 0.15 0.1 s2 ( xi x ) 2 max Collembola ni n n n Variance 1 SD s s2 Standard deviation 0.05 0 -4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25 ln body weight class The standard deviation is a measure of the width of the statistical distribution that has the sam dimension as the mean. Mean Variance Standard deviation 5.66 10.45 3.23 The standard deviation as a measure of errors Distance 1 2 3 4 5 6 7 8 9 10 Average NOx Standard concentration deviation 9.53 1.70 7.37 1.18 5.24 0.86 3.15 0.26 2.17 0.18 1.05 0.09 0.84 0.14 0.63 0.10 0.32 0.03 0.21 0.02 The precision of derived metrics should always match the precision of the raw data Concentration Environmental pollution Station NOx [ppm] 1 8.49 2 1.12 3 9.11 4 7.75 5 0.75 6 8.23 7 0.97 8 6.06 9 8.48 10 5.88 11 8.51 12 9.62 13 3.35 14 7.74 15 2.03 16 5.06 17 7.61 18 0.99 19 2.55 20 8.91 ± 1 standard deviation is the most often used estimator of error. The probablity that the true mean is within ± 1 standard deviation is approximately 68%. The probablity that the true mean is within ± 2 standard deviations is approximately 95%. 14 12 10 8 6 4 2 0 ± 1 standard deviation 1 2 3 4 5 6 Distance [km] 7 8 9 10 Standard deviation and standard error Mean Standard deviation 5.44 4.15 4.49 5.29 5.55 3.39 5.56 3.13 The standard deviation is constant irrespective of sample size. The precision of the estimate of the mean should increase with sample size n. The standard error is a measure of precision. SE Average NOx Standard Distance concentration deviation 1 2 3 4 5 6 7 8 9 10 9.53 7.37 5.24 3.15 2.17 1.05 0.84 0.63 0.32 0.21 3.32 2.45 1.24 0.67 0.87 0.34 0.14 0.10 0.03 0.02 Standard error n=20 0.74 0.55 0.28 0.15 0.19 0.08 0.03 0.02 0.01 0.01 SD n 12 10 Concentration Environmental pollution NOx Station [ppm] 1 8.49 2 1.12 3 9.11 4 7.75 5 0.75 6 8.23 7 0.97 8 6.06 9 8.48 10 5.88 11 8.51 12 9.62 13 3.35 14 7.74 15 2.03 16 5.06 17 7.61 18 0.99 19 2.55 20 8.91 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 Distance [km] Central moments n n n n i 1 i 1 s ( xi x) f ( xi ) ( xi ) f ( xi ) 2 xi x f ( xi ) ( x) 2 f ( xi ) 2 2 2 i 1 i 1 n n s ( xi ) f ( xi ) 2 x x ( x) 1 ( xi ) 2 f ( xi ) x 2 2 i 1 n 2 2 i 1 n xi xi s 2 i 1 i 1 n 1 n 1 2 E(x2) 2 [E(x)]2 Mathematical expectation First central moment First moment of central tendency The variance is the difference between the mean of the squared values and the squared mean k-th central moment E( X ) 2 E( x 2 ) E( x)2 n E ( X ) xi f ( xi ) k k i 1 E( X ) k k x f ( x )dx n 2 2 2 ( X ) f ( X ) E (( X ) ) i i i 1 Frequency distributions of resource use or wealth in a population can be described by a power law (the famous Pareto-Zipf law) with exponents that often have values around -5/2. What are the mean and the variance of such a power function distribution? Discrete distribution max f ( x) 1 f ( x) ax z min f(x) 1 0.176777 0.06415 0.03125 0.017889 0.01134 0.007714 0.005524 0.004115 0.003162 1.32192 f(x)/sum 0.756475 0.133727 0.048528 0.02364 0.013532 0.008579 0.005835 0.004179 0.003113 0.002392 1 Mean 1.50942 xf(x) 0.756475 0.267454 0.145584 0.094559 0.067661 0.051472 0.040846 0.033432 0.028018 0.023922 1.50942 Variance StDev 18.9121 4.34881 (x-m)2f(x) 0.566929 1.542484 1.860055 2.001836 2.078674 2.12562 2.156716 2.178547 2.194559 2.206711 18.9121 1 Frequency z 2.5 x 1 2 3 4 5 6 7 8 9 10 Sum f ( x) x 5 / 2 0.1 x 5 / 2 f ( x) 1.32 0.01 0.001 1 Wealth class Most people are in the lowest income class and the average is half between the first and the second. 10 Continuous approximation Note that the yaxis is at log scale. Frequency 1 0.1 0.01 0.001 0 1 2 3 4 5 6 7 8 9 10 Upper bound of ten would only cover half of the column Wealth class 2a 3 / 2 a10.53 / 2 a0.53 / 2 5 / 2 0.5ax dx ( 3) x 0.5 3 3 0.93a 10.5 0.93a 1 a 10.5 1 1.07 0.93 10.5 10.5 10.5 10.5 The estimate of a is imprecise 1 1 1/ 2 5 / 2 1/ 2 1/ 2 xx dx 1 . 07 x 2 . 14 ( 10 . 5 0 . 5 ) 2.37 0.93 0.5 0.5 0.5 1 1 1/ 2 5 / 2 1/ 2 1/ 2 xx dx 0 . 76 x 1 . 52 ( 10 . 5 0 . 5 ) 1.56 1.32 0.5 0.5 0.5 The Arrhenius probability model assumes the same probability of an event irrespective of the time that elapsed from the starting. What are the mean and the variance of such a distribution? max a t a e dt 1 e 1 0 0 f ( x)dx 1 min a f (t ) aet t 1 a e t dt 1 Cumulative density function 0 x te dt t e (t 1) 1 0 t ( t 2t 2) 2 2 2 0 0 E[ x ] t e dt 2 2 t 0 2 1 1 2 2 2 2 t e 2 2 Third central moment E(( X )3 ) E( X 3 ) 3 E( X 2 ) 3 2 E( X ) 3 E( X 3 ) 3E( X 2 ) 23 Skewness f(x) 2 4 x 6 8 0 Kurtosis E( 4 )3 1000 x 1500 1 0.8 0.6 0.4 0.2 0 1 2 4 x 6 8 1.5 x 2 Left skewed distribution =0 0 <0 1 0.8 0.6 0.4 0.2 0 2000 Right skewed distribution Symmetric distribution ( X )4 500 f(x) 0 >0 1 0.8 0.6 0.4 0.2 0 f(x) =0 1 0.8 0.6 0.4 0.2 0 f(x) f(x) E (( X )3 ) 3 1 0.8 0.6 0.4 0.2 0 >0 0 2 4 x 6 8 How to get the modus? f(x) y xex 1 0.8 0.6 0.4 0.2 0 We need the maximum of the pdf A probability distribution if Mode Mean xe 0 2 4 x dx e x 0 6 (x 1) 1 1 1 0 x dxe x xe x e x 0 x 1 dx Arithmetic mean E ( x) xxe dx x e dx e ( x 2 x 2) 2 x 2 x x 2 0 0 0 Body volumes are estimated from measures of height*length*width. Assume you estimated the thorax volume of insects and used this volume to infer the body weight. V c Length Height Width W[mg] a Length HeightWidth z How to get the parameters a and z? W[mg] a Length HeightWidth z 3.000 2.500 Dry weight Body weights are estimated from species weights against thorax volume. y = 1.7754x0.6072 2.000 1.500 1.000 The body weight of a new species is estimated from the regression function 0.500 0.000 0.000 0.500 1.000 1.500 2.000 Thorax volume Height, length and width could be measured with an accuracy of ± 2%. Standard deviation is a measure of accuracy (error) 2 Independent measurements n i 1 total 2 0.022 0.022 0.022 0.0012 total 0.0012 0.035 n total i ; total i 2 2 2 2 i 1 The error of the thorax estimate is 3.5%. Home work and literature Refresh: • • • • • • • Arithmetic, geometric, harmonic mean Cauchy inequality Statistical distribution Probability distribution Moments of distributions Error law of Gauß Bootstrap Prepare to the next lecture: • Bionomial distribution • Mean and variance of the binomial distribution • Poisson distribution • Mean and variance of the Poisson distribution • Moments of distributions • DNA mutations • Transition matrix Literature: Łomnicki: Statystyka dla biologów Binomial distribution: http://www.stat.yale.edu/Courses/199798/101/binom.htm Poisson dstribution: http://en.wikipedia.org/wiki/Poisson_distribution