1 THIS PAGE IS TO BE REMOVED FROM THE BLINDED VERSION 2 Random sampling of skewed distributions and Taylor's power law of fluctuation scaling 3 Joel E. Cohen and Meng Xu 4 Authors' footnote 5 Joel E. Cohen is Professor, and Meng Xu was Postdoctoral Associate, at The Rockefeller 6 University, 1230 York Avenue, Box 20, New York, NY, 10065-6399 (e-mail: 7 cohen@rockefeller.edu, xumeng04@gmail.com). Xu is now Lecturer, Department of 8 Mathematics and Physics, University of New Haven, West Haven, CT 06516. This work was 9 partially supported by National Science Foundation grants EF-1038337 and DMS-1225529. The 10 authors thank Richard Chandler, Russell Millar and Andrew Wood for helpful comments and 11 Priscilla K. Rogerson for assistance. 12 THIS PAGE IS TO BE REMOVED FROM THE BLINDED VERSION 13 0 14 Random sampling of skewed distributions and Taylor's power law of fluctuation scaling 15 Abstract. Taylor’s law (TL), a widely verified quantitative pattern in ecology, describes the 16 variance in population density as a power-law function of the average population density: 17 variance = a(mean)b, a > 0. In the past half-century, multiple mechanisms have been proposed to 18 explain and interpret TL. Here we show that TL arises when data are randomly sampled in 19 blocks from any frequency distribution with four finite moments, and that b is positive whenever 20 the frequency distribution has positive skewness. We give approximate formulas for the sample 21 estimate bΜ of b and the variance of πΜ: if the distribution has mean M, variance V, coefficient of 22 variation CV = V1/2/M, skewness γ1 = μ3/V3/2, and kurtosis κ = μ4/V2, then as the number N of 23 blocks and sample sizes in all blocks become large, bΜ converges in probability to πΎ1⁄πΆπ and 24 (π − 2)π 2 (πΜ) converges in probability to [π − 1 − πΎ12 ]⁄(πΆπ)2. In simulations and an empirical 25 example using counts of oak trees from Black Rock Forest, the formula for bΜ agrees with the 26 estimate of b obtained by least-squares regression. Our null model provides a baseline against 27 which more complex explanations of TL can be compared. 28 Key words: delta method; least-square regression; skewness; variance function 29 1. Introduction 30 Taylor's law (TL), sometimes called Taylor's power law of fluctuation scaling after Taylor 31 (1961), is the subject of more than a thousand papers in ecology, epidemiology, biomedical 32 sciences and other applied statistical areas (Eisler et al. 2008, Jørgensen et al. Draft), but is not 33 vary widely known among some statisticians. For example, Davidian and Carroll (1987, p. 1079) 34 "model the variance as proportional to a power of the mean response" as their first example of 35 "variance functions." This model is precisely TL, variance = a(mean)b, a > 0, but Davidian and 1 36 Carroll do not cite the large prior ecological literature (e.g., review by Taylor 1984) on precisely 37 that model, substantial related prior statistical research (e.g., Bartlett 1936, 1947, Tweedie 1946, 38 1947, 1984 ) and practical applications to the design of sampling plans for the control of insect 39 pests of soybeans (Kogan et al. 1974, Bechinski and Pedigo 1981) and cotton (review by Wilson 40 et al. 1989). 41 In recent decades, Jørgensen, Kendal and colleagues (Jørgensen 1997, Kendal and Jørgensen 42 2011, Jørgensen et al. Draft) have done much to develop a statistical theory of TL but awareness 43 in the statistical community of TL and its associated literature remains limited. 44 In applications, TL arises when observations of a nonnegative real-valued random variable 45 (originally, counts of the size of a local population of insects) are grouped into N blocks, j = 1, 46 …, N. When the N points (log of the sample mean of observations in block j, log of the sample 47 variance of observations in block j) are plotted, TL predicts that the points will approximate a 48 straight line (Taylor 1961). 49 A linear relationship between the logarithm of the sample variance and the logarithm of the 50 sample mean of population density is one of the most frequently confirmed empirical patterns in 51 ecology and has been verified for hundreds of species, recently including bacteria (Ramsayer et 52 al. 2012; Kaltz et al. 2012), forest trees (Cohen, Xu, and Schuster 2012, 2013a), and people 53 (Cohen, Xu, and Brunborg 2013b). Examples of TL have also been found in diverse other fields 54 (see reviews by Eisler et al. 2008, Jørgensen et al., Draft), including cell populations within 55 organisms, epidemiology of measles and whooping cough, cancer metastases, single nucleotide 56 polymorphisms and genes on chromosomes, and non-biological measurements such as 57 precipitation, packet switching on the Internet, stock market trading, and number theory. 2 58 Theories of the mechanistic or stochastic origins of TL and interpretations of the intercept log a 59 and slope b of the linear relationship log variance = log a + b×log mean (mathematically 60 equivalent to its power-law form) are as diverse as the empirical applications (again see reviews 61 of Eisler et al. 2008, Jørgensen et al. Draft). Taylor (1961) proposed that b was an index of 62 aggregation in populations (Taylor 1961). Kilpatrick and Ives (2003) theorized that negative 63 interactions among species leads to b < 2. Ballantyne (2005) proposed that b = 2, which 64 corresponds to a constant coefficient of variation, is a consequence of deterministic population 65 growth. Ballantyne and Kerkhoff (2007) suggested that individuals’ reproductive correlation 66 determines the size of b. Cohen et al. (2013a) showed that the Lewontin-Cohen stochastic 67 multiplicative population model (a geometric random walk) implied TL and calculated log a and 68 b explicitly. Cohen (2013) showed that exponentially growing, non-interacting clones would 69 asymptotically satisfy TL with b = 2. 70 As TL has been a subject of investigation for more than half a century, it is somewhat surprising 71 that a simple null hypothesis to explain the origins of TL under certain conditions has so far 72 apparently not been considered. Here we show that TL can arise as a consequence of randomly 73 blocking observations arising from simple random sampling of a single underlying nonnegative- 74 valued probability distribution. No other mechanisms are required to generate TL. We derive 75 explicit approximate formulae for the TL slope b and its variance when random samples from 76 any distribution with four finite moments are randomly grouped into blocks. We give simulations 77 to illustrate how well these theoretical formulae approximate the slope, and we give an empirical 78 example using published data on counts of oak trees. 79 Our formulae imply that b > 0 arises from random sampling in blocks of any right-skewed 80 distribution, and b < 0 arises from random sampling in blocks of any left-skewed distribution. 3 81 This derivation provides a purely statistical "null-model" of the slope b. This null model does not 82 purport to be a universal explanation of TL in all or most circumstances. However, the 83 availability of this null model will require that future more elaborate explanations of TL, in terms 84 of specific mechanisms, show first why this null model is not a sufficient explanation. 85 2. Results 86 Suppose X is a nonnegative real-valued random variable with cumulative distribution function F, 87 mean E(X) = M > 0, variance var(X) = V > 0, and finite central moments E([X - M]h) = μh, h = 3, 88 4. Consider N > 2 "blocks" or sets of independently and identically distributed (iid) observations 89 (random samples) of X. Let xij denote observation i in block j, i = 1, …, nj, assuming the sample 90 size of block j satisfies nj > 3, j = 1, …, N. The total number of observations is O = n1+ n2+β― + 91 nN. For block j the sample mean and its expectation and variance are ππ = (π₯1π + β― + π₯πππ )⁄ππ , 92 πΈ(ππ ) = π, π£ππ(ππ ) = π ⁄ππ . The unbiased estimator of the sample variance and its 93 expectation and variance are ππ 94 ππ ππ − 3 2 1 1 2 π£π = ∑ π₯ππ − ππ2 , πΈ(π£π ) = π, π£ππ(π£π ) = (π4 − π ). ππ − 1 ππ − 1 ππ ππ − 1 π=1 95 The formula for π£ππ(π£π ) is from Neter et al. (1990). 96 In this model, the variation between blocks in the sample mean is small because it arises only 97 from differences due to random sampling of the same distribution for every block. Variation 98 between blocks in the sample variance is also small for the same reason. Precisely how small the 99 differences are depends on the sample size per block, as shown in the above formulas for 100 π£ππ(ππ ) and π£ππ(π£π ). Since any two smoothly varying functions can be locally linearly related, 4 101 the logarithm of the sample variance per block can be approximated as a linear function of the 102 logarithm of the sample mean per block, which is just TL: log π£π = log π + π log ππ , across 103 blocks j = 1, …, N. 104 It is customary in ecological applications and elsewhere to estimate the intercept log a and the 105 slope b of TL using ordinary least-squares regression of log(vj) as a linear function of log(mj), 106 ignoring the variability in the x-coordinate log(mj). This practice may be defensible for large nj 107 because, at least in this model, π£ππ(log ππ ) is inversely proportional to nj (see Appendix for 108 proof). 109 In the following statement of our main analytical result, the symbol ''→π " means convergence in 110 probability. By definition, the coefficient of variation of the underlying distribution of X is CV = 111 V1/2/M, the skewness is γ1 = μ3/V3/2, and the kurtosis is κ = μ4/V2. 112 Theorem. Let πΜ and π 2 (πΜ) denote the least-squares estimator of b and the unbiased estimator of 113 its sample variance when all blocks are weighted equally. Then the limits in probability of πΜ and 114 π 2 (πΜ) satisfy, for large N and ππ , π = 1, 2, … , π, 115 116 πΜ →π πππ£(ππ , π£π ) π£ππ(ππ ) ⁄ = π3 π⁄π 2 = πΎ1⁄πΆπ . ππ π2 (π − 2)π 2 (πΜ) →π π2 (π4 π − π 3 − π32 )⁄π 4 = (π − 1 − πΎ12 )⁄(πΆπ)2 . 117 We prove this Theorem in the Appendix. Since CV > 0, the first formula shows that random 118 sampling of any right-skewed distribution (one with πΎ1 > 0) generates TL with positive slope. 119 From the second formula on the extreme right, since π 2 (πΜ) ≥ 0 intrinsically, we conclude that 5 120 π − 1 − πΎ12 ≥ 0 for any distribution with finite fourth moment. Rohatgi and Székely (1989) 121 proved independently that π − 1 − πΎ12 ≥ 0 for any distribution with finite fourth moment. 122 3. Simulations 123 If X is gamma distributed with probability density function π(π₯|πΌ, π½) = 124 π₯ πΌ−1 exp(− π₯⁄π½ )⁄[π½ πΌ Γ(πΌ)] , πΌ ≥ 1, π½ > 0, where Γ(πΌ) is the gamma function evaluated at πΌ, 125 then M = αβ, V = αβ2, μ3 = 2αβ3, and μ4 = 3αβ4(α + 2). Consequently, the estimator of the TL 126 slope is πΜ = 2 and π 2 (πΜ) = (2πΌ + 2)/(π − 2). In the special case when α = 1 and β= 1/λ, X is 127 exponentially distributed with parameter λ > 0 and probability density function f(λ) = λ⋅exp(-λx). 128 In this case, M = 1/λ, V = 1/λ2, μ3= 2/λ3, μ4 = 9/λ4 and πΜ = 2, s2(πΜ) = 4/(N-2). 129 To illustrate this model numerically, we created four matrices with n = 100 rows and N = 100 130 columns. In each matrix, the elements were iid from, respectively, an exponential, gamma, 131 lognormal, and normal distribution (Figure 1A, B, C, D). For each matrix, we plotted the log 132 sample variance vj of each column j on the ordinate against the log sample mean mj on the 133 abscissa, j = 1, …, N. For the three positively skewed distributions (A, B, C), a positive 134 approximately linear relationship was observed. For the symmetric normal distribution with zero 135 skewness (D), no relationship between the log sample variance and the log sample mean was 136 observed. 137 We investigated the exponential distribution with λ = 1 in greater detail (Table 1). The median of 138 πΜ estimated from the Theorem was always less than, but became closer to, the corresponding 139 value estimated from linear regression, as both n and N increased. For each set of n and N, the 140 95% confidence interval (CI) of πΜ from each method contained the theoretically calculated 141 asymptotic value 2. On the other hand, the median of (N - 2)s2(πΜ) from the Theorem was bigger 6 142 than its corresponding median estimated from linear regression, although the 95% CIs from both 143 methods overlapped and included the theoretically calculated asymptotic value 4, except when n 144 = 10, N = 100 under linear regression. It is not surprising that the delta method should yield a 145 quite accurate approximation to b in this situation as all variation is due to random sampling 146 alone. 147 Figure 1. Taylor's law in random sampling from (A) an exponential (λ = 1), (B) a gamma (α = 4, 148 β = 1), (C) a lognormal (μ = 1, σ = 1), and (D) a shifted normal (5 + π©(0,1)) distribution, i.e., a 149 π©(0,1) distribution with 5 added to each value to make each block's mean positive with high 150 probability. For each panel, 10,000 independent and identically distributed observations from the 151 selected distribution were arranged in a single matrix with n = 100 rows and N = 100 columns. 152 For each column j, the sample mean mj and the sample variance vj were calculated and plotted 153 here on log-log coordinates, j = 1, …, N. For the three positively skewed distributions (A, with 154 skewness 2; B, with skewness 1; and C, with skewness (e+2)(e-1)1/2 ≈ 6.1849), a positive 155 approximately linear relationship was observed. The solid line is the least-squares linear 156 regression log10 vj = log10 a + b log10 mj. The parameter values are (A) exponential slope 1.8501, 157 intercept -0.0116; predicted slope bΜ ± [s2(bΜ)]1/2 = 2 ± [4/98]1/2 = 2 ± 0.2020; (B) slope 2.1643, 158 intercept -0.6969; predicted slope bΜ ± [s2(bΜ)]1/2 = 2 ± [10/98]1/2 = 2 ± 0.3194; (C) slope 4.0138, 159 intercept -1.1455; predicted slope bΜ ± [s2(bΜ)]1/2 = 4.7183 ± (0.6660) (D) slope 0.0233, intercept - 160 0.0243; predicted slope bΜ ± [s2(bΜ)]1/2 = 0 ± 0.1429. The slope estimated by linear regression fell 161 within one estimated standard deviation of the theoretically predicted slope except for the 162 lognormal observations (panel (C)), where the linear regression slope was 1.06 standard 163 deviation below the theoretically predicted slope. 7 164 8 165 Table 1. Estimating the slope b of Taylor's law (known to have slope 2 in this case) and its sample variance (known to be 4) by 166 two methods: linear regression, and the formulae in the Theorem using sample moments. We generated 10,000 n × N matrices, 167 where each element was iid from an exponential distribution with parameter λ = 1. For each n × N matrix, for each column j = 168 1, …, N, we calculated the column mean mj and column variance vj over elements in column j. The conventional least-squares 169 estimator (column (3)) of the linear regression slope b of the log vj as a linear function of the log mj and (N-2) times the sample 170 variance (column (5)) of the least-squares regression slope were calculated from MATLAB function “LinearModel.fit” 171 (Snedecor and Cochran 1980, pp. 155). For the same n × N matrix, we calculated πΜ (column (4)) and its sample variance s2(πΜ) 172 Μ Μ2 Μ 2 Μ4 − πΜ 2 )⁄πΜ 3 − π Μ 2 ⁄πΜ 4 } respectively, where times (N-2) (column (6)) from the approximate formulae π Μπ Μ3 2 π 3 ⁄π and {π (π 173 Μ , πΜ , π π Μ, Μ4 were the sample moments estimated from all nN elements of the matrix. For every combination of n = 10, 3 and π 174 100, and N = 10, 100, the table shows the 2.5%, 50%, and 97.5% quantiles of the 10,000 estimates of the slope and (N-2) times 175 the variance of the slope of each method separately. The random matrices were simulated by using function “exprnd” 176 (MATLAB R2012b). Quantiles were calculated using the “Distributions” platform in JMP 10 (SAS Institute 2012). 177 9 (1) (2) N N (3) Regression πΜ (4) Approximate πΜ 95% CI median 95% CI median (5) (N-2)×Regression s2(πΜ ) (6) (N-2)×Approximate s2(πΜ) 95% CI median 95% CI median 10 10 (0.8206, 3.1690) 2.0076 (0.5438, 4.2160) 1.7473 (0.4852, 9.3439) 2.1504 (0.3408, 15.8448) 2.2072 10 100 (1.6828, 2.3115) 1.9995 (1.4116, 2.7970) 1.9400 (1.5868, 3.6929) 2.4250 (1.7934, 9.0552) 3.6162 (0.5583, 3.4533) 1.9960 (0.5315, 3.6667) 1.9594 (0.7088, 13.5331) 3.1270 (0.6696, 15.7344) 3.2208 100 100 (1.6262, 2.3751) 1.9999 (1.5808, 2.4530) 1.9916 (2.3203, 5.3334) 3.5280 (2.4108, 6.5366) 3.9004 100 10 10 178 4. Empirical example: counts of oaks in Black Rock Forest 179 We report here a new analysis of data previously published and analyzed differently by Cohen et 180 al. (2012). In the Black Rock Forest, Cornwall, New York, 12 contiguous study plots, each 181 approximately 75 m by 75 m, were laid out in a rectangular array with three rows labeled A, B, C 182 and four columns labeled 1, 2, 3, 4. Each plot was identified by its row label and its column 183 label. For example, plot A1 was in the northwest corner of the study area while plot C4 was in 184 the southeast corner. Cohen et al. (2012, p. 15833, Figure 7) gave a map and contour diagram of 185 the study area and plots. Each plot was divided into nine approximately 25 m × 25 m subplots. 186 The number of oaks with diameter not less than 1 inch (2.54 cm) at breast height (approximately 187 1.5 m above ground) was recorded for each of the 108 subplots (Table 2). Here we consider only 188 the data from 2007. Cohen et al. (2012) give additional data and details. 189 At first sight, it seems plausible to propose that the data from this study design might be modeled 190 by the null model, since the plots (blocks) and subplots (observations within a block) are square 191 areas arbitrarily imposed on a contiguous rectangle of hillside forest. However, a one-way 192 analysis of variance (JMP 10, SAS Institute 2012) rejected (with P < 0.0001) the null hypothesis 193 that the variability in oak tree counts between plots could be accounted for by random sampling 194 from a single underlying distribution. Plots B4 and C4 in the southeast corner of the study area had 195 far more oaks per subplot than the other plots (Figure 2A). Plot A1 may have had fewer than average 196 oaks per subplot. 11 197 Table 2. Counts of oak trees in 2007 in Black Rock Forest plots A1, A2, …, C4, by subplot C, E, …, W. C = central, E = east 198 of central, N = north of central, S = south of central, W = west of central. This contingency table was derived from data 199 originally published as an online supplement by Cohen et al. (2012). Plot A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 C 3 11 15 7 4 12 24 38 10 6 8 23 E 8 6 12 8 5 10 15 22 14 9 11 38 N 8 4 7 14 2 13 11 49 5 12 11 23 NE 8 16 8 9 9 12 20 17 6 8 8 51 NW 6 9 14 6 7 4 13 35 3 4 16 10 S 7 15 14 13 8 9 11 19 6 10 10 12 SE 2 18 15 10 7 14 9 40 12 13 9 19 SW 9 5 15 19 8 12 5 18 8 7 6 4 W 6 5 8 15 12 4 11 23 6 6 17 11 Plot 57 89 108 101 62 90 119 261 70 75 96 191 6.33 9.89 12.00 11.22 6.89 7.78 8.33 10.67 21.22 Subplot total Mean Variance 5.75 28.61 11.50 18.44 8.61 10.00 13.22 29.00 13.75 33.19 136.00 12.69 8.75 12 13.50 223.94 200 Figure 2. Oak trees in Black Rock Forest, 2007. (A) Number of oaks per subplot in each of 12plots separately. The horizontal 201 line is the mean number of oaks per subplot, 12.21. A pooled estimate of the error variance (within plots) was 2.18. (B) Mean 202 and variance (plotted on log-log coordinates) of the number of oak trees per subplot in each of 12 plots. The straight (red) line 203 is the least-squares fitted linear model. The slightly convex (blue) curve is the least-squares fitted quadratic model. The 204 rightmost data point corresponds to plot B4, and the next-to-rightmost point corresponds to plot C4. The relationship between 205 variance and mean in these two outlying plots is consistent with that observed in the remaining 10 plots. (C) Frequency 206 histogram of the number of oak trees per subplot. A B C 13 207 A least-squares linear model, TL, had coefficients Log(Variance(NTrees)) = -2.501856 + 208 2.3041749*Log(Mean(NTrees)). To test for linearity, we also fitted by least-squares a quadratic 209 model, obtaining the coefficients Log(Variance(NTrees)) = -0.936401 + 210 1.058334*Log(Mean(NTrees)) + 0.2398604*[Log(Mean(NTrees))]2. In the latter model, a t-test 211 to test the null hypothesis that the coefficient of the quadratic term did not differ significantly 212 from zero had P > 0.736, so there was no statistically significant evidence of nonlinearity. As 213 Figure 2B suggests, the quadratic term did not significantly improve the fit to the data. In the 214 former model, which is TL, the linear coefficient, 2.30417, was approximately 7.22 times its 215 standard error, so there was statistically significant evidence (P < 0.0001) that the slope was 216 positive, as is obvious from Figure 2B. The frequency histogram of oak trees per subplot was 217 right-skewed (Figure 2C). 218 The 108 counts of oak trees per subplot have the following moments (rounded to two decimal 219 places): mean 12.21, standard deviation 8.86, skewness 2.34, kurtosis 9.59, and CV 0.73. From 220 the Theorem (again rounding to two decimal places), bΜ ± [s2(bΜ)]1/2 = 3.23 ± 0.77. The least- 221 squares estimate of the slope, 2.30, differed from the estimate of the slope derived from the 222 (counterfactual) model of iid sampling by 1.20 estimated standard deviations. Despite the 223 inhomogeneity among plots in the mean number of oaks per subplot revealed by the ANOVA, 224 the null model predicted a slope of TL that was in reasonable agreement with that observed. The 225 covariation between the mean and variance of samples from a skewed distribution in this 226 example contributed substantially to the relationship described by TL. 14 227 5. Conclusions 228 The Theorem makes it possible to compare the parameters of Taylor's law estimated from block- 229 structured data with the parameters that would arise from random sampling and random blocking 230 of the underlying frequency distribution. In ecological applications, this comparison makes it 231 possible to compare the effects of biologically based blocking with random sampling of a 232 frequency distribution. It remains to be seen how generally this null model will approximate well 233 the slope of TL estimated in empirical examples. To demonstrate mechanisms other than random 234 sampling that generate TL, it will be necessary to design studies in which the variation in means 235 between blocks is too large to be accounted for by random sampling, or in which the estimate of 236 b differs significantly from that predicted by the Theorem. 237 6. Appendix 238 If X is a real-valued random variable with finite mean E(X) and finite variance var(X), and if a 239 real-valued function f of real x is twice differentiable at E(X), then the delta method (Oehlert 240 1992; Hosmer, Lemeshow and May 2008, pp. 355-358) gives the approximations 241 π(π) ≈ π(πΈ(π)) + (π − πΈ(π)){(π ′ (π₯))|π₯=πΈ(π) }, 242 πΈ(π(π)) ≈ π(πΈ(π)) + { 243 π ′′ (π₯) 2 | π₯=πΈ(π) } ⋅ π£ππ(π), 2 π£ππ(π(π)) ≈ {(π ′ (π₯))|π₯=πΈ(π) } π£ππ(π). 244 In practice, we compute sample moments from observations of X, plug them in to replace the 245 population moments, and accept the result as approximations to the left sides. 15 246 Lemma 1. If x > 0 and f(x) = log(x), then π ′ (π₯) = 1⁄π₯ , π ′′ (π₯) = −π₯ −2. Assume the sample size 247 in block π is ππ (π = 1, 2, … , π) and N is the number of blocks. Assume mj is the sample mean in 248 block j and E(mj) = M > 0. Then the approximations given by the delta method are log ππ ≈ 249 log π + (ππ − π)⁄π, π£ππ(log ππ ) ≈ π ⁄(ππ π2 ), πΈ(log ππ ) ≈ log π − π ⁄(2ππ π2 ). 250 Proof. In the delta method, we set X = mj , f(x) = log(x). From Loève (1977, p. 276, Exercise 5), 251 Oehlert (1992) showed essentially that for π ≥ 0, πΈ {|ππ − π| 252 use this bound with q = 0, 1/2, and 1 separately. Applying Taylor’s expansion to log ππ yields 253 2 2(π+1) } = π (ππ−(π+1) ). We shall 3 log ππ = log π + (ππ − π)/π − (ππ − π) /(2π2 ) + π ((ππ − π) ). 254 Following Oehlert’s notation, we define π(ππ ) = log ππ , and π΄2 (ππ ) = log π + (ππ − π)/ 255 π − (ππ − π) /(2π2 ). Because M > 0 and because the logarithmic function is infinitely 256 differentiable in any open interval that contains M, by Taylor’s theorem, there exists a finite 257 constant C > 0, such that |π(ππ ) − π΄2 (ππ )| ≤ πΆ |(ππ − π) |. From Oehlert (1992) with q = 258 1/2, we have πΈ {πΆ |(ππ − π) |} = π(ππ−3 2 ). Therefore, as ππ → ∞, for 1 < π < 2, ππ π ⋅ 259 πΈ{|π(ππ ) − π΄2 (ππ )|} = π (ππ π−2 ) → 0. Here “→” denotes point-wise convergence. By triangle 260 inequality, πΈ (π(ππ )) = πΈ (π΄2 (ππ )) + π(ππ −π ). After substitution, πΈ(log ππ ) = log π + 261 πΈ(ππ − π)/π − πΈ {(ππ − π) } /(2π2 ) + π(ππ −π ) = log π − π ⁄(2π2 ππ ) + π(ππ −π ). 262 Hence πΈ(log ππ ) ≈ log π − π ⁄(2π2 ππ ). As ππ → ∞, this leads to the first-order approximation 263 πΈ(log ππ ) → log π. 2 3 3 3 ⁄ 3 2 16 264 Now we estimate π£ππ(log ππ ) using the first-order Taylor expansion of log ππ , namely, 265 log ππ = log π + (ππ − π)/π + π ((ππ − π) ). Denote π΄1 (ππ ) = log π + (ππ − π)/π. 266 By Taylor’s theorem, there exists a finite constant πΆ1 > 0, such that |π(ππ ) − π΄1 (ππ )| ≤ 267 πΆ1 |(ππ − π) |. From Oehlert (1992) with q = 0, we have πΈ {πΆ1 |(ππ − π) |} = π(ππ−1 ). We 268 now approximate πΈ {(log ππ ) } using the delta method. 269 2 2 2 2 2 {π(ππ )} = {π(ππ ) − π΄1 (ππ ) + π΄1 (ππ )} 2 272 273 2 = {π(ππ ) − π΄1 (ππ )} + {π΄1 (ππ )} + 2{π΄1 (ππ )} ⋅ {π(ππ ) − π΄1 (ππ )}. 270 271 2 In other words, 2 2 2 {π(ππ )} − {π΄1 (ππ )} = {π(ππ ) − π΄1 (ππ )} + 2{π΄1 (ππ )} ⋅ {π(ππ ) − π΄1 (ππ )}. 2 2 4 Since |π(ππ ) − π΄1 (ππ )| ≤ πΆ1 |(ππ − π) |, |π(ππ ) − π΄1 (ππ )| ≤ πΆ12 |(ππ − π) |. So 2 πΆ1 3 |(ππ − π) |. π 274 {π΄1 (ππ )} ⋅ {π(ππ ) − π΄1 (ππ )} ≤ πΆ1 log π ⋅ |(ππ − π) | + 275 |{π(ππ )} − {π΄1 (ππ )} | ≤ πΆ12 |(ππ − π) | + 2πΆ1 log π ⋅ |(ππ − π) | + 2 2 4 2 2πΆ1 3 |(ππ − π) |. π 276 From Oehlert (1992) using q = 1 for the first term on the right side, q = 0 for the second term, 277 and q = 1/2 for the third term, the expectation of the right side of the above inequality is 278 π(ππ −1 ). As ππ → +∞, for 0 < πΎ < 1, ππΎ πΈ |{π(ππ )} − {π΄1 (ππ )} | ≤ π(ππ πΎ−1 ) → 0. From 279 the triangle inequality, πΈ [{π(ππ )} ] = πΈ [{π΄1 (ππ )} ] + π(ππ −πΎ ). Thus the approximate mean 2 2 2 17 2 2 2 2 280 of (log ππ ) is πΈ {(log ππ ) } ≈ πΈ [{log π + (ππ − π)} /π] = πΈ {(log π)2 + 2 (log π)(ππ − 281 π)/π + (ππ − π) ⁄π2 } = (log π)2 + π ⁄(π2 ππ ). 282 Overall, the estimated variance of log ππ from the delta method using the first-order Taylor 283 expansion of log ππ is π£ππ(log ππ ) = πΈ {(log ππ ) } − {πΈ(log ππ )} ≈ (log π)2 + 284 π ⁄(π2 ππ ) − (log π)2 = π ⁄(π2 ππ ). This proves Lemma 1. 285 Lemma 2. Under the assumptions of Lemma 1, also assume vj is the sample variance in block j 286 and E(vj) = V > 0. Then the approximations given by the delta method are log π£π ≈ log π + 287 (π£π − π)⁄π , π£ππ(log π£π ) ≈ (π4 − ππ−1 π 2 ) /(ππ π 2 ), πΈ(log π£π ) ≈ log π − 2π (π42 − ππ −1). 288 Proof. Setting X = vj and following the same arguments as in the proof of Lemma 1 gives the 289 results. 290 Lemma 3. Under the assumptions of Lemma's 1 and 2, the covariance of the sample mean and 291 sample variance is πππ£(π£π , ππ ) = π3 ⁄ππ ,where μ3 is the third central moment. 292 Zhang (2007) gives a proof of this classical formula, which has been known at least since 1903 293 (Editorial 1903, p. 279, equation (xiii); Editorial 1913, p. 7, equation (xxvi); Neyman 1925, p. 294 479, equation (67); Neyman 1926). 295 Proof of Theorem. When all blocks are weighted equally, the least-squares estimator of b and the 296 unbiased estimator of its sample variance are respectively (Snedecor and Cochran 1980, pp. 155) 297 πΜ = πππ£+ (log π£π , log ππ )⁄π£ππ+ (log ππ ), 2 2 2 π −3 1 π π π 18 π −3 π 298 2 2 π 2 (πΜ) = [π£ππ+ (log π£π )⁄π£ππ+ (log ππ ) − {πππ£+ (log π£π , log ππ )} ⁄{π£ππ+ (log ππ )} ] /(π − 2). 299 The notations πππ£+ (log π£π , log ππ ) and π£ππ+ (log ππ ) are to be read as the covariance and 300 variance across all blocks and not as referring to any single block j. Explicitly, the sample 301 estimators are defined by 302 π£ππ+ (log ππ ) = 1 π−1 2 ∑π π=1(log ππ ) − 2 1 1 π(π−1) 2 (∑π π=1 log ππ ) , 2 1 303 π π£ππ+ (log π£π ) = π−1 ∑π π=1(log π£π ) − π(π−1) (∑π=1 log π£π ) , 304 π π πππ£+ (log π£π , log ππ ) = π−1 ∑π π=1(log ππ ⋅ log π£π ) − π(π−1) (∑π=1 log ππ )(∑π=1 log π£π ). 305 They are all consistent by law of large numbers: as π → ∞, π£ππ+ (log ππ ) →π π£ππ(log ππ ), 306 π£ππ+ (log π£π ) →π π£ππ(log π£π ), and πππ£+ (log π£π , log ππ ) →π πππ£(log π£π , log ππ ). 307 To find the limits in probability of πΜ and π 2 (πΜ), we approximate the above estimators by the 308 delta method using Lemmas 1, 2, and 3. We first approximate the numerator and the 309 denominator of πΜ separately. For the numerator of bΜ, namely, πππ£+ (log π£π , log ππ ), The first 310 term is approximately 1 1 19 π π π=1 π=1 1 1 1 1 ∑(log ππ ⋅ log π£π ) ≈ ∑ {log π + (ππ − π)} ⋅ {log π + (π£π − π)} π−1 π−1 π π 311 π π π=1 π=1 π log π log π = ⋅ log π ⋅ log π + ∑(ππ − π) + ∑(π£π − π) (π − 1)π (π − 1)π π−1 312 π 1 + ∑(ππ − π)(π£π − π). (π − 1)ππ 313 π=1 314 315 316 317 The second term of the numerator of bΜ is approximately 1 1 π(π−1) 1 π 1 π π π (∑π π=1 log ππ )(∑π=1 log π£π ) ≈ π(π−1) ∑π=1 {log π + π (ππ − π)} ⋅ ∑π=1 {log π + π log π log π π (π£π − π)} = π−1 ⋅ log π ⋅ log π + (π−1)π ∑π π=1(ππ − π) + (π−1)π ∑π=1(π£π − π) + 1 π(π−1)ππ π ∑π π=1(ππ − π) ∑π=1(π£π − π). 1 1 318 π Therefore πππ£+ (log π£π , log ππ ) ≈ (π−1)ππ ∑π π=1(ππ − π)(π£π − π) − π(π−1)ππ ∑π=1(ππ − 319 π π π π) ∑π π=1(π£π − π) = (π−1)ππ ∑π=1 ππ π£π − π(π−1)ππ ∑π=1 ππ ∑π=1 π£π = 320 2 1 1 1 2 π denominator of πΜ is approximately π£ππ+ (log ππ ) ≈ π2 {(π−1) ∑π π=1 ππ − π(π−1) (∑π=1 ππ ) } = 321 πππ£+ (ππ ,π£π ) π£ππ+ (ππ ) π£ππ+ (ππ )⁄π2 . Consequently, for large nj, π = 1, 2, … , π, πΜ ≈ ⁄ π2 . By ππ 322 πππ£(ππ ,π£π ) π£ππ(ππ ) consistency, for large N, using Lemma 3 in the numerator, πΜ ≈ ⁄ π2 = ππ 323 1 π3 ⁄ π ππ ππ ππ π2 1 = π3 π⁄π 2 = πΎ1⁄πΆπ . 20 πππ£+ (ππ ,π£π ) ππ . Similarly, the 324 The derivation of π£ππ+ (log π£π ) is the same as that of π£ππ+ (log ππ ). Replacing ππ with π£π and π 325 with π yields π£ππ+ (log π£π ) ≈ π£ππ+ (π£π )⁄π 2. For large N and ππ , π = 1, 2, … , π, substituting into 326 the formula for π 2 (πΜ) the estimators corresponding to π£ππ+ (ππ ), π£ππ+ (π£π ), and πΜ yields 327 (π − 2)π (πΜ) ≈ ( 2 π4 π π2 (π4 π − π 3 − π32 ) π − 1 − πΎ12 2 2 − 1)⁄ 2 − (π3 π⁄π ) = = , (π − 2)π 4 (π − 2)(πΆπ)2 π2 π 328 where κ = μ4/V2 is the kurtosis. This completes the proof. 329 References 330 Ballantyne, F. and Kerkhoff, A. J. (2007). The observed range for temporal mean-variance 331 scaling exponents can be explained by reproductive correlation. Oikos 116, 174-180. 332 Ballantyne, F. (2005). The upper limit for the exponent of Taylor’s power law is a consequence 333 of deterministic population growth. Evolutionary Ecology Research 7(8), 1213-1220. 334 Bartlett, M. S. 1936 Some notes on insecticide tests in the laboratory and in the field. 335 Supplement to the Journal of the Royal Statistical Society 3(2):185-194. Stable URL: 336 http://www.jstor.org/stable/2983670 337 M. S. Bartlett 1947 The use of transformations. Biometrics 3(1):39-52 Stable URL: 338 http://www.jstor.org/stable/3001536 339 Bechinski, E. J. and Pedigo, L. P. (1981). Population dispersion and development of sampling 340 plans for Orius insidiosus and Nabis spp. in soybeans. Environmental Entomology 10(6), 956- 341 959. 21 342 Cohen, J. E. (2013). Taylor's power law of fluctuation scaling and the growth-rate theorem. 343 Theoretical Population Biology doi: 10.1016/j.tpb.2013.04.002. 344 Cohen, J. E., Xu, M. and Schuster, W. S. F. (2013a). Stochastic multiplicative population growth 345 predicts and interprets Taylor’s power law of fluctuation scaling. Proceedings of the Royal 346 Society B, Biological Sciences 280, 20122955. 347 Cohen, J. E., Xu, M. and Brunborg, H. (2013b). Taylor's law applies to spatial variation in a 348 human population. Genus 69(1), 25-60. 349 Cohen, J. E., Xu, M. and Schuster, W. S. F. (2012). Allometric scaling of population variance 350 with mean body size is predicted from Taylor’s law and density-mass allometry. Proceedings of 351 the National Academy of Sciences, USA 109(39), 15829–15834. 352 Davidian, M. and Carroll, R. J. 1987 Variance function estimation. Journal of the American 353 Statistical Association 82(400):1079-1091, December. Stable URL: 354 http://www.jstor.org/stable/2289384 he 355 Editorial [probably K. Pearson, then the editor] (1903). On the probable errors of frequency 356 constants. Biometrika. 2(3), 273-281. Stable URL: http://www.jstor.org/stable/2331603. 357 Editorial [probably K. Pearson, then the editor] (1913). On the probable errors of frequency 358 constants. Biometrika. 9(1/2), 1-10. Stable URL: http://www.jstor.org/stable/2331796. 359 Eisler, Z., Bartos, I. and Kertész, J. (2008). Fluctuation scaling in complex systems: Taylor’s law 360 and beyond. Advances in Physics 57, 89-142. 22 361 Hosmer, D. W., Lemeshow, S. and May S. (2008). Applied Survival Analysis: Regression 362 Modeling of Time-to-Event Data, 2nd Edition, John Wiley & Sons, Inc. 363 Jørgensen, B., Kendal, W. S., Demétrio, C. G. B. and Holst, R. (Draft 2012). The ecological 364 footprint of Taylor’s universal power law. Draft. 365 Jørgensen, B. (1997). The Theory of Dispersion Models. Chapman & Hall, London. 366 Kaltz, O., Escobar-Páramo, P., Hochberg, M. E. and Cohen, J. E. (2012). Bacterial microcosms 367 obey Taylor's law: effects of abiotic and biotic stress and genetics on mean and variance of 368 population density. Ecological Processes, 1:5. 369 Kendal, W.S. and Jørgensen, B. (2011). Taylor's power law and fluctuation scaling explained by 370 a central-limit-like convergence. Physical Review E 83, 066115. 371 Kilpatrick, A. M. and Ives, A. R. (2003). Species interactions can explain Taylor’s power law for 372 ecological time series. Nature 422, 65-68. 373 Kogan, M., Ruesink, W. G. and McDowell, K. (1974). Spatial and temporal distribution patterns 374 of the bean leaf beetle, Cerotoma trifurcata (Forster), on soybeans in Illinois. Environmental 375 Entomology 3(4), 607-617. 376 Loève, M. (1977). Probability Theory I, 4th edition, New York, Springer-Verlag. 377 Neter, J., Wasserman, W. and Kutner, M. H. (1990). Applied Linear Statistical Models, 3rd 378 edition, 622-623.CRC Press. 23 379 Neyman, J. (1925). Contributions to the theory of small samples drawn from a finite population. 380 Biometrika 17(3/4), 472-479. Stable URL: http://www.jstor.org/stable/2332092. 381 Neyman, J. (1926). On the correlation of the mean and the variance in samples drawn from an 382 "infinite" population. Biometrika 18(3/4), 401-413. Stable URL: 383 http://www.jstor.org/stable/2331958. 384 Oehlert, G. W. (1992). A note on the delta method. The American Statistician 46(1), 27–29. 385 Ramsayer, J., Fellous, S., Cohen, J. E. and Hochberg, M. E. (2012). Taylor’s law holds in 386 experimental bacterial populations but competition does not influence the slope. Biology Letters 387 8(2), 316-319. 388 Rohatgi, V. K. and Székely, G. J. (1989). Sharp inequalities between skewness and kurtosis. 389 Statistics & Probability Letters 8, 297-299. 390 SAS Institute Inc. 2012. JMP® 10 Basic Analysis and Graphing. Cary, NC: SAS Institute Inc. 391 Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods. 7th ed. Iowa State University 392 Press, Ames. 393 Taylor, L. R. (1961). Aggregation, variance and the mean. Nature 189(4766), 732–735. 394 Taylor, L. R. 1984 Assessing and interpreting the spatial distributions of insect populations. 395 Annual Review of Entomology 29: 321-357. doi:10.1146/annurev.en.29.010184.001541 396 Tweedie, M. C. K. 1946 The regression of the sample variance on the sample mean. J. Lond. 397 Math. Soc. 21:22–28. doi:10.1112/jlms/s1-21.1.22 24 398 Tweedie, M. C. K. 1947 Functions of a statistical variate with given means, with special 399 reference to Laplacian distributions. Proc. Cambridge Philos. Soc. 43:41–49. 400 doi:10.1017/S0305004100023185 401 Tweedie, M. C. K. 1984 An index which distinguishes between some important exponential 402 families. In: Statistics: Applications and New Directions. Proceedings of the Indian Statistical 403 Institute Golden Jubilee International Conference, J. K. Ghosh and J. Roy, eds. Indian Statistical 404 Institute, Calcutta. Pp. 579-604. 405 Wilson, L. T., Sterling, W. L., Rummel, D. R. and DeVay, J. E. (1989). Quantitative sampling 406 principles in cotton. Ch. 5 In Integrated Pest Management Systems and Cotton Production (eds 407 R. E. Frisbie, K. M. El-Zik, and L. T. Wilson) pp. 85-119. John Wiley & Sons, Inc. 408 Zhang, L. (2007). Sample mean and sample variance: their covariance and their (in)dependence. 409 The American Statistician 61(2), 159-160. 25