SAMPLE STATISTICS A random sample x1 , x2 , . . . , xn from a distribution f (x) is a set of independently and identically variables with xi ∼ f (x) for all i. Their joint p.d.f is f (x1 , x2 , . . . , xn ) = f (x1 )f (x2 ) · · · f (x2 ) = n f (xi ). i=1 The sample moments provide estimates of the moments of f (x). We need to know how they are distributed The mean x̄ of a random sample is an unbiased estimate of the population moment µ = E(x), since x n 1 i E(x̄) = E = E(xi ) = µ = µ. n n n The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore x n 2 σ2 1 i V (xi ) = 2 σ = = 2 . V (x̄) = V n n n n Observe that V (x̄) → 0 as n → ∞. Since E(x̄) = µ, the estimates become increasingly concentrated around the true population parameter. Such an estimate is said to be consistent. 1 The sample variance is not an unbiased estimate of σ 2 = V (x), since 1 E(s2 ) = E (xi − x̄)2 n 1 2 (xi − µ) + (µ − x̄) =E n 1 =E (xi − µ)2 + 2(xi − µ)(µ − x̄) + (µ − x̄)2 n = V (x) − 2E{(x̄ − µ)2 } + E{(x̄ − µ)2 } = V (x) − V (x̄). Here, we have used the result that 1 E (xi − µ)(µ − x̄) = −E{(µ − x̄)2 } = −V (x̄). n It follows that (n − 1) σ2 2 2 E(s ) = V (x) − V (x̄) = σ − = σ2 . n n Therefore, s2 is a biased estimator of the population variance. For an unbiased estimate, we should use (xi − x̄)2 2 2 n = . σ̂ = s n−1 n−1 However, s2 is still a consistent estimator, since E(s2 ) → σ 2 as n → ∞ and also V (s2 ) → 0. The value of V (s2 ) depends on the distribution of underlying population, which is often assumed to be a normal. 2 Theorem. Let x1 , x2 , . . . , xn be a random sample from the normal population N (µ, σ 2 ). Then, y = ai xi is normally distributed with E(y) = ai E(xi ) = µ ai and V (y) = a2i V (xi ) = σ 2 a2i . Any linear function of a set of normally distributed variables is normally distributed. If xi ∼ N (µ, σ 2 ); i = 1, . . . , n is a normal random sample then x̄ ∼ N (µ, σ 2 /n). Let µ = [µ1 , µ2 , . . . , µn ] = E(x) be the expected value of x = [x1 , x2 , . . . , xn ] and let Σ = [σij ; i, j = 1, 2, . . . , n] be the variance–covariance matrix. If a = [a1 , a2 , . . . , an ] is a constant vector, then a x ∼ N (a µ, a Σa) is a normally distributed with a mean of E(a x) = a µ = ai µi and a variance of V (a x) = a Σa = = i a2i σii + i j i 3 ai aj σij j=i ai aj σij . Let ι = [1, 1, . . . , 1] . Then, if x = [x1 , x2 , . . . , xn ] has xi ∼ N (µ, σ 2 ) for all i, there is x ∼ N (µι, σ 2 In ), where µι = [µ, µ, . . . , µ] and In is an identity matrix of order n. Writing this explicitly, we have 2 µ x1 0 ··· 0 σ x2 µ 0 σ 2 · · · 0 . x= .. . . .. ... ∼ N .. , ... . . . . xn 0 µ Then, there is x̄ = (ι ι)−1 ι x = 0 · · · σ2 1 ι x ∼ N (µ, σ 2 /n) n and 1 2 ι {σ 2 I}ι σ 2 ι ι σ2 = 2 = σ = , n n2 n n where we have used repeatedly the result that ι ι = n . If we do not know the form of the distribution from which the sample has been taken, we can still say that, under very general conditions, the distribution of x̄ tends to normality as n → ∞: The Central Limit Theorem states that, if x1 , x2 , . . . , xn is a random sample from a distribution with mean µ and variance σ 2 , then the distribution of x̄ tends to the √ 2 normal distribution N (µ, σ /n) as n → ∞. Equivalently, (x̄ − µ)/(σ/ n) tends in distribution to the standard normal N (0, 1) distribution. 4 Distribution of the sample variance To describe the distribution of the sample variance, we need to define the chi-square distribution: Definition. If z ∼ N (0, 1) is distributed as a standard normal variable, then z 2 ∼ χ2 (1) is distributed as a chi-square variate with one degree of freedom. If zi ∼ N (0, 1); i = 1, 2, . . . , n are a set of independent and identically distributed standard normal variates, then the sum of their squares is a chi-square variate of n degrees of freeedom: n zi2 ∼ χ2 (n). u= i=1 The mean and the variance of the χ2 (n) variate are E(u) = n and V (u) = 2n respectively. Theorem. The sum of two independent chi-square variates is a chi-square variate with degrees of freedom equal to the sum of the degrees of freedom of its additive components. If x ∼ χ2 (n) and y ∼ χ2 (m), then (x + y) ∼ χ2 (m + n). If x1 , x2 , . . . , xn is a random sample from a standard normal N (0, 1) distribution, then x2i ∼ χ2 (n). If x1 , x2 , . . . , xn is a random sample from an N (µ, σ 2 ) distribution, then (xi −µ)/σ ∼ N (0, 1) and, therefore, n (xi − µ)2 2 ∼ χ (n). 2 σ i=1 5 Consider the identity 2 (xi − µ) = ({xi − x̄} + {x̄ − µ}) 2 2 = {xi − x̄} + 2{x̄ − µ}{xi − x̄} + {x̄ − µ} = {xi − x̄}2 + n{x̄ − µ}2 , 2 which follows from the fact that the cross product term is {x̄ − µ} decomposition of a sum of squares features in the following result: {xi − x̄} = 0. This The Decomposition of a Chi-square statistic. If x1 , x2 , . . . , xn is a random sample from a standard normal N (µ, σ 2 ) distribution, then n n (x̄ − µ)2 (xi − µ)2 (xi − x̄)2 = +n , 2 2 2 σ σ σ i=1 i=1 with (1) (2) n (xi − µ)2 i=1 n i=1 σ2 ∼ χ2 (n), (xi − x̄)2 ∼ χ2 (n − 1), 2 σ (x̄ − µ)2 2 ∼ χ (1), (3) n σ2 where the statistics under (2) and (3) are independently distributed. 6 Sampling Distributions (1) If u ∼ χ2 (m) and v ∼ χ2 (n) are independent chi-square variates with m and n degrees of freedom respectively, then F = u m v ∼ F (m, n), n which is the ratio of the chi-squares divided by their respective degrees of freedom, has an F distribution of m and n degrees of freedom, denoted by F (m, n). (2) If z ∼ N (0, 1) is a standard normal variate and if v ∼ χ2 (n) is a chi-square variate of n degrees of freedom, and if the two variates are distributed independently, then the ratio v ∼ t(n) t=z n has a t distributed of n degrees of freedom, denoted t(n). Notice that z2 2 t = ∼ v/n χ2 (1) 1 7 χ2 (n) n = F (1, n). CONFIDENCE INTERVALS Let z ∼ N (0, 1). From the tables of the standard normal, we can find numbers a, b such that, for any Q ∈ (0, 1), there is P (a ≤ z ≤ b) = Q. The interval [a, b] is called a Q × 100% confidence interval for z. The length of the interval is minimised when it is centred on E(z) = 0. A confidence interval for the mean. Let xi ∼ N (µ, σ 2 ); i = 1, . . . , n be a random sample. Then σ2 x̄ − µ √ ∼ N (0, 1). x̄ ∼ N µ, and n σ/ n Therefore, we can find numbers ±β such that x̄ − µ √ ≤ β = Q. P −β ≤ σ/ n But, the following events are equivalent: σ σ x̄ − µ √ ≤ β ⇐⇒ x̄ − β √ ≤ µ ≤ x̄ + β √ . −β ≤ σ/ n n n √ √ The probability that [x̄ − βσ/ n, x̄ + βσ/ n] falls over µ is σ σ = Q, P x̄ − β √ ≤ µ ≤ x̄ + β √ n n which means that we Q × 100% confident that µ lies in the interval. 8 A confidence interval for µ when σ 2 is unknown. The unbiased estimate of σ 2 is σ̂ 2 = (xi − x̄)2 /(n − 1). When σ 2 is replaced by σ̂ 2 , √ n(x̄ − µ) ∼ N (0, 1) σ is replaced by √ n(x̄ − µ) ∼ t(n − 1). σ̂ To demonstrate this result, consider writing √ ! " √ 2 (xi − x̄) n(x̄ − µ) n(x̄ − µ) = , σ̂ σ σ 2 (n − 1) and observe that σ is cancelled from the numerator and the denominator. The denominator contains (xi − x̄)2 /σ 2 ∼ χ2 (n − 1). The numerator is a standard normal variate. Therefore " 2 χ (n − 1) N (0, 1) ∼ t(n − 1). n−1 To construct a confidence interval, ±β, from the table of the N (0, 1) distribution, are replaced by corresponding numbers ±b, from the t(n − 1) table. Then σ̂ σ̂ P x̄ − b √ ≤ µ ≤ x̄ + b √ = Q. n n 9 A confidence interval for the difference between two means. Imagine a treatment that affects the mean of a normal population without affecting its variance. To establish a confidence interval for the change in the mean, take samples before and after the treatment. Before treatment, there is xi ∼ N (µx , σ 2 ); i = 1, . . . , n and x̄ ∼ N σ2 µx , n , and, after treatment, there is yj ∼ N (µy , σ 2 ); j = 1, . . . , m and ȳ ∼ N σ2 µy , m . Assuming that the samples are mutually independent, the difference between their means is (x̄ − ȳ) ∼ N Hence σ2 σ2 µx − µy , + n m (x̄ − ȳ) − (µx − µy ) ∼ N (0, 1). 2 2 σ σ + n m 10 . If σ 2 were known, then, for any given value of Q ∈ (0, 1), a number β can be found from the N (0, 1) table such that P (x̄ − ȳ) − β σ2 + n σ2 m ≤ µx − µy ≤ (x̄ − ȳ) + β σ2 n + σ2 m " = Q, giving a confidence interval for µx − µy . Usually, σ 2 has to be estimated from the sample information. There are (xi − x̄)2 σ2 ∼ χ (n − 1) 2 and (yj − ȳ)2 2 ∼ χ (m − 1), σ2 which are independent variates with expectations equal to the numbers of their degrees of freedom. The sum of independent chi-squares is itself a chi-square with degrees of freedom equal to the sum of those of its constituent parts. Therefore, (xi − x̄)2 + σ2 (yj − ȳ)2 ∼ χ2 (n + m − 2) has an expected value of n + m − 2, whence the unbiased estimate of the variance is 2 σ̂ = (xi − x̄)2 + (yj − ȳ)2 . n+m−2 11 If the estimate is used in place of the unknown value of σ 2 , then we get ! (xi − x̄)2 + (yj − ȳ)2 (x̄ − ȳ) − (µx − µy ) (x̄ − ȳ) − (µx − µy ) & = 2 2 σ 2 (n + m − 2) σ2 σ2 σ̂ σ̂ + n m + n m N (0, 1) = t(n + m − 2), ∼& 2 χ (n+m−2) n+m−2 which is the basis for a confidence interval. A confidence interval for the variance. If xi ∼ N (µ, σ 2 ); i = 1, . . . , n is a random sample, then (xi − x̄)2 /(n−1) is an unbiased estimate of the variance and (xi − x̄)2 /σ 2 ∼ χ2 (n − 1). Therefore, from the appropriate chi-square table, we can find numbers α and β such that (xi − x̄)2 ≤β =Q P α≤ σ2 for some chosen Q ∈ (0, 1). From this, it follows that P 1 ≥ α 2 σ 1 ≥ β (xi − x̄)2 = Q ⇐⇒ P (xi − x̄) ≤ σ2 ≤ β and the latter provides a confidence interval for σ 2 . 12 2 (xi − x̄) α 2 =Q The confidence interval for the ratio of two variances. Imagine a treatment that affects the variance of a normal population. It is possibile that the mean is also affected. Let xi ∼ N (µx , σx2 ); i = 1, . . . , n be a random sample taken from the population before treatment and let yj ∼ N (µy , σy2 ); j = 1, . . . , m be a random sample taken after treatment. Then (xi − x̄)2 σ2 ∼ χ (n − 1) 2 (yj − ȳ)2 ∼ χ2 (m − 1), 2 σ and are independent chi-squared variates, and hence F = (xi − x̄) σx2 (n − 1) 2 (yj − ȳ) σy2 (m − 1) 2 " ∼ F (n − 1, m − 1). It is possible to find numbers α and β such that P (α ≤ F ≤ β) = Q, where Q ∈ (0, 1) is some chose probability value. Given such values, we may make the following probability statement: + * 2 2 2 σy (yj − ȳ) (n − 1) (yj − ȳ) (n − 1) ≤ β ≤ = Q. P α σx2 (xi − x̄)2 (m − 1) (xi − x̄)2 (m − 1) 13