Chapter 5: Some Elementary Statistical Inferences 5.1 Sampling and Statistics Definition • (Random Sample) The random variables X1 , · · · , Xn constitude a random sample on a random variable X if X1 , · · · , Xn are iid with the same distribution as that of of X. • (Statistic) Any function of observations is called a statistic. If X1 , · · · , Xn are random sample with common PDF (or PMF) f (x) and CDF F (x), the the joint PDF (or PMF) is fX1 ,···,Xn (x1 , · · · , xn ) = n Y f (xi ) i=1 and the joint CDF is FX1 ,···,Xn (x1 , · · · , xn ) = n Y F (xi ). i=1 In addition, if a parameter is contained in f (x) so that we can write f (x) = fθ (x), then the likelihood function is defined by their joint PDF (or PMF) as L(θ) = n Y fθ (xi ). i=1 Example: Let X1 , · · · , Xn be random sample with PDF fθ (x). Assume fθ (x) is the PDF of N (µ, σ 2 ), where θ = (µ, σ 2 ). (a) If both µ and σ 2 are unknown, the quantities are 3, n, X̄, S 2 , Pn 4 i=1 Xi . Determined whether they are statistics. Answer: 3, n, X̄, S 2 and Pn i=1 Pn i=1 (Xi − µ)2 /n, X̄/σ, and Xi4 are statistic. (b) If µ is known but σ 2 is not, which of those in part (a) are statistics. Answer: 3, n, X̄, S 2 , Pn i=1 (Xi − µ)2 /n, Pn i=1 Xi4 . (c) If σ is known but µ is not, which of those in part (a) are statistics. Answer: 3, n, X̄, S 2 , X̄/σ, Pn i=1 Xi4 . 5.2 Order Statistics • Let X1 , · · · , Xn be iid random sample from a distribution with PDF (or PMF) f (x). Let X(i) be the i-th least value of X1 , · · · , Xn . Then, if X1 , · · · , Xn is continuous, then the joint PDF of X(1) , · · · , X(n) is g(y1 , · · · , yn ) = n! n Y i=1 if y1 ≤ y2 ≤ · · · ≤ yn . 1 f (yi ) • Let F (x) be a CDF of X. Then, F −1 (p) for 0 < p < 1 is the quantile function of X. F −1 (0.5) is the median. • Sample quantile: X([pn]) is the pth sample quantile or 100pth precentile of the sample. • X([ n2 ]) is the sample median. Example: • Example 5.2.4. • If f (x) is the density of U nif orm[0, 1], the the joint PDF of the order statistic X(1) , · · · , X(n) is g(y1 , · · · , yn ) = n! if 0 ≤ y1 ≤ y2 ≤ · · · ≤ yn ≤ 1. This is called the Dirichlet distribution. In addition, the marginal PDF of X(k) is gX( n) (x) = n! xk−1 (1 − x)n−k . (k − 1)!(n − k)! This is the density of β(k, n − k + 1). • Let W1 , · · · , Xn+1 be iid Exp(1).In the above example. It can be further seen that the joint density of X(k1 ) , X(k2 ) with 1 ≤ k1 < k2 ≤ n is the same as the joint density of P k1 i=1 Wi ( Pn+1 , i=1 Wi Pk2 i=1 Pn+1 i=1 Wi ). Wi We can use this to derive the asymptotic distribution of the joint PDF of order statistics. • In general, if f (x) is a general density, we can use the transformation of Yi = F −1 (xi ) and Y(i) = F −1 (x(i) ). We can also analytically derive the joint asymptotic distirbutions of order statistics. • Find the asymtotic distirbution of the sample median. Derive the formula for confidence interval. • Find the limiting distirbution of the pth sample quantile. Derive the forumae for confidence interval. 5.4 More on Confidence Intervals One sample case. Independent normal sample. If X1 , · · · , Xn are iid N (µ, σ 2 ), then the (1 − α)100% confidence interval for µ is S x̄ ± tα/2,n−1 √ n 2 and the (1 − α)100% confidence interval for σ 2 is [ (n − 1)S 2 (n − 1)S 2 , ]. χ2α/2,n−1 χ21−α/2,n−1 If σ is known, then the (1 − α)100% confidence interval for µ is σ x̄ ± zα/2,n−1 √ . n iid sample when n is large If X1 , · · · , Xn are iid with common mean µ and variance σ 2 , then if n is large (e.g. n > 40), the (1 − α)100% confidence interval for µ is S x̄ ± zα/2,n−1 √ n and the confidence interval for σ 2 is still as the same of the normal sample. Binomial proportions. If we observed X ∼ Bin(n, p), then the (1 − α)% confidence interval for p is s p̂(1 − p̂) p̂ ± zα/2 . n Two sample case. Large sample case. Suppose we observed X1 , · · · , Xn1 are iid with common mean µ1 and variance σ12 and Y1 , · · · , Yn2 are iid with common mean µ2 and variance σ22 . Suppose both n1 and n2 are large, then the (1 − α)100% (large sample) confidence interval for µ1 − µ2 is s x̄ − ȳ ± z α2 s21 s2 + 2. n1 n2 Pooled t case. Suppose we observed X1 , · · · , Xn1 ∼iid N (µ1 , σ 2 ) and Y1 , · · · , Yn2 ∼iid N (µ2 , σ 2 ). Let (n1 − 1)S12 + (n2 − 1)2 S22 . n1 + n2 − 2 Then, the (1 − α)100% pooled t confidence interval for µ1 − µ2 is Sp2 = s x̄ − ȳ ± tα/2,n1 +n2 −2 sp 1 1 + . n1 n2 Q: what is the answer when σ12 6= σ22 ? Q: what is the confidence interval for σ12 /σ22 if we have two sample normal data? Two-sample binomial for the difference in proportion. If we observe X ∼ Bin(n1 , p1 ) and Y ∼ Bin(n2 , p2 ), then the (1 − α)100% confidence interval for p1 − p2 is s p̂1 − p̂2 ± z α2 p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) + . n1 n2 5.5 Introduction to Hypotheses Testing Assume the PDF (or PMF) is f (x; θ), θ ∈ Ω. Assume Ω0 ∪ Ω1 = Ω and Ω0 ∩ Ω1 = φ. Suppose we consider the hypothesis H0 : θ ∈ Ω0 versus H1 : θ ∈ Ω1 . We will draw conclusion based on observations. Look at the following 2 × 2 table. 3 Truth Conclusion Accept H0 Reject H0 H0 Correct Type I Error H1 Type II Error Correct We call P (Reject H0 |H0 ) is the type I error probability and P (Accept H0 |H1 ) is the type II error probability. We call the maximum of type I error probability is the significance level, which is usually denoted by α. That is α = max P (Reject H0 |H0 ). θ∈Ω0 The power function of a test is defined by P (Reject H0 |θ), whic is a function of θ. Usually, we find the test, we need α is controlled by a given value. Usually, in a test we need to find the rejection region C based on observed value of a statistic T . That is we reject H0 if T ∈ C and we accept H0 if T 6∈ C. Please understand the above concepts based on the following examples: • Suppose X1 , · · · , Xn are iid N (µ, 1). Test is (a) : H0 : µ ≤ 0 ↔ H1 : µ > 0 or (b) : H0 : µ ≥ 0 ↔ H1 < 0. or (c) : H0 : µ ∈ [µ1 , µ2 ] ↔ H0 6∈ [µ1 , µ2 ]. • Suppose X ∼ Bin(n, p). Test is (a) H0 : p ≤ p0 ↔ H1 : p < p0 or (b) H0 : p ≥ p0 ↔ H1 : p > p0 or (c) H0 : p ∈ [p1 , p2 ] ↔ H1 : p 6∈ [p1 , p2 ]. 4 Connection between confidence interval and test. We can reject H0 if the confidence interval of θ is not included in the confidence interval. 5.6 Additional Comments About Statistical Tests We will focus on the following examples: Example 5.6.1: Let X1 , · · · , Xn be iid sample with mean µ and variance σ 2 . Test H0 : µ = µ0 ↔ H1 : µ 6= µ0 . Derive the power function: under (a) n is large; (b) Xi are normal samples. Example 5.6.2: Assume X1 , · · · , Xn1 are iid N (µ1 , σ 2 ) and Y1 , · · · , Yn2 are iid N (µ2 , σ 2 ). Test H0 : µ1 = µ2 ↔ H1 : µ1 6= µ2 . Derive the power function. Example 5.6.3: Suppose X1 , · · · , Xn are iid Bernoulli(p). Test H0 : p = p0 ↔ H1 : p 6= p0 by (a) using exact binomial method; and (b) the normal approximation method. Approximately derive the power function of (b). Example 5.6.4: Suppose X1 , · · · , X10 are iid sample from P oisson(θ). Suppose we reject H0 : θ ≤ 0.1 ↔ H1 : θ > 0.1 if Y = 10 X Xi ≥ 3. i=1 Find the type I error probability, type II error probability and significance level. Example 5.6.5: Let X1 , · · · , X25 be iid sample from N (µ, 4). Consider the test H0 : µ ≥ 77 ↔ H1 : µ < 77. Suppose we observe x̄ = 76.1. Then, p-value is 76.1 − 77 Pµ=77 (X̄ ≤ 76.1) = Φ( q ) = Φ(−2.25) = 0.012. 4/25 Remark: Observed significance level is called the p-value, which is the probability of the test statistic greater than the observed value under the null hypothesis. For example, in Example 5.6.4, the p-value is P (P oisson(0.1) ≥ y). 5.7 Chi-Square Tests. 5 Consider a test H0 : θ ∈ Θ0 ↔ H1 : θ ∈ Θ1 . Suppose under H0 we estimate µi = E(Xi ) by µ̂i and we estimate σi2 = V (Xi ) by σ̂i2 . Pearson χ2 statistic. The Pearson χ2 statistic for independent random samples is generally defined by n X (Xi − µ̂i )2 q Y = . σ̂i2 i=1 The idea is from independent normal samples as X1 , · · · , Xn are independent with N (µi , σi2 ) respectively. Then, n X (Xi − µi )2 X2 = ∼ χ2n . 2 σi i=1 Loglikelihood ratio statistic. Let `(θ) be the likelihood function. Then, the loglikelihood ratio statistic is defined by Λ = 2 log supθ∈Θ `(θ) = 2[log sup `(θ) − sup `(θ). supθ∈Θ0 `(θ) θ∈Θ θ∈Θ0 We can show both X 2 and Λ are approximately chi-square distributioned. In general, we call X 2 Pearson goodness of fit and Λ deviance goodness of fit statistics. Particularly, their degrees of freedom equal to the difference of degrees of freedom between Θ and Θ0 . Let us try to understand them in the following examples for X 2 . We will look at Λ in detail in Chapter 6. Example 5.7.1 Suppose we filp a die n times. Let Xi be the number observed at the i-th time. Find Pearson χ2 statistic X 2 . Example 5.7.2 Suppose we have X1 , · · · , Xn samples from a distribution taking values over [0, 1]. How to find the Pearson χ2 statistic X 2 to test whether the distribution is uniform. Suppose we partition [0, 1] into four intervals [0, 1/4], (1/4, 1/2], (1/2, 3/4] and (3/4, 1]. 6