MATH2411 Applied Statistics Tutorial Notes 3 Warm-up (Sample Mean and Sample Variance) Referring to Example 2, eight students are randomly selected (with replacement) from MATH2411 T1A. Their midterm scores are recorded below: Suppose Billy will play the game twice (choose one ball with replacement each time). Use X to denote the average payoff of the two rounds. (c) Complete the following table. Outcome WW WB BW BB Prob. of outcome 1 9 2 9 2 9 4 9 Round 1 payoff x1 10 10 −2 −2 Round 2 payoff x2 10 −2 10 −2 Sample Mean x1 + x2 x= 2 10 4 4 −2 0 72 72 0 34, 29, 23, 31, 34, 28, 32, 29 (a) Find the Sample Mean x P8 xi 34 + 29 + 23 + 31 + 34 + 28 + 32 + 29 x = i=1 = = 30 8 8 (b) Find the Sample Variance s2 P8 (xi − x)2 42 + (−1)2 + (−7)2 + 12 + 42 + (−2)2 + 22 + (−1)2 92 s2 = i=1 = = 8−1 7 7 (c) Find the Sample Standard Deviation s r √ √ 92 2 161 s = s2 = = ≈ 3.6253 7 7 Sample Variance 2 1 X (xi − x)2 s2 = 2 − 1 i=1 Example 1 (Sample Mean and Sample Variance) In a bag there are 3 balls of same size, and they are made by the same material. One ball is colored white and the other two balls are colored black. Billy randomly chooses a ball from the bag. If the ball chosen is white, he gains $10, while if the ball chosen is black, he will lose $2. Use $X to denote the pay-off of the game (single round). (d) Find E(X) (a) Find E(X) (e) Find V ar(X) E(X) = 10 2 6 1 + (−2) = =2 3 3 3 (b) Find V ar(X) V ar(X) = E(X 2 ) − (E(X))2 1 2 = 102 + (−2)2 − 22 3 3 108 = −4 3 = 32 E(X) = 1 2 2 4 18 (10) + (4) + (4) + (−2) = =2 9 9 9 9 9 V ar(X) = E((X)2 ) − (E(X))2 1 2 2 4 =[ (102 ) + (42 ) + (42 ) + (−2)2 9 9 9 9 100 + 32 + 32 + 16 = −4 9 = 20 − 4 = 16 (f) Find E(s2 ) E(s2 ) = 1 2 2 4 (0) + (72) + (72) + (0) = 32 9 9 9 9 ] − 22 Exercise 1 In Example 1, Suppose Billy will play the game 3 times (choose one ball with replacement each time). Use M to denote the average payoff of the 3 rounds. (g) Complete the following table Outcome www wwb wbw bww wbb bwb bbw bbb Prob of outcome 1 27 2 27 2 27 2 27 4 27 4 27 4 27 8 27 Round 1 payoff m1 10 10 10 −2 10 −2 −2 −2 Round 2 payoff m2 10 10 −2 10 −2 10 −2 −2 Round 3 payoff m3 10 −2 10 10 −2 −2 10 10 Sample Variance s2 P3 (mi − m)2 = i=1 3−1 0 E(X) = E(X) = 3.2 r r q 1.6 V ar(X) 1.62 1 σX = V ar(X) = = =√ = n 64 5 64 ∴ P (3.2 ≤ X ≤ 3.4) = P ( −2 6 6 6 2 2 2 3.2 − 3.2 1 5 ≤ X − 3.2 1 5 ≤ 3.4 − 3.2 1 5 ) ≈ P (0 ≤ Z ≤ 1) ≈ 0.8413 − 0.5 −2 = 0.3413 48 48 48 48 48 48 0 (h) Find E(M ) 1 2 4 8 10 + 36 + 24 − 16 54 (10) + 3 [ (6) ] + 3 [ (2) ] + (−2) = = =2 27 27 27 27 27 27 (i) Find V ar(M ) E(M ) = V ar(M ) = E((M )2 ) − (E(M ))2 1 2 2 4 2 8 ={ (102 ) + 3 [ (6 ) ] + 3 [ (2 ) ] + (−2)2 27 27 27 27 100 + 216 + 48 + 32 −4 27 = 44 − 12 3 = Exercise 2 (Central Limit Theorem) If a certain machine makes electrical resistors having a mean resistance of 40 ohms and a standard deviation of 2 ohms, what is the probability that a random sample of 36 of these resistors will have a combined resistance of more than 1458 ohms? Let X in ohms be the resistance of a resistor. Then, r we have: r q 1 1 1 V ar(X) = (2)2 = E(X) = E(X) = 40 and σX = V ar(X) = n 36 3 } − 22 32 3 ∴ P (36X > 1458) = P (1458/36) − 40 X − 40 > (1/3) (1/3) = P ( Z > 3 (40.5 − 40) ) = 1 − P (Z ≤ 1.5) ≈ 1 − 0.9332 = 0.0668 (j) Find E(s2 ) E(s2 ) = If a random sample of 64 customers are to be observed, find the probability that their mean time spent at the bank counter is at least 3.2 minutes but no more than 3.4 minutes. = P (Z ≤ 1) − P (Z ≤ 0) Sample Mean m m1 + m2 + m3 = 3 = Example 2 (Central Limit Theorem) Suppose that, the amount of time that a Hang Seng Bank HKUST Branch staff serves a customer is a random variable X with expectation E(X) = 3.2 minutes and a standard deviation σX = 1.6 minutes. 1 2 4 8 3 (2 + 4) 48 (0) + 3 [ (48) ] + 3 [ (48) ] + (0) = = 32 27 27 27 27 27 Example 3 (Distribution of Sample Mean) The weight of a randomly selected can of soft drink ST A is known to have a normal distribution with mean 304g and a standard deviation of 2g. A brief summary of course materials: Consider a random sample of size n, X1 , X2 , · · · , Xn from a common distribution X with mean E(X) = µ and variance V ar(X) = σ 2 . Let X in g be the weight of the soft drinks. (a) What is the probability that a random can has weight between 300g and 308g? Required probability = P (300 < X < 308) X − 304 308 − 304 300 − 304 < < =P 2 2 2 (1) If θ is the underlying population parameter, then any function b 1 , X2 , · · · , Xn ) can be considered as an estimator of θ. θb is called θ(X b = θ. an unbiased estimator of θ if E(θ) n (2) The Sample Mean X = = P (−2 < Z < 2) 1X Xi is an unbiased estimator of population n i=1 mean µ. That is, E(X) = µ. = 0.9772 − 0.0228 = 0.9544 n (3) The Sample Variance S 2 = 1 X (Xi − X)2 is an unbiased estimator of n − 1 i=1 population variance σ 2 . That is, E(S 2 ) = σ 2 . (b) The soft drink are randomly packed and sold in 6-can packages. What is the largest weight (integer gram) can be printed on a single can so that the probability of a customer getting an underweight 6-can pack is no more than 1%? Let y in g be the printable weight so that the probability of any 6-can pack being underweight is no more than 1%. 22 2 With X ∼ N 304, = N 304, , we need the least integer Y satisfying 6 3 P ∵ P (X < Y ) ≤ 1% ! X − 304 y − 0.5 − 304 < ≤ 0.01 = P (Z < −2.33) 2 2 X − 304 √2 6 √ √ 6 ∼ Z, ∴ √ 6 (4) For Sample Mean X: E(X) = µ and V ar(X) = σ2 . n (5) If the samples are drawn from a normal population with mean µ and variance σ2 σ 2 , then the sample mean also follows normal distribution X ∼ N µ, . n (6) Central Limit Theorem (CLT) Consider a random sample X1 , X2 , · · · , Xn of size n from any common distribution X with mean µ and variance σ 2 , √ n(X − µ) it holds that → N (0, 1) as n → ∞ (practically for n ≥ 30). σ 1.5(y − 304.5) ≤ −2.33 −2.33 y ≤ 304.5 + √ ≈ 302.10 1.5 ∴ the largest integer required is 302. * Amendment at noon on 9th August are highlighted in red. Example 4 (Unbiasedness of Estimator, 2008 Fall Final) Let X1 , X2 , · · · , Xn be a random sample of size n drawn from a normal distribution n 1X 2 N (0, σ 2 ). Define W = X . Prove that W is an unbiased estimator for σ 2 . n i=1 i ! ! n n n 1X 2 1 X 1 X 2 E(W ) = E Xi = E(Xi ) = V ar(Xi ) + [E(Xi )]2 n i=1 n i=1 n i=1 n 1 1X 2 = [σ + (0)2 ] = (nσ 2 ) = σ 2 n i=1 n ∴ W is an unbiased estimator for σ 2 . Exercise 3 (Unbiasedness of Estimator, 2012 Spring Final) Consider a random variable X ∼ Bin(n, θ), where n is known but θ is unknown. For X a function g(t) = nt(1 − t), is g an unbiased estimator of g(θ)? n 2 X X X X E g =E n 1− = E(X) − E n n n n 1 V ar(X) + (E(X))2 = nθ − n nθ(1 − θ) + (nθ)2 n = θ(1 − θ)(n − 1) = nθ − X while g(θ) = nθ(1 − θ) 6= E g n X is an BIASED estimator of g(θ). Hence g n A brief summary of course materials: (1) The χ2 , T Distributions are defined as the following: Distribution Definition Example χ2 Distribution χ2k If Z1 , Z2 , · · · , Zk ∼ N (0, 1), k X 2 then X = Zi2 ∼ χ2k . iid X2 = 2 (n − 1)SX ∼ χ2n−1 2 σX i=1 Example 5 (C.I. for Mean with Known Variance) Many cardiac patients wear implanted pacemakers to control their heartbeat. A plastic connector module mounts on the top of the pacemaker. Assuming a standard deviation of 0.0015 and a normal distribution, find a 95% confidence interval for the mean of all connector modules made by a certain manufacturing company, if a random sample of 75 modules are collected and has an average of 0.310 inch. Let X be the length of connector modules made by the company. X, sample mean of n = 75 modules, is used to estimate the population mean µX . Population standard deviation is known to be σX = 0.0015. 95% confidence interval is required so take α = 1 − 0.95 = 0.05. ∵ x = 0.31, z α2 = z0.025 = 1.960, σX = 0.0015, n = 75 σX σX α α , x + (z 2 ) √ So required interval is x − (z 2 ) √ n n 0.0015 0.0015 = 0.31 − 1.96 √ , 0.31 + 1.96 √ 75 75 = [0.3097, 0.3103] T Distribution tk If Z ∼ N (0, 1), V ∼ χ2k , Z and V are independent, Z then T = q ∼ tk . √ Tn−1 = V k n(X − µX ) ∼ tn−1 SX (2) Let α be a number with 0 < α < 1. For Z ∼ N (0, 1), T ∼ tk , X 2 ∼ χ2k , define the quantities zα , tk,α , χ2k,α to be numbers such that α = P (Z > zα ) = P (T > tk,α ) = P (X 2 > χ2k,α ). (3) 0 < α < 1 also gives the symmetric 1 − α confidence intervals by: Condition σX is known σX is unknown µX is unknown Parameter θ Estimator θb Lower Bound θbL Upper Bound θbU µX X x ∓ z α2 µX X x ∓ tn−1, α2 2 σX 2 SX (n − 1)s2X χ2n−1, α 2 σX √ n s √X n (n − 1)s2X χ2n−1,1− α 2 Example 6 (C.I. for Mean with Unknown Variance) Regular consumption of pre-sweetened cereals contributes to tooth decay, heart disease, and other degenerative disease according to studies conducted by Dr.W.H.Bowen of the National Institutes of Health and Dr. J. Yudben, Professor of Nutrition and Dietetics at the University of London. In a random sample of 20 similar single serving of Alpha-Bits, the average sugar content was 11.3 grams with a standard deviation of 2.45 grams. Assuming that the sugar contents are normally distributed, construct a 95% confidence interval for the mean sugar content for single servings of Alpha-Bits. y = 11.3, tn−1, α2 = t49, 0.025 = 2.01 sX sX So required interval is x − (tn−1, α2 ) √ , x + (tn−1, α2 ) √ n n 2.11 2.11 √ √ = 11.3 − 2.01 , 11.3 + 2.01 50 50 = [10.7002, 11.8998] Let X be the amount of sugar content for single servings. X, sample mean of n = 20 servings, is used to estimate the population mean µX . Population standard deviation σX is unknown. Sample standard deviation is sX = 2.45. 95% confidence interval is required so take α = 1 − 0.95 = 0.05. x = 11.3, tn−1, α2 = t19, 0.025 = 2.093 sX sX So required interval is x − (tn−1, α2 ) √ , x + (tn−1, α2 ) √ n n 2.45 2.45 = 11.3 − 2.093 √ , 11.3 + 2.093 √ 20 20 = [10.153, 12.447] Example 7 (C.I. for Variance with Unknown Mean) A sample of 7 boxes of a contain type of cereal with a nominal weight of 750g had the following weights: 775, 780, 781, 795, 803, 810, 823 Find a 95% confidence interval for σ 2 . n = 7, x = 795.3, s2X = 315.5714, α = 1 − 95% = 0.05 χ2n−1, α = χ26, 0.025 = 14.45, 2 χ2n−1, 1− α = χ26, 0.975 = 1.237 2 (n − 1)s2X (n − 1)s2X , 2 χ2n−1, α χn−1, 1− α ∴ Required interval is 2 Exercise 4 Based on the given information in Example 6, now 30 MORE similar single serving is randomly drawn. Among these 30 new samples which are assumed to be normally distributed, the average sugar content was 11.3 grams with a standard deviation of 1.96 grams. With all information you have, construct a 95% confidence interval for the mean sugar content for single servings of Alpha-Bits. Let Y be the amount of sugar content for single servings. Y , sample mean of m = 50 servings, is used to estimate the population mean µY . Population standard deviation σY is unknown. 19(2.45) + 29(1.96) Sample standard deviation is sY = = 2.11 (20 + 30) − 1 95% confidence interval is required so take α = 1 − 0.95 = 0.05 = ! 2 (7 − 1)(315.5714) (7 − 1)(315.5714) , 14.45 1.237 = (131.04, 1530.24) So the 95% confidence interval for σ is (11.4, 339) Exercise 5 Each year in a university, 200 students are randomly invited to do a feedback questionnaire when they finish their first year of studies. However only 5% of the forms could finally be collected. Responses are calculated and it is found that the mean scores for teaching staff is 88 with a standard deviation of 5, while the mean score for facilities is 70 with a standard deviation of 10. Let X be the scores of teaching staff while Y be the scores of facilities. Then, we have n = 200 × 5% = 10, x = 88, sX = 5 y = 70, sY = 10 Assume that both scorings are distributed normally. Find a 90% confidence interval for: (α = 1 − 90% = 0.1, tn−1, α2 = t9, 0.05 = 1.833) (a) the mean scores of teaching staff; sX sX , x + tn−1, α2 √ Required interval = x − tn−1, α2 √ n n 5 5 = 88 − 1.833 √ , 88 + 1.833 √ 10 10 = [85.1018, 90.8982] (b) the mean scores of facilities; sY sY Required interval = y − tn−1, α2 √ , y + tn−1, α2 √ n n 10 10 √ √ = 70 − 1.833 , 70 + 1.833 10 10 = [64.2035, 75.7965] (c) the mean scores of facilities, if it is given that the standard deviations in the previous years were also 10; σY σY Required interval = y − z α2 √ , y + z α2 √ n n 10 10 = 70 − z0.05 √ , 70 + z0.05 √ 10 10 √ √ = [70 − (1.645)( 10), 70 + (1.645)( 10)] (d) the mean scores of teaching staff, if it is given that the standard deviations in the previous years were 6 instead of 5; σX σX Required interval = x − z α2 √ , y + x α2 √ n n 6 6 √ √ , 88 + z0.05 = 88 − z0.05 10 10 6 6 = 88 − (1.645) √ , 88 + (1.645) √ 10 10 = [84.8788, 91.1212] (e) the variance of the scores of teaching staff; and # " (n − 1)s2X (n − 1)s2X , 2 Required interval = χ2n−1, α χn−1, 1− α 2 2 " # (10 − 1)(5)2 (10 − 1)(5)2 = , χ29, 0.05 χ29, 0.95 225 225 = , 16.919 3.325 = [13.2987, 67.6692] (f) the variance of the scores of facilities. # " (n − 1)s2Y (n − 1)s2Y Required interval = , 2 χ2n−1, α χn−1, 1− α 2 2 " # 2 (10 − 1)(10) (10 − 1)(10)2 = , χ29, 0.05 χ29, 0.95 900 900 = , 16.919 3.325 = [53.1946, 270.6767] = [64.7981, 75.2019] (Answers will be available at http://ihome.ust.hk/~makittylee)