Probability & Statistics August 7th Professor Wei Zhu Hypothesis Test on One Population Mean (continued) Scenario 1: When the population is normal, and the population variance known Scenario 2: Any population (usually not normal), but the sample size is large (n 30) Scenario 3. Normal Population, but the population variance is unknown “A Student of Statistics” – pen name of William Sealy Gosset (June 13, 1876–October 16, 1937) http://en.wikipedia.org/wiki/William_Sealy_Gosset “The Student’s t-distribution” X ~ tn 1 S/ n (Exact t-distribution with n-1 degrees of freedom ) P.Q. T Review: Theorem Sampling from the normal population Let X 1 , X 2 , , X n i .i .d . ~ N ( , 2 ) , then 1 X ~ N ( , 2 1) 2) W n ) (n 1) S 2 2 ~ n21 X and S 2 (and thus W) are independent. Thus we have: 3) T X S n ~ t n 1 Wrong Test for a 2-sided alternative hypothesis (use Z): Reject H 0 if |𝑧0 | ≥ 𝑍𝛼/2 Right Test for a 2-sided alternative hypothesis (use T): Reject H 0 if |𝑡0 | ≥ 𝑡𝑛−1,𝛼/2 (Because t distribution has heavier tails than normal distribution.) Right Test * Test Statistic T0 H 0 : 0 H a : 0 X 0 S n H0 ~t n 1 * Reject region : Reject H 0 at if the observed test statistic value |𝑡0 | ≥ 𝑡𝑛−1,𝛼/2 * p-value 2 p-value = shaded area * 2 Further Review: 1. Definition : t-distribution Z T W ~ tk k Z ~ N (0,1) W ~ k2 (chi-square distribution with k degrees of freedom) Z & W are independent. 2. Def 1 : chi-square distribution : from the definition of the gamma distribution: gamma(α = k/2, β = 2) 1 MGF: 𝑀(𝑡) = (1−2𝑡) 𝑘/2 mean & varaince: 𝐸(𝑊) = 𝑘; 𝑉𝑎𝑟(𝑊) = 2𝑘 Def 2 : chi-square distribution : Let Z 1 , Z 2 , , Z k i .i .d . ~ N (0,1) , k then W Z i2 ~ k2 i 1 3. Now we porve part (3) of sampling from the normla population: Proof) Z X ~ N (0,1) ; n Let W (n 1) S 2 2 ~ n21 3 T X n (n 1) S 2 2 (n 1) X ~ tn 1 S n The derivation of the one-sample t-test based on the pivotal quantity method follows the same procedure as the derivation of the one-sample Z-test. The following is a summary of the decision rules for the T-test, using either the rejection region approach or the p-value approach: H 0 : 0 H 0 : 0 H 0 : 0 H a : 0 H a : 0 H a : 0 Observed value of test statistic T0 X 0 S n H0 ~t n 1 Rejection region : we reject H 0 in favor of H a at the significance level if T0 t n 1, T0 t n 1, | T0 | t n 1, 2 p-value P(| T0 || t 0 || H 0 ) p-value P(T0 t 0 | H 0 ) p-value P(T0 t 0 | H 0 ) (1) the area under t n 1 pdf (2) the area under t n 1 pdf (3) twice the area under to the right of t 0 to the left of t 0 t n 1 to the right of | t 0 | 2 P(T0 || t 0 || H 0 ) The above figure depicts the equivalence of the Rejection Region method and the PH 0 : 0 value method for decision making for the one-sided test: H a : 0 (http://images.frompo.com/) in that: 4 p-value <α ⇔ t 0 t n-1, α (test statistic falls inside the rejection region); & p-value >α ⇔ t 0 t n-1, α (test statistic falls outside the rejection region). Here t 0 is referred to as the observed test statistic value, and t n-1, α is referred to the critical value). Such equivalence between the two methods holds for all three pairs of the hypotheses. Example. Jerry is planning to purchase a sports good store. He calculated that in order to cover basic expenses, the average daily sales must be at least $525. Scenario A. He checked the daily sales of 36 randomly selected business days, and found the average daily sales to be $565 with a standard deviation of $150. Scenario B. Now suppose he is only allowed to sample 9 days. And the 9 days sales are $510, 537, 548, 592, 503, 490, 601, 499, 640. For A and B, please determine whether Jerry can conclude the daily sales to be at least $525 at the significance level of 0.05 . What is the p-value for each scenario? Solution A large sample (⑤) n=36, x 565, s 150 H 0 : 525 versus H a : 525 *** First perform the Shapiro-Wilk test to check for normality. If normal, use the exact T-test. If not normal, use the large sample Z-test. In the following, we assume the population is found not normal – but since the sample size is large, we use the approximate Z test based on CLT & Slustky’s Theorem. Test statistic z0 x 0 565 525 1.6 s n 150 36 At the significance level 0.05 , we will reject H 0 if z0 Z 0.05 1.645 We can not reject H 0 5 p-value p-value = 0.0548 *** Alternatively, if you can show the population is normal using the Shapiro-Wilk test, it is better that you perform the exact t-test. Solution B small sample Shapiro-Wilk test If the population is normal, t-test is suitable. (*If the population is not normal, and the sample size is small, we shall use the nonparametric test such as Wilcoxon Signed Rank test.) In the following, we assume the population is found normal. x 546.67, s 53.09, n 9 H 0 : 525 versus H a : 525 Test statistic t0 x 0 546.67 525 1.22 s n 53.09 9 At the significance level 0.05 , we will reject H 0 if t 0 t 8, 0.05 1.86 We can not reject H 0 p-value 6 What’s the p-value when t0 1.22 ? (Hint: use the 1-pt(1.22,8) command in R to obtain the p-value) (Review: the –qt(0.05,8) command in R to obtain the critical value t8,0.05 1.86 .) Learning R: Please study the following links on how to perform the Shapiro-Wilk normality test, and how to perform the one-sample t-test in R. Enjoy! https://stat.ethz.ch/R-manual/R-patched/library/stats/html/shapiro.test.html http://www.r-bloggers.com/one-sample-students-t-test/ http://ww2.coastal.edu/kingw/statistics/R-tutorials/singlesample.html http://www.stat.columbia.edu/~martin/W2024/R2.pdf https://www.youtube.com/watch?v=kvmSAXhX9Hs Topics in next lecture ② Power of the test ② Likelihood ratio test (for one population mean) 7