docx - Tsinghua Math Camp 2015

advertisement
Probability & Statistics
August 7th
Professor Wei Zhu
Hypothesis Test on One Population Mean (continued)
Scenario 1: When the population is normal, and the population variance
known
Scenario 2: Any population (usually not normal), but the sample size is large
(n  30)
Scenario 3. Normal Population, but the population variance is unknown
“A Student of Statistics” – pen name of William Sealy Gosset (June 13, 1876–October
16, 1937)
http://en.wikipedia.org/wiki/William_Sealy_Gosset
“The Student’s t-distribution”
X 
~ tn 1
S/ n
(Exact t-distribution with n-1 degrees of freedom )
P.Q. T 
Review: Theorem Sampling from the normal population
Let X 1 , X 2 ,  , X n
i .i .d .
~ N ( ,
2
) , then
1
X ~ N ( ,
2
1)
2) W 
n
)
(n  1) S 2
2
~  n21
X and S 2 (and thus W) are independent. Thus we have:
3)
T
X 
S
n
~ t n 1
Wrong Test for a 2-sided alternative hypothesis (use Z): Reject H 0 if |𝑧0 | ≥
𝑍𝛼/2
Right Test for a 2-sided alternative hypothesis (use T): Reject H 0 if |𝑡0 | ≥
𝑡𝑛−1,𝛼/2
(Because t distribution has heavier tails than normal distribution.)
Right Test
* Test Statistic
T0 
H 0 :   0
H a :   0
X  0
S
n
H0
~t
n 1
* Reject region : Reject H 0 at  if the observed test statistic value |𝑡0 | ≥ 𝑡𝑛−1,𝛼/2
* p-value
2
p-value = shaded area * 2
Further Review:
1. Definition : t-distribution
Z
T
W
~ tk
k
Z ~ N (0,1)
W ~  k2 (chi-square distribution with k degrees of freedom)
Z & W are independent.
2. Def 1 : chi-square distribution : from the definition of the gamma distribution:
gamma(α = k/2, β = 2)
1
MGF: 𝑀(𝑡) = (1−2𝑡)
𝑘/2
mean & varaince: 𝐸(𝑊) = 𝑘; 𝑉𝑎𝑟(𝑊) = 2𝑘
Def 2 : chi-square distribution : Let Z 1 , Z 2 ,  , Z k
i .i .d .
~ N (0,1) ,
k
then W   Z i2 ~  k2
i 1
3. Now we porve part (3) of sampling from the normla population:
Proof) Z 
X 

~ N (0,1) ;
n
Let W 
(n  1) S 2

2
~  n21
3
T
X 
 n
(n  1) S 2
2

(n  1)
X 
~ tn 1
S n
The derivation of the one-sample t-test based on the pivotal quantity method follows
the same procedure as the derivation of the one-sample Z-test. The following is a
summary of the decision rules for the T-test, using either the rejection region
approach or the p-value approach:
H 0 :   0
H 0 :   0
H 0 :   0
H a :   0
H a :   0
H a :   0
Observed value of test statistic T0 
X  0
S
n
H0
~t
n 1
Rejection region : we reject H 0 in favor of H a at the significance level  if
T0  t n 1,
T0  t n 1,
| T0 | t n 1, 2
p-value  P(| T0 || t 0 || H 0 )
p-value  P(T0  t 0 | H 0 )
p-value  P(T0  t 0 | H 0 )
(1) the area under t n 1 pdf
(2) the area under t n 1 pdf
(3) twice the area under
to the right of t 0
to the left of t 0
t n 1 to the right of | t 0 |
 2  P(T0 || t 0 || H 0 )
The above figure depicts the equivalence of the Rejection Region method and the PH 0 :   0
value method for decision making for the one-sided test:
H a :   0
(http://images.frompo.com/) in that:
4
p-value <α ⇔ t 0  t n-1, α (test statistic falls inside the rejection region); &
p-value >α ⇔ t 0  t n-1, α (test statistic falls outside the rejection region).
Here t 0 is referred to as the observed test statistic value, and t n-1, α is referred to the
critical value).
Such equivalence between the two methods holds for all three pairs of the hypotheses.
Example. Jerry is planning to purchase a sports good store. He calculated that in order
to cover basic expenses, the average daily sales must be at least $525.
Scenario A. He checked the daily sales of 36 randomly selected business days, and found
the average daily sales to be $565 with a standard deviation of $150.
Scenario B. Now suppose he is only allowed to sample 9 days. And the 9 days sales are
$510, 537, 548, 592, 503, 490, 601, 499, 640.
For A and B, please determine whether Jerry can conclude the daily sales to be at least
$525 at the significance level of   0.05 . What is the p-value for each scenario?
Solution A large sample (⑤) n=36, x  565, s  150
H 0 :   525 versus H a :   525
*** First perform the Shapiro-Wilk test to check for normality. If normal, use the exact
T-test. If not normal, use the large sample Z-test. In the following, we assume the
population is found not normal – but since the sample size is large, we use the
approximate Z test based on CLT & Slustky’s Theorem.
Test statistic z0 
x  0 565  525

 1.6
s n
150 36
At the significance level   0.05 , we will reject H 0 if z0  Z 0.05  1.645
 We can not reject H 0
5
p-value
p-value = 0.0548
*** Alternatively, if you can show the population is normal using the Shapiro-Wilk test,
it is better that you perform the exact t-test.
Solution B small sample  Shapiro-Wilk test
 If the population is normal, t-test is suitable.
(*If the population is not normal, and the sample size is small, we shall use the nonparametric test such as Wilcoxon Signed Rank test.)
In the following, we assume the population is found normal.
x  546.67, s  53.09, n  9
H 0 :   525 versus H a :   525
Test statistic t0 
x   0 546.67  525

 1.22
s n
53.09 9
At the significance level   0.05 , we will reject H 0 if t 0  t 8, 0.05  1.86
 We can not reject H 0
p-value
6
What’s the p-value when t0  1.22 ?
(Hint: use the 1-pt(1.22,8) command in R to obtain the p-value)
(Review: the –qt(0.05,8) command in R to obtain the critical value t8,0.05  1.86 .)
Learning R:
Please study the following links on how to perform the Shapiro-Wilk normality test, and
how to perform the one-sample t-test in R. Enjoy!
https://stat.ethz.ch/R-manual/R-patched/library/stats/html/shapiro.test.html
http://www.r-bloggers.com/one-sample-students-t-test/
http://ww2.coastal.edu/kingw/statistics/R-tutorials/singlesample.html
http://www.stat.columbia.edu/~martin/W2024/R2.pdf
https://www.youtube.com/watch?v=kvmSAXhX9Hs
Topics in next lecture
② Power of the test
② Likelihood ratio test (for one population mean)
7
Download