Statistical Model

advertisement
Tools of Inference
3 major tools of inference
- point estimate (+ s.e.)
- but maybe point isn’t enough => CI
- hypothesis testing
Don’t just want idea, we suspect it has a particular value and want to test that.
Ex: average life span is 75 years, but scientists figure medicine and quality of life have improved
so perhaps it is 85 years now. So they want to test this hypothesis, by taking a random sample
and seeing whether the data supports that. We are testing a claim about the population; the claim
should be made before collecting the data.
New objective – we don’t just want to estimate parameters, but want to decide which of 2
contradictory hypotheses about the parameter is true.
Statistical hypothesis testing is a claim about the value of a parameter of the underlying
probability distribution.
Examples
1. Claim about mean running time of a program.
e.g.  = 0.75
2. Claim p < 0.1 where p is the proportion of defective items in a manufacturing facility.
3. Claim 2 programs have same mean running time, i.e. 1 = 2
4. Claim data come from a particular distribution
In hypothesis testing there are two contradictory hypotheses being considered
Null hypothesis = assumed true, the current stance, without changing anything
Alternative hypothesis = what we are trying to prove
e.g.  = 0.75 vs   0.75
p < 0.1 vs p  0.1
etc…
The objective is to decide which hypothesis is correct (supported by the data)
The logic is analogous to proof by contradiction.
Assume a hypothesis is true (the null hypothesis), H0. It usually represents the current situation
that we want to disprove, “no change”.
We look for evidence against the null hypothesis in favour of the alternative hypothesis, Ha or
H1.
The conclusion is to either accept or reject H0.
So we set up the hypothesis so that Ha is what we want to show.
The evidence is given by the data and we will have evidence against H0 if the probability of
observing the data is small under the assumption that H0 is true (i.e. P(data | H0) is small).
Example – Manufacturing
Suppose we want to show that a new manufacturing process improves rate of defectives, which
was 0.1:
H0: p = 0.1
Ha: p < 0.1
where p is the rate of defectives using the new process
Can take random sample of products produced with the new process of size n = 200.
Let X be the number of defective items in the sample.
If H0 is true, then X ~ Bin(200,0.1) and E[X] = 20 (expected # of defectives)
If Ha is true, we would expect fewer than 20 defective items in the sample.
Let x be the observed value of X; we’d reject H0 if x is substantially less than 20.
Say reject H0 if x  15 (we will look at more formal definition of cut-off later)
Components to this test procedure:
1. Comparing hypotheses, H0 and Ha
2. A Test Statistic – a function of the observed data given (manufacturing example has test
statistic x)
3. A Rejection Region – the set of all values of the test statistic for which H0 will be rejected
(in example, x  15)
4. Decision/Conclusion
How to choose a rejection region
To determine the rejection region, we need to consider errors we might make.
1. Type I Error: reject H0 when it is true
In example, it is possible that we observe 13 defective items even if p = 0.1, in which case H0
would be erroneously rejected.
2. Type II Error: accept H0 when it is false
In example, it is possible that we observe 18 defective items even if p < 0.1, in which case H0
would be erroneously accepted.
We want to minimize the probability of these two errors. Type I error is more crucial than II,
since it is a lot of time/energy/money to change the current state, where as if we wrongly don’t
change anything, we are no worse off than before.
Notation:
 = P(type I error)
 = P(type II error)
 depends on the particular value of the parameters in Ha
1- is the power of the test = p(H0 rejected when it is false, the ideal situation)
A powerful test has a large probability of rejecting H0 when we should
090305 R
H0 is true
H0 is false (Ha is true)
Reject H0
Type I Error
ideal situation
FTR H0
FTR = Fail to Reject
no change
Type II Error
Problem: if we adjust the rejection region so that  is small,  is going to increase.
Solution: the Neyman-Pearson lemma
fix  and choose a test (rejection region) with tolerable 
Back to Manufacturing Example
H0: p = 0.1
Ha: p < 0.1 (“one-sided alternative” or “left-sided alternative”)
n = 200
Test statistic: X ~ Bin(200,p) = # of defective items in the sample
x – observed value of the test statistic
Rejection region: x  15 (what we chose arbitrarily before)
Calculate  and 
 = p(type I error) = p(reject H0, when H0 is true)
= p(x  15 when X~Bin(200,0.1))
= pbinom(15,200,0.1) [in R]
= 0.143
So when H0 is true, 14.3% of samples of size 200 will lead to a type I error.
We can decrease  by decreasing the size of the rejection region.
Suppose we wish to have  = 0.05, the rejection region can be found by solving the equation
p(x  k when X~Bin(200,0.1) = 0.05
k = qbinom(0.05,200,0.1) = 13
Need to fix a value for p which is less than 0.1
0.05 = p(type II error when p=0.05) = p(not reject H0 when p = 0.05)
= p(x > 15 when X~Bin(200,0.05))
= 0.044
Decreasing size of rejection region increases .
Using a smaller value for p will make type II errors less likely.
In general, when p is chosen further away from the null hypothesis’ value, there is more
power because it is less likely that the two hypotheses are mixed up by the data (though
it may also decrease the probability of getting the ideal situation).
When p = 0.05, power = 1 -  = 0.956 (lots of power)
Neyman-Pearson Lemma
Fix  at largest value that can be tolerated and choose rejection region accordingly.
Typical values of : 0.01, 0.05, 0.1
This is called the “significance level” of the test
Tests for 
H0:  = 0
Ha: three possibilities:
 < 0
one-sided alternative
 > 0
one-sided alternative
  0
two-sided alternative
090310 T
Significance Tests
Tests for the Mean
Assume X1, …, Xn ~ N(,2) with known 
If H0 is true, then X~N(0,2/n)
X 
Z  0
n

 = 0.05, 0 = 10
Ha:  < 0
If X > 10, don’t reject Ha
If X << 10 we do want to reject
Observed score  -1.695 (Z) => reject
Ha:  > 0, observed scored  1.695
|Z|  1.96 for two-sided test.
Example – Laptop Batteries
Stated operating temp is 40C
A sample of 9 batteries gives X = 41.08
Assume  = 1.5
Does this contradict the claim?
X 
41.08  40
Z obs   0 
 2.116
1.5
n
9
If  = 0.05, then we reject H0 since 2.116 > 1.96 = Z0.025 (i.e. it is less than 5% likely)
If  = 0.01, then we don’t reject H0 since 2.116 < 2.576 = Z0.005 (i.e. it is less than 1% likely)

P-value
Assuming H0 is true, the p-value is the probability of observing values of the test statistic “as or
more extreme”.
Laptop example:
Zobs = 2.116
If H0 is true, this is from N(0,1) distribution.
p-value = P(Z  2.116) + P(Z  -2.116) = 2*0.0154 = 0.03
Smaller p-value => more evidence against H0
p-value > 0.1
=> no evidence
0.05 < p-value < 0.1
=> weak evidence
0.01 < p-value < 0.05
=> moderate evidence
p-value < 0.01
=> strong evidence
Power
Power = 1 -  = 1 – P(type II error) = P(reject H0 when it is false)
First consider upper tailed test for the mean on iid, normally distributed,  known, significance
level 
H0:  = 0 against Ha:  > 0
reject if Zobs  Z
X  0
 Z

n
X  0  Z



n
 is the true mean of the population
Power() = P(H0 is rejected when mean is )
= P( X  0  Z n ) where X ~ N(,/sqrt(n)) by Ha
= P(   Z

n
 0  Z

n
)

0  



Z

Z

)
=
1



 




n
n

where  is a standard normal cdf.
= P( Z 
0  
How to increase power?


-increase , reduces the rejection region
-increase n
-decrease 
-evaluate at values of ’ further away
Exercise:

0  

Z

Show that the power is Power(’) = 

 
 when Ha: 0 < 


n



0  
0  



Z


Z

and power(’) = 1 - 
+


 

 
 when Ha: 0  
 2

 2

n
n

Laptop Example
H0:  = 40


Ha:   40
 = 1.5, n = 9, don’t need to know X-bar
What is the power when  = 0.01, ’ = 41
Z0.005 = 2.576


40  41
40  41





2.576


2.576

Power(41) = 1 - 
1.5
1.5
 + 





9
9
= 0.3
What if  not known?


Assume data are realizations from X1, …, Xn, iid N(,2). Want to test H0:   0.
2
1 n
Estimate  by sample standard deviation s 
X  Xi 


n 1 i1

Test statistic t obs 
X  0
s
n
is an observation from t-distribution with n-1 degrees of freedom
By CLT, works well even if X1, …, Xn are not normally distributed. (works well => p-value
is approximately accurate)
090312 R
Review
H0:  = 0
Assume data are normally distributed (if not, by CLT X-bar is still approximately normal, so test
still works well).
X  0
Test statistic is tobs 
, where if H0 is correct tobs~t(n-1)
s
n
Calculate p-value from t distribution with n-1 degrees of freedom.
Example
 US increased the speed limit on interstate highways.
In 1996, the
Data: percent change in traffic fatalities in the year 1996 in 32 states on these highways.
Let  be the mean of the percent change in fatalities
Test H0:  = 0 vs Ha:   0
Observed X-bar = 13.75, n = 32, s = 21.33
X  0 13.75  0
Test statistic is tobs 
 21.33  3.65 ~ t31
s
n
32
p-value = P(|tobs|  3.65)
from tables: p-value < 2*0.0005
< 0.001

from R:
> 1 – pt(3.65,31) + pt(-3.65,31) = 0.0009
> t.test(fatalities.change.increase)
-3.65
3.65
p-value
Conclusion: Strong evidence that the mean of the change in traffic fatalities was not 0.
Test for  in any population when n is large
We wish to test, H0:  = 0
The data X1, …, Xn are a random sample from some distribution with mean .
When n is large we use the Z statistic,
X 
Z obs   0
n
since for large n, the t distribution approaches the standard normal.

Example: Test for p in Bernoulli distribution
Suppose X1, …, Xn, iid Bernouli(p), n is large
E(Xi) =  = p
Var(Xi) = p(1-p)
Want to test hypothesis about p, eg, H0: p = p0
Natural estimate of p is p
1 n
 Xi  X
n i1
Further, Var(p’) = p(1-p)/n
Under H0, p’ ~ N(p0, p0(1 – p0)/n) by CLT
Test Statistic:
p
 p0
~ N(0,1)
Z obs 
p 0 1 p 0 
n

Calculate p-value using the N(0,1) distribution
Example: H0: p = p0 vs p > p0
p-value = P(Z > Zobs)
Example
Do one on your own.
p-value
Download