hypothesis testing

advertisement
HYPOTHESIS TESTING
Parameters are concerned
The statistical test is based:
 Parameters stemming from sample data,
 theoric statististical distribution – based on the hypothesis to be
tested
The Null hypothesis  H 0  is an assumption concerning one or more
parameter characterising a phenomenon:
H0 :   0
Alternative Hypothesis H 1  is an assumption opposed to the null
hypothesis.
K
W (con K  W=)
not true.
is a subset of the sampling space  sustaining H0,
is a subset of the sampling space  sustaining H0
(sampling space gives the range of all possible values of a r.v. generated
by sampling)
Given a sample and a sampling the question is :
How likely is this sampling parameter if H0 is true:
If the probability is to low we are prone to reject H0
Considering the existence of an alternative hypothesis is possible to make
errors rejecting or not rejecting H0.
The probability of these errors is small: exactly the same given by the
rejecting area under H0 (, significance level).
In synthesis:
 choice of H0
 choice of the test statistic T=t(X1, X2, …Xn) – given H0 its distribution
probability is Known
 looking at T values selecting a region of values that it is possible to
consider quite close to the parameter of interest under H0 with high
probability
 comparing the observed T in the sample (tobs= t(x1, x2, …,xn) ) and
taking a decision about the region where it falls
 defining a t threshold and, as consequence, a region for rejecting the
Hypotesis:
a) one-tail test:
H0: =0
H1: >0
(the T values implying a refusal are supposed to be quite far
from 0 and higher than 0 - depending on the distribution
variability :
P( T  t | H0) = 
reject region : [t, +  )
b) one-tail test:
H0: =0
H1: <0
(the T values implying a refusal are supposed to be quite far
from 0 and lower than 0 - depending on the distribution
variability :
P( T  - t | H0) = 
reject region: (-  , -t]
H0: =0
H1: 0
c) two tails test:
(the T values implying a refusal are supposed to be quite far
from 0 and higher or lower than 0 - depending on the
distribution variability :
P( T
 t/2
| H0) = P( T
 t/2
| H0) = /2
reject region: (-  , -t/2] e [t/2, +  )
In synthesis:
Let T in the sample (toss) known:
a) One tail test :
if toss  t
b) One tail test : if toss  - t
c) two tails test: if toss  - t/2 or toss
 t/2
 reject H0
 reject H0
( |toss |  t/2)  rifiuto H0
(p-value) r significance level
pobs= P(|T|  |tobs| ; | H0)
one tail test :
if pobs



reject H0
I type error
When H 0 is true and we reject it.
II type error
When H 0 is not true and we do not reject
Necessary to make balance between two kind of errors
TEST ON THE MEAN WHEN THE VARIANCE IS KNOWN
H 0 :   0
X
test Statistic:
We need to standardise (easy to calculate probability) taking into
account H 0 :
X 
X  0
=
/ n / n
X  0
Z=
~ N (0,1)
/ n
zobs is:
zobs =
x  0
/ n
pobs= P(Z  zobs | H0)
We reject when:
1)
alternative hyipothesis
H1 :    0
(one tail)
given , looking at z* - on the table of N (0,1) – corresponding to:
P(Z  z*| H 0 ) = 
If zobs

z*
If zobs < z*
reject H 0
no reject H 0
or
if pobs


reject H 0
Se pobs > 
Note
if
x   0  z*
no reject H 0

n
reject H 0
( x far from  0 - that is H 0 - , reject H 0 )
2)
alternative Hypothesis
H1 :    0
(one tail)
given , looking at z* - on the table of N (0,1) – corresponding to:
P(Z

z*| H 0 ) = 
Owing to the symmetry of N (0,1):
P(Z  z*| H 0 ) = P(Z  - z*| H 0 ) =
If zobs

- z*
If zobs > - z*
 reject H 0
 no reject H 0
3)
H1 :    0
alternative Hypothesis
(two tails)
Given  - to be fairly distributed on the two tails –
Looking at the tables N (0,1) for the z*/2 that:
P(Z  z*/2 | H 0 ) = /2 corresponding to P(Z  - z*/2 | H 0 ) = /2
If zobs

z*/2
or if zobs


- z*/2
reject H 0

If - z*/2 < zobs < z*/2
No reject H 0
The pobs,:
pobs= P(|Z|  |zoss |; | H0) = 2 P(Z

|zoss |; | H0)
Then
If pobs



If pobs > 
 NO reject H 0
 note
zoBs  z* or IF zobs

- z*/2
correspond to the statement:
*
If x   0  z
reject H 0

n
or if x   0  z*
 reject H 0 "

n
 NO reject H 0
(formal statement equivalent to the informal one: “when x is far
from  0 - that is from H 0 - reject H 0 )
Testing The Difference Between Two Means – Variances Are Known


2
Let X 1 , X 2 ,..., X n be a sample selected from a population N  x , x .
2
Let Y1 ,Y2 ,...,Ym be a sample selected from a population N  y , y  .
We want to test:
H 0 :  x   y vs H 1 :  x   y
Knowing
X  Y   
x
 y 
 / n  / m
2
x
2
y
~ N 0,1
[1]
Then:
If
If
x  y   0  z / 2  x2 / n   y2 / m
x  y    z / 2  x2 / n   y2 / m
or
Then H0 Is rejected
2
2
2
2
if  z / 2  x / n   y / m  x  y    z / 2  x / n   y / m
Then H0 Is not rejected
- VARIANCES ARE
TESTING DIFFERENCE BETWEEN TWO MEANS
UNKNOWN
If  x2 and  y2 are not known but t is possible to assume
common variance can be estimated using:
 X
n
s2 
i 1
 X    Yi  Y 
n
2
i
2
i 1
nm2
No rejecting region is built considering
X  Y 
s 1/ n 1/ m
If
x  y   t ,nm1s
1/ n 1/ m
x  y   t ,nm1 s
1/ n 1/ m
~ t n m2 ,
[2]
or
2
if
2
then H0 is rejected
If  t  ,n m1 s 1 / n  1 / m  x  y   t  ,nm1 s 1 / n  1 / m
2
Then H0 Is No rejected
2
 x2 =  y2
, then a
Esercizio 6
From a census survey we know tha 70% of Households makes shopping in
big stores.
After 3 years we take a surveywith a sample of 600 HHs and we find that
406 make shopping in big stores. Do we have enough evidence to say that
HHs has the same behaviour of the year of the census ? (Choose an high
Confidence)
Solution
- The sample is large so a normal distribution can be assumed for the
sampling proportion
- The null Hypothesis is H 0 : p  p0  0,7 , alternative Hypothesis is
H 1 : p  p 0  0,7 .
- From the sampling data we have p̂  0,68 .
- Under the null hypothesi the sampling proportion has a normal
distribution with mean p 0  0,7 e variance p0 (1  p0 ) / n  0,00035 ,
- We can refer to the test Z , in the sample
- The rejecting (unlikely) area is
z c  0,68  0,7
z  z 0 ,005"  2,576 e
0,00035  1,07 .
z  z 0 ,995  2,576 ,
- -1,07 is in the likely region >-2,576 then the null hypotesis is not
rejected.
Download