HYPOTHESIS TESTING Parameters are concerned The statistical test is based: Parameters stemming from sample data, theoric statististical distribution – based on the hypothesis to be tested The Null hypothesis H 0 is an assumption concerning one or more parameter characterising a phenomenon: H0 : 0 Alternative Hypothesis H 1 is an assumption opposed to the null hypothesis. K W (con K W=) not true. is a subset of the sampling space sustaining H0, is a subset of the sampling space sustaining H0 (sampling space gives the range of all possible values of a r.v. generated by sampling) Given a sample and a sampling the question is : How likely is this sampling parameter if H0 is true: If the probability is to low we are prone to reject H0 Considering the existence of an alternative hypothesis is possible to make errors rejecting or not rejecting H0. The probability of these errors is small: exactly the same given by the rejecting area under H0 (, significance level). In synthesis: choice of H0 choice of the test statistic T=t(X1, X2, …Xn) – given H0 its distribution probability is Known looking at T values selecting a region of values that it is possible to consider quite close to the parameter of interest under H0 with high probability comparing the observed T in the sample (tobs= t(x1, x2, …,xn) ) and taking a decision about the region where it falls defining a t threshold and, as consequence, a region for rejecting the Hypotesis: a) one-tail test: H0: =0 H1: >0 (the T values implying a refusal are supposed to be quite far from 0 and higher than 0 - depending on the distribution variability : P( T t | H0) = reject region : [t, + ) b) one-tail test: H0: =0 H1: <0 (the T values implying a refusal are supposed to be quite far from 0 and lower than 0 - depending on the distribution variability : P( T - t | H0) = reject region: (- , -t] H0: =0 H1: 0 c) two tails test: (the T values implying a refusal are supposed to be quite far from 0 and higher or lower than 0 - depending on the distribution variability : P( T t/2 | H0) = P( T t/2 | H0) = /2 reject region: (- , -t/2] e [t/2, + ) In synthesis: Let T in the sample (toss) known: a) One tail test : if toss t b) One tail test : if toss - t c) two tails test: if toss - t/2 or toss t/2 reject H0 reject H0 ( |toss | t/2) rifiuto H0 (p-value) r significance level pobs= P(|T| |tobs| ; | H0) one tail test : if pobs reject H0 I type error When H 0 is true and we reject it. II type error When H 0 is not true and we do not reject Necessary to make balance between two kind of errors TEST ON THE MEAN WHEN THE VARIANCE IS KNOWN H 0 : 0 X test Statistic: We need to standardise (easy to calculate probability) taking into account H 0 : X X 0 = / n / n X 0 Z= ~ N (0,1) / n zobs is: zobs = x 0 / n pobs= P(Z zobs | H0) We reject when: 1) alternative hyipothesis H1 : 0 (one tail) given , looking at z* - on the table of N (0,1) – corresponding to: P(Z z*| H 0 ) = If zobs z* If zobs < z* reject H 0 no reject H 0 or if pobs reject H 0 Se pobs > Note if x 0 z* no reject H 0 n reject H 0 ( x far from 0 - that is H 0 - , reject H 0 ) 2) alternative Hypothesis H1 : 0 (one tail) given , looking at z* - on the table of N (0,1) – corresponding to: P(Z z*| H 0 ) = Owing to the symmetry of N (0,1): P(Z z*| H 0 ) = P(Z - z*| H 0 ) = If zobs - z* If zobs > - z* reject H 0 no reject H 0 3) H1 : 0 alternative Hypothesis (two tails) Given - to be fairly distributed on the two tails – Looking at the tables N (0,1) for the z*/2 that: P(Z z*/2 | H 0 ) = /2 corresponding to P(Z - z*/2 | H 0 ) = /2 If zobs z*/2 or if zobs - z*/2 reject H 0 If - z*/2 < zobs < z*/2 No reject H 0 The pobs,: pobs= P(|Z| |zoss |; | H0) = 2 P(Z |zoss |; | H0) Then If pobs If pobs > NO reject H 0 note zoBs z* or IF zobs - z*/2 correspond to the statement: * If x 0 z reject H 0 n or if x 0 z* reject H 0 " n NO reject H 0 (formal statement equivalent to the informal one: “when x is far from 0 - that is from H 0 - reject H 0 ) Testing The Difference Between Two Means – Variances Are Known 2 Let X 1 , X 2 ,..., X n be a sample selected from a population N x , x . 2 Let Y1 ,Y2 ,...,Ym be a sample selected from a population N y , y . We want to test: H 0 : x y vs H 1 : x y Knowing X Y x y / n / m 2 x 2 y ~ N 0,1 [1] Then: If If x y 0 z / 2 x2 / n y2 / m x y z / 2 x2 / n y2 / m or Then H0 Is rejected 2 2 2 2 if z / 2 x / n y / m x y z / 2 x / n y / m Then H0 Is not rejected - VARIANCES ARE TESTING DIFFERENCE BETWEEN TWO MEANS UNKNOWN If x2 and y2 are not known but t is possible to assume common variance can be estimated using: X n s2 i 1 X Yi Y n 2 i 2 i 1 nm2 No rejecting region is built considering X Y s 1/ n 1/ m If x y t ,nm1s 1/ n 1/ m x y t ,nm1 s 1/ n 1/ m ~ t n m2 , [2] or 2 if 2 then H0 is rejected If t ,n m1 s 1 / n 1 / m x y t ,nm1 s 1 / n 1 / m 2 Then H0 Is No rejected 2 x2 = y2 , then a Esercizio 6 From a census survey we know tha 70% of Households makes shopping in big stores. After 3 years we take a surveywith a sample of 600 HHs and we find that 406 make shopping in big stores. Do we have enough evidence to say that HHs has the same behaviour of the year of the census ? (Choose an high Confidence) Solution - The sample is large so a normal distribution can be assumed for the sampling proportion - The null Hypothesis is H 0 : p p0 0,7 , alternative Hypothesis is H 1 : p p 0 0,7 . - From the sampling data we have p̂ 0,68 . - Under the null hypothesi the sampling proportion has a normal distribution with mean p 0 0,7 e variance p0 (1 p0 ) / n 0,00035 , - We can refer to the test Z , in the sample - The rejecting (unlikely) area is z c 0,68 0,7 z z 0 ,005" 2,576 e 0,00035 1,07 . z z 0 ,995 2,576 , - -1,07 is in the likely region >-2,576 then the null hypotesis is not rejected.