MATHEMATICAL STATISTICS 214 CHAPTER 10 Hypothesis Testing Introduction Aim + Infer about unknown parameter from an estimate obtained from a random sample from a population. Approach + Posit about the “behaviour”/“value” of a parameter. + Investigate the validity of the hypothesis using evidence available from a random sample from the population of interest. + Understand how to identify and generate appropriate function of the sample variable that will ensure that a valid decision is reached. + Quantify the possibility of making the incorrect decision. 1 / 47 Elements of Statistical Tests Question + What is the population mean (µ) height of the Khoisan? Hypothesis + On average, the Khoisan are 50cm tall. That is, µ = 50. Data + Collected a random sample of 20 Khoisan. Evidence + Estimate of average height from the sample is 25cm. That is, µ̂ = x̄ = 25. 2 / 47 Elements of Statistical Tests Statistical Analyses + Case I Is the difference between the hypothesised value (µ = 50) and the sample estimate (x̄ = 25) negligible and/or only due to random sampling error. In other words, the data does not contain enough evidence for the claim to be rejected. f(X) + Case II The difference between the hypothesis and the statistic reflects true discrepancy between the values. Therefore, the claim is rejected based on evidence from the sample. 25 µ = 50 X 3 / 47 Elements of Statistical Tests Statistical tests of hypothesis comprise the following essential elements. (a.) Null Hypothesis (H0 ): I Assumption: Associated with an “equal” (=) notation. (b.) Alternative Hypothesis (H1 ): I Assumption: Associated with one of “not-equal” (6=), “less-than” (<) or “greater-than” (>) sign. (c.) Significance Level (α): corresponds to the P(Type I Error). I zα : threshold implied by the significance level. Also referred to as “critical value”. (d.) Test Statistic: magnitude of the evidence present in the data. (e.) Rejection Region (RR): area under the sampling distribution beyond the critical value. (f.) Conclusion: statement about inference from the statistical test. 4 / 47 Elements of Statistical Tests α 2 α 2 − zα 2 zα 2 5 / 47 Elements of Statistical Tests + Ideally, the null hypothesis (H0 ) and the alternative hypothesis (H1 ) should be stated in a way such that they are mutually exclusive and collectively exhaustive. I Note: H0 can only be expressed in terms of one of “=”, “≤” or “≥” sign. + The null hypothesis is stated in terms of the “status quo”. I Note: NEVER accept H0 . + The research aim is usually to show support for H1 . + Rejection Region (RR): comprises the set of values of the test statistic for which H0 will be rejected. 6 / 47 Elements of Statistical Tests For any fixed rejection region, two types of errors can be made when a conclusion is reached. + Type I Error: I Occurs with a probability equal to α. I I It is the significance level of the test. P(reject H0 |true H0 ). + Type II Error: I Occurs with a probability equal to β. I Power of the test is calculated as (1 − β). I P(not reject H0 |false H0 ). 7 / 47 Elements of Statistical Tests Actual Situation Decision H0 is True H0 is False Do not reject H0 Correct decision Confidence = 1 − α Type II Error P(Type II Error)= β Reject H0 Type I Error P(Type I Error)= α Correct decision Power = 1 − β + Inverse relationship exist between α and β. + Increased sample size causes both α and β to decrease. + Value of β depends on the true value of the parameter. I When the difference between true and hypothesised value of the parameter increases, β decreases and vice versa. 8 / 47 Elements of Statistical Tests α β β µtrue α Benchmark α µtrue β µtrue µH0 µH0 µtrue − µH0 decreases β increases. β α µtrue µH0 µH0 β increases α decreases. σ increases β increases. 2 9 / 47 Example A manufacturer of automatic washers offers a model in one of two colors – A or B. A random sample of 20 customers that each purchased one type of the washers were observed. Assume that it was decided a priori that a washer will be considered inferior if not more than 5 of the sampled customers bought it. (a.) Calculate α = P(Type I Error) with respect to the null hypothesis that both washers perform similarly against an alternative that Washer A is inferior. (b.) If the true population proportion of customers that prefer Washer Washer A is 0.40, calculate β = P(Type II Error). 10 / 47 Solution π = population proportion of customers that prefer Washer A. X = number of customers that purchase Washer A. H0 : π = 0.50 H1 : π < 0.50 (b.) (a.) X ∼ Binomial(n = 20, π = 0.50) α = = ≈ β = P (X > 5 | n = 20, π = 0.40) 5 X 20 0.50i × (1 − 0.50)20−i i i=1 = 1− ≈ 1 − 0.126. 0.021. = 0.874. P (X ≤ 5 | n = 20, π = 0.50) 5 X 20 0.40j × (1 − 0.40)20−j j j=1 11 / 47 One Sample Tests 12 / 47 Large Sample Tests: µ X̄ ∼ N(µ, σx̄2 ) + Null Hypothesis H0 : a ? µ = µ0 + Alternative Hypothesis H1 : a µ < µ0 lower-tail µ 6= µ0 two-tail µ > µ0 upper-tail + Significance Level: a P(Type I Error) = α + Test Statistic: a x̄ − µ0 x̄ − µ0 √ Z= = σx̄ σ/ n Z ∼ N(0, 1) + Rejection Region (RR): a {Z < −zα } lower-tail {|Z| > z α2 } two-tail {Z > zα } upper-tail Population variance (σ 2 ) known or sample size large enough (n ≥ 30) such that it may be estimated accurately. 13 / 47 Large Sample Tests α 2 α 2 − zα zα 2 2 Two-Tail α α zα − zα Lower-Tail Upper-Tail 14 / 47 Large Sample Tests: π p ∼ N(π, σp2 ) + Null Hypothesis H0 : a ? π = π0 + Alternative Hypothesis H1 : a π < π0 lower-tail π 6= π0 two-tail π > π0 upper-tail + Significance Level: a P(Type I Error) = α + Test Statistic: a p − π0 p − π0 Z= =p σp π0 (1 − π0 )/n Z ∼ N(0, 1) + Rejection Region (RR): a {Z < −zα } lower-tail {|Z| > z α2 } two-tail {Z > zα } upper-tail n×π > 5 n × (1 − π) > 5 15 / 47 Large Sample Tests: Example The outage voltage of an electric circuit is specified by the manufacturer as 130. A sample of 25 independent readings on the voltage for this circuit produced an average of 128.6. Suppose that it is known that the standard deviation of output voltage is 4.0. If output voltage is assumed to be normally distributed, at 5% significance level, (a.) verify the claim made by the manufacturer. (b.) A competing company claims that the output voltage from the manufacturer’s batteries is greater than what is advertised and it damages appliances. Is the competitor’s claim valid? Solution: X = output voltage 2 42 X̄ ∼ N µ, σn = N µ, 25 Significance Level = P(Type I Error) = α = 5% = 0.05 16 / 47 Large Sample Tests: Solution Cont’d (a.) zstat (b.) H0 : µ = 130 H1 : µ 6= 130 = x̄ − µ √ σ/ n = −1.75. = 128.6 − 130 √ 4/ 25 zstat H0 : µ ≥ 130 H1 : µ < 130 = x̄ − µ √ σ/ n = −1.75. = 128.6 − 130 √ 4/ 25 zcrit = z α = z0.025 = 1.96 zcrit = −zα = −z0.05 = −1.645 Conclusion: |zstat | = 1.75 < 1.96 = zcrit . Therefore, fail to reject null hypothesis. There is insufficient evidence in the observed data to be able to claim that output voltage is different from 130V . Conclusion: zstat = −1.75 < −1.645 = zcrit . Therefore, reject null hypothesis. The output voltage is statistically significantly not greater than the advertised 130V . 2 zcrit ≈ − 1.75 zcrit ≈ − 1.75 -1.96 0 1.96 -1.645 0 17 / 47 Type II Error: Not reject false H0 a β = P(Type II Error) = P(Not reject H0 | False H0 ) Example: The outage voltage of an electric circuit is specified by the manufacturer as 130. An officer from the consumer protection unit intends to verify the claim of the manufacturer against that of some dissatisfied consumers that the output voltage is less than what was advertised. A sample of 25 independent readings on the voltage for this circuit was collected for the assessment. Suppose that it is public knowledge that output voltage is normally distributed with a standard deviation of 4.0. What is the value of β if the true mean output voltage is 128. Use 5% significance level. 18 / 47 Type II Error: Solution X = output voltage 42 X̄ ∼ N µ, 25 α 0.05 H0 : µ ≥ 130 H1 : µ < 130 Therefore, x̄crit − 130 √ 4/ 25 = −1.645 Ñ x̄crit = 130 − 1.645 × 0.80 ≈ 128.684 = 0.05 = P (Z < zcrit | µ = 130) P (Z < z0.05 | µ = 130) = = P (Z < −1.645 | µ = 130) x̄ − 130 P Z < crit√ 4/ 25 = = = So that, β TRUE P (Z > zcrit | µ = 128) 128.684 − 128 √ P Z> 4/ 25 = P (Z > 0.855) ≈ 0.1963 NULL α β 128 xcrit 130 19 / 47 Sample Size Estimation: Derivation Imagine that you are interested in verifying the following hypotheses. H 0 : µ ≤ µ0 , H1 : µ > µ0 . vs. Assume that a similar assessment has been executed by a different researcher who had inferred that µ > µT . What sample size is required if you plan to ensure that the P(Type I Error) = α and that the P(Type II Error) = β. Assume further that the variable of interest X ∼ N(µ, σ 2 ) and that µ0 < µT . NULL PROPOSED β α µ0 xcrit µT 20 / 47 Sample Size Estimation: Derivation α = = Ñ zα = Ñ x̄crit = H0 : µ ≤ µ0 H1 : µ > µ0 β P (Z > zα | µ = µ0 ) x̄ − µ P Z > crit √ 0 σ/ n x̄crit − µ0 √ σ/ n σ µ0 + zα √ n = = (1) Ñ −zβ = Ñ x̄crit = P Z < −zβ | µ = µT x̄ − µ P Z < crit √ T σ/ n x̄crit − µT √ σ/ n σ µT − zβ √ n (2) Equate Equation (1) and Equation (2): σ σ √ √ x̄crit = µ0 + zα = µT − z β n n Solve for n: & a n= (zα + zβ )2 × σ 2 (µT − µ0 )2 ' 21 / 47 Sample Size Estimation: Example How “clean” is green energy? Cobalt – a natural resource abundant in war-torn Democratic Republic of Congo (DRC) – is a major component of the batteries used to power renewable energy. An European battery manufacturing company that has a cobalt mine in the DRC was recently accused of over-working her mine workers. The company claims that on average, her employees work below 45H00 weekly. A prominent activist disagrees. If the activist posits that the average work duration per week at the mine is 45H32, what sample size is required to verify the company’s claim against the activist’s such that probabilities of types I and II errors are restricted to about 5% and 9%, respectively. Assume that a standard deviation of 2H12 was obtained from a pilot study. Solution (follows directly from the preceding derivation): H0 : H1 : µ ≤ 45 µ > 45 n = α = 5% β = 9% µT = 45 + 32 60 = 45.53̇ σ̂ = 2 + 12 60 = 2.2 × σ̂ 2 2 µT − µ0 zα + zβ 2 2 = = n × 2.22 z0.05 + z0.09 2 45.53̇ − 45 & ' (1.645 + 1.341)2 × 2.22 0.53̇2 ≈ d153.629e ≈ 154 a Note: (1.) Always round-up to the nearest integer. (2.) Calculations are the same when s2 is used. 22 / 47 Hypothesis Testing vs. Confidence Interval θ̂ ∼ N(θ, σθ̂2 ) + Null Hypothesis H0 : a ? θ = θ0 + Alternative Hypothesis H1 : a θ 6= θ0 + Confidence Interval: a θ̂ − θ −z α2 < < z α2 σθ̂ θ̂ − z α2 × σθ̂ < θ < θ̂ + z α2 × σθ̂ two-tail + Significance Level: a P(Type I Error) = α aDo not reject H in favour of 0 H1 if the value of θ0 lies inside the (1 − α) × 100% confidence interval for θ. α 2 α 2 − zα 2 zα 2 23 / 47 Hypothesis Testing vs. Confidence Interval Example: Bleue-Blanche-Rouge (BBR), a cobalt mining company in the DRC was recently accused of paying below the minimum 500 FRANCS hourly wage. The accusation was based on the salary of 12 of her employees that yielded an average of 493.15 FRANCS. Verify the authenticity of the accusation with an appropriate confidence interval and an 8% significance level. Assume that wages at the DRC have been shown to be N (µ, 150.63) distributed. X = wage at BBR X̄ ∼ N µ, 150.63 12 H0 : µ ≤ 500 H1 : µ > 500 α = 0.08 ! x̄ − µ P p < zα σ 2 /n 493.15 − µ < z0.08 P √ 150.63/12 α = 0.08 488.17 = 1−α µx ∴ 92% C.I. : µ > 493.15 − 1.4051 × = 1 − 0.08 ! r 150.63 12 (488.1719 < µ < ∞) Conclusion: 500 falls within the interval. Therefore, fail to reject null hypothesis. There is insufficient evidence in the observed data to be able to claim that, on average, BBR does not pay below minimum wage. 24 / 47 p-value + Represents a different way to report results of hypothesis tests. + It is the smallest value of α for which the null hypothesis can be rejected. + Avails an opportunity to evaluate the extent to which the evidence in the data disagree with the null hypothesis (H0 ). Example: Bleue-Blanche-Rouge (BBR), a cobalt mining company in the DRC was recently accused of paying below the minimum 500 FRANCS hourly wage. The accusation was based on the salary of 12 of her employees that yielded an average of 493.15 FRANCS. Estimate the p-value for the appropriate hypothesis test to verify the authenticity of the accusation. Adopt an 8% significance level for your inference and assume that wages at the DRC have been shown to be N (µ, 150.63) distributed. 25 / 47 p-value: Example X = wage at BBR X̄ ∼ N µ, 150.63 12 H0 : µ ≤ 500 H1 : µ > 500 p-value α = 0.08 p-value ≈ 0.9734 α = 0.08 -1.9334 z0.08 zstat = x̄ − µ p σ 2 /n = 493.15 − 500 √ ≈ −1.9334 150.63/12 = P (Z > zstat ) ≈ P (Z > −1.9334) ≈ 0.9734 Interpretation: If BBR paid below 500 FRANCS on average, the probability of observing 12 of her employees with a mean wage of 493.15 FRANCS or more is about 97.34%. Conclusion: p-value ≈ 0.9734 > 0.08 = α. Therefore, fail to reject null hypothesis. The claim that, on average, BBR pays below the minimum wage may not be invalidated based on the salary data collected from the sampled employees. 26 / 47 Small Sample Test for µ X̄ ∼ N(µ, σx̄2 ) + Null Hypothesis H0 : a ? µ = µ0 + Alternative Hypothesis H1 : a µ < µ0 lower-tail µ 6= µ0 two-tail µ > µ0 upper-tail + Test Statistic: a x̄ − µ0 x̄ − µ0 √ T= = σ̂x̄ s/ n T ∼ t(n−1) . + Rejection Region (RR): a {T < −t(n−1),α } lower-tail {|T| > t(n−1), α } two-tail {T > t(n−1),α } upper-tail 2 + Significance Level: a P(Type I Error) = α σ 2 is unknown and the sample size is not large enough (n < 100) for an accurate estimate to be obtained. 27 / 47 Small Sample Test for µ: Example Alcohol abuse is prevalent among South African teens. In a recent broadcast, the ruling party claimed that some expensive and heavily criticised measures that were implemented were fruitful. It was said that the average age (in years) of alcohol takers is no longer less than 18. You set out to validate this statement and decided a priori on a 5% significance level. A random sample of 25 alcohol users yielded a mean and standard deviation of 16.3 yr and 4.17 yr respectively. X = age (in years) of South African alcohol user 2 X̄ ∼ N µ, σ25 tstat H0 : µ ≥ 18 H1 : µ < 18 Critical value approach: = −T(24),0.05 α = 0.05 ≈ −1.7109 Deg. of Fr. = 25 − 1 = 24 > −2.0384 ≈ tstat = = x̄ − µ √ s/ n 16.3 − 18 √ ≈ −2.0384 4.17/ 25 Tcrit p-value approach: P T(24) < tstat ≈ ≈ P T(24) < −2.0384 0.0263 < 0.05 = α 28 / 47 Small Sample Test for µ: Example Cont’d Confidence approach: x̄ − µ √ > −T(24),α P s/ n 16.3 − µ √ > −T(24),0.05 P 4.17/ 25 = 1−α = 1 − 0.05 tcrit ≈ − 1.7109 p-value ≈ 0.0263 Ñ 95% C.I. : 4.17 µ < 16.3 + 1.7109 × √ 25 α = 0.05 -2.0384 18 ∈/ (−∞ < µ < 14.8731) 1 − α = 0.95 Decision: Reject null hypothesis. Conclusion: The average age of alcohol users in South Africa is statistically significantly not greater than 18 years. xup ≈ 14.8731 µx 18 29 / 47 Hypothesis Testing w.r.t σ 2 X ∼ N(µ, σ 2 ) + Null Hypothesis H0 : a ? σ 2 = σ02 + Alternative Hypothesis H1 : a σ 2 < σ02 lower-tail σ 2 6= σ02 two-tail σ 2 > σ02 upper-tail + Test Statistic: a (n − 1)s2 C= σ02 2 C ∼ χ(n−1) . + Rejection Region (RR): a 2 {C < χ(n−1),α } lower 2 {C < χ(n−1), α } or 2 two 2 {C > χ(n−1),1− α} 2 + Significance Level: a P(Type I Error) = α 2 {C > χ(n−1),1−α } 2 χ(n−1),α = Ccrit Ñ 2 χ(n−1),1−α = Ccrit Ñ upper 2 P χ(n−1) > Ccrit = α. 2 P χ(n−1) > Ccrit = 1 − α. 30 / 47 Hypothesis Testing w.r.t σ 2 : Rejection Regions α χ(n-1), α α 0 χ(n-1), 1−α 31 / 47 Hypothesis Testing w.r.t σ 2 : Example A fund manager is considering migrating some of her traditionally stocked portfolio to a crypto-based portfolio. She was told by a crypto enthusiast that the daily change in value of individual cryptocurrency has a normal distribution with variance (or volatility) not more than 100% and an average of 0%. So, she decided to switch if she can validate the claim. She opts to test the claim with the sum of daily volatilities. Subsequently, she collected data of the daily volatility of 12 representative and independent cryptocurrencies. See the table below for the crypto volatility data collected by the manager. Conduct an appropriate hypothesis test. Report the corresponding p-value (as accurately as possible) and advise based on a 5% statistical significance level whether the manager should migrate her portfolio. 111.70% 280.80% 314.10% 45.00% 246.40% 81.10% 186.80% 70.30% 177.40% 43.00% 116.70% 226.30% 32 / 47 Hypothesis Testing w.r.t σ 2 : Example Claim: I Let C = Daily change in the value of cryptocurrency i. i I Further, let s2 = Observed daily variability of cryptocurrency i. i Ci ∼ N(0, 1) " Y= 12 X i=1 # Ci ∼ N 12 X ! 12 X 0, i=1 1 = N(0, 12) i=1 Data: sy2 = 111.70% + 314.10% + . . . + 43.00% + 226.30% = 1899.60% = 18.996 Analysis: H0 : σy2 ≤ 12 H1 : σy2 > 12 α = 0.05 Xstat = (n − 1)sy2 σy2 = 11 × 18.996 12 ≈ 17.413 p-value: 2 χ(11),0.05 ≈ 19.6751 2 χ(11),0.10 ≈ 17.2750 2 ∴ 0.05 < P χ(11) > Xstat < 0.10 Conclusion: Fail to reject the null hypothesis. There is not enough information in the available data to suggest that, on average, the variability of a crypto asset exceeds 100%. So, the manager may migrate her portfolio. 33 / 47 χ2crit ≈ 17.275 α = 0.10 χ2crit ≈ 19.6751 α = 0.05 0 17.4 34 / 47 Two-Sample Tests 35 / 47 Large Sample Tests: µ1 − µ2 x1 = {x1,1 , . . . , xn1 ,1 } ∼ N(µ1 , σ12 ) and x2 = {x1,2 , . . . , xn2 ,2 } ∼ N(µ2 , σ22 ) + Null Hypothesis H0 : a ? µ1 − µ2 = D0 + Test Statistic: a x̄ − x̄ − D Z = 1q 2 2 2 0 ∼ N(0, 1) σ1 σ2 n + n 1 + Alternative Hypothesis H1 : a µ1 − µ2 < D0 lower-tail µ1 − µ2 6= D0 two-tail µ1 − µ2 > D0 upper-tail 2 + Rejection Region (RR): a {Z < −zα } lower-tail {|Z| > z α } two-tail 2 {Z > zα } upper-tail + Conditions: + Significance Level: a P(Type I Error) = α I x1 and x2 are independent. I σ12 and σ22 are known or sample sizes are large enough (i.e., n12 , n22 > 100) such that they could be accurately estimated. 36 / 47 Tests of Proportions: π1 − π2 p1 = (y1 /n1 ) ∼ N(π1 , π1 [1−π1 ]/n1 ) and p2 = (y2 /n2 ) ∼ N(π2 , π2 [1−π2 ]/n2 ) + Null Hypothesis H0 : a ? π1 − π2 = D0 + Test Statistic: a p − p2 − D0 Z= r 1 p̃(1 − p̃) n1 + 1 + Alternative Hypothesis H1 : a π1 − π2 < D0 lower-tail π1 − π2 6= D0 two-tail π1 − π2 > D0 upper-tail 1 n2 Z ∼ N(0, 1) + Rejection Region (RR): a {Z < −zα } lower-tail {|Z| > z α } two-tail 2 {Z > zα } + Significance Level: a P(Type I Error) = α + Note: p̃ = upper-tail n1 p1 + n2 p2 n1 + n2 37 / 47 Small Sample Tests: µ1 − µ2 x1 = {x1,1 , . . . , xn1 ,1 } ∼ N(µ1 , σ12 ) and x2 = {x1,2 , . . . , xn2 ,2 } ∼ N(µ2 , σ22 ) + Null Hypothesis H0 : a ? µ1 − µ2 = D0 + Test Statistic: a x̄ − x̄ − D0 T = r1 2 ∼ t(n1 +n2 −2) sp2 n1 + n1 1 + Alternative Hypothesis H1 : a µ1 − µ2 < D0 lower-tail µ1 − µ2 6= D0 two-tail µ1 − µ2 > D0 upper-tail 2 + Rejection Region (RR): a {T < −t(n +n −2),α } lower-tail 1 {|T| > t(n {T > t(n 2 α 1 +n2 −2), 2 1 +n2 −2),α } } two-tail upper-tail + Note: + Significance Level: a P(Type I Error) = α σ12 and σ22 are unknown and may not be accurately estimated due to small sample sizes. It is assumed that σ12 = σ22 . sp2 = (n1 − 1)s12 + (n2 − 1)s22 n1 + n2 − 2 38 / 47 Small Sample Tests: µ1 − µ2 x1 = {x1,1 , . . . , xn1 ,1 } ∼ N(µ1 , σ12 ) and x2 = {x1,2 , . . . , xn2 ,2 } ∼ N(µ2 , σ22 ) + Null Hypothesis H0 : a ? µ1 − µ2 = D0 + Test Statistic: a x̄ − x̄ − D0 T = r1 2 ∼ t(ν) s12 s22 n + n 1 + Alternative Hypothesis H1 : a µ1 − µ2 < D0 lower-tail µ1 − µ2 6= D0 two-tail µ1 − µ2 > D0 upper-tail 2 + Rejection Region (RR): a {T < −t(ν),α } lower-tail {|T| > t(ν), α } two-tail 2 {T > t(ν),α } upper-tail + Note: + Significance Level: a P(Type I Error) = α σ12 and σ22 are unknown and may not be accurately estimated due to small sample sizes. It is assumed that σ12 6= σ22 . (s2 /n + s2 /n )2 2 2 ν = 21 12 − 2 2 2 (s1 /n1 ) (s2 /n2 ) n +1 + n +1 1 2 39 / 47 Paired Sample Tests: µd x1 = {x1,1 , . . . , xn1 ,1 } and x2 = {x1,2 , . . . , xn2 ,2 } d = {x1,1 − x1,2 , . . . , xn,1 − xn,2 } = {d1 , . . . , dn } ∼ N(µd , σd2 ) + Null Hypothesis H0 : a ? µd = µ0 + Test Statistic: a x̄ − µ T = d √ 0 ∼ t(n−1) sd / n + Alternative Hypothesis H1 : a µd < µ0 lower-tail + Rejection Region (RR): a {T < −t(n−1),α } lower-tail µd 6= µ0 two-tail µd > µ0 upper-tail {|T| > t(n−1), α } two-tail 2 + Significance Level: a P(Type I Error) = α {T > t(n−1),α } upper-tail + Note: σd2 is unknown and may not be accurately estimated due to small sample size. 40 / 47 Test of Variances: σ12 ÷ σ22 x1 = {x1,1 , . . . , xn1 ,1 } ∼ N(µ1 , σ12 ) and x2 = {x1,2 , . . . , xn2 ,2 } ∼ N(µ2 , σ22 ) + Null Hypothesis H0 : a ? σ12 ÷ σ22 = 1 + Alternative Hypothesis H1 : a σ12 ÷ σ22 6= 1 two-tail σ12 ÷ σ22 > 1 upper-tail + Significance Level: a P(Type I Error) = α + Test Statistic: a (n1 − 1)s12 (n2 − 1)s22 F = (n1 − 1)σ12 (n2 − 1)σ22 = s12 s22 ∼ F(n1 −1,n2 −1) + Rejection Region (RR): a {F > F(n −1,n −1),α } lower-tail 1 {F > F(n 2 α} 1 −1,n2 −1), 2 two-tail + Note: I x and x are independent. 1 2 I σ 21 ≥ σ 22 . 41 / 47 Rejection Region for F Test α 0 F(n1-1,n2-1), α 42 / 47 Two-Sample Test: Example You intend to compare the quality of products between two electric circuit producing companies. You collected independent sample products from the two industries. The corresponding summary statistics with respect to the output voltages from the samples are presented below. Based on a 5% probability of Type I error, which of the manufacturers has better (higher output) products? Assume that output voltage is normally distributed. A 29 20.3 3.7 n x̄ s B 25 22.5 5.4 Solution: Xa , Xb = voltage from companies A & B products respectively Xa ∼ N µa , σa2 ; and Xb ∼ N µb , σb2 X̄a ∼ N µa , σa2 /29 ; and X̄b ∼ N µb , σb2 /25 α = 0.05 Preamble: verify if variances may be pooled Critical value: H0 : σb2 /σa2 = 1 Fstat ≈ 2.13 < 2.17 ≈ F(24,28),0.025 H1 : σb2 /σa2 6= 1 Fstat = sb2 /sa2 = 5.4 /3.7 2 2 = 2.13 Implication: Fail to reject null hypothesis. No enough evidence to suggest that output voltage variability differs between outputs from both companies. 43 / 47 α = 0.025 2 F0.025 ≈ 2.17 α = 0.025 2 Fstat ≈ 2.13 44 / 47 Two-Sample Test: Example s sp = H0 : µa − µb = 0 H1 : µa − µb 6= 0 (na − 1) sa2 + (nb − 1) sb2 = na + nb − 2 tstat = r (28 × 3.72 ) + (24 × 5.42 ) ≈ 4.5640 29 + 25 − 2 (x̄a − x̄b ) − (µa − µb ) (20.3 − 22.5) − 0 q q = ≈ −1.7662 1 1 1 1 sp na + n 4.5640 29 + 25 b p-value: T(52),0.025 ≈ 1.960 T(52),0.050 ≈ 1.645 0.05 = 2 × 0.025 < P T(52) > |tstat | < 2 × 0.050 = 0.10 Conclusion: Fail to reject null hypothesis. There is insufficient evidence in the data to claim that there is statistically significant difference between the average voltage output generated by the circuits from the two industries. 45 / 47 t(52), 0.025 ≈ − 1.96 α = 0.050 2 t(52), 0.05 ≈ − 1.645 α = 0.025 2 tstat ≈ − 1.7662 46 / 47 End of Chapter 10. 47 / 47