Module 6 Simple comparative experiments: two sample inference Difference of two Proportions - test Fifty seven percent of 248 boys sampled aged 15-17 have online profiles while 70% of 256 sampled girls aged 15-17 have online profiles. If the boys and girls were selected at random, is there a statistical difference between the two groups? Use = 0.05. Framework: Is the difference between the proportions significantly different from 0? 1) Hypotheses: H 0 : p1 − p2 = 0 or p1 = p2 H A : p1 − p2 0 or p1 p2 2) Model and Assumptions: • Random and independent data • 10% of population • Independent Samples (yes/no) • Enough successes (check both groups…57% of 248 and 70% of 256) Two sample proportions (z) test Sampling distribution of 𝑝̂1 − 𝑝̂2 . Use z (Normal) with a centre at p1- p2 =0 and standard error of: se( pˆ1 − pˆ 2 ) = pˆ qˆ pˆ1 − pˆ 2 The test statistic is: z= ( pˆ 1 − pˆ 2 ) − 0 pˆ qˆ + pˆ qˆ n n 1 2 1 | ©2020 Karen Lawrence, McMaster University pˆ qˆ n n 1 0 + 2 pooled pˆ = success1 + success2 n1 + n2 pooled pˆ = 141 + 179 = 0.635 248 + 256 H0:p1=p2 This is the “status quo”. Under this assumption (condition) the two samples come from populations with similar (or the same) proportions of success. Therefore we can pool their estimates and get a better estimate. ENGTECH 2ES3/3ES3 3) Mechanics: Distribution Plot Normal, Mean=0, StDev=1 z= ( −0.13) − 0 = −3.05 (0.635)(0.365) (0.635)(0.365) + 248 256 0.4 Two-tailed test 0.3 Density pˆ 1 = 0.57 boys pˆ 2 = 0.70 girls pˆ 1 − pˆ 2 = −0.13 0.2 0.1 0.001144 0.0 -3.05 0.001144 0 X P-value = 2 x 0.001144 = 0.002 4) Decision and Conclusion P-value < …..0.002 < 0.05….therefore we reject H0. H 0 : p1 − p2 = 0 or p1 = p2 H A : p1 − p2 0 or p1 p2 Conclusion…We have sufficient evidence at 0.05 significance that there is a difference in the proportion of girls who have an online profile and boys who have and online profile. Okay, we have shown there is a difference. Now, estimate what the difference could be…calculate a C.I. 2 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 3.05 Difference of two Proportions – Confidence Intervals Remember Estimate +/- margin of error Estimate +/- (model critical value*) x (standard error of the estimate) ( pˆ1 − pˆ 2 ) z pˆ1qˆ1 n 1 − 0.13 (1.96) + pˆ 2 qˆ2 n not pooled p̂ 2 (0.57)(0.43) (0.70)(0.30) + 248 256 - 0.13 0.083 (-0.213 to − 0.047) For a C.I., there are no hypotheses and no parameters, just sample statistics. Therefore, we cannot pool sample estimates. Note that the interval calculated here does not contain 0. How does this relate to the test? It means “0” cannot be one of the possible values of the difference in proportions, so H0 would be rejected. This interval, calculated from our sample, gives us 95% of the possible values of pˆ1 − pˆ 2 We are 95% confident the interval of -21.3% and -4.7% captures the true difference in proportions between boys and girls who have online profiles. 3 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 Difference of two Means – Hypothesis Tests (3 tests) An engineer is interested in determining if the addition of a polymer latex emulsion during the mixing process impacts bonding strength of cement (=0.05). Ten samples of the original and 10 samples of the modified formulations were prepared (two treatments/levels of the factor formulation). The response, strength, is tabled below. Summary Statistics (estimates of parameters) from the sample data: Formulation 1 Modified Formulation 2 Unmodified y1 = 16.76 y2 = 17.04 S12 = 0.100 S22 = 0.061 S1 = 0.316 S2 = 0.248 n1 = 10 n2 = 10 4 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 How the Two-Sample t-Test Works: • Is there a difference in the sample means? • Is this difference significantly different from 0? (H ) • We need to test based on the sampling distribution of 𝑦̅1 − 𝑦̅2 : y1 − y2 = 16.76 − 17.04 = −0.28 H 0 : 1 − 2 = 0 or 1 = 2 H A : 1 − 2 0 or 1 2 0 1) Hypotheses H 0 : 1 − 2 = 0 or 1 = 2 H A : 1 − 2 0 or 1 2 2) Model and assumptions • Random and independent data • 10% of population • Nearly normal population {small samples!} • Independent Samples?* (yes/no) • Equal population variances? { 𝜎12 − 𝜎22 }** (yes/no) = 2 2 1 2 *If answer is no, Test 3. **If answer is no, Test 2. 1. Two-sample pooled t-test – independent samples and equal variances For the sampling distribution of y1 − y2 when 1 and 2 are unknown use t-distribution with a centre at −2 =0 and standard error of: se( y1 − y2 ) = 2 2 1 2 1 2 s +s n n = Because we are assuming equal population variances { 2 2 1 2 we “pool” the two sample standard deviations (s1 and s2) together to get one estimate of s called sp. (n1 − 1) s1 + (n2 − 1) s2 2 sp = n1 + n2 − 2 5 | ©2020 Karen Lawrence, McMaster University 2 }, The pooled standard deviation is a weighted average of the two sample standard deviations. Calculation is made with variances and then square rooted. Keep this idea in your mind…it occurs repeatedly!! ENGTECH 2ES3/3ES3 Replace s1 and s2 with sp and the standard error then becomes: se( y1 − y2 ) = s p test statistic is The testThe statistics is therefore: t0 = 1 + 1 n n 1 y1 − y2 1 1 Sp + n1 n2 2 The degrees of freedom (df) are determined by the denominator of sp. df = n1+n2 - 2 3) Mechanics…using the summary statistics from the sample: (n1 − 1) S12 + (n2 − 1) S 22 9(0.100) + 9(0.061) S = = = 0.081 n1 + n2 − 2 10 + 10 − 2 Distribution Plot T, df=18 2 p 0.4 S p = 0.284 Density y1 − y2 16.76 − 17.04 t0 = = = −2.20 1 1 1 1 Sp + 0.284 + n1 n2 10 10 0.3 0.2 0.1 0.02055 0.0 The two sample means are a little over two standard deviations apart Is this a "large" difference? 6 | ©2020 Karen Lawrence, McMaster University 0.02055 -2.2 0 2.2 t P-value = 0.042 {two-sided test.} ENGTECH 2ES3/3ES3 4) Decision and conclusion p-value < 0.042 < 0.05 Reject H0. H 0 : 1 − 2 = 0 or 1 = 2 H A : 1 − 2 0 or 1 2 Conclusion: We have sufficient evidence at 0.05 significance that there is a difference in the average bonding strength of the two mixes. 2. The Two-Sample t-Test - not pooled For independent samples but do not assume 1 2 = 2 2 : 1) Hypotheses: H 0 : 1 − 2 = 0 or 1 = 2 =0.05 H A : 1 − 2 0 or 1 2 2) Model and Assumptions: • Random and independent data • 10% of population • Nearly normal population {small samples!} • Independent Samples?* (yes/no) • 1 = 2 2 Equal population variances? { 2 2 2 1 2 }** (yes/no) Two-sample t-test For the sampling distribution of y1 − y2 when 1 and 2 are unknown use t-distribution with a centre at -2 =0 and standard error of: se( y1 − y2 ) = 2 2 1 2 1 2 s +s n n 7 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 Because we are not assuming equal population variances, cannot pool the sample standard deviations. The test statistic uses the standard error as stated above. The test statistics is therefore: t= ( yˆ 1 − yˆ 2 ) − 0 2 2 1 2 1 2 s +s n n with df = yuck!! {we’ll use Minitab} 3) Mechanics t= ( −0.278) − 0 0.3162 0.2482 + 10 10 = −2.19 4) Decision and Conclusion p-value < 0.043 < 0.05, therefore we reject H0. H 0 : 1 − 2 = 0 or 1 = 2 H A : 1 − 2 0 or 1 2 Conclusion…We have sufficient evidence at 0.05 significance that there is a difference in the average bonding strength of the two mixes. 8 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 Difference of two Means (independent samples)– Confidence Intervals General form of a CI: estimate +/- margin of error 3. Paired t-test Two different machines were used to measure the tensile strength of synthetic fiber. Do the two machines yield the same average strength values? Eight (8) specimens of fiber are randomly selected and one measurement is made using each machine on each specimen. These data are paired to prevent the difference in specimens from affecting the difference in machines. When samples are paired (dependent), we look at the differences in strength from the machines from each individual (specimen). The difference (d) is analysed as a one sample t-test. d = −1.38 2 s d = 7.13 ( sd = 2.67) 1) Hypotheses: H 0 : d = 0 H A : d 0 =0.05 9 | ©2020 Karen Lawrence, McMaster University Framework: On average, are the individual differences significantly different from 0? ENGTECH 2ES3/3ES3 2) Model and Assumptions: • Random and independent data • 10% of population • Nearly normal population {small samples!} • Independent Samples* (yes/no)….samples are dependent • Equal population variances Paired t-test ̅ follows a t-distribution with a centre at d =0 and standard error of: Sampling distribution of 𝒅 sed = sd n The test statistics is therefore: t= d with df = n-1 sd n 3) Mechanics t= − 1.38 = −1.46 2.67 8 4) Decision and conclusion p-value >…0.187 > 0.05, therefore we fail to reject H0. H 0 : d = 0 H A : d 0 Conclusion…at 5% there is insufficient evidence to indicate that the two machines differ in their mean tensile strength measurements. Means (paired samples)– Confidence Interval 10 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 A few text questions: Chapter 19 {Comparing Means} 11 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 Chapter 20 {Paired Samples} 12 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 13 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 Chapter 21 {Two Proportions} 14 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 15 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 16 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3 17 | ©2020 Karen Lawrence, McMaster University ENGTECH 2ES3/3ES3