STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2–3pm today (Old Chem 114) Final Exam: 9am–12pm HERE Bring a calculator–no cell phones, laptops, tablets, etc. Allowed one 8 12 × 11 inch cheat sheet with notes on both sides. You must create this yourself. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 2 / 19 Topics for today different conditions and hypotheses in each test (and a list of tests) — when to use Z, T, F stats, 2 vs 1 sample tests, degrees of freedom pooled variance, pooled proportion (when to use) confidence intervals – is it always two-sided?, how to interpret CI of difference b/w 2 means chi-square – what we are testing, when to use, how to approach hypotheses ANOVA (filling in the chart) Regression – interpreting linear lines and writing them, correlation, residuals Type I, II error Bayesian probability – won’t be on there (but cond’l probability might) MLR – won’t be on there Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 3 / 19 Review What you need to know about HTs: Format for answering a hypothesis test question: 1 State the null and alternative hypotheses 2 Check conditions 3 Calculate the test statistic (T, Z, etc.) and standard error (if needed) 4 Calculate the p-value (double if two-sided hypothesis) 5 Reject or fail to reject the null hypothesis 6 Interpret your decision in context of the problem Know how to interpret a p-value in context Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 4 / 19 Review What you need to know about CIs: Format for answering a confidence interval question: 1 Check conditions 2 ?) Find and state the critical value (z ? , tdf 3 Calculate the standard error 4 Calculate the confidence interval 5 Interpret your confidence interval in context of the problem Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 5 / 19 Review Different types of tests General conditions for HTs/CIs with means or proportions: Independence–random samples (¡10% of population sampled) nearly normal data–either we know the population is normal or we have to use CLT Conditions for CLT: sample looks nearly normal, no large skew, no major outliers. If not met, should use randomization sample size ≥ 30. Else, you should use a t-distribution. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 6 / 19 Review One sample test for mean Conditions for one sample test: Independence–random samples (¡10% of population sampled) sample looks nearly normal, no large skew, no major outliers. If not met, should use randomization sample size ≥ 30. Else, you should use a t-distribution. Note: if paired data, use differences as a one sample test Conditions for two sample test: Independence–random samples (¡10% of population sampled) sample looks nearly normal, no large skew, no major outliers. If not met, should use randomization both sample sizes ≥ 30. Else, you should use a t-distribution (if either or both are smaller than 30). Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 7 / 19 Review Recap - inference for one proportion Population parameter: p, point estimate: p̂ Conditions: independence - random sample at least 10 successes and failures - if not → randomization q Standard error: SE = p (1−p ) n for CI: use p̂ for HT: use p0 Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 8 / 19 Review Recap - comparing two proportions Population parameter: (p1 − p2 ), point estimate: (p̂1 − p̂2 ) Conditions: independence within groups - random sample and 10% condition met for both groups independence between groups at least 10 successes and failures in each group - if not → randomization q SE(p̂1 −p̂2 ) = p1 (1−p1 ) n1 + p2 (1−p2 ) n2 for CI: use p̂1 and p̂2 for HT: 1 +#suc2 when H0 : p1 = p2 : use p̂pool = # suc n1 +n2 when H0 : p1 − p2 = (some value other than 0): use p̂1 and p̂2 - this is pretty rare Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 9 / 19 Review Reference - standard error calculations one sample mean SE = proportion SE = q √s n q two samples SE = p (1−p ) n q SE = s12 n1 + s22 n2 p1 (1−p1 ) n1 + p2 (1−p2 ) n2 When working with means, it’s very rare that σ is known, so we usually use s. When working with proportions, if doing a hypothesis test, p comes from the null hypothesis if constructing a confidence interval, use p̂ instead Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 10 / 19 Review When to use pooled standard error For a two sample HT/CI for a difference of means, we can pool our information about the variance if s1 and s2 can be assumed to be roughly equal 2 If we can do this, then we replace s12 and s22 with spool , where 2 spool = s12 (n1 − 1) + s22 (n2 − 1) n1 + n2 − 2 The degrees of freedom for the t-distribution are now df = n1 + n2 − 2 Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 11 / 19 Review When to use pooled proportion When doing a HT for a two sample test for a difference of proportions, the null hypothesis is that p1 = p2 or p1 − p2 = 0 Since we assume that H0 is true in a HT, we assume ppool = p1 = p2 , which means our standard error is s SE = ppool (1 − ppool ) n1 We estimate ppool with p̂pool = Statistics 101 (Thomas Leininger) + ppool (1 − ppool ) n2 # of successes1 + # of successes2 n1 + n2 STA 101 Final Review June 24, 2013 12 / 19 Review More on confidence intervals how to interpret diff b/w two means Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 13 / 19 Review ANOVA—filling in chart Exercise 5.40 Mean SD n Less than HS 38.67 15.81 121 Df degree XXXXX Residuals XXXXX Total XXXXX Statistics 101 (Thomas Leininger) Educational attainment HS Jr Coll Bachelor’s 39.6 41.39 42.55 14.97 18.1 13.62 546 97 253 Sum Sq XXXXX 267,382 Mean Sq 501.54 Graduate 40.85 15.51 155 F value XXXXX Total 40.45 15.17 1,172 Pr(>F) 0.0682 XXXXX XXXXX STA 101 Final Review June 24, 2013 14 / 19 Review Chi-square tests Goodness of Fit: 1 variable H0 : There is no inconsistency between the observed and the expected counts. HA : There is an inconsistency between the observed and the expected counts. Test of Independence: 2 variables H0 : Variable 1 and Variable 2 are independent. HA : Variable 1 and Variable 2 are not independent. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 15 / 19 Review Regression Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.012701 0.012638 -1.005 0.332 bac$Beers 0.017964 0.002402 7.480 2.97e-06 BAC (grams per deciliter) ● Exercise 7.28 0.15 ● 0.10 ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● 2 4 6 Cans of beer 8 Residual standard error: 0.02044 on 14 degrees of freedom Multiple R-squared: 0.7998, Adjusted R-squared: 0.7855 F-statistic: 55.94 on 1 and 14 DF, p-value: 2.969e-06 Write the equation of the regression line. Interpret the slope and intercept in context. Do the data provide strong evidence that drinking more cans of beer is associated with an increase in blood alcohol? State the null and alternative hypotheses, report the p-value, and state your conclusion. What is R 2 ? Interpret R 2 in context. What is the correlation? Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 16 / 19 Review Regression Conditions for regression linearity nearly normal residuals constant variability (of residuals) Residuals Predict BAC content for someone who has had 5 cans of beer: ˆ = −0.012701 + 0.017964 ∗ 5 = 0.077099 BAC Observed BAC is 0.10 Residual is yi − ŷi = 0.10 − 0.077 = 0.023 Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 17 / 19 Review Type I, Type II error H0 true Truth HA true Decision fail to reject H0 reject H0 X Type 1 Error Type 2 Error X Type 1 error is rejecting H0 when you shouldn’t have, and the probability of doing so is α (significance level) Type 2 error is failing to reject H0 when you should have, and the probability of doing so is β (more complicated to calculate) Power of a test is the probability of correctly rejecting H0 , and the probability of doing so is 1 − β In hypothesis testing, we want to keep α and β low, but there are inherent trade-offs. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 18 / 19 Review Calculating sample sizes For our CIs, we can calculate the minimum sample size needed to provide a certain margin of error For a desired (maximum) ME, call it m, we start with m ≤ ME = critical value × SE (function on n) Solve for n, should always look like n ≥ 123.45 Then round n up to the next whole number (123.45 → 124). For a two-sample CI, you would need both sample sizes to be at least this large. Statistics 101 (Thomas Leininger) STA 101 Final Review June 24, 2013 19 / 19