i INF397C Introduction to Research in Information Studies Fall, 2009 Day 11 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 1 Context (cont’d.) i • Where we’re going: Inferential statistics • • • • Confidence intervals Hypothesis testing, Type I and II errors, significance level t-tests ANOVA – More descriptive statistics • Correlation – Which method when? – Cumulative final R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 2 Standard Error of the Mean i SE = SM = S/SQRT(N) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 3 A research question i 1. Does an iSchool IT-provided online tutorial lead to better learning than a face-to-face class? R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 4 Two methods of making statistical inferences i • Null hypothesis testing – Assume IV has no effect on DV; differences we obtain are just by chance (error variance) – If the difference is unlikely enough to happen by chance (and “enough” tends to be p < .05), then we say there’s a true difference. • Confidence intervals – We compute a confidence interval for the “true” population mean, from sample data. (95% level, usually.) – If two groups’ confidence intervals don’t overlap, we say (we INFER) there’s a true difference. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 5 Remember . . . i • Earlier I said that there are two ways for us to be confident that something is true: – Statistical inference – Replicability • Now I’m saying there are two avenues of statistical inference: – Hypothesis testing – Confidence intervals R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 6 t-tests i • Remember the z scores: – z = (X - µ)/σ – It is often the case that we want to know “What percentage of the scores are above (or below) a certain other score”? – Asked another way, “What is the area under the curve, beyond a certain point”? – THIS is why we calculate a z score, and the way we do it is with the z table, on p. 362 of Hinton. • Problem: We RARELY truly know µ or σ. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 7 t-tests (cont’d.) i • So, typically what we do is use M to estimate µ and s to estimate σ. (Duh.) (Note: When we estimate σ with s, we divide by N-1, which is degrees of freedom.) • Then, instead of z, we calculate t. • Hinton’s example on p. 66 is for a t-test when you have a null hypothesis population mean (µ0). (That is, you want to test if your observed sample mean is different from some value.) • Hinton then offers examples in Chapter 8 of related (dependent, within-subjects) and independent (unrelated, between-subjects) t-tests. • S, Z, & Z’s example on p. 447 is for a t-test to compare independent means. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 8 Formulae i - For a single mean(compared with µ0): - t = (M - µ)/(s/SQRTn) - For related (within-subjects) groups: - t = (M1 – M2)/se M1 – M2 - Where se M1 – M2 = (sx1 – x2)/SQRTn - See Hinton, p. 86 - For independent groups: - From S, Z, & Z, p. 447, and Hinton, p. 90 - t = (M1 – M2)/se M1 – M2 – Where se M1 – M2 = SQRT [(S12/n1) + (S22/n2)] – See Hinton, p. 90 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 9 Steps i • For a t test for a single sample – Restate the question as a research hypothesis and a null hypothesis about the populations. – Determine the characteristics of the comparison distribution. • The mean is the known population mean. • Compute the standard deviation by: – – – – Calculate the estimated population variance (S2 = SS/df) Calculate the variance of the distribution of means (S2/n) Take the square root, to get SE. Note, we’re calculating t with N-1 df. • Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. – Decide on an alpha and one-tailed vs. two-tailed – Look up the critical value in the table – Determine your sample’s t score: t = m- µ / SE – Decide whether to reject or not reject the null hypothesis. (If the observed value of t exceeds the table value, reject.) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 10 Steps • i For a t test for dependent means – Restate the question as a research hypothesis and a null hypothesis about the populations. – Determine the characteristics of the comparison distribution. • Make each person’s score into a difference score. From here on out, use difference scores. • Compute the mean of the difference scores. • Assume a population mean of 0: µ = 0. • Compute the standard deviation of the difference scores: – – – – Calculate the estimated population variance (S2 = SS/df) Calculate the variance of the distribution of means (S2/n) Take the square root, to get SE. Note, we’re calculating t with N-1 df. • Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. – Decide on an alpha, and one-tailed vs. two-tailed – Look up the critical value in the table – Determine your sample’s t score: t = m - µ / SE – Decide whether to reject or not reject the null hypothesis. (If the observed value of t exceeds the table value, reject.) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 11 Steps i • For a t test for independent means – Same as for dependent means, except the value for SE is that squirrely formula on Hinton, p. 90. – Basically, here’s the point. When you’re comparing DEPENDENT (within-subject, related) means, you can assume both sets of scores come from the same distribution, thus have the same standard deviation. • But when you’re comparing independent (betweensubject, unrelated) means, you gotta basically average the variability of each of the two distributions. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 12 Three points i • df – Four people, take your choice of candy. – One df used up calculating the mean. • One or two tails – Must be VERY careful, choosing to do a one-tailed test. • Comparing the z and t tables – Check out the .05 t table values for infinity df (1.96 for two-tailed test, 1.645 for one-tailed). – Now find the commensurate values in the z table. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 13 Type I and Type II Errors World Our decision Reject the null hypothesis i Null Null hypothesis is hypothesis is false true Correct decision Type I error (α) Fail to reject Type II error Correct the null (β) decision hypothesis (1-β) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 14 Confidence Intervals i • We calculate a confidence interval for a population parameter. • The mean of a random sample from a population is a point estimate of the population mean. • But there’s variability! (SE tells us how much.) • What is the range of scores between which we’re 95% confident that the population mean falls? • Think about it – the larger the interval we select, the larger the likelihood it will “capture” the true (population) mean. • CI = M +/- (t.05)(SE) • See Box 12.2 on “margin of error.” NOTE: In the box they arrive at a 95% confidence that the poll has a margin of error of 5%. It is just coincidence that these two numbers add up to 100%. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 15 CI about a mean -- example • • • • i CI = M +/- (t.05)(SE) Establish the level of α (two-tailed) for the CI. (.05) M=15.0 s=5.0 N=25 Use Table A.2 to find the critical value associated with the df. – t.05(24) = 2.064 • CI = 15.0 +/- 2.064(5.0/SQRT 25) = 15.0 +/- 2.064 = 12.935 – 17.064 “The odds are 95 out of 100 that the population mean falls between 12.935 and 17.064.” (NOTE: This is NOT the same as “95% of the scores fall within this range!!!) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 16 Another CI example i • Hinton, p. 89. • t test not sig. • What if we did this via confidence intervals? R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 17