i INF397C Introduction to Research in Information Studies Fall, 2009 Day 10 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 1 Where we’ve been: i • Descriptive statistics – – – – – – Frequency distributions Graphs Types of scales Probability Measures of central tendency and spread z scores • Experimental design – – – – – The scientific method Operational definitions IV, DV, controls, counterbalancing, confounds Validity, reliability Within- and between-subject designs • Qualitative research – Gracy, Rice Lively R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 2 Context (cont’d.) i • Where we’re going: – More descriptive statistics • Correlation (demo) – Inferential statistics • • • • • Confidence intervals Hypothesis testing, Type I and II errors, significance level t-tests ANOVA (VERY high level) Chi square (demo) – Which method when? – Cumulative final R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 3 First, correcting a lie Parameters (for populations) Statistics (for samples) Mean µ = ΣX/N M (or “X bar”) = ΣX/N Standard deviation σ = SQRT of s = SQRT of 2–(ΣX)2/N)/ 2 2 (ΣX (ΣX –(ΣX) /N)/ N-1 N R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 4 Degrees of Freedom i • Demo R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 5 Standard Error of the Mean i • So far, we’ve computed a sample mean (M, X bar), and used it to estimate the population mean (µ). • One thing we’ve gotten convinced of (I hope) is . . . larger sample sizes are better. – Think about it – what if I asked ONE of you, what School are you a student in? Versus asking 10 of you? R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 6 Standard Error (cont’d.) i • Well, instead of picking ONE sample, and using that mean to estimate the population mean, what if we sampled a BUNCH of samples? • If we sampled ALL possible samples, the mean of the means would equal the population mean. (“µM”) • Here are some other things we know: – As we get more samples, the mean of the sample means gets closer to the population mean. – Distribution of sample means tends to be normal. – We can use the z table to find the probability of a mean of a certain value. – And most importantly . . . R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 7 Standard Error (cont’d.) i • We can easily work out the standard deviation of the distribution of sample means: SE = SM = S/SQRT(N) • So, the standard error of the mean is the standard distance that a sample mean is from the population mean. • Thus, the SE tells us how good an estimate our sample mean is of the population mean. • Note, as N gets larger, the SE gets smaller, and the better the sample mean estimates the population mean. • Hold on – we’ll use SE later. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 8 A research question i 1. Does an iSchool IT-provided online tutorial lead to better learning than a face-to-face class? R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 9 Two methods of making statistical inferences i • Null hypothesis testing – Assume IV has no effect on DV; differences we obtain are just by chance (error variance) – If the difference is unlikely enough to happen by chance (and “enough” tends to be p < .05), then we say there’s a true difference. • Confidence intervals – We compute a confidence interval for the “true” population mean, from sample data. (95% level, usually.) – If two groups’ confidence intervals don’t overlap, we say (we INFER) there’s a true difference. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 10 Remember . . . i • Earlier I said that there are two ways for us to be confident that something is true: – Statistical inference – Replicability • Now I’m saying there are two avenues of statistical inference: – Hypothesis testing – Confidence intervals R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 11 Confidence Intervals i • We calculate a confidence interval for a population parameter. • The mean of a random sample from a population is a point estimate of the population mean. • But there’s variability! (SE tells us how much.) • What is the range of scores between which we’re 95% confident that the population mean falls? • Think about it – the larger the interval we select, the larger the likelihood it will “capture” the true (population) mean. • CI = M +/- (t.05)(SE) • See Box 12.2 on “margin of error.” NOTE: In the box they arrive at a 95% confidence that the poll has a margin of error of 5%. It is just coincidence that these two numbers add up to 100%. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 12 CI about a mean -- example • • • • i CI = M +/- (t.05)(SE) Establish the level of α (two-tailed) for the CI. (.05) M=15.0 s=5.0 N=25 Use Table A.2 to find the critical value associated with the df. – t.05(24) = 2.064 • CI = 15.0 +/- 2.064(5.0/SQRT 25) = 15.0 +/- 2.064 = 12.935 – 17.064 “The odds are 95 out of 100 that the population mean falls between 12.935 and 17.064.” (NOTE: This is NOT the same as “95% of the scores fall within this range!!!) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 13 Type I and Type II Errors World Our decision Reject the null hypothesis i Null Null hypothesis is hypothesis is false true Correct decision Type I error (α) Fail to reject Type II error Correct the null (β) decision hypothesis (1-β) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 14