i INF397C Introduction to Research in Information Studies

advertisement
i
INF397C
Introduction to Research in Information
Studies
Fall, 2009
Day 11
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
1
Context (cont’d.)
i
• Where we’re going:
Inferential statistics
•
•
•
•
Confidence intervals
Hypothesis testing, Type I and II errors, significance level
t-tests
ANOVA
– More descriptive statistics
• Correlation
– Which method when?
– Cumulative final
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
2
Standard Error of the Mean
i
SE = SM = S/SQRT(N)
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
3
A research question
i
1. Does an iSchool IT-provided online
tutorial lead to better learning than a
face-to-face class?
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
4
Two methods of making
statistical inferences
i
• Null hypothesis testing
– Assume IV has no effect on DV; differences we
obtain are just by chance (error variance)
– If the difference is unlikely enough to happen by
chance (and “enough” tends to be p < .05), then we
say there’s a true difference.
• Confidence intervals
– We compute a confidence interval for the “true”
population mean, from sample data. (95% level,
usually.)
– If two groups’ confidence intervals don’t overlap, we
say (we INFER) there’s a true difference.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
5
Remember . . .
i
• Earlier I said that there are two ways for
us to be confident that something is true:
– Statistical inference
– Replicability
• Now I’m saying there are two avenues of
statistical inference:
– Hypothesis testing
– Confidence intervals
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
6
t-tests
i
• Remember the z scores:
– z = (X - µ)/σ
– It is often the case that we want to know “What
percentage of the scores are above (or below) a
certain other score”?
– Asked another way, “What is the area under the
curve, beyond a certain point”?
– THIS is why we calculate a z score, and the way we
do it is with the z table, on p. 362 of Hinton.
• Problem: We RARELY truly know µ or σ.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
7
t-tests (cont’d.)
i
• So, typically what we do is use M to estimate µ and s
to estimate σ. (Duh.) (Note: When we estimate σ
with s, we divide by N-1, which is degrees of freedom.)
• Then, instead of z, we calculate t.
• Hinton’s example on p. 66 is for a t-test when you
have a null hypothesis population mean (µ0). (That is,
you want to test if your observed sample mean is
different from some value.)
• Hinton then offers examples in Chapter 8 of related
(dependent, within-subjects) and independent
(unrelated, between-subjects) t-tests.
• S, Z, & Z’s example on p. 447 is for a t-test to compare
independent means.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
8
Formulae
i
- For a single mean(compared with µ0):
- t = (M - µ)/(s/SQRTn)
- For related (within-subjects) groups:
- t = (M1 – M2)/se M1 – M2
- Where se M1 – M2 = (sx1 – x2)/SQRTn
- See Hinton, p. 86
- For independent groups:
- From S, Z, & Z, p. 447, and Hinton, p. 90
- t = (M1 – M2)/se M1 – M2
– Where se M1 – M2 = SQRT [(S12/n1) + (S22/n2)]
– See Hinton, p. 90
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
9
Steps
i
• For a t test for a single sample
– Restate the question as a research hypothesis and a null hypothesis
about the populations.
– Determine the characteristics of the comparison distribution.
• The mean is the known population mean.
• Compute the standard deviation by:
–
–
–
–
Calculate the estimated population variance (S2 = SS/df)
Calculate the variance of the distribution of means (S2/n)
Take the square root, to get SE.
Note, we’re calculating t with N-1 df.
• Determine the cutoff sample score on the comparison distribution at
which the null hypothesis should be rejected.
– Decide on an alpha and one-tailed vs. two-tailed
– Look up the critical value in the table
– Determine your sample’s t score: t = m- µ / SE
– Decide whether to reject or not reject the null hypothesis. (If the
observed value of t exceeds the table value, reject.)
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
10
Steps
•
i
For a t test for dependent means
– Restate the question as a research hypothesis and a null hypothesis about
the populations.
– Determine the characteristics of the comparison distribution.
• Make each person’s score into a difference score. From here on out, use difference
scores.
• Compute the mean of the difference scores.
• Assume a population mean of 0: µ = 0.
• Compute the standard deviation of the difference scores:
–
–
–
–
Calculate the estimated population variance (S2 = SS/df)
Calculate the variance of the distribution of means (S2/n)
Take the square root, to get SE.
Note, we’re calculating t with N-1 df.
• Determine the cutoff sample score on the comparison distribution at which the null
hypothesis should be rejected.
– Decide on an alpha, and one-tailed vs. two-tailed
– Look up the critical value in the table
– Determine your sample’s t score: t = m - µ / SE
– Decide whether to reject or not reject the null hypothesis. (If the observed
value of t exceeds the table value, reject.)
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
11
Steps
i
• For a t test for independent means
– Same as for dependent means, except the value
for SE is that squirrely formula on Hinton, p. 90.
– Basically, here’s the point. When you’re comparing
DEPENDENT (within-subject, related) means, you
can assume both sets of scores come from the
same distribution, thus have the same standard
deviation.
• But when you’re comparing independent (betweensubject, unrelated) means, you gotta basically average the
variability of each of the two distributions.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
12
Three points
i
• df
– Four people, take your choice of candy.
– One df used up calculating the mean.
• One or two tails
– Must be VERY careful, choosing to do a one-tailed
test.
• Comparing the z and t tables
– Check out the .05 t table values for infinity df (1.96
for two-tailed test, 1.645 for one-tailed).
– Now find the commensurate values in the z table.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
13
Type I and Type II Errors
World
Our
decision
Reject the
null
hypothesis
i
Null
Null
hypothesis is hypothesis is
false
true
Correct
decision
Type I error
(α)
Fail to reject Type II error Correct
the null
(β)
decision
hypothesis
(1-β)
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
14
Confidence Intervals
i
• We calculate a confidence interval for a population parameter.
• The mean of a random sample from a population is a point
estimate of the population mean.
• But there’s variability! (SE tells us how much.)
• What is the range of scores between which we’re 95% confident
that the population mean falls?
• Think about it – the larger the interval we select, the larger the
likelihood it will “capture” the true (population) mean.
• CI = M +/- (t.05)(SE)
• See Box 12.2 on “margin of error.” NOTE: In the box they arrive
at a 95% confidence that the poll has a margin of error of 5%. It
is just coincidence that these two numbers add up to 100%.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
15
CI about a mean -- example
•
•
•
•
i
CI = M +/- (t.05)(SE)
Establish the level of α (two-tailed) for the CI. (.05)
M=15.0 s=5.0 N=25
Use Table A.2 to find the critical value associated with the df.
– t.05(24) = 2.064
• CI = 15.0 +/- 2.064(5.0/SQRT 25)
= 15.0 +/- 2.064
= 12.935 – 17.064
“The odds are 95 out of 100 that the population mean falls between
12.935 and 17.064.”
(NOTE: This is NOT the same as “95% of the scores fall within this
range!!!)
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
16
Another CI example
i
• Hinton, p. 89.
• t test not sig.
• What if we did this via confidence
intervals?
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
17
Download