Comparing 2 Conditions With a T Test

advertisement
Statistics for Everyone Workshop
Fall 2010
Part 3
Comparing 2 Conditions With a T Test
Workshop by Linda Henkel and Laura McSweeney of Fairfield University
Funded by the Core Integration Initiative and the Center for Academic Excellence at
Fairfield University
Statistics as a Tool in Scientific Research:
Comparing 2 Conditions With a T Test
Teaching Tips
The research will excite them, but the statistics part
might frighten them (“Oh, I’m not good at that…”) so
ALWAYS emphasize statistics as a tool for
researchers to use to answer research questions
Ask exciting and interesting research questions so
that they want to find out the answer
Be sure to cover the science basics (e.g., terminology);
If they don’t hear it from you, where do they hear
it? Don’t assume they already know
Statistics as a Tool in Scientific Research:
Comparing 2 Conditions With a T Test
Types of Research Questions
• Descriptive (What does X look like?)
• Correlational (Is there an association
between X and Y? As X increases, what
does Y do?)
• Experimental (Do changes in X cause
changes in Y?)
Statistics as a Tool in Scientific Research
 Start with the science and use statistics as a
tool to answer the research question
Get your students to formulate a research
question:
• How often does this happen?
• Did all plants/people/chemicals act the
same?
• What happens when I add more sunlight,
give more praise, pour in more water?
Types of Statistical Procedures
Descriptive: Organize and summarize
data
Inferential: Draw inferences about the
relations between variables;
use samples to generalize to the
population
Hypothesis Testing in Experiments
Independent variable (IV) (manipulated): Has different levels
or conditions
• Presence vs. absence (Drug, Placebo)
• Amount (5mg, 10 mg, 20 mg)
• Type (Drug A, Drug B, Drug C)
Quasi-Independent variable (not experimentally controlled)
e.g., Gender
Dependent variable (DV) (measured variable):
• Number of white blood cells, temperature, heart rate
Do changes in the levels/conditions of the IV cause changes
in the DV?
If graphing, put IV on x-axis and DV on y-axis
Hypothesis Testing in Experiments
Statistical test: the t test
Used for: Comparing average score in one
condition to average score in another
condition
Use when: The IV is categorical (with 2
levels) and the DV is numerical (interval or
ratio scale), e.g., weight as a function of gender,
# of white blood cells as a function of organism,
mpg as a function of foreign/domestic
Types of T Tests: One Sample T
Use when you have one sample and you want to
compare it to a known population
•
•
•
Participants get Drug B. Clinical trials with Drug A have
already shown how much it reduced pain. Does Drug B
provide even better pain relief, on average?
Plants are exposed to 3 hrs artificial light. You already
know how many blooms, on average, plants will make
when they have no artificial light. Does the artificial light
cause more blooms on average than in the population at
large?
Hybrid cars are driven on a test course to determine
their mpg. You already know the average mpg gaspowered cars get on this course. Do the hybrids get
better or worse mpg on average than the population at
large?
Types of T Tests: Independent Samples T
Use when you have a between-subjects design -comparing if there is a difference between two
separate (independent) groups
•
•
•
Some people get the drug, and other people get
the placebo. Which group on average has less
pain?
Some plants are exposed to 0 hrs of artificial
light, and some are exposed to 3 hours. Which
group has more blooms on average?
Some cars are hybrid and some are gaspowered. Which type of car gets better fuel
mileage on average?
Types of T Tests: Paired Samples T
Use when you have a within-subjects design –
each subject experiences all levels/conditions of
the IV; observations are paired/dependent/matched
•
•
•
Each person gets both the drug and the placebo at
different times. Which relieves pain better on average?
Each plant is exposed to 0 hrs of artificial light at one
time and 3 hours at another time. Which exposure time
causes more blooms on average?
Cars are filled with Fuel A at one time and Fuel B at
another time. Which fuel gets better mpg on average?
Hypothesis Testing Using T Tests
The t test allows a scientist to determine
whether their research hypothesis is
supported
Null hypothesis H0:
• The IV does not influence the DV
• Any differences in average scores between
the different conditions are probably just due
to chance (measurement error, random
sampling error)
Hypothesis Testing Using T Tests
Research hypothesis HA:
• The IV does influence the DV
• The differences in average scores
between the different conditions are
probably not due to chance but show a
real effect of the IV on the DV
Teaching tip: Very important for students to
be able to understand and state the
research question so that they see that
statistics is a tool to answer that question
Hypothesis Testing Using T Tests
Null hypothesis: Average pain relief is the same whether people
have Drug A or a placebo
Research hypothesis: Drug A brings better pain relief on average
than the placebo (Upper tail test)
Null hypothesis: Plants exposed to 3 hours of artificial light have
the same number of blooms on average as plants not exposed
to artificial light
Research hypothesis: Plants exposed to 3 hours of artificial light
have a different number of blooms on average than plants not
exposed to artificial light (Two tailed test)
Null hypothesis: Cars filled with Fuel A get the same mpg as Fuel
B on average
Research hypothesis: Cars filled with Fuel A get less mpg than
Fuel B on average (Lower tail test)
Hypothesis Testing Using T Tests
Do the data/evidence support the research
hypothesis or not?
Did the IV really influence the DV, or are the
obtained differences in averages between
conditions just due to chance?
Teaching tip: To increase your students’
understanding, you should be explicit about
what researchers mean by “just by
chance”…
Hypothesis Testing Using T Tests
p value = probability of results being due to chance
When the p value is high (p > .05), the obtained
difference is probably due to chance
.99 .75 .55 .25 .15 .10 .07
When the p value is low (p < .05), the obtained
difference is probably NOT due to chance and more
likely reflects a real influence of the IV on DV
.04 .03 .02 .01 .001
Hypothesis Testing Using T Tests
p value = probability of results being due to chance
[Probability of observing your data (or more severe) if H0 were
true]
When the p value is high (p > .05), the obtained difference is
probably due to chance
[Data likely if H0 were true]
.99 .75 .55 .25 .15 .10 .07
When the p value is low (p < .05), the obtained difference is
probably NOT due to chance and more likely reflects a real
influence of the IV on DV
[Data unlikely if H0 were true, so data support HA]
.04 .03 .02 .01 .001
Hypothesis Testing Using T Tests
In science, a p value of .05 is a conventionally
accepted cutoff point for saying when a result is
more likely due to chance or more likely due to a
real effect
Not significant = the obtained difference is probably
due to chance; the IV does not appear to have a
real influence on the DV; p > .05
Statistically significant = the obtained difference is
probably NOT due to chance and is likely due to a
real influence of the IV on DV; p < .05
Types of T Tests
All 3 t tests in essence answer the same
question:
• Is the research hypothesis supported?
• In other words, did the IV really influence
the DV, or are the obtained differences
between conditions just due to chance?
One sample t test
Independent samples t test
Paired samples t test
The Essence of a T Test
They answer this question by calculating a t score
The t score basically examines how large the
difference between the average score in each
condition is, relative to how far spread out you
would expect scores to be just based on chance
(i.e., if there really was no effect of the IV on the
DV)
It is of course more sophisticated than that but
students will have to wait for statistics courses to
really get the low down on how it all works
The Essence of a T Test
This can be taught without formulas but for the sake
of thoroughness…
All formulas take the basic format:
Difference between means
Standard Error (SE)
Remember: When you sample from a population,
you won’t get the same exact mean every time.
SE = how far you would expect sample means
from the same population to vary due to chance
Formulas for T Tests
One-sample t test: difference between
sample mean and population mean
divided by estimated standard error of
the mean
Independent samples t test: difference
between sample means divided by
standard error of the difference
between means
Related samples t test: mean of the
difference scores – expected mean
difference divided by standard error of
the mean difference scores
t
MX  
SM
t
(M X  M Y )  ( X   Y )
SD M X  M Y
t
MD  D
SD M D
The Essence of a T Test
Each t test gives you a t score, which can be positive or
negative; It’s the absolute value that matters
The bigger the |t| score, the less likely the difference
between conditions is just due to chance
The bigger the |t| score, the more likely the difference
between conditions is due to a real effect of the IV on
the DV
So big values of |t| will be associated with small p values that
indicate the differences are significant (p < .05)
Little values of |t| (i.e., close to 0) will be associated with
larger p values that indicate the differences are not
significant (p > .05)
Each t test will also have a particular value for degrees of freedom (df), which
is based on the sample size (N)
Excel will calculate this for you, based on the sample size. The test statistic
and df are both needed to calculate the p-value.
Test
df formula
What it measures
One sample t
N–1
total number of subjects minus 1
Independent
samples t
(Pooled)
(N1 – 1)+(N2 – 1)
(number of subjects in Cond. 1)
minus 1 + (number of subjects in
Cond. 2) minus 1
Paired samples t
N–1
total number of subjects minus 1
* Remember, they are the same
subjects in both conditions so do not
count them twice
What Software to Use to Analyze Your Data?
If you have raw data, you can use Excel or SPSS
If the data has already been summarized (e.g.,
you do not have the raw data but just have
group means and SDs), you need to use Excel
Running the One-Sample T Test Using Excel
One Sample t Test: Use when you have one sample
and you want to compare it to a known population
Need:
• Descriptive statistics: Sample M and SD
• Known population mean (to compare sample to)
To run: Open Excel file “SFE Statistical Tests” and go
to page called One-sample t test
Enter population mean (mu or ), sample size (N),
sample mean (M), and standard deviation (SD)
Output: Computer calculates t value, df, and p value
Running the Independent Samples T Test Using Excel
Independent samples t Test: Use when you have a betweensubjects design
Need:
• Descriptive statistics: M, SD, and N for each condition
To run: Open Excel file “SFE Statistical Tests” and go to page
called Independent samples t test
Pooled: If population SDs are equal
Not Pooled: If population SDs are not equal
Enter for each condition the sample size (N), Mean (M), and
standard deviation (SD)
Output: Computer calculates t value, df, and p value
Running the Paired Samples T Test Using Excel
Paired samples t Test: Use when you have a withinsubjects design
Need:
Descriptive statistics: Difference scores for the paired
data (X-Y); M and SD for the difference scores
To run: Open Excel file “SFE Statistical Tests” and go
to page called Paired samples t test
Enter the sample size (N), Mean (M) and standard
deviation (SD) of the difference scores
Output: Computer calculates t value, df, and p value
Running the One-Sample T Test Using SPSS
One Sample t Test: Use when you have one sample
and you want to compare it to a known population
Need:
• Raw numerical data in a column
• Known population mean (to compare sample to)
To run: Open SPSS; Analyze  Compare means 
One sample t test; send the column with your raw
data into the “Test variable” box; Under “Test value”
enter the known population mean, hit Ok
Output: Computer calculates t value, df, and p value
Running the Independent Samples T Test Using SPSS
Independent samples t Test: Use when you have a betweensubjects design
Need:
Raw numerical data set up in two columns (IV; DV); make sure
you use value labels under IV to indicate what the #s stand for (e.g.,
1=male, 2=female)
To run:
Analyze  compare means  independent samples t test
Test var: DV
Grouping var: IV (condition)
Define groups Group 1: 1 Group 2: 2, ok
Output: Computer calculates t value, df, and p value
Running the Paired Samples T Test Using SPSS
Paired samples t Test: Use when you have a withinsubjects design
Need:
Set up data file as two columns listing raw numerical data:
Cond 1, Cond 2 [but give meaningful names to]
To run:
Analyze  compare means  paired samples t test; send
pair over
Output: Computer calculates t value, df, and p value
Interpreting T Tests
Cardinal rule: Scientists do not say “prove”! Conclusions are
based on probability (likely due to chance, likely a real
effect…). Be explicit about this to your students.
Based on p value, determine whether you have evidence to
conclude the difference was probably real or was probably
due to chance: Is the research hypothesis supported?
p < .05: Significant
•
Reject null hypothesis and support research hypothesis (the
difference was probably real; the IV likely influences the DV)
p > .05: Not significant
•
Retain null hypothesis and reject research hypothesis (any
difference was probably due to chance; the IV did not influence
the DV)
Teaching Tips
Students have trouble understanding what is less than .05 and what is
greater, so a little redundancy will go a long way!
Whenever you say “p is less than point oh-five” also say, “so the probability
that this is due to chance is less than 5%, so it’s probably a real effect.”
Whenever you say “p is greater than point oh-five” also say, “so the
probability that this is due to chance is greater than 5%, so there’s just not
enough evidence to conclude that it’s a real effect – these 2 conditions are
not really different”
In other words, read the p value as a percentage, as odds, “the odds that this
difference is due to chance are 1%, so it’s probably not chance…”
Relate your phrasing back to the IV and DV: “So the IV likely caused changes
in the DV…”; “The IV worked – these 2 conditions are different”
Deciding to Use One-tailed or Two-tailed T Tests
Different disciplines have different conventions
The conservative “rule” in some disciplines is to use a
two-tailed test, regardless of whether you had a
directional or nondirectional hypothesis
• Directional: You expect scores in one condition to be higher than the other
• Nondirectional: You expect a difference between conditions but do not
specify ahead of time whether A should be greater than B or less than B
If using two-tailed, instruct your students to look for the
p value on the line that says “two tailed” in the Excel
worksheet
Deciding to Use One-tailed or Two-tailed T Tests
Example: For the two sample tests (both paired and
independent tests)
Null Hypothesis: 1 = 2
Research Hypothesis:
1  2 (Two tail test) or
1 > 2 (Upper tail test) or
1 < 2 (Lower tail test)
Where 1 is the mean of the first population and2 is
the mean of the second population
Reporting T Tests Results
State key findings in understandable
sentences
Use descriptive and inferential statistics to
supplement verbal description by putting
them in parentheses and at the end of
the sentence
Use a table and/or figure to illustrate
findings
Reporting T Tests Results
Step 1: Write a sentence that clearly indicates what statistical
analysis you used
A [type of t test] t test was conducted to determine whether
[name of DV] varied as a function of [name of IV or name of
conditions]
A one-sample t test was conducted to determine whether the pollen count
varied as a function of location (on campus or in the state of CT in general)
An independent samples t test was conducted to determine whether people’s
pulse rates varied as a function of their weight (obese people or
underweight).
A paired samples t test was conducted to determine whether calories
consumed by rats varied as a function of group size (rats tested alone,
tested in groups).
Reporting T Tests Results
Step 2: Write a sentence that clearly indicates what pattern you saw in your
data analysis – Did the conditions differ, and if so, how (i.e., which
condition scored higher or lower?)
Average [Name of DV] was significantly higher/lower for [name of Condition
1] than the average for [name of Condition 2]
[Name of DV] was significantly higher/lower on average for [name of
Condition 1] than for [name of Condition 2]
[Name of DV] did not significantly differ between [name of Condition 1] and
[name of Condition 2] on average
The pollen count was significantly lower on average on college campuses
than in the state of CT in general. (Lower tailed test)
Pulse rates did not significantly differ on average whether people were
obese or underweight. (Two tailed test)
The number of calories on average consumed by rats was significantly
higher when the rats ate alone than when they ate in groups. (Upper
tailed test)
Teaching Tip
The key is to emphasize that these should be nice,
easy-to-understand grammatical sentences that do
not sound like “Me Tarzan, you Jane!”
You may want your students to explicitly note whether
the research hypothesis was supported or not.
“Results supported the hypothesis that increased
dosages of the drug would reduce the average
number of white blood cells…”
Reporting T Tests Results
Step 3: Tack the descriptive and inferential statistics onto the
sentence
• Put Ms and SDs in parentheses after the name of the
condition. (Possibly include Ns for two sample tests.)
• Put the t test results at the end of the sentence using this
format: t(df) = x.xx, p = .xx
The pollen count was significantly lower on average on college campuses
(M=8.35, SD=2.25) than in the state of CT in general (=9.85), t(12)=
-2.40, p = .02.
Pulse rates did not significantly differ on average whether people were obese
(M=95 beats per minute, SD=21, N = 24) or underweight (M=91 beats per
minute, SD=18, N = 23), t(45)= 0.70, p = .49.
The number of calories on average consumed by rats was significantly higher
when the rats ate alone (M=100.34, SD = 12.64) than when they ate in
groups (M=87.65, SD = 13.43), t(22) = 2.38, p = .01.
Reporting T Tests Results for Nonsignificant Difference
Teaching tip: Make sure your students understand that when the
difference is not significant, they should NOT word their
sentences to imply that there is a difference
Not significant = no difference
One mean will no doubt be higher than the other, but if it’s not a
significant difference, then the difference is probably not real,
so do not interpret a direction (Saying “this is higher but no it’s
really not” is silly)
Option 1: Simply say there was no significant difference
Pulse rates did not significantly differ on average whether people were obese (M=95
beats per minute, SD=21, N = 24) or underweight (M=91 beats per minute, SD=18, N
= 23), t(45)= 0.70, p = .49.
Option 2: Word the sentence so that it is clear that the
hypothesized difference was not obtained
Counter to hypotheses, pulse rates were not significantly higher on average for people
who were obese (M=95 beats per minute, SD=21, N = 24) than for people who were
underweight (M=91 beats per minute, SD=18, N = 23), t(45)= 0.70, p = .49.
Reporting T Tests Results
•
Be sure that you note the unit of measure for the DV
(miles per gallon, volts, seconds, #, %). Be very
specific
• If using a Table or Figure showing M & SDs or SEs,
you do not necessarily have to include those
descriptive statistics in your sentences
• If you have more than 2 conditions, run t tests for
each pair of conditions you want to compare (e.g.,
Cond. 1 vs. 2, Cond 1 vs. 3, Cond. 2 vs. 3)
More Teaching Tips
You can ask your students to report either:
• the exact p value (p = .03, p = .45)
• the cutoff: say either p < .05 (significant) or p > .05 (not
significant)
You should specify which style you expect. Ambiguity confuses
them!
Tell students they can only use the word “significant” only when
they mean it (i.e., the probability the results are due to chance
is less than 5%) and to not use it with adjectives (i.e., they
often mistakenly think one test can be “more significant” or
“less significant” than another). Emphasize that “significant” is
a cutoff that is either met or not met -- Just like you are either
found guilty or not guilty, pregnant or not pregnant. There are
no gradients. Lower p values = less likelihood result is due to
chance, not “more significant”
Effect Size for T Test
When a result is found to be significant
(p < .05), many researchers report the
effect size as well
Effect size = How large the difference in
scores was
Significant = Was there a real difference or
not?
Effect Size for T Test
Effect size: How much did the IV influence the DV?
Cohen’s d = |mean difference|
SD
Small
0 - .20
Medium
.21 - .80
Large
> .80
Reporting T Tests Results
****THIS STEP IS OPTIONAL***
Step 4: Report the effect size if the t test was significant
After the t test results are reported, say whether it was a small,
medium, or large effect size, and report Cohen’s d
The pollen count was significantly lower on average on college campuses
(M=8.35, SD=2.25) than in the state of CT in general (=9.85), t(12)= -2.40,
p = .02, and the effect size was medium, Cohen’s d = .67
Pulse rates did not significantly differ on average whether people were obese
(M=95 beats per minute, SD=21, N = 24) or underweight (M=91 beats per
minute, SD=18, N = 23), t(45)=.70, p = .49. No reason to report effect size
because t test was not significant.
The number of calories on average consumed by rats was significantly higher
when the rats ate alone (M=100.34, SD = 12.64) than when they ate in
groups (M=87.65, SD = 13.43), t(22) = 2.38, p = .01, and the effect size was
large, Cohen’s d = .97.
Check Assumptions for T Test
• Numerical scale (interval or ratio) for DV
• The distribution of scores in each sample is
approximately symmetrical (normal) thus
the mean is an appropriate measure of
central tendency
• If a distribution is somewhat skewed (not
symmetric) it is still acceptable to run the
the test as long as the sample size is not
too small (say, N > 30)
Time to Practice
• Running and reporting t tests
Download