Correlations

advertisement
Correlations and T-tests
Matching level of measurement to
statistical procedures
We can match statistical methods to the level of
measurement of the two variables that we want to
assess:
Level of
Measurement
Nominal
Ordinal
Interval
Ratio
Nominal
Chisquare
Chisquare
T-test
ANOVA
T-test
ANOVA
Ordinal
Chisquare
ChiSquare
ANOVA
ANOVA
Interval
T-test
ANOVA
ANOVA
Correlation
Regression
Correlation
Regression
Ratio
T-test
ANOVA
ANOVA
Correlation
Regression
Correlation
Regression
However, we should only use these tests
when:



We have a normal distribution for an interval
or ratio level variable.
When the dependent variable (for
Correlation, T-test, ANOVA, and Regression)
is interval or ratio.
When our sample has been randomly
selected or is from a population.
Interpreting a Correlation from an SPSS
Printout
Corre lations
Educational Level (years) Pearson Correlation
Sig. (2-tailed)
N
Beginning Salary
Pearson Correlation
Sig. (2-tailed)
N
Educational
Level (years)
1
.
474
.633**
.000
474
**. Correlation is significant at the 0.01 level (2-tailed).
Beginning
Salary
.633**
.000
474
1
.
474
A correlation is:




An association between two interval or ratio
variables.
Can be positive or negative.
Measures the strength of the association
between the two variables and whether it is
large enough to be statistically signficant.
Can range from -1.00 to 0.00 and from 0.00
to 1.00.
Example: Types of Relationships
Positive
Income
($)
Negative
Education
(yrs)
Income
($)
No Relationship
Education
(yrs)
Income
($)
Education
(yrs)
20,000
10
20,000
18
20,000
14
30,000
12
30,000
16
30,000
18
40,000
14
40,000
14
40,000
10
50,000
16
50,000
12
50,000
12
75,000
18
75,000
10
75,000
16
The stronger the correlation the closer
it will be to 1.00 or -1.00. Weak
correlations will be close to 0.00
(either positive or negative)
You can see the degree of correlation
(association) by using a scatterplot graph
22
20
18
16
14
12
10
8
6
0
20000
Current Salary
40000
60000
80000
100000
120000
140000
Looking at a scatterplot from the same data set, current and
beginning salary we can see a stronger correlation
100000
80000
60000
40000
20000
0
0
20000
40000
Current Salary
60000
80000
100000
120000
140000
If we run the correlation between these two variables in
SPSS, we find
Correlations
Beginning Salary
Current Salary
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Beginning
Current
Salary
Salary
1
.880**
.
.000
474
474
.880**
1
.000
.
474
474
**. Correlation is s ignificant at the 0.01 level (2-tailed).
For these two variables, if we were to test a hypothesis at
Confidence Level, .01
Alternative Hypothesis:
There is a positive association between beginning and current salary.
Null Hypothesis:
There is no association between beginning and current salary.
Decision: r (correlation) = .88 at p. = .000.
.000 is less than .01.
We reject the null hypothesis and accept the alternative hypothesis!
(Bonus Question): Why would we expect the previous correlation to be statistically
significant at below the p.= .01 level?
Answer: This is a large data set N = 474 – this makes it likely that if there is a
correlation, it will be statistically significant at a low significance (p) level.
Larger data sets are less likely to be affected by sampling or random error!
Other important information on
correlation



Correlation does not tell us if one variable “causes”
the other – so there really isn’t an independent or
dependent variable.
With correlation, you should be able to draw a
straight line between the highest and lowest point in
the distribution. Points that are off the “best fit” line,
indicate that the correlation is less than perfect (1/+1).
Regression is the statistical method that allows us to
determine whether the value of one interval/ratio
level can be used to predict or determine the value
of another.
Another measure of association is a t-test.
T-tests



Measure the association between a nominal
level variable and an interval or ratio level
variable.
It looks at whether the nominal level variable
causes a change in the interval/ratio variable.
Therefore the nominal level variable is always
the independent variable and the
interval/ratio variable is always the
dependent.
Example of t-test – Self –Esteem Scores
Men
Women
32
34
44
18
56
52
18
16
21
33
39
26
25
35
28
20
32.875
29.25
Important things to know about an
independent samples t-test



It can only be used when the nominal variable has
only two categories.
Most often the nominal variable pertains to
membership in a specific demographic group or a
sample.
The association examined by the independent
samples t-test is whether the mean of interval/ratio
variable differs significantly in each of the two
groups. If it does, that means that group
membership “causes” the change or difference in
the mean score.
Looking at the difference in means between the two
groups, can we tell if the difference is large enough to
be statistically significant?
Group Statistics
Beginning Salary
Gender
Male
Female
N
258
216
Mean
$20301.4
$13092.0
Std.
Deviation
*********
*********
Std. Error
Mean
$567.275
$199.742
T-test results
Independent Samples Test
Levene's Test for
Equality of Variances
F
Beginning Salary Equal variances
assumed
Equal variances
not assumed
105.969
Sig.
.000
t-test for Equality of Means
t
Mean
Sig. (2-tailed) Difference
df
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
11.152
472
.000
$7,209.43
$646.447 $5939.16 $8479.70
11.987
318.818
.000
$7,209.43
$601.413 $6026.19 $8392.67
Positive and Negative t-tests


Your t-test will be positive when, the lowest
value category (1,2) or (0,1) is entered into
the grouping menu first and the mean of that
first group is higher than the second group.
Your t-test will be negative when the lowest
value category is entered into the grouping
menu first and the mean of the second group
is higher than the first group.
Paired Samples T-Test

Used when respondents have taken both a pre and post-test using the same
measurement tool (usually a standardized test).

Supplements results obtained when the mean scores for all the respondents on
the post test is subtracted from the pre test scores. If there is a change in the
scores from the pre test and post test, it usually means that the intervention is
effective.

A statistically significant paired samples t-test usually means that the change in
pre and post test score is large enough that the change can not be simply due
to random or sampling error.

An important exception here is that the change in pre and post test score must
be in the direction (positive/negative specified in the hypothesis).
Pair-samples t-test (continued)
For example if our hypothesis states that:
Participation in the welfare reform experiment is associated with a positive change in
welfare recipient wages from work and participation in the experiment actually decreased
wages, then our hypothesis would not be confirmed. We would accept the null hypothesis
and accept the alternative hypothesis.
Pre-test wages = Mean = $400 per month for each participant
Post-test wages = Mean = $350 per month for each participant.
However, we need to know the t-test value to know if the difference in means is large enough to
be statistically significant.
What are the alternative and null hypothesis for this study?
Let’s test a hypothesis for an independent
t-test



We want to know if women have higher
scores on a test of exam-related anxiety than
men.
The researcher has set the confidence level
for this study at p. = .05.
On the SPSS printout, t=2.6, p. = .03.
What are the alternative and null hypothesis?
Can we accept or reject the null hypothesis.
Answer
Alternative hypothesis:
Women have higher levels of exam-related anxiety
than men as measured by a standardized test.
Null hypothesis: There will be no difference between
men and women on the standardized test of examrelated anxiety.
Reject the null hypothesis, (p = .03 is less than the
confidence level of .05.) Accept the alternative
hypothesis. There is a relationship.
Computing a Correlation




Select Analyze
Select Correlate
Select two or more variables and click add
Click o.k.
Computing an independent t-test
Select Analyze
 Select Means
 Select Independent T-test
 Select Test (Dependent Variable - must be ratio)
 Select Grouping Variable (must be nominal – only
two categories)
 Select numerical category for each group
(Usually group 1 = 1, group 2 = 2)
Click o.k.

Computing a paired sample t-test






Select Analyze
Select Compare Means
Select Paired Samples T-test
Highlight two interval/ratio variables – should
be from pre and post test
Click on arrow
Click o.k.
Data from Paired Sample T-test
Paired Samples Statistics
Pair
1
Current Salary
Beginning Salary
Mean
$34419.6
$17016.1
N
474
474
Std.
Deviation
*********
*********
Std. Error
Mean
$784.311
$361.510
More data from paired samples t-test
Pa ired Sa mpl es Test
Paired Differences
Mean
Pair
1
Current Salary Beginning Salary
$17403.5
St d.
Deviation
*********
St d. Error
Mean
$496.732
95% Confidenc e
Int erval of t he
Difference
Lower
Upper
$16427.4
$18379.6
t
35.036
df
Sig. (2-tailed)
473
.000
Analysis of Variance (ANOVA)




Is used when you want to compare means for
three or more groups.
You have a normal distribution (random
sample or population.
It can be used to determine causation.
It contains an independent variable that is
nominal and a dependent variable that is
interval/ratio.
Download