Uploaded by hewag67268

Topic 1 Intro to correlational techniques R (1)

advertisement
Topic 1. Introduction to
correlational techniques
PSYC5000
Copyright © Anna Brown 2023
Correlational research
• We cannot do without it
– Experiments cannot be conducted for all research
questions in psychology
• For ethical, practical or ecological validity reasons
• Types of correlational studies
– Survey research
• Attitude to school and attendance
– Observational research
• Attendance and school grades
– Archival research
• Grades in primary school, 11+ exam, and performance in
secondary school
2
A correlational study
• Tests for relationships between variables
– Collect data from subjects and see if the variables are
related to each other
– See if the observed relationship is statistically significant
3
Scatter diagram
• Relationship between height and weight for
N=507 adults
– Linear
– Positive
– Strong
4
Covariance
cov XY
X  X Y  Y 



N
• The degree to which two variables vary together
• For body data,
Height X̅ = 171
Weight Y̅ = 69
N = 507
cov = 89.9
(X – X̅ )
(Y – Y̅)
5
Correlation
• Standard deviation of X is the standard unit of
measurement
XX XX
– For body data,
• Height SD = 13.3
• Weight SD = 9.4
SDX 



N
• Covariance divided by the units of measurements for both
variables gives a standardised measure of the degree of
relationship - correlation
cov XY
rXY 
SDX  SDY
• Pearson product-moment correlation coefficient
– For body data, r = 0.72
6
Interpreting a correlation coefficient
• Reflects the degree of linear relationship
between two variables
• Ranges from -1 to 1
– The sign (+ or - ) reflects the direction of the
relationship
– The absolute value reflects the strength of the
relationship
7
Significance of correlation
• The correlation coefficient is a descriptive
statistic
• We may want to test if rXY is statistically
significant
– To test if it is significantly different from zero
– Null hypothesis H0: The correlation between X
and Y in the population is zero
– We want to see how likely it is to observe
correlation rXY if we sample from that population
8
Sampling from the population
• We randomly sample from population with the
correlation zero, drawing samples of size N
• We compute and record rXY for each sample
rXY = .8
rXY = .2
9
Sampling distribution of r
• The sampled correlations form a distribution
("sampling distribution")
– If H0 is true, the mean of this distribution must be
around 0
– If N is small, we may have a wide spread of values
(and a narrow spread if N is large)
• The SD of the sampling distribution is its "Standard Error“
• When X and Y are normally distributed,
– rXY divided by its Standard Error follows Student’s t
distribution with N-2 degrees of freedom
10
Testing the significance of correlation
• The t-test will provide the probability for rXY to
emerge from a population with correlation zero
– If this probability is too small, say less than p=.05, we
have to reject the Null hypothesis
– We then conclude that the correlation is significantly
different from 0
11
Computing correlation in R
• Testing significance
12
Effect size
• The correlation coefficient might be significant,
but is it important?
• Effect size for correlation (Jacob Cohen, 1988)
Small +/- .1
Medium +/- .3
Large +/- .5
• Large effect sizes in psychology are rare
• Small effect sizes might still be important
13
Reporting correlation
• “Weight was significantly and positively related to
height, r(505)= .72; p < .001.”
• You can report the degrees of freedom used for
significance testing: r(505). Otherwise, make it very
clear what your sample size was.
• Always report exact p value if it is not too small;
otherwise you can write p < .001
• If reporting in the APA style
– Do not put 0 before the decimal point
14
Correlation with dichotomous variable
• Point-biserial correlation: Pearson's correlation
between an interval variable and a dichotomy
– The category codes determine the sign of correlation!
15
Rank-order transformations
• So far, we assumed that our X and Y are interval
measures
• When at least one, X or Y, is not true equalinterval measure, we may consider ranks instead
– We rank order all values of X
– We take average rank of all tied values
X
Rank(X)
45
1
56
2.5
56
2.5
73
4
115
5
– Distribution of rank orders is uniform (no need to
assume normal distribution for X)
16
Non-parametric correlations
• Correlation computed from the ranks of X and Y,
not the actual data
• Spearman’s rho 
– ‘Non-parametric’ correlation
– Use for data that is not equal-interval
– Less sensitive to outliers* (next slide), so may be used
for interval data with outliers too
• Kendall’s tau-b

– should be used when there are many tied ranks, and
the sample is small (more accurate than Spearman’s
rho in these conditions)
17
Influence of Outliers
• Subjects with very unusual values on one or both
measures, or unusual combination of measures
– may unduly influence the correlation coefficient
18
Correlation and Restriction of range
• Verbal and numerical reasoning test scores correlate
positively
• If we pre-selected employees with only high test scores, we
would restrict the range of values on which correlation is
computed...
r = .43
r = .28
19
Recommended reading
• Field, Miles & Field (2012)
– Chapter 1 (section 1.6.1 – correlational methods);
– Chapter 6 “Correlation” (sections 6.1-6.5; 6.8; 6.9)
• Additional / optional
– Howell 8th edition - chapter 9 (sections 5 and 12);
chapter 10 (section 3)
20
Revision guide
• Understand the advantages and challenges of correlational research
• What is covariance; understand how to interpret the sign of the
covariance
• Pearson product-moment correlation coefficient; its range, sign and
interpretation
• Testing significance of the correlation coefficient (the Null hypothesis; the
sampling distribution; the degrees of freedom)
• Understand the difference between statistical significance and effect size
of correlation. Know correlation effect sizes (small, medium, large)
• Understand what the point-biserial correlation is and how it is computed.
• Understand the basic idea of Spearman’s rho and when it is used instead
of Pearson correlation. Also Kendall’s tau-b.
• Problems in interpretation of correlation: restriction of range, influence of
outliers.
21
Download