Topic 1. Introduction to correlational techniques PSYC5000 Copyright © Anna Brown 2023 Correlational research • We cannot do without it – Experiments cannot be conducted for all research questions in psychology • For ethical, practical or ecological validity reasons • Types of correlational studies – Survey research • Attitude to school and attendance – Observational research • Attendance and school grades – Archival research • Grades in primary school, 11+ exam, and performance in secondary school 2 A correlational study • Tests for relationships between variables – Collect data from subjects and see if the variables are related to each other – See if the observed relationship is statistically significant 3 Scatter diagram • Relationship between height and weight for N=507 adults – Linear – Positive – Strong 4 Covariance cov XY X X Y Y N • The degree to which two variables vary together • For body data, Height X̅ = 171 Weight Y̅ = 69 N = 507 cov = 89.9 (X – X̅ ) (Y – Y̅) 5 Correlation • Standard deviation of X is the standard unit of measurement XX XX – For body data, • Height SD = 13.3 • Weight SD = 9.4 SDX N • Covariance divided by the units of measurements for both variables gives a standardised measure of the degree of relationship - correlation cov XY rXY SDX SDY • Pearson product-moment correlation coefficient – For body data, r = 0.72 6 Interpreting a correlation coefficient • Reflects the degree of linear relationship between two variables • Ranges from -1 to 1 – The sign (+ or - ) reflects the direction of the relationship – The absolute value reflects the strength of the relationship 7 Significance of correlation • The correlation coefficient is a descriptive statistic • We may want to test if rXY is statistically significant – To test if it is significantly different from zero – Null hypothesis H0: The correlation between X and Y in the population is zero – We want to see how likely it is to observe correlation rXY if we sample from that population 8 Sampling from the population • We randomly sample from population with the correlation zero, drawing samples of size N • We compute and record rXY for each sample rXY = .8 rXY = .2 9 Sampling distribution of r • The sampled correlations form a distribution ("sampling distribution") – If H0 is true, the mean of this distribution must be around 0 – If N is small, we may have a wide spread of values (and a narrow spread if N is large) • The SD of the sampling distribution is its "Standard Error“ • When X and Y are normally distributed, – rXY divided by its Standard Error follows Student’s t distribution with N-2 degrees of freedom 10 Testing the significance of correlation • The t-test will provide the probability for rXY to emerge from a population with correlation zero – If this probability is too small, say less than p=.05, we have to reject the Null hypothesis – We then conclude that the correlation is significantly different from 0 11 Computing correlation in R • Testing significance 12 Effect size • The correlation coefficient might be significant, but is it important? • Effect size for correlation (Jacob Cohen, 1988) Small +/- .1 Medium +/- .3 Large +/- .5 • Large effect sizes in psychology are rare • Small effect sizes might still be important 13 Reporting correlation • “Weight was significantly and positively related to height, r(505)= .72; p < .001.” • You can report the degrees of freedom used for significance testing: r(505). Otherwise, make it very clear what your sample size was. • Always report exact p value if it is not too small; otherwise you can write p < .001 • If reporting in the APA style – Do not put 0 before the decimal point 14 Correlation with dichotomous variable • Point-biserial correlation: Pearson's correlation between an interval variable and a dichotomy – The category codes determine the sign of correlation! 15 Rank-order transformations • So far, we assumed that our X and Y are interval measures • When at least one, X or Y, is not true equalinterval measure, we may consider ranks instead – We rank order all values of X – We take average rank of all tied values X Rank(X) 45 1 56 2.5 56 2.5 73 4 115 5 – Distribution of rank orders is uniform (no need to assume normal distribution for X) 16 Non-parametric correlations • Correlation computed from the ranks of X and Y, not the actual data • Spearman’s rho – ‘Non-parametric’ correlation – Use for data that is not equal-interval – Less sensitive to outliers* (next slide), so may be used for interval data with outliers too • Kendall’s tau-b – should be used when there are many tied ranks, and the sample is small (more accurate than Spearman’s rho in these conditions) 17 Influence of Outliers • Subjects with very unusual values on one or both measures, or unusual combination of measures – may unduly influence the correlation coefficient 18 Correlation and Restriction of range • Verbal and numerical reasoning test scores correlate positively • If we pre-selected employees with only high test scores, we would restrict the range of values on which correlation is computed... r = .43 r = .28 19 Recommended reading • Field, Miles & Field (2012) – Chapter 1 (section 1.6.1 – correlational methods); – Chapter 6 “Correlation” (sections 6.1-6.5; 6.8; 6.9) • Additional / optional – Howell 8th edition - chapter 9 (sections 5 and 12); chapter 10 (section 3) 20 Revision guide • Understand the advantages and challenges of correlational research • What is covariance; understand how to interpret the sign of the covariance • Pearson product-moment correlation coefficient; its range, sign and interpretation • Testing significance of the correlation coefficient (the Null hypothesis; the sampling distribution; the degrees of freedom) • Understand the difference between statistical significance and effect size of correlation. Know correlation effect sizes (small, medium, large) • Understand what the point-biserial correlation is and how it is computed. • Understand the basic idea of Spearman’s rho and when it is used instead of Pearson correlation. Also Kendall’s tau-b. • Problems in interpretation of correlation: restriction of range, influence of outliers. 21