Module 15: Correlation Assignment: 15.2, 15.7 March 13, 2009 1 Correlation We learn today about correlation between 2 random variables. Correlation is a number between -1 and 1 that shows how strongly 2 variables are related. The procedure PROC CORR produces descriptive statistics for all pairs of variables listed in the VAR statement. It also computes a p-value for testing whether the true population correlation ρ = 0. Example 15.1 We want to see if there is a relationship between test grades in the file grades.dat. filename datain ’Grades.dat’; data one; infile datain; input id $ gender $ class quiz exam1 exam2 lab final; run; proc sort data=one; by gender; run; proc corr data=one; var exam1 exam2 final; by gender; run; As we see on the output page, there are some descriptive statistics for each variable, as well as the correlation coefficient between each pair of variables. It appears that there is a positive correlation between the variables. Notice we used the PROC SORT command to sort the data by gender. You also need to add a ’by gender’ statement after the ’var’ line in the PROC CORR command. 1