rxy rxy • When two variables are correlated, we can predict a score on one variable from a score on the other • The stronger the correlation, the more accurate our prediction will be rxy • We need a measure of the “strength” of a correlation rxy • We need a number that gets bigger when big numbers are paired with big numbers and small numbers are paired with small numbers • We need a number that gets smaller when big numbers are paired with small numbers and small numbers are paired with big numbers rxy • Remember the height/weight example: • Big number indicates this (strong positive correlation) c d a b, e f 100 110 120 130 140 150 c d a be f 5’ 5’2 5’4 5’6 5’8 5’10 rxy • Remember the height/weight example: • Small number indicates this (strong negative correlation) c d a b, e f 100 110 120 130 140 150 f e b a d c 5’ 5’2 5’4 5’6 5’8 5’10 rxy • Two sets of scores, xi and yi • What could we do? rxy • What could we do? n (x y ) i i1 i rxy • What could we do? • When pairs are multiplied and the products are summed up: – Greatest when big numbers paired with big numbers and small numbers with small numbers – Least when small numbers are paired with big numbers and big numbers are paired with small numbers rxy • analogy: This gets you most money Pennies Quarters Loonies rxy • analogy:this gets you the least… Pennies Quarters Loonies rxy • analogy: Because: 3 x $1 plus 2 x $0.25 plus 1 x $0.01 is more than 1 x $1 plus 2 x $0.25 plus 3 x $0.01 rxy • But there’s a problem n (x y ) i i1 i Not a good measure because the value ultimately depends on n AND the size of the numbers rxy • Try this n (x y ) i i1 n i rxy • Try this n (x y ) i i1 n i Still not so good - doesn’t depend on n anymore, but does depend on size of x’s and y’s rxy • How about multiply deviation scores – comparing each variable relative to its respective mean n (x i x)(y i y) i1 n rxy • Multiply deviation scores n (x i x)(y i y) i1 n Now value depends on the spread of the data rxy • So standardize the scores (x i x) (y i y) Sx Sy i1 n n rxy • This measures strength of correlation: (x i x) (y i y) Sx Sy i1 n n n = (z z ) x i yi i1 n = rxy rxy • rxy ranges from -1.0 indicating a perfect negative correlation to +1.0 indicating a perfect positive correlation • an rxy of zero indicates no correlation whatsoever. Scores are random with respect to each other. rxy • rxy also has a geometric meaning rxy • rxy also has a geometric meaning • Recall that the mean of the zx and zy distributions is zero and each z-score is a deviation from the mean rxy • Each point lands in one of four quadrants point zx,zy zy zx rxy • notice that: n rxy = (z z ) x i yi i1 n both zx and zy are positive rxy • notice that: n rxy = zx is negative and zy is positive (z i1 n z ) x i yi rxy • notice that: n rxy = (z i1 n zx is negative and zy is negative z ) x i yi rxy • notice that: n rxy = (z z ) x i yi i1 n zx is positive and zy is negative rxy • So II III Thus if most points tend to fall around a line with a positive (45 degree) slope (I and III), the cross-products will tend to be positive I IV rxy • So II III Thus if most points tend to fall around a line with a positive (45 degree) slope (I and III), the cross-products will tend to be positive I IV If most points tend to fall around a line with a negative slope (II and IV), the cross products will tend to be negative rxy • So If the points were randomly scattered about, the negative and positive cross-products cancel Covariance • a related measure of the relationship between scores on two different variables is the covariance Sxy n i1 (x i x )(y i y ) n Covariance • notice that the variance (S2x) is the covariance between a variable and itself ! Sxy n i1 (x i x )(y i y ) n Regression • If two variables are perfectly correlated (r = + or - 1.0) then one can exactly predict a score on one variable given a score on another Regression • For example: a university charges $250 registration fee plus $100 / credit Regression • tuition = $100(X) + $250 – where X is the number of credits • Notice this is a linear relationship (an equation of the form y = ax + b – a = $100/credit – b = $250 – x = number of credits Regression • Tuition as a function of credit hours is a straight line • There is a perfect correlation between credit hours and tuition •You could predict perfectly the tuition required given the number of credit hours Next Time • Regression - read chapter 8