r xy

advertisement
rxy
rxy
• When two variables are correlated, we
can predict a score on one variable from
a score on the other
• The stronger the correlation, the more
accurate our prediction will be
rxy
• We need a measure of the “strength” of
a correlation
rxy
• We need a number that gets bigger
when big numbers are paired with big
numbers and small numbers are paired
with small numbers
• We need a number that gets smaller
when big numbers are paired with small
numbers and small numbers are paired
with big numbers
rxy
• Remember the height/weight example:
• Big number indicates this (strong positive correlation)
c
d
a
b, e
f
100 110 120 130 140 150
c
d
a
be
f
5’ 5’2 5’4 5’6 5’8 5’10
rxy
• Remember the height/weight example:
• Small number indicates this (strong negative
correlation)
c
d
a
b, e f
100 110 120 130 140 150
f
e b a
d
c
5’ 5’2 5’4 5’6 5’8 5’10
rxy
• Two sets of scores, xi and yi
• What could we do?
rxy
• What could we do?
n
 (x y )
i
i1

i
rxy
• What could we do?
• When pairs are multiplied and the
products are summed up:
– Greatest when big numbers paired with
big numbers and small numbers with small
numbers
– Least when small numbers are paired with
big numbers and big numbers are paired
with small numbers
rxy
• analogy: This gets you most money
Pennies
Quarters
Loonies
rxy
• analogy:this gets you the least…
Pennies
Quarters
Loonies
rxy
• analogy:
Because:
3 x $1 plus 2 x $0.25 plus 1 x $0.01
is more than
1 x $1 plus 2 x $0.25 plus 3 x $0.01
rxy
• But there’s a problem
n
 (x y )
i
i1
i
Not a good measure because
the value ultimately depends
on n AND the size of the
numbers
rxy
• Try this
n
 (x y )
i
i1
n
i
rxy
• Try this
n
 (x y )
i
i1
n
i
Still not so good - doesn’t
depend on n anymore, but
does depend on size of x’s
and y’s
rxy
• How about multiply deviation scores
– comparing each variable relative to its
respective mean
n
 (x
i
 x)(y i  y)
i1
n
rxy
• Multiply deviation scores
n
 (x i  x)(y i  y)
i1
n
Now value depends on the
spread of the data
rxy
• So standardize the scores
(x i  x) (y i  y)
 Sx Sy
i1
n
n

rxy
• This measures strength of correlation:
(x i  x) (y i  y)
 Sx Sy
i1
n
n
n

=
 (z
z )
x i yi
i1
n
=
rxy
rxy
• rxy ranges from -1.0 indicating a perfect
negative correlation to +1.0 indicating a
perfect positive correlation
• an rxy of zero indicates no correlation
whatsoever. Scores are random with
respect to each other.
rxy
• rxy also has a geometric meaning
rxy
• rxy also has a geometric meaning
• Recall that the mean of the zx and zy
distributions is zero and each z-score is
a deviation from the mean
rxy
• Each point lands in one of four
quadrants
point zx,zy
zy
zx
rxy
• notice that:
n
rxy =
 (z
z )
x i yi
i1
n
both zx and zy are positive

rxy
• notice that:
n
rxy =
zx is negative and zy is positive
 (z
i1
n

z )
x i yi
rxy
• notice that:
n
rxy =
 (z
i1
n

zx is negative and zy is negative
z )
x i yi
rxy
• notice that:
n
rxy =
 (z
z )
x i yi
i1
n

zx is positive and zy is negative
rxy
• So
II
III
Thus if most points tend to fall
around a line with a positive (45
degree) slope (I and III), the
cross-products will tend to be
positive
I
IV
rxy
• So
II
III
Thus if most points tend to fall
around a line with a positive (45
degree) slope (I and III), the
cross-products will tend to be
positive
I
IV
If most points tend to fall around
a line with a negative slope (II
and IV), the cross products will
tend to be negative
rxy
• So
If the points were randomly
scattered about, the negative
and positive cross-products
cancel
Covariance
• a related measure of the relationship
between scores on two different
variables is the covariance
Sxy



n
i1
(x i  x )(y i  y )
n
Covariance
• notice that the variance (S2x) is the
covariance between a variable and itself !
Sxy



n
i1
(x i  x )(y i  y )
n
Regression
• If two variables are perfectly correlated
(r = + or - 1.0) then one can exactly
predict a score on one variable given a
score on another
Regression
• For example: a university charges $250
registration fee plus $100 / credit
Regression
• tuition = $100(X) + $250
– where X is the number of credits
• Notice this is a linear relationship (an
equation of the form y = ax + b
– a = $100/credit
– b = $250
– x = number of credits
Regression
• Tuition as a function of credit hours is a
straight line
• There is a perfect
correlation between
credit hours and tuition
•You could predict
perfectly the tuition
required given the
number of credit hours
Next Time
• Regression - read chapter 8
Download