correlation

Inference for correlation Definition of covariance For a pair of random variables x and y, one can define the covariance of the variables as the average (expected value) product of deviations from mean, or  x, y  covx, y   Ex   x  y   y .   Properties of covariance  The covariance of a variable with itself is just its variance, because when x is identical to y ( x  y ), then covx, x   Ex   x 2   x2  For given variances  x2 and  y2 , the greatest covariance,  x, y   x y , occurs when y and x are directly linearly related, and the least,  x, y   x y , when they are inversely linearly related.  For the covariance to equal zero, it is sufficient, but not necessary, that they be statistically independent, because then Ex   x  y   y  Ex   x   E y   y  0  0     Definition of correlation The covariance, then, is a measure of the extent of linear relation, but depends on the scale of measurement of the variables. A scale-free measure is had by dividing by the extreme value, and is called  x, y . It follows that the correlation ranges between the values 1  x y and +1, the extreme values corresponding to exact linear relation (direct and inverse, respectively). the correlation, or  x, y  corrx, y   Point Estimation An estimate of the correlation can be constructed from sample analogues of the population parameters: xi  x  yi  y  s x, y rx, y   (the term n  1 dividing the numerator and denominator cancels sx s y xi  x 2  yi  y 2    out). This estimator has a bias in small samples, but can be used for inference about correlation in the population. Testing For testing the null hypothesis of no correlation, one can transform r to obtain a test statistic, namely r n2 t ~ t n  2  , which follows a t-distribution with n  2 degrees of freedom under the null 2 1 r hypothesis. As usual, the null hypothesis is rejected for large enough absolute values of t, according to the desired probability of type I error. This t statistic turns out to be identical to the statistic used for testing for zero slope in the regression model (either x or y can play the role of independent variable!) Thus, regression software can be used to perform this test even though designed for a different statistical model. If hypotheses other than the value zero are to be tested, say    0 , another transformation of r yields an 1 1   1 1 r   approximately normal variable in moderately sized samples: w  ln   . If we let   ln  2  1    2 1 r  1   be the correspondingly transformed population correlation, then w  N  , , i.e. approximately n  3   w unbiased with variance n  3 , and so a standard normal test statistic is formed as z  . The n3 critical (rejection) region of the test is determined as usual according to the direction of the test (alternative) and the probability of type I error. Confidence interval for  Confidence intervals for  are obtained by back-transforming the corresponding interval for . This latter interval is constructed as w  z1 , or   e 2  1 e 2  1 2 1 n3 . To get the inverse transformation, we solve for  in terms of , and apply this transformation to the upper and lower confidence limits, respectively; that is, the upper confidence limit for  is obtained by substituting w  z1 2 1 n3 for , and similarly for the lower limit. The resulting interval for  is not symmetrical unless w happens by chance to be zero.

correlation

Related documents

Products

Support

correlation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib