Correlation and Linear Regression

advertisement
Page 1 of 2
Correlation and Linear Regression
1001-CorrLinReg.doc
Correlation and Linear Regression
Correlation – collection of pairs of sample data – bi-variant data (having two variables).
Relationship – correlation – when one variable is related to another one in some way.
Assumptions
1. sample of paired (x, y) data is random sample.
2. (x, y) bi-variant normal distribution.
Values of both x and y are from normal distribution.
Linear correlation coefficient, r, shows the strength of the linear relationship between
paired (x, y) values in a sample.
r 
n xy 
n x 2
 x y
   x  n y
2
2
 y 
2
r = sample correlation coefficient
(rho)  = population correlation coefficient
Some characteristics of correlation coefficient:
1. –1  r  1
2. Conversion of all values of either variable to a different scale does not change rvalue.
3. r is not affected by choice of x or y.
4. r measures strength of a linear relationship.
Coefficient of Determination: r2 is proportion of variation in y that is explained by
linear relationship between x and y.
Common errors:
1. Concluding that correlation implies causality.
Lurking variable – one that affects variables being studied, but is not
included in the study.
2. Using data based on averages – suppresses individual variation and may inflate
correlation coefficient.
3. Property of linearity – linear correlation may be zero when non-linear correlation
may be very strong.
Testing significance of r, correlation coefficient.
H0:  = 0
Use t 
and
r
1  r2
n  2
H1:   0
with degrees of freedom, df = n-2, or Table A-6 for r-values.
Page 2 of 2
Correlation and Linear Regression
1001-CorrLinReg.doc
Regression Analysis
• Purpose: to determine the regression equation; it is used to predict the value of the
dependent variable (Y) based on the independent variable (X).
•
Procedure: select a sample from the population and list the paired data for each
observation; draw a scatter diagram to give a visual portrayal of the relationship;
determine the regression equation.
• the regression equation: yˆ  b0  b1 x , where:
• ŷ (y hat) is the predicted value of Y for any X.
• b0 is the Y-intercept, or the estimated Y value when X=0
• b1 is the slope of the line, or the average change in ŷ for each change of one
•
unit in X
the least squares principle is used to obtain b1 & b0 :
b1 
n xy   x  y 


n  x 2   x 
2
and b0  y  b1 x or b0 
 y b x
n
1
n
Centroid: From a collection of paired (x, y) data, the centroid is x, y  . This represents
the point designated by the mean of x-values and mean of y-values.
Download