Regression Basics

advertisement
R Programming
Linear Regression Basics
Correlation and Linear Regression
Dependent
Variable
Independent
(predictor)
Variable
Statistical
Test
Comments
Quantitative
Categorical
T-TEST (one,
two or
paired
sample)
Determines if categorical
variable (factor) affects
dependent variable; typically
used for experimental or
planned change studies
Quantitative
Quantitative
Correlation
/Regression
Analysis
Test establishes a regression
model; used to explain, predict
or control dependent variable
Categorical
Categorical
Chi-Square
Tests if variables are statistically
independent (i.e. are they
related or not?)
Correlation
 Correlation coefficients assess strength of linear relationship between two
quantitative variables.
• The correlation measure ranges from -1 to +1.
• A negative correlation means that X and Y are inversely related.
• A positive correlation means that X and Y are directly related.
• zero correlation means that X and Y are not linearly related.
• A correlation of +1 indicates X and Y are directly related and that all
the points fall on the same straight line.
• A correlation of -1 indicates X and Y are inversely related and that all
the points fall on the same straight line
 Plot Scatter Diagram of Each Predictor variable and Dependent Variable
• Look of Departures from Linearity
• Look for extreme data points (Outliers)
 Examine Partial Correlation
• Can’t determine causality, but isolate confounding variables
Correlation
For example, lets take two variables and evaluate their
correlation…open the stats98 dataset in Excel…
What would you expect the correlation of the Verbal SAT
scores and the Math SAT scores to be? Why?
What would you expect the correlation of the Math SAT
scores and the percent taking the test to be? Why?
Correlation
What would you expect the correlation of the Verbal SAT scores and the
Math SAT scores to be? Why?
Math versus Verbal
610
590
570
550
530
510
490
470
450
450
470
490
510
530
550
570
590
610
Correlation
What would you expect the correlation of the Math SAT scores and the
Percent of HS students that took the test? Why?
Math versus Percent Took
610
590
570
550
530
510
490
470
450
0
10
20
30
40
50
60
70
80
90
Correlation
Lets pull up the UCDAVIS2 dataset in Excel…plot Ideal Height versus
Actual Height…what would you expect the correlation value to be?
Can you explain someone’s Ideal Height using their Actual Height?
IdealHt Versus Actual Height
85.00
80.00
75.00
70.00
65.00
60.00
55.00
60.00
65.00
70.00
75.00
80.00
Linear Regression
IdealHt Versus Actual Height
85.00
y = 0.8174x + 14.271
R² = 0.7372
80.00
75.00
70.00
65.00
60.00
55.00
60.00
65.00
70.00
75.00
80.00
Linear Regression
From the previous slide, the “regression line” has been imposed onto the
relationship between ideal height and height.
The equation of this line takes the general form of y=mx+b, where:
• Y is the dependent variable (ideal height)
• M is the slope of the line
• X is the independent variable (actual height)
• B is the Y-intercept.
When we discussion regression models, we transform this equation to be:
Y = bo + b1x1 + …bnxn
Where bo is the y-intercept and b1 is the slope of the line. The “slope” is
also the effect of a one unit change of x on y.
Linear Regression
From the previous slide, the model equation is presented in the form of
the equation of a line: y=.8174x +14.271.
From this, we would say:
1. For every 1 inch of change in someone’s actual height, there is a
.8174 inch change in their ideal height.
2. Everyone “starts” with 14.271 inches.
3. If someone has an actual height of 68 inches, their ideal height is
69.85 inches.
That R2 value of .7372 is interpreted as “73.72% of the change in ideal
height can be explained by a linear model with actual height as the
only predictor”.
Download