Notes/Practice

advertisement
Quarter 4: Part I Simple Linear Regression
Section 4: Notes - Coefficient of Determination (r2) – Strength of a Linear Relationship
A fitted value ŷ is simply another name for a predicted value as it describes where a particular
x-value fits the line of best fit. It is found by substituting a given value of x into the regression
equation ŷ  b0  b1 x .
A residual denoted (e) is the difference or error between an observed observation and a
predicted or fit value. Graphically, it is the vertical distance between a point and the line of best
fit. It is found by subtracting the fitted ŷ -value from the observed y-value: e  y  yˆ
An outlier, in the regression sense, is a value with a large, in absolute value, residual.
Graphically, it is a point that falls far from the regression line, not following the pattern apparent
in the other points.
In regression, the residuals represent the natural or unexplained variation (the natural error) as
they describe the deviations about the regression line.
The coefficient of correlation r measures the strength of a relationship. r also indicates the
direction of a relationship. r is between  1  r  1 where r = 1 indicates a perfect positive
relationship and r =  1 indicates a perfect negative relationship. r = 0 indicates no relationship,
or at least no linear relationship when employing the linear model.
The coefficient of determination r 2 , like the coefficient of correlation, describes the strength of
a relationship, but has a more concrete interpretation.
SSR SST  SSE
r2 

. 0  r2 1
SST
SST
2
SST is the total sum of squared deviations about y . SST    y  y 
SSE is the total sum of squared deviations about the regression line ŷ .
SSE    y  yˆ    residual  .
2
2
SSR is the total sum of squared deviations due to regression, i.e. SSR    yˆ  y 
SSR, however, is most easily found by computing the difference SSR = SST – SSE.
2
Interpreting r2: Blank- percent of the variation in y variable is explained by the regression line.
A line of best fit or regression line is also called a least squares regression line because this line
fits the points in such a way that minimizes the error or residual terms, and hence minimizes SSE
the squared error terms.
Quarter 4: Part I Simple Linear Regression
Section 4: Notes - Coefficient of Determination (r2) – Strength of a Linear Relationship
1. A researcher would like to know if gestation period of an animal could be used to predict the
life expectancy. She collects the following data.
Animal
Cat
Chicken
Dog
Duck
Goat
Lion
Parakeet
Pig
Rabbit
Squirrel
Gestation
(days) (x)
Life
Expectancy
(years) (y)
63
22
63
28
151
108
18
115
31
44
fitted residual
y
 y  y 2
ŷ
 y  yˆ 
 y  ŷ 2
11
7.5
11
10
12
10
8
10
7
9
TOTAL
SST
SSE
(1) Fill in the chart to find SST, SSE, SSR, and ultimately the coefficient of determination r 2 .
Check your r2 with that provided by the calculator.
(2) Interpret the coefficient of determination r 2 .
2. Following are the lengths and grades of ten research papers for a sociology professor’s class
Length (pages): x 25 32 20 28 15 34 29 30 45 35
Grade:
y 69 81 72 75 64 89 84 73 92 86
(1) On the graph below, draw each residual (the vertical distance between the point and the line
of best fit)
(2) Use your calculator (LinReg) to find the least squares regression line along with the
coefficient of determination r2.
(3) Interpret the coefficient of determination.
Regression Plot
grade = 49.2756 + 0.997420 pages
S = 4.38172
R-Sq = 80.1 %
R-Sq(adj) = 77.6 %
95
grade
85
75
65
15
25
35
pages
45
Download