Derivation of the Ordinary Least Squares Estimator

advertisement
1
Derivation of the Ordinary Least Squares Estimator
Simple Linear Regression Case
As briefly discussed in the previous reading assignment, the most commonly used
estimation procedure is the minimization of the sum of squared deviations. This procedure is
known as the ordinary least squares (OLS) estimator. In this chapter, this estimator is derived for
the simple linear case. The simple linear case means only one x variable is associated with each
y value.
Simple Linear Regression
Error Defined
The simple linear regression problem is given by the following equation
(1)
y i  a  bx i  u i
i  1, 2, , n
where yi and xi represent paired observations, a and b are unknown parameters, ui is the error
term associated with observation i, and n is the total number of observations. The terms
deviation, residual, and error term are often used interchangeably in econometric analysis. More
correct usage, however, is to use error term to represent an unknown value, residual, deviation,
or estimated error term to represent a calculated value for the error term. The term “a” represents
the intercept of the line and “b” represents the slope of the line. One key assumption is the
equation is linear. Here, the equation is linear in y and x, but the equation is also linear in a and
b. Linear simply means nonlinear terms such as squared or logarithmic values for x, y, a, and b
are not included in the equation. The linear in x and y assumption will be relaxed later, but the
equation must remain linear in a and b. Experience suggests this linear requirement is an
obstacle for students’ understanding of ordinary least squares (see linear equation review box).
You have three paired data (x, y) points (3, 40), (1, 5), and (2, 10). Using this data, you
wish to obtain an equation of the following form
(2)
y i  a  bx i  u i
where i denotes the observation, a and b are unknown parameters to be estimated, xi is the
independent variable, yi is the dependent variable, and ui is the error term associated with
observation i.
Simple linear regression uses the ordinary least squares procedure. As briefly discussed
n
in the previous chapter, the objective is to minimize the sum of the squared residual,
 û
t 1
2
i
. The
idea of residuals is developed in the previous chapter; however, a brief review of this concept is
presented here. Residuals are how far the estimated y (using the estimated y-intercept and slope
parameters) for a given x is from an observed y associated with the given x. By this definition,
residuals are the estimated error between the observed y value and the estimated y value.
2
Included in this error term is everything affecting y, but not included in the equation. In the
simple linear regression case, only x is included in the equation, the error term includes all other
variables affecting y. Graphically, residuals are the vertical distance (x is given or held constant)
from the observed y and the estimated equation. Residuals are shown graphically in figure 1 (as
before figures are at the end of this reading assignment). In this figure, residuals one and two are
positive, whereas, residual three is negative. For residual one, the observed y value is 40, but the
estimated y value given by the equation is 35.83. The residual associated with this data point is
4.17. Recall, residuals are calculated by subtracting the estimated value from the observed value
(40 - 35.83). In general terms, residuals are given by the equation û i  y i  ŷ i where the hat
symbol denotes an estimated y value (the value given by the equation). Note the subscript on û i ,
as before this subscript denotes observation i. An estimated error is calculated for each
observation. This allows the information in each observation to be used in the estimation
procedure. Recall, from earlier readings, this is an important property of OLS.
To recap, using the above example, we have the following information, 1) paired
observations on y and x, 2) the total number of observations is n, 3) a simple functional form
given by yi = a + bxi + ui, 4) a and b are unknown parameters, but are fixed (that is they do not
vary by observation), 5) a definition for an error term, 6) the objective to minimize the sum of
squared residuals, 7) one goal to use all observations in the estimation procedure, and 8) we will
never know the true values for a and b.
Minimize Sum of Squared Errors
To minimize the sum of squared residuals, it is necessary to obtain û i for each
observation. Estimated residuals are calculated using the estimated values for a and b. First,
using the estimated equation ŷ i  â  b̂x i , (recall the hat denotes estimated values) an estimated
value for y is obtained for each observation. Unfortunately, at this point we do not have values
for â and b̂ . As noted earlier, residuals are calculated as û i  y i  ŷ i . At this point, we have
values for all yi’s.
The sum of squared residuals can be written mathematically as
û 12  û 22  û 32    û 2n
(3)
n
  û i2
i 1
where n is the total number of observations and ∑ is the summation operator. The above
equation is known as the sum of squared residuals (sum of squared errors) and denoted SSE.
Using the definitions of û i and ŷ i , the SSE becomes
(4)
n
n
n
n
i 1
i 1
i 1
i 1
SSE   û i2   ( y i  ŷ i ) 2   [ y i  (â  b̂x i )] 2   ( y i  â  b̂x i ) 2
where first the definition for û i is used, then the definition for ŷ i , and finally some algebra.
3
Up to this point, only algebra has been used to provide a definition for residuals and
provide an equation for the sum of squared residuals. Because the objective is to minimize the
sum of squared residuals, a procedure is necessary to achieve this objective. Recall from the
calculus review, a well-behaved function can be maximized or minimized by taking the first
derivatives and setting them equal to zero. Second order conditions are then checked to
determine if a maximum or minimum has been found.
An equation of the objective function for the OLS estimation procedure is:
n
min SSE   ( y i  â  b̂x i ) 2
(5)
i 1
with respect to (w.r.t.) â and b̂ .
KEY POINT: Students at this point often face a mental block. Minimizing this equation w.r.t.
â and b̂ is no different than minimizing equations seen in the review of calculus. The mental
block occurs because earlier, the minimization was w.r.t. x and not the parameters, â and b̂ .
Calculus does not depend on what the variable of interest is called. Do not change the procedure
and ideas, just because the variables of interest are now called â and b̂ and not x. This is simply
a nomenclature issue.
Estimates for a and b are obtained by minimizing the SSE as follows 1) take the first
order partial derivatives w.r.t. â and b̂ , 2) set the resulting equations equal to zero, 3) solve the
first order conditions (FOC) for â and b̂ , and 4) check the second order conditions for a
maximum or minimum. Notice this procedure is no different than the procedure reviewed
earlier. Taking the first order partial derivatives w.r.t. â and b̂ and setting them equal to zero, the
following two equations are obtained:
SSR
(6)
SSR
â
n
n
i 1
i 1
  [2 ( y i  â  b̂x i ) (1)]    2( y i  â  b̂x i ) 0, and
n
n
i 1
i 1
b̂   [2 ( y i  â  b̂x i ) ( x i )]    2x i ( y i  â  b̂x i ) 0.
Two equations with two unknowns, â and b̂ , are obtained from the first order conditions.
Solving these equations for â and b̂ one obtains the following expressions for the values of
â and b̂ that minimize the SSE. From the first order conditions (equation 6), the following
general expressions are obtained for the OLS estimators (see tables 1 and 2 for steps involved):
4
â  y  b̂x
(7)
b̂ 
n
n
n
i 1
i 1
n
i 1
n x i yi   x i  yi
n
.
n  x i2  ( x i ) 2
i 1
i 1
Table 1. Algebraic Steps Involved in Obtaining the OLS Estimator, â , from the
FOC Conditions
Mathematical Deviation
Step involves
Original FOC
 2( y  â  b̂x ) 0

 ( y  â  b̂x )  0
 y   â   b̂x  0
Divide both sides by -2
y
Summation over a constant
i
i
i
i
i
i
Distribute the summation operator
i
 n â  b̂ x i  0
nâ   y i  b̂ x i
Subtraction
1
1
y i  b̂ x i

n
n
â  y  b̂x
Divide by n
â 
Definition of a mean
Table 2. Algebraic Steps Involved in Obtaining the OLS Estimator, b̂ , from the
FOC Conditions
Mathematical Deviation
Step involves
Original FOC
  2x i ( y i  â  b̂x i ) 0
Divide both sides by -2
 x ( y  â  b̂x )  0
i
i
i
 x y   âx   b̂x x
i
x y
x y
i
i
i
i
i
i
0
 â  x i  b̂ x i2  0
Distribute the summation operator
and xi
Summation over a constant
1
1
Substitute the definition for the â
y i  b̂ x i ] x i  b̂ x i2  0

n
n
Use the distributive law and then
1
1
 x i y i  n  y i  x i  b̂[ n  x i  x i   x i2 ]  0 factor out b̂ , careful with the
negative and positive signs
1
Solve for b̂
 x i yi  n  yi  x i
b̂ 
1
  x i  x i   x i2
n
i
i
[
5
b̂ 
n x i y i   x i  y i
Simplify
n  x i2  ( x i ) 2
Example. At this point, returning to the example will help clarify the procedure. There are three
paired observations on x and y, (3, 40), (1, 5), and (2, 10). Using these data points, the SSE can
be written as
(8)
3
3
3
i 1
i 1
i 1
SSE   û i2   ( y i  ŷ i ) 2   ( y i  â  b̂x i ) 2
 (40  â  b̂(3))  (5  â  b̂(1)) 2  (10  â  b̂(2)) 2
2
Taking the partial derivative of equation (7) w.r.t. â and b̂ and setting the resulting equations
equal to zero, the following first order conditions are obtained:
SSR
(9)
SSR
â
b̂
 2(40  â  b̂(3))  2(5  â  b̂)  2(10  â  b̂(2))  0
 6(40  â  b̂(3))  2(5  â  b̂)  4(10  â  b̂(2))  0.
Solving these two equations for the two unknowns, â and b̂ , one obtains the OLS estimates for a
and b given the three paired observations. The first equation in equation 9 can be simplified as:
(10)
 80  2â  6b̂  10  2â  2b̂  20  2â  4b̂  0
 110  6â  12b̂  0.
Similarly, the second equation in equation (6) can be simplified as:
(11)
 240  6â  18b̂  10  2â  2b̂  40  4â  8b̂  0
 290  12â  28b̂  0
One way of solving these two equations is to multiplying equation (10) by 2 and then setting
equations (10) and (11) equal to each other. Another way is to solve one equation for b̂ or â
and substitute into the other equation. Both methods will give the same answer. Solving the
equations, one obtains the OLS estimate for a (note both equations are equal to zero and
multiplying both sides by two will not affect the equality):
 290  12â  28b̂  0  2(110  6â  12b̂)  220  12â  24b̂
 70  4b̂  0
b̂  70  17.5
4
6
Substituting this result into either equation (10) or (11), the OLS estimate for a is obtained:
 110  6â  12(70 )  0
4
110  3 * 70
â 
 100 / 6  16.67.
6
The OLS estimates for this example are, therefore, â  16.67 and b̂  17.5 . Given the original
simple linear equation and the three data points, an intercept of -16.67 and a slope of 17.5 are
obtained; giving the following estimated line ŷ  16.67  17.5x .
The following information can be in the general formulas to obtain the OLS estimates.
These general formulas, avoid having to solve the FOC using the data observations.
yi
xi
xiyi
x2i
40
3
120
9
5
1
5
1
10
2
20
4
Summation
55
6
145
14
Mean
18.333 2
Using this information, the following equations are obtained:
n
(12)
b̂ 
n
n
i 1
n
i 1
n x i yi   x i  yi
i 1
n
n  x i2  ( x i ) 2
i 1

3(145)  55(6) 435  330 105


 17.5
42  36
6
3(14)  (6) 2
i 1
â  y  b̂x  18.333  17.5(2)  18.333  35  16.67
Using the estimated equation, estimated errors for each observation are:
û 1  40  (16.67  17.5(3))  40  35.83  4.17
(12)
û 2  5  (16.67  17.5(1))  5  .83  4.17
û 3  10  (16.67  17.5(2))  10  18.33  8.33.
Second Order Conditions
As with all maximization and minimization problems, the second order conditions (SOC)
need to be checked to determine if the point given by the FOC conditions is a maximum or
minimum. The OLS problem will provide a minimum. Intuitively, a minimum is achieved
because the problem is to minimize the sum of squares. Squaring each residual forces each term
to be nonnegative. The problem becomes then minimizing the sum of nonnegative numbers.
Because all numbers are nonnegative, the smallest possible sum is zero. A zero sum occurs
when all residuals equal zero. This would be a perfect between the line and the data points. In
empirical studies, a perfect will not occur. A residual that is positive will add to the sum of the
7
squares. Thus, the sum of squared residuals must equal a zero or a positive number. This
intuitive explanation indicates the problem will provide a minimum.
In figure 2, the general minimization problem is shown graphically (note, in the figure the
intercept is b1 instead of a and the slope parameter is b2, do not let the change in notation confuse
you. An unknown variable can be called anything. This is the only good figure I could find).
For different estimates for a and b, the SSE is graphed. Notice the SSE varies as the estimates
for the intercept and slope change. Of importance here is the shape of the minimization problem.
Notice, the bowl shaped function. Such a shape assures the SOC will be satisfied.
Mathematically, the FOC for minimization of the SSE finds the lowest point on the SSE graph.
The SOC confirm the point is a minimization.
To use the simple second order condition test from the calculus review section of the
class, the second and cross partial derivatives are necessary. Recall, the incomplete test required
for a minimum the second order partials to be greater than zero and the two cross partials
multiplied together must be greater than the cross partials squared. The necessary partial
derivatives are:
 2 SSR
(13)
 2 SSR
 2 SSR
n
â
2
  2  2n  0.
i 1
n
b̂ 2
n
  2x i2  2 x i2  0
âb̂
i 1
i 1
n
n
i 1
i 1
  2 x i  2 x i .
The second order partials are greater than zero, because only nonnegative numbers are involved.
An observation on the variable, x, maybe negative, but squaring x results in a positive number.
Further, it can be shown (although not intuitively) that 2n(2 x i2 )  (2 x i ) 2 . This equation
holds partially because one side of the equation is multiplied by the number of observations.
Example Continued. Continuing the example, it is necessary to check the second order
conditions. Recall, the three-paired observations on x and y are (3, 40), (1, 5), and (2, 10).
Using these data points, the SOC are:
 2 SSR
 2 SSR
 SSR
n
â
2
  2  2n  2(3)  6  0.
i 1
b̂ 2
2
n
n
i 1
i 1
  2x i2  2 x i2  2(3 2  12  2 2 )  28  0
( SSR
2
â
2
b̂
2
2
)  ( SSR
âb̂
) 2  6(28)  (2(3  1  2)) 2  168  144
Therefore, the second order conditions hold and the OLS estimates minimized the SSE w.r.t.
â and b̂ .
8
In figure 3, the minimization problem is shown graphically for our simple threeobservation example. For different estimates for a and b, the SSE is graphed. Notice the SSE
varies as the estimates for a and b change. As the estimates for a and b move away from the
OLS estimates of -16.67 and 17.5, the SSE increases.
KEY POINT: although often seen as using new ideas, the derivation of the OLS estimator uses
only simple algebra and the idea of minimization of a quadratic function. This is material that
was covered in the prerequisites for this class and reviewed in previous lectures. What is new is
the combining of the subject matters of economics (for the problem), algebra (model set up), and
calculus (minimization) to obtain the OLS estimates for a simple linear equation. Another point
other seen as a problematic is that in previous classes, algebra was used to find x and y with a
and b as given constants. In econometrics, the x’s and y’s are given constants and estimates for a
and b are obtained.
KEY POINT: two assumptions are implicit in the previous derivation of the OLS estimator.
First, the original problem assumed the equation to be estimated was linear in a and b. Second, it
was assumed the FOC could be solved. Neither assumption is particularly restrictive. Notice,
the assumptions say nothing about the statistical distribution of the estimates, just that we can get
the estimates. One reason OLS is so powerful is that estimates can be obtained under these fairly
unrestrictive assumptions. Because the OLS estimates can be obtained easily, this also results in
OLS being misused. The discussion will return to these assumptions and additional assumptions
as we continue deriving the OLS estimator.
Algebraic Properties of the OLS Estimator
Several algebraic properties of the OLS estimator are shown here. The importance of
these properties is they are used in deriving goodness-of-fit measures and statistical properties of
the OLS estimator.
Algebraic Property 1. Using the FOC w.r.t. â it can be shown the sum of the residuals is equal
to zero:
  2( y
(14)
i
 â  b̂x i )  0
i
n
 2 û i  0   û i  0.
i
i 1
Here, the first equation is the FOC derived earlier. The definition for the residuals
(û i  y i  â  b̂x i ) is substituted into the FOC equation. The constant, -2, is then removed from
the summation. Because the constant 2 does not equal zero, the only way the FOC can equal
zero is the sum of the residuals equal zero. The sum of the residuals equally zero, implies the
mean of the residuals must also equal zero.
9
Algebraic Property 2. The point ( y, x ) will always be on the estimated line. Using the FOC
w.r.t â , this property can be shown by dividing both sides by -2, distributing the summation
operator through the equation, divided both sides by the number of observations, and then
simplifying by using the definition of a mean. Mathematically this property is derived as
follows,
  2( y
i
 â  b̂x i )  0
i
y 
â   b̂x i   y i  nâ  b̂ x i 0
i
i
y
(15)
i
i
i
 â  b̂
x
n
n
y  â  b̂x  0
i
i

i
0
n
y  â  b̂x.
Algebraic Property 3. The sample covariance between the xi and û i is equal to zero. Recall, the
formula for calculating the covariance between any two variables, z and q is
n
cov 
 (z
i 1
i
 z )(q i  q )
. To show this property, the FOC w.r.t. b̂ is used. By substituting in the
n 1
definition for the residuals, the FOC can be simplified to:
  2( y
(16)
i
 â  b̂x i ) x i  0
i
 û x
i
i
0
i
Substituting this result, along with the algebraic property 1, the sum and mean of the residuals
equal zero into the covariance formula, algebraic property 3 is derived:
cov( x , û ) 
(17)
 (x
 x )( û i  û )

n 1
 x i u i   x i û   xû i   xû
i
n 1
0  0  x i  x 0  nx 0

x u
i
i
 û  x i  x  û i  nxû
n 1

 0.
n 1
This shows the covariance between the estimated residual and the independent variable is equal
to zero.
Algebraic Property 4. The mean of the variable, y, will equal the mean of the ŷ . Distributing
the summation operator through the definition of the estimated residual, using algebraic property
10
1, and the definition of a mean are used to show this property. Mathematically, this is property is
derived as:
û i  y i  ŷ i
y i  ŷ i  u i
(18)
 y   ŷ   û
 y   ŷ
i
i
i
n
y  ŷ.
i
.
i
n
Example Continued. Using the previous three-observation example, it can be shown the four
algebraic properties hold.
Algebraic Property 1. Using the residuals calculated earlier, the sum of the residuals is SSE =
4.17 + 4.17 - 8.33 = .01 which within rounding error equals zero.
Algebraic Property 2. Substituting in the mean of the x’s, 2, into the estimated equation, one
obtains y = -16.65 + 17.5 (2) = 18.33, which equals the mean of the y variables, 18.33 = (40 + 5
+ 10)/3.
Algebraic Property 3. The covariance between x and û is given by
(3  2)( 4.17  0)  (1  2)( 4.17  0)  (2  2)( 8.33  0)

3 1
1(4.17)  1(4.17)  0(8.33)
 0.
2
Thus, property 3 holds.
Algebraic Property 4. Using earlier calculations for obtaining the estimated y’s, the mean of the
estimated y’s can be obtained using the equation (35.83 + 0.83 + 18.33)/3 = 18.33. Thus, the
mean of y and ŷ are equal.
KEY POINT: in deriving these algebraic points, subject matter (definitions of mean and
covariance) that pertains to statistics has been added to the mix of economics, calculus, and
algebra.
Goodness-of-Fit
Up to this point, nothing has been stated about how “good” the estimated equation fits the
observed data. In this section, one measure of the goodness-of-fit is presented. This measure is
11
called the coefficient of determination or R2. The coefficient of determination measures the
amount of the sample variation in y that is explained by x.
To derive the coefficient of determination, three definitions are necessary. First, the total
sum of squares (SST) is defined as the total variation in y around its mean. This definition is
very similar to that of a variance. SST is defined as
n
(19)
SST   ( y i  y) 2 .
i 1
Notice, this formula is the same, as the formula for a variance except the variation is not divided
by the degrees of freedom. We will return to this point later in the lectures. The second need is
the explained sum of squares (SSR) or sum of squares of the regression (where the R comes
from). SSR is simply the variation of the estimated y’s around their mean. SSR is defined as:
n
(20)
SSR   ( ŷ i  y) 2 .
i 1
Recall, the actual values for y and the estimated values for y have the same mean. Therefore,
equations (19) and (20) are the same except for actual or estimated y’s are used. The means are
the same. The third definition is the residual sum of residuals (errors) (SSE), which was earlier
defined as:
n
(21)
SSE   û i2 .
i 1
SSE is the amount of variation not explained by the regression equation.
It be shown the total sum of the variation in y around its mean is equal to the amount of
variation in y around its mean plus the amount of variation not explained. Mathematically, this
statement is SST = SSR + SSE. To show this equation holds, algebraic properties (1) and (3)
derived earlier must be used. Using these two properties, and expanding the SST equation, the
necessary steps to show this equation holds are shown in Table 3.
Table 3. Algebraic Steps Involved in Showing the Equation SST = SSR + SSE Holds
Mathematical Deviation
Step involves
2
Original equation
SST   ( y i  y)
 [( y i  ŷ i )  ( ŷ i  y)] 2
Add ŷ i and subtract ŷ i
 [û i2  2û i ( ŷ  y)  ( ŷ  y) 2 ]
Expand
 [û i  ( ŷ i  y)] 2
  û i2  2 û i ( ŷ i  y)   ( ŷ i  y)
 SSE  2 û i ( ŷ i  y)  SSR
Use the definition û i  y i  ŷ i
2
Distribute the summation operator
Using sum of square definitions
12
?
2 û i ( ŷ i  y)  0
Need to show middle term equals
zero
?
2 [û i (â  b̂x i  y)  2 [û i â  û i b̂x i  û i y]  0
Use definition ŷ i  â  b̂x i
?
â  û i  b̂ û i x i  y û i  0
Divide by 2 and distribute the
summation operator
Using algebraic properties 1 and 3
â 0  b̂0  y0  0
SST = SSR + SSE
Equation shown
From the equation SST = SSR + SSE, the coefficient of determination can be derived.
Taking this equation and dividing both sides by SST one obtains:
(22)
SST
SST
 SSR
SST
 SSE
SST
 1.
This equation is equal to one because any number divided by itself equals one. Rearranging this
equation the coefficient of determination is obtained:
(23)
R 2  SSR
SST
 1  SSE
SST
.
As shown in this equation, the coefficient of determination, R2, is the ratio of the amount
of variation explained to the total variation in y around its mean, or equivalently, the one minus
the ratio of the amount of variation not explained to the total variation. Thus, R2 measures the
amount of sample variation in y that is explained by x. R2 can range from zero (no fit) to one
(perfect fit). If x explains no variation in y, the SSR will equal zero. Looking at equation (23), a
zero for SSR gives a value of zero for R2. On the other hand, if x explains all the variation in y,
SSR will equal SST. In this case, R2 equals one. The values of [0 - 1] are just the theoretical
range for the coefficient of determination. One will not usually see either of these values when
running a regression.
The coefficient of determination (and its adjusted value discussed later) is the most
common measure of the fit of an estimated equation to the observed data. Although, the
coefficient of determination is the most common measure, it is not the only measure of the fit of
an equation. One needs to look at other measures of fit, that is don’t use R2 as your only gauge
of the fit of an estimated equation. Unfortunately, there is not a cutoff value for R2 that gives a
good measure of fit. Further, in economic data it is not uncommon to have low R2 values. This
is a fact of using socio-economic cross-sectional data. We will continue the discussion on R2
later in this class, when model specification is discussed.
Example Concluded. To finish our example, we need to calculate SST, SSR, and SSE, along
with the coefficient of determination. Using the residuals calculated earlier, SSE equals
SSE = (4.17)2 + (4.17)2 + (-8.33)2 = 104.17.
13
The estimated y values calculated and means earlier are used to obtain SSR:
SSR = (35.83 - 18.33)2 + (0.83 - 18.33)2 + (18.33 - 18.33)2 = 612.5.
Using the actual observations on y, SST equals:
SST = (40 - 18.33)2 + (5 - 18.33)2 + (10 - 18.33)2 = 716.67.
Thus, the coefficient of determination is
R2 = 612.5 / 716.67 = 0.85 or
R2 = 1 - (104.17 / 716.67) = 0.85.
In our example, the amount of variation in y around its mean explained by the estimated equation
is 85%.
Important Terms / Concepts
Error Term
Residual
Deviation
Estimated error term
Hat symbol
Sum of squares
SSR
SST
SSE
Why OLS is powerful?
Why OLS is misused?
Four Algebraic properties
Goodness-of-fit
R2 - Coefficient of Determination
Range of R2
n
i
14
Linear Equation Review
Linear equations are simply equations for a line. Recall, from algebra, an equation
given by y = α + βx has a y-intercept equal to α and a slope equal to β. The y-intercept
gives the point where the line crosses the y-axis. If the intercept is 10, the line will cross
the y-axis at this value. Crossing the y-axis indicates a x-value of zero, therefore, the (x, y)
point (0, 10) is associated with this line. The slope indicates the rise and run of the line. A
slope equal to five indicates y increases by five units (the line rises five units) for each oneunit increase in x (a one unit increase in x). A positive slope indicates the line is upward
sloping, whereas a negative slope indicates a downward sloping line. Which of the
following equations are linear in x and y?
y  4  3x
y  a  6x  3x 2
y  a  5 ln x
Only the first equation is linear in y and x. The second equation contains an x-squared
term. This equation is a quadratic equation. The third equation is also is not a linear
equation in x and y; here the natural logarithm of x is in the equation. This equation is
commonly known as a semi-log equation. However, each equation remains linear in the yintercept and slope parameters.
Of the following equations, which one(s) are linear in the y-intercept and slope parameters?
y  5  2 x
y  x
y  ln   ln x
Here, none of the equations is linear in the y-intercept and slope parameters. Each equation
contains a nonlinear component associates with at least one of the parameters. Linear in
parameters is given by
y    x.
As a final example, the following two equations are linear in the y-intercept and slope
parameters (α and β), but nonlinear in x
y     ln x
y    x 2 .
15
45
Estimated
Observed
40
}u 1
35
30
25
20
}
15
u3
10
u2{
5
0
0
0.5
1
1.5
2
2.5
3
Figure 1. Relationship Between Estimated Equation and
Observed Value, Residuals
3.5
16
Figure 2. General simple linear regression problem of minimizing the sum of squared
errors w.r.t. the intercept (b1) and slope (b2) parameters. Note in the notation in the text, a
denotes the intercept and b denotes the slope.
17
Figure 3. Graph showing the minimization of the sum of squared errors for the simple
linear regression example
100000
75000
SSE
50000
25000
0
-20
50
0
0
20
-50
40
b
60
-100
a
Download