Linear Regression
Module - 12
Kiran Sultana
© Islamic Online University
Least Square Method
The regression equation of X on Y is :
πΰ·‘π = π + πππ
The regression equation of Y on X is:
πΰ·‘π = π + πππ
The values of a and b are found with the help of normal equations given below:
(I )
Σπ = ππ + πΣπ
Σππ = πΣπ + πΣπ 2
(II )
Σπ = ππ + πΣπ
Σππ = πΣπ + πΣπ 2
© Islamic Online University
Example
From the following data obtain the two regression equations
the method of Least Squares.
X
X
Y
3
2
3
Y
6
6
1
7
4
8
8
5
9
ο₯ X ο½ 24
ο₯Y ο½ 29
2
1
XY
18
2
56
20
72
7
8
X2 4
5
9
4
Y28
9
36
1
49
16
64
64
25
81
using
ο₯ XY ο½168 ο₯ X ο½ 142 ο₯Y ο½ 207
2
2
© Islamic Online University
Regression Equation of Y on X
πΰ·‘π = π + πππ
Σπ = ππ + πΣπ
Σππ = πΣπ + πΣπ 2
π=5
ο₯ X ο½ 24
ο₯Y ο½ 29
ο₯ XY ο½ 168
ο₯ X ο½ 142
ο₯Y ο½ 207
2
2
Plotting the values in given equation
29=5a+24b………………………(i)
168=24a+142b
84=12a+71b……………………..(ii)
Multiplying equation (i ) by 12 and (ii) by 5
348=60a+288b………………………………..(iii)
420=60a+355b………………………………..(iv)
By solving equation(iii)and (iv) we get
a=0.66 and b=1.07
πΰ· = 0.66 + 1.07π
© Islamic Online University
Regression Equation of X on Y
πΰ·‘π = π + πππ
Σπ = ππ + πΣπ
Σππ = πΣπ + πΣπ 2
π=5
ο₯ X ο½ 24
ο₯Y ο½ 29
ο₯ XY ο½168
ο₯ X ο½ 142
ο₯Y ο½ 207
2
2
24=5a+29b………………………(i)
Plotting the values in given equation 168=29a+207b…………………..(ii)
Multiplying equation (i)by 29 and in (ii) by 5 we get
696=145a+841b…………………(iii)
840=145a+ 1035b………………..(iv)
By solving equation(iii)and (iv) we get
a=0.49 and b=0.74
πΰ· = 0.49 + 0.74π
© Islamic Online University
Simple Linear Regression Equation
The simple linear regression equation provides an estimate of the population regression line
Estimate of the
regression slope
Estimate of regression
intercept
Estimated (or predicted) y
value for observation i
ΰ·‘π = π + πππ
π
Value of x for
observation i
The individual random error terms ei have a mean of zero
Σπ = 0
© Islamic Online University
Assumption Required for a Linear Regression Analysis
1. The distribution of ε at any particular x value has mean value 0.
πΈ(ππ ) = 0
2. The Variance of ε is same for any particular value of x.
π£ππ ππ = πΈ(ππ2 ) = π 2
3. The random errors are independent of one another.
πΈ ππ , ππ = 0
4. The distribution of ε at any particular x value is normal.
ππ ~π(0 , π 2 )
© Islamic Online University
Probability Distribution of π
For any fixed x, y has normal distribution with mean a + bx and standard
deviation s.
Positive Error
πΈ π = πΌ + π½π
Negative Error
© Islamic Online University
Error or Ceterus Paribas
Law of Demand:
“The Quantity Demanded increases(decreases) with the
decrease(increase) in price, when other things remaining the same or
Ceterus Paribus”
Mathematically,
Where,
ππ = π π + πππ‘πππ’π ππππππ’π
ππ = πΌ − π½π + πππ‘πππ’π ππππππ’π
πΌ = π΄π’π‘ππππππ’π π·πππππ
π½ = πππππ ππ π·πππππ πΆπ’ππ£π ππ πππΆ
ππ = πΌ − π½π + π
It means error term π plays the same role in Statistics, as Ceterus paribus in Economics
© Islamic Online University
Regression Equation of Y on X(Alternative Approach)
The calculation by the least squares method are quit cumbersome when the values of X and Y
are large. So the work can be simplified by using this method.
The formula for the calculation of Regression Equations by this method:
πΰ·‘π = π + πππ
ΰ΄€
ΰ΄€
Σ(π − π)(π
− π)
ππ¦π₯ =
Σ π − πΰ΄€ 2
Or
πΣππ − ΣπΣπ
ππ¦π₯ =
πΣπ 2 − Σπ 2
ππ¦π₯ = πΰ΄€ − ππ¦π₯ πΰ΄€
© Islamic Online University
Regression Equation of X on Y(Alternative Approach)
When X is regressed on Y, we have the following equation,
πΰ·‘π = π + πππ
Slope
ΰ΄€
ΰ΄€
Σ(π − π)(π
− π)
ππ₯π¦ =
Σ π − πΰ΄€ 2
Or
πΣππ − ΣπΣπ
ππ₯π¦ =
πΣπ 2 − Σπ 2
Intercept
ππ₯π¦ = πΰ΄€ − ππ₯π¦ πΰ΄€
© Islamic Online University
Example
From the following data obtain the two regression equations
the method of Least Squares.
X
Y
3
6
2
1
7
8
4
5
8
9
using
For Solution, move to MS Excel
© Islamic Online University
Properties of Least Square
β The least square regression line passes through the points of means (πΰ΄€ , ΰ΄₯π), the center of the
observed data.
β The sum of residuals is equal to zero Σππ = 0
ΰ·‘π are equal Σππ = Σπΰ·‘π
β The Sum of Observed Values ππ and sum of the fitted values π
ΰ·‘π are equal Σπΰ΄€π = Σπΰ΄€π .
β The Mean of Observed Values ππ and Mean of the fitted values π
β Sum of square of residuals is minimum Σππ2 ππ ππππππ’π
Move to MS Excel
© Islamic Online University
Standard Error of Estimate
“The degree of scatter of observed values about the regression lines is measured
by Standard deviation of regression or Standard error of estimate”
For population data Standard Error of observations about the true regression line πΈ π = πΌ + βπ
πππππ‘ππ ππ¦ ππ¦.π₯ is given by
ππ¦.π₯ =
2
Σ π − πΰ·
π
ππ¦.π₯ =
2
Σ π − πΰ·
π−2
For Sample Data:
ππ¦.π₯ =
or
Σππ2 − πΣππ − πΣππ ππ
π−2
Move to MS Excel
© Islamic Online University
Total Variation, Explained Variation and Unexplained Variation
Total Variation or T.S.S:
“The variability among the values of dependent variable Y coming from the
explained variables and all those which are in Error Term”
π. π. π = Σ π − πΰ΄€ 2
Explained Variation or R.S.S
Unexplained Variation or E.S.S
The
variation
coming
from
the
explanatory variables e.g., Change in
demand due to change in price
2
ΰ·‘
ΰ΄€
π
. π. π = Σ ππ − π
It is sum of squares of the errors of
′
predicted πΰ·‘π π from the sample mean πΰ΄€
The variation coming from the variables
which are unexplained e.g., Change in
demand due to change in Income etc.
2
ΰ·
πΈ. π. π = Σ ππ − π
It is sum of squares of the errors of
prediction
© Islamic Online University
Total Variation, Explained Variation and Unexplained Variation
π. π. π = π
. π. π + πΈ. π. π
πΈ. π. π
2
Σ ππ − πΰ·
ΰ·‘π − πΰ΄€
π
. π. π Σ π
2
© Islamic Online University
Coefficient of Determination
“The Ratio of Explained variation to the
total variation is called the Coefficient of
Determination and Denoted π 2 ”
2
ΰ·‘
ΰ΄€
πΈπ₯πππππππ ππππππ‘πππ Σ ππ − π
π2 =
=
πππ‘ππ ππππππ‘πππ
Σ ππ − πΰ΄€ 2
2
ΰ·
Σ
π
−
π
πΈ.
π.
π
π
π2 = 1 −
=
π. π. π Σ ππ − πΰ΄€ 2
© Islamic Online University
Coefficient of Determination
Simplifying the ratio,
Σπ 2
πΣπ + πΣππ −
π
2
π =
2
Σπ
Σπ 2 −
π
•
r2 is the square of correlation coefficient
•
r2 is a number between zero and one and a value close to zero suggests a poor model.
0 ≤ π2 ≤ 1
•
It gives the proportion of variation in y that can be attributed to an approximate
linear relationship between x and y.
Move to MS Excel
© Islamic Online University
Coefficient of Non-Determination
πΰ·‘π = π + πππ
Logically, it is the total variation in Y that is not explained by the variation in X.
Mathematically,
1 − π2
For Example:
When π 2 = 0.75
then
1 − π 2 = 0.25
Means 25% variation is not accounted by the variation in independent variable
© Islamic Online University
Prediction in Regression Analysis
“Regression line is use to predict the response “Y” for
specific value of the explanatory variable “X””
Accuracy of prediction:
Accuracy of prediction from a regression line depends on how much scatter about line the
data has.
• If the data points are very to the line then we are confident then that our prediction is
accurate and vice versa.
© Islamic Online University
Interpolation and Extrapoltaion
Extrapolation and interpolation are both used to estimate hypothetical values for a variable based upon
other observations
To tell the difference between extrapolation and interpolation, we need to look at the prefixes “extra” and
“inter.”
• The prefix “extra” means “outside” or “in addition to.”
• The prefix “inter” means “in between” or “among.”
Just knowing these meanings (from their originals in Latin) goes a long way to distinguishing between the
two methods.
© Islamic Online University
Interpolation and Extrapolation
. Interpolation is guessing data points that fall within
the range of the data you have, i.e. between your
existing
data
points.
. Extrapolation is guessing data points from beyond
the
range
of
your
data
set.
For Example:
For Example:
•
•
•
Suppose as before that data with x between 0 and
10 is used to produce a regression line y = 2x + 5.
We can use this line of best fit to estimate the y
value corresponding to x = 20, y = 2(20) + 5 =45.
•
Because our x value is not among the range of
values used to make the line of best fit, this is an
example of extrapolation.
Suppose that data with x between 0 and 10 is
used to produce a regression line y = 2x + 5. We
can use this line of best fit to estimate the y value
corresponding to x = 6. y = 2(6) + 5 =17.
Because our x value is among the range of values
used to make the line of best fit, this is an
example of interpolation.
© Islamic Online University
Advantages of Regression Analysis
• Predicting the Future
• Supporting Decision
• Correcting Errors
• New Insights
© Islamic Online University