final ppt regression

advertisement
British Biometrician Sir Francis Galton
was the one who used the term
Regression in the later part of 19
century.

Regression means the estimation or prediction of the
unknown value of dependent variable from the known value
of independent variable. In other words, regression analysis
is a mathematical measure of the average relationship
between two or more variables.
Definition of Regression
By M.M. Blair- Regression is the measure of the
average relationship between two or more variables.
By Hirsch- regression analysis measures the nature and
extent of the relation between two or more variables, thus
enables us to make predictions.
1. Regression analysis is essential for
planning and policy making: It is widely used
to estimate the coefficient of economic
relationship. These coefficient helps in the
formulation of economic policy of govt.
2. Testing Economic Theory: Regression analysis
is one of the basic tool to test the accuracy of
economic theory.
3. Prediction: By regression analysis, the value of
dependent variable can be predicted on the basis of
the value of an independent variable. For example,
if price of a commodity rises, what will be the
probable fall in demand, this can be predicted by
regression.
4. Regression is used to study the functional
relationship:
In agriculture, it is important to study the response of
fertilizer.
Basis
Correlation
Regression
1. Degree and
nature of
Relationship
Correlation is a measure
of degree of relationship
between X and Y.
Regression studies
the relationship
between two
variables so that
one may be able to
predict the value of
one variable on the
basis of another.
2. Cause and
Effect
Relationship
Correlation does not
Regression analysis
always assume cause and expresses the cause
effect relationship.
and effect
relationship.
3. Prediction
Correlation does not make Regression analysis
any prediction.
enable us to make
prediction.
4. Non-sense
Correlation:
In Correlation analysis,
sometimes there is a nonsense correlation like
between rise in income
and rise in weight
In regression
analysis there is
nothing like nonsense regression.
1. Simple and Multiple Regression: In simple regression
analysis, we study only two variables at a time, in which one
variable is dependent and another is independent. The
relationship between income and expenditure is an example of
simple regression. In multiple regression analysis, we study more
than two variables at a time in which one is dependent variable
and others are independent variable. The study of effect of rain
and irrigation on yield of wheat is an example of multiple
regression.
2. Linear and Non-linear Regression: When one variable
changes with other variable in fixed ratio this is called linear
regression. When one variable varies with other variable in
changing ratio, then it is called as non-linear regression.
3. Partial and Total Regression: When two or more variables
are studied for functional relationship but at a time, relationship
between two variables is studied and the other variables are
constant, it is known as partial regression. When all the variables
are studied simultaneously for the relationship among them is
called total regression.
Regression
Lines
Regression
Equations
Regression
Coefficients

1.
2.
The Regression Line shows the average relationship
between two variables. This is also known as the
line of Best Fit. If two variables X and Y are given,
then there are two regression lines related to them:
Regression Line of X on Y: The regression
line of X on Y gives the best estimate for the value
of X for any given value of Y
Regression Line of Y on X: The regression
line of Y on X gives the best estimate for the value
of Y for any value of X.
1. Scatter diagram method
2. Least Square method

This is the simplest method of constructing regression lines.
In this method, values of related variables are plotted on a
graph. A straight line is drawn passing through the plotted
points. The straight line is drawn with freehand. The shape
of regression line can be linear or non-linear.

Regression Lines can also be constructed by this method.
Under this method a regression line is fitted through different
points in such a way that the sum of squares of the deviations
of the observed values from the fitted line shall be least. The
line drawn by this method is called Line of Best Fit. Under
this method two lines regression lines are drawn in such a
way that sum of squared deviations becomes minimum.

Regression equations are the algebraic formulation
of regression lines. Regression Equations represent
regression lines. There are two regression equations:
Regression Equation of Y on X
Regression equation of X on Y





This equation is used to estimate the probable value
of Y on the basis of the given values of X. This
equation can be expressed in the following way:
Y=a+bX
Here a and b are constants
Regression equation of Y on X can also be
presented in another way:
Y−Y¯=byx(X—X¯)





This equation is used to estimate the probable values
of X on the basis of the given values of Y. This
equation is expressed in the following:
X=a+bY
Here a and b are constants
Regression equation of X on Y can also be written
in the following way:
X—X¯=bxy(Y—Y¯¯)

There are two regression coefficients. Regression
coefficients measures the average change in the
value of one variable for q unit change in the value
of another variables. Regression coefficient
represents the slope of a regression line. There are
two regression coefficients:
Regression coefficient of Y on X
Regression coefficient of X on Y






This coefficient shows that with a unit change in the value
of X variable, what will be the average change in the value
of Y variable. This is represented byx. Its formula is as
follows:
byx=r.
y∕ x
Regression coefficient of X on Y
This coefficient shows that with a unit change in the value
of Y variable, what will be the average change in the value
of X-variable. It is represented by bxy. Its formula is as
follows:
bxy=r.
x∕ y

Following are the properties of regression coefficients:

1. Coefficient of correlation is the geometric
mean of the regression coefficients.
r=√bxy×byx
2. Both the regression coefficients must
have same sign- This means either both


regression coefficients will either be positive or
negative. In other words when one regression
coefficient is negative, the other would also be
negative. It is never possible that one regression
coefficient is negative while the other is postive.

3. The coefficient of correlation will have the
same sign as that of regression coefficients- If
both regression coefficients are negative, then the correlation
would be negative. And if byx and bxy have positive signs,
then r will also take plus sign.

4. Both regression coefficients cannot be
greater than unity- If one regression coefficient of y on
x is greater than unity, then the regression coefficient of x on
y must be less than unity. This is because



r=√byx.bxy=±1
5. Arithmetic mean of two regression
coefficients is either equal to or greater
than the correlation coefficient. In terms of the
formula:
byx+bxy÷2≥r
6. Shift of origin does not affect regression
coefficients but shift in scale does affect
regression coefficient- Regression coefficients
are independent of the change of origin but not
of scale.
Regression Equations in case
of Individual Series
Regression Equations in case
of Grouped Data

In Individual series, regression equations can be
worked out by two methods:
Using Normal
Equations
Using regression
Coefficients











This method is also called Least Square Method.
Under this method, computation of regression equations is
done by solving two normal equations.
Regression Equation of Y on X
Y=a+bX
Under least square method, the values of a and b are obtained
by using the following two normal equations:
∑Y=Na+b∑X
∑XY=a∑X+b∑X²
Solving these equations, we get the following value of a and
b:
bxy= N.∑XY−∑X.∑Y÷N.∑X²−(∑X)²
a=Y-bX
Finally a and b are put in the equation:
Y=a+bX






Regression Equation of X on Y is expressed as follows:
X=a+bY
Under least Square Method the value of a and b are
obtained by using the following normal equations:
∑X=Na+b∑Y
∑XY=a∑Y+b∑Y²
Solving these equations we get the value
of a and b
The calculated value of a and b are put in the equation:
X=a+bY

Calculate the regression equation of X on Y from
the following by least square method
X
Y
X²
Y²
XY
1
2
1
4
2
2
5
4
25
10
3
3
9
9
9
4
8
16
64
32
5
7
25
49
35
N=5
∑X=15
∑Y=25
∑X²=55
∑Y²=151
∑XY=88














Regression Equation of X on Y
X=a+bY
The two normal equations are:
∑X=Na+b∑Y
∑XY=a∑Y+b∑Y²
Substituting the values we get
15=5a+25b
88=25a+151b
Multiplying 1 and 2
88=25a+151b
75=25a+125b
− − −
13=26b
b=13÷26=0.5






Putting the value of b in equation1
15=5a+25×0.5
15=5a+12.5
5a=2.5
a=0.50
Therefore: X=0.5+0.5Y





Regression equations can also be computed with the
help of regression coefficients. For this we will have
to find out, X‾, Y‾, byx, bxy from the data.
Regression equations can be computed from the
regression coefficients by following method:
1. Using actual values of X and Y
2. Using deviations from Actual Means
3. Using deviations from Assumed Means
4. Using r, x, y, and X‾, Y‾.







In this method , actual values of X and Y are used to
determine regression equations. Regression equation are put
in the following way:
Regression Equation of Y on X
Y-Y‾=byx(X-X‾)
Using the actual values, the value of byx can be calculated
as:
byx=N.∑XY−∑X.∑Y÷N.∑X²−(∑X)²
Regression Equation of X on Y
X−X‾=bxy(Y−Y‾)
Using the actual values bxy can be calculated as :
bxy=N.∑XY−∑X.∑Y÷N.∑Y²−(Y)²

Obtain two lines of equations from the following :
X
Y
XY
X²
Y²
2
5
10
4
25
4
7
28
16
49
6
9
54
36
81
8
8
64
64
64
10
11
110
100
121
∑X=30
∑Y=40
∑XY=266
∑X²=220
∑Y²=340















Regression coefficient of Y on X
byx=N.∑XY−(∑X) (∑Y)÷N.∑X²−(∑X)²
=5×266−30×40÷5×220−(30)²
=1330−1200÷1100−900
=130÷200
=0.65
Regression coefficient of X on Y
bxy=N.∑XY−∑X.∑Y÷N.∑Y²−(Y)²
=5×266−30×40÷5×340−(40)²
=1330−1200÷1700−1600
=130÷100
=1.30
Y−Y‾=byx(X−X‾)
Y‾=∑Y÷N
=40÷5
=8
X‾=∑X÷N
=30÷5
=6












Regression equation of Y on X
Y−Y‾=byx(X−X‾)
Y−8=0.65(X− 6)
Y=4.1+0.65X
Regression equation of X on Y
X−X‾=bxy(Y−Y‾)
X−6=1.30(Y−8)
X=−4.40+1.30Y










When the size of the values of X and Y is very large, then the method
using actual values becomes very difficult to use. In such case,
deviations taken from arithmetic means are used.
Regression equation of Y on X
Y−Y‾=byx(X−X‾)
Using deviations from actual means:
byx=∑xy÷∑x²
Where, x=X−X‾, y=Y−Y‾
Regression equation of X on Y
X−X‾=bxy(Y−Y‾)
Using deviations from actual means:
bxy=∑xy÷∑y²

Obtain the two regression equations from the
following:
X
Y
X‾=7
X−X‾
x
x²
Y‾=5
Y−Y‾
y
Y²
xy
2
4
−5
25
−1
1
5
4
2
−3
9
−3
9
9
6
5
−1
1
0
0
0
8
10
1
1
5
25
5
10
3
3
9
−2
4
−6
12
6
5
25
1
1
5
∑x²=7
0
∑y=0
∑y²=4
0
∑xy=1
8
∑X=42 ∑Y=30 ∑x=0
N=6
Since the actual means of X and Y are whole numbers we
should take deviations from X‾and Y‾

















byx=∑xy÷∑x²
=18÷70
=0.257
bxy=∑xy÷∑y²
=18÷40
=0.45
Regression equation of Y on X
Y−Y‾=byx(X−X‾)
Y−5=0.257(X−7)
Y−5=0.257X−7
Y=0.257X+3.201
Regression equation of X on Y
X−X‾=bxy(Y−Y‾)
X−7=0.45(Y−5)
X−7=0.45Y−2.25
X=0.45Y=4.75





When actual means turn out to be in fractions rather
than the whole number like 24.14, 56.89 etc, then
deviations from assumed means rather than actual
means are used. Following are the regression
equations:
Regression equation of Y on X
Y−Y‾=byx(X−X‾)
Using deviations from the assumed means, value of
byx can be calculated as:
byx=N×∑dxdy−∑dx.∑dy÷N.∑dx²−(∑dx)²





Regression equation of X on Y
X−X¯=bxy(Y−Y¯)
Using deviations from assumed means the value of bxy
can be calculated:
bxy=N.∑dxdy−∑dx.∑dy÷N.∑dy²−(∑dy)²
Where dx=X−Ax, dy=Y−Ay

Obtain the two regression equations for the
following data:
X
Y
A=69
dx
dx²
A=112
dy
dy²
dxdy
78
125
9
81
13
169
117
89
137
20
400
25
625
500
97
156
28
784
44
1936
1232
69=A
112=A
0
0
0
0
0
59
107
−10
100
−5
25
50
79
136
10
100
24
576
240
68
124
−1
1
12
144
−12
61
108
−8
64
−4
16
32
∑dx²=
1530
∑dy=1
09
∑dy²=
3491
∑dxdy
=2159
N=8
∑Y=10 ∑dx=4
∑X=60 05
8
0










byx=N.∑dxdy−∑dx.∑dy÷N.∑dx²−(dx)²
=8×2159−(48)(109)÷8×1530−(48)²
=17272−5232÷12240−2304
=12040÷9936
=1.212
Regression equation of Y on X
Y−Y¯=byx(X−X¯)
Y−125.625=1.212(X−75)
Y−125.625=1.212X−90.9
Y=1.212X+34.725










bxy=N.∑dxdy−∑dx.∑dy÷N.∑dy²−(∑ dy)²
=8×2159−(48)(109)÷8×3491−(1005)²
=17272−5232÷27928−1010025
=12040÷−9827097
=−0.001
Regression equation of X on Y
X−X¯=bxy(Y−Y¯)
X−75=−0.001(Y−125.625)
X−75=−0.001Y+0.125
X=−0.001Y+9.375







When the values of X¯, Y, x, y and r of X and Y
series are given then regression equations are
expressed in the following way:
1. Regression Equation Of Y on X
Y−Y¯=byx(X−X¯)
Where, byx=r. y x
2. Regression equation of X on Y
X−X¯=bxy(Y−Y¯)
Where, bxy=r. x y


You are given the following information:
Obtain two regression equations:
X
Y
Arithmetic Mean:
5
12
Standard deviation:
2.6
3.6
Correlation
Coefficient:
r=0.7







1. Regression Equation Of X on Y:
X−X¯=r. x y(Y−Y¯)
Putting the values in the equation we get:
X−5=0.7×2.6÷3.6(Y−12)
X−5=0.51(Y−12)
X−5=0.51Y−6.12
X=0.51Y−1.12
2. Regression equation of Y on X
Y−Y¯=r. y x(X−X¯)
Putting the values in the equation, we get
Y−12=0.7×3.6÷2.6(X−5)
Y−12=0.97(X−5)
Y−12=0.97X−4.85
Y=0.97X+7.15
For obtaining regression equations from grouped data, first of
all we have to construct a correlation table. Special
adjustments must be made while calculating the value of
regression coefficients because regression coefficients are
independent of change of origin but not of scale. In
grouped data the regression coefficients are computed by
using
the
formula:
1.bxy=N×∑fdxdy−∑fdx.∑fdy÷N×∑fdy²−(∑fdy)²×ix÷iy
2.byx=N×∑fdxdy−∑fdx.∑fdy÷N×∑fdx²−(∑fdx)²×iy÷ix



1. To find the Mean Values from the
Regression Equations: Two regression lines
intersect each other at mean value(X¯ and Y‾ points.
2. To find the Coefficient of
Correlation from two Regression
Equations: Correlation coefficient can be
worked out from the regression coefficients bxy and
byx.
r=√byx.bxy









From the following two regression equations,
identify which one is X on Y and which one is Y on
X
2X+3Y=42
X+2Y=26
Solution
In the absence of any clear cut indication, Let us
assume that equation first to be Y on X and equation
second to be X on Y
Let equation 1 be regression equation of Y on X
2X+3Y=42
3Y=42−2X
Y=42÷3−2X÷3
From this it follows that

byx=Coefficient of X in 1=−2÷3
Now equation 2 be regression equation on X on Y
X+2Y=26
X=26−2Y
From this it follows that
bxy=coefficient of Y in 2=−2
Now, we calculate ‘r’ on the basis of the above
values of two regression coefficients we get:
r²=byx.bxy
=−2÷3×−2
= 4÷3>1














Here, r²>1 which is impossible as r²≤1. So
our assumption is wrong. We now choose
equation 1 as regression of X on Y and 2 as regression
equation of Y on X:
Assuming the first equation as on X on Y, we have
2X+3Y=42
2X=42−3Y
X=42÷2−3Y÷2
From this, it follows that
bxy=Coefficient of Y in above equation=−3÷2
Now, assuming the second equation as Y on X, we
have,
X+2Y=26
2Y=26−X
Y=26÷2−1÷2X
From this it follows that
byx=Coefficient of X in above equation=−1÷2






Now, r²=bxy.byx
=−3÷2×−1÷2
=3÷4
=0.75
Here, r²<1 which is possible. r² is within the
limit , r²≤1.
Hence, it is proved that the first
equation is of X on Y and the second
equation is of Y on X
Download