UNIT V Regression: relationship between two variables The variable

advertisement
UNIT V
Regression: relationship between two variables
The variable being predicted is called the dependent variable.
The variable or variables being used to predict the value of the dependent variable are called the
independent variables
The simplest type of regression analysis involving one independent variable and one dependent
variable in which the relationship between the variables is approximated by a straight line. It is
called simple linear regression
Regression analysis involving two or more independent variables is called multiple regression
analysis
Line of Regression of Y on X
Y = a + bX
Here X = independent variable Y= dependent variable
a = intercept, b = slope of line
Another way of expressing regression equation of Y on X = this line can also be expressed as
(Y- Y) = byx (X-X)
byx = r (σy /σx)
Where X = mean of X series
Y = mean of Y series
σx= standard deviation of x series
σy= standard deviation of y series
r= coefficient of correlation between two variable x and y
Calculation of byx = n ∑XiYi – (∑Xi) (∑Yi)
n ∑Xi2 – (∑Xi) 2
Line of Regression of X on Y
X = c + dY
Here Y = independent variable X= dependent variable
c = intercept, d = slope of line
Another way of expressing regression equation of X on Y = this line can also be expressed as
(X- X) = bxy (Y-Y)
bxy = r (σx /σy)
Where X = mean of X series
Y = mean of Y series
σx= standard deviation of x series
σy= standard deviation of y series
r= coefficient of correlation between two variable x and y
Calculation of byx = n ∑XiYi – (∑Xi) (∑Yi)
n ∑Yi2 – (∑Yi)2
Coefficient of correlation r = √𝒃𝒙𝒚 ∗ 𝒃𝒚𝒙
Question: A panel judges P and Q graded seven dramatic performances by independently
awarding mark as follows:
Performance 1
2
3
4
5
6
7
Marks by P
46
42
44
40
43
41
45
Marks by Q
40
38
36
35
39
37
41
The eighth performance which judge Q could not attend was awarded 37 marks by judge P. If
judge Q had also been present, how many marks would be expected to have been awarded by
him to eighth performance?
Solution: let us denote marks awarded by the judge P as X and marks awarded by the judge Q as
Y. since we have to estimate marks that would have been awarded by judge Q, we shall fit a line
of regression of Y on X tot the given data.
X
Y
X2
Y2
XY
46
40
2116
42
38
1764
44
36
1936
1296
1584
40
35
1600
1225
1400
43
39
1849
1521
1677
41
37
1681
1369
1517
45
41
∑Y=266
2025
∑X = 12971
1681
∑Y2=10136
1845
∑XY=11459
∑X=301
2
1600
1444
1840
1596
Calculate mean of X = 301/7 = 43
Mean of Y = 266/7 = 38
Calculation of byx = n ∑XiYi – (∑Xi) (∑Yi)
n ∑Xi2 – (∑Xi)2
=
7 ∗ 11459 – 301∗266
7∗12971−(301∗301)
= 0.75
(Y- Y) = byx (X-X)
= (Y – 38) = 0.75(X - 43)
Y = 5.75 + 0.75 X
Estimate of Y when X =37
Y = 5.75 + 0.75 * 37 = 33.5 marks
It is expected that the judge Q would have awarded 33.5 marks to the eighth
performance
Question: find the mean of X & Y variables and the coefficient of correlation between them
from the following two regression equations
3Y – 2X – 10 = 0
2Y – X – 50 = 0
Solution: means of X & Y …solve both equation and get the values of X and Y
After solving then mean of X = 130 mean of Y = 90
Correlation coefficient
Let us assume that the first equation be regression of X on Y
X=−
10
2
3
+ 2𝑌
Then here b =3/2
Let us assume that the second equation be regression of Y on X
Y=
50
2
1
+ 2𝑋
Then here d =1/2
Coefficient of correlation = r2 = b*d=
3
2
∗
1
2
=
3
r = √4 = 0.87
Coefficient of Determination r2 = SSR /SST
3
4
Question: calculate SSE,SST,SSR where 10 number of restaurants
X
y
2
58
6
105
8
88
8
118
12
117
16
137
20
157
Solution : firstly calculate the estimated regression equation Y on X
SSE
SST
20
169
22
149
26
202
Using the estimated regression equation for estimation and prediction
If a significant relationship exists between x and y, and the coefficient of determination shows
that the fit is good, the estimated regression equation should be useful for estimation and
prediction.
Point Estimation
We can use the estimated regression equation to develop a point estimate of the mean value of y
for a particular value of x or to predict an individual value of y corresponding to a given value of
x.
Interval Estimation
Point estimates do not provide any information about the precision associated with an estimate.
For that we must develop interval estimates. The first type of interval estimate, a confidence
interval, is an interval estimate of the mean value of y for a given value of x. The second type of
interval estimate, a prediction interval, is used whenever we want an interval estimate of an
individual value of y for a given value of x. The point estimate of the mean value of y is the same
as the point estimate of an individual value of y. But, the interval estimates we obtain for the two
cases are different. The margin of error is larger for a prediction interval
Prediction Interval for an Individual Value of y
Residual Plot Against x
Aresidual plot against the independent variable x is a graph in which the values of the
independent variable are represented by the horizontal axis and the corresponding residual values
are represented by the vertical axis. A point is plotted for each residual. The first coordinate for
each point is given by the value of xi and the second coordinate is given by the corresponding
value of the residual yi _ i. For a residual plot against x
Residual Plot Against y
Another residual plot represents the predicted value of the dependent variable on the horizontal
axis and the residual values on the vertical axis. Apoint is plotted for each residual. The first
coordinate for each point is given by i and the second coordinate is given by the corresponding
value of the ith residual yi
Download