Chapter 6 - WordPress.com

advertisement
Chapter 6
Intercept A and Gradient of
regression line, B
1
y – intercept or
constant term
Gradient or Slope of
regression line
y = A + Bx
Dependent
Variable, DV
y- axis
Independent
Variable, IV
y – intercept or
constant term
A
Gradient or Slope of
regression line = B
2
x-axis
y = A + Bx is sometimes called a Deterministic model and it gives an
exact relationship between y and x
But in reality yobs is slightly different from the value predicted by ypre
So y = A + Bx + e
where e is random error term to take into
consideration the difference (see slide 5 if you
do not understand this concept)
A and B are population parameters and the regression line is called
Population regression line and values of A and B in the population are
called true values of the y-intercept and slope.
But population data are difficult to obtain. So we use sample data to
estimate the population. Thus the values calculated from sample data
are estimates and so the y-intercept and the slope for the sample data
are denoted as ‘a’ and ‘b’ and yo is denoted as the predicted or estimated
value for a given x.
yo = a + bx
this equation is called estimated regression
model; it gives the regression of y on x
based on sample data
3
Example 1
Income
35
49
21
39
15
28
25
Food
Expenditure
9
15
7
11
5
8
9
4
Scatter Plot for example 1
yobs
ypre
Regression
Line
e
e (error) = ypre - yobs
ypre – y value predicted by regression line
or best straight line
x1
Yobs – actual y value obtained
Or
e = y - yo
5
Error Sum of Squares, SSE
The sum of errors is always zero for the best straight line or least squares line.
i.e.
Σe = Σ(y –yo) = 0
So to find the line that best fits the points, we cannot minimize the sum of errors
Since it will always be zero. Instead we minimize the error sum of squares, SSE
SSE = Σe2 = Σ(y –yo)2
The value of ‘a’ and ‘b’ that give the minimum SSE are called the least squares
estimates of A and B and the regression line obtained with these estimates is
called the least squares regression line.
For the least squares regression line, yo = a + bx
Where,
b = SSxy
and a = y - b x
SSxx
y = mean of y scores
x = mean of x scores
6
SSxy = Σ (x - x)(y – y)
SSxx = Σ (x – x)2
SSxy = Σxy – (Σx) (Σy)
n
SSxx = Σx2 – (Σx)2
n
Can be positive or negative
Is always positive
y = mean of y scores
x = mean of x scores
7
Example 1
Income
x
35
49
21
39
15
28
25
Σx = 212
Food
Expenditure, y
9
15
7
11
5
8
9
Σy = 64
xy
315
735
147
429
75
224
225
Σxy = 2150
x2
1225
2401
441
1521
225
784
625
Σx2 = 7222
Step 1: Compute Σx, Σy, x and y.
Σx = 212
Σy= 64
X = Σx / n = 212 / 7 = 30.2857 Y = Σy / n = 64 / 7 = 9.1429
Step 2: Compute Σxy and Σx2
8
Step 3: Compute SSxy and SSxx
SSxy = Σxy – (Σx) (Σy)
n
SSxx = Σx2 – (Σx)2
n
= 2150 – (212)(64) /7
= 7222 – (212)2 / 7
= 211.7143
= 801.4286
Step 4: Compute ‘a’ and ‘b’
b = SSxy
SSxx
= 211.7143
801.4286
and a = y - b x
a = 9.1429 – (.2642)(30.2857)
a = 1.1414
= .2642
The estimated regression model ypre = a + bx
is ypre = 1.1414 + .2642x
9
This gives the regression of food expenditure on income.
Using this estimated regression model, we can find the predicted value
Of y for any specific value of x.
Eg. If the monthly income is RM3500, where x = 35 in hundred
Then ypre = 1.1414 – (.2642)(35) = RM10.3884 hundred = RM1038.84
But the actual y value when x = 35 is RM900
There is an error in the prediction of –RM138.84 . This negative error indicates
that the predicted value of y is greater than the actual value of y. Thus if
We use the regression model, the household food expenditure is overestimated
by RM138.84
Calculate what happens when income = RM0?
10
Exercise 1
Calculate the regression equation for the
Math (x scores) and Science (y scores) marks.
Maths
Science
32
45
67
56
23
12
86
79
65
73
55
65
32
40
67
77
90
87
31
40
56
49
77
82
10
13
75
76
67
68
77
79
34
45
28
31
44
49
11
Exercise 2
Calculate the regression equation for the
Maths (x scores) and History (y scores)
marks.
Maths
History
32
70
67
30
23
80
86
45
65
35
55
65
32
65
67
35
90
10
31
70
56
49
77
42
10
90
75
40
67
59
77
51
18
81
28
55
44
49
12
Exercise 3
Calculate the regression equation for the
graph between IQ ranges (x axis) and the
Correlation coefficients ( r) between
Overall Creativity (OC) and Overall
Achievement (OA) (y axis) based on the
data on page 77 (Graph 7.1) (Palaniappan,
2006)
13
Download