Uploaded by N. Khord

Regression+L2

advertisement
FURTHER TOPICS IN REGRESSION
ANALYSIS
(MULTIPLE LINEAR REGRESSION MODEL)
Dr. E. N. Aidoo
Department of Statistics and Actuarial Science
en.aidoo@yahoo.com
0202901980
September, 2020
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
1 / 39
OUTLINE
1
INTRODUCTION
2
LEAST SQUARE ESTIMATION OF THE PARAMETERS
3
MATRIX APPROACH TO MULTIPLE LINEAR REGRESSION
4
REAL DATA LAB SESSION
5
INFERENCE UNDER THE PARAMETER
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
2 / 39
INTRODUCTION
Many applications of regression analysis involve situations in which
there are more than one regressor variable.
A regression model that contains more than one regressor variable is
called a multiple regression model.
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
3 / 39
For example, suppose that the effective life of a cutting tool depends
on the cutting speed and the tool angle. A possible multiple
regression model could be
Y = β0 + β1 x1 + β2 x2 + ε
where;
Y - tool life
x1 - cutting speed
x2 - tool angle
Y denotes the dependent variable
X1 and X2 denote the independent variables
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
4 / 39
Y = β0 + β1 x1 + β2 x2 + ε
β1 x1 and β2 x2 are partial regression coefficients
β1 measures the expected change in Y per unit change in x1 when x1
is held constant,
β2 measures the expected change in Y per unit change in x2 when x1
is held constant.
ε denotes the error term or residuals (the residuals is assumed to be
normally distributed with mean 0 and variance constant σ 2 )
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
5 / 39
LEAST SQUARE ESTIMATION OF THE PARAMETERS
The least square function is given by
n
P
L=
ε2i =
i=1
n
P
yi − β0 −
k
P
!2
βj xij
j=1
i=1
The least square function must satisfy
∂L
∂βj
= −2
n
P
yi − β̂0 −
!
β̂j xij
=0
(1)
j=1
i=1
β̂0 ,β̂1 ,...,β̂k
k
P
and
∂L
∂βj
= −2
β̂0 ,β̂1 ,...,β̂k
Dr Eric Nimako Aidoo
n
P
i=1
yi −β̂0 −
k
P
!
β̂j xij xij = 0; j = 1, 2, ..., k (2)
j=1
General Linear Regression Models
September, 2020
6 / 39
Simplifying Equation (1-2), we obtain the least squares normal
equations
Equations
nβ̂0
+ β̂1
n
P
xi1
+ β̂2
n
P
xi1
+ β̂1
i=1
i=1
..
.
β̂0
n
P
i=1
n
P
xik
+ β̂1
..
.
n
P
i=1
xi2
+ ···
2
xi1
xik xi1
+ β̂2
n
P
xi1 xi2
i=1
+ β̂2
..
.
n
P
xik
=
+ ···
+ ···
i=1
+ β̂k
n
P
xi1 xik
=
..
.
+ β̂k
i=1
yi
n
P
xi1 yi
i=1
i=1
n
P
n
P
i=1
i=1
..
.
xik xi2
n
P
+ β̂k
i=1
i=1
β̂0
n
P
..
.
xik2
=
n
P
xik yi
i=1
The solution to the normal Equations are the least squares
estimators of the regression coefficients
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
7 / 39
MATRIX APPROACH TO MULTIPLE LINEAR
REGRESSION
Suppose the model relating the regressors to the response is
yi = β0 + 0 + β2 xi2 + ... + βk xik + εi
i = 1, 2, ..., n
In matrix notation this model can be written as
y = Xβ + ε
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
8 / 39
y = Xβ + ε
where

 
1 x11 x12 · · ·
y1
1 x21 x22 · · ·
 y2 
 

y =  .  X = .
..
..
..
 ..
 .. 
.
.
.
1 xn1 xn2 · · ·
yn
Dr Eric Nimako Aidoo

 
 
β0
ε1
x1k





x2k 
β1 
 ε2 
..  β =  ..  and ε =  .. 
.
.
. 
xnk
βk
εn
General Linear Regression Models
September, 2020
9 / 39
The least square function:
S(β) =
n
P
i=1
=
ε2i
ε0 ε
= (y − X β)0 (y − X β)
= y 0 y − 2β 0 X 0 y + β 0 X 0 X β
∂S
∂β
= −2X 0 y + 2X 0 X β̂ = 0
β̂
X 0 X β̂ = X 0 y normal equations
β̂ = (X 0 X )−1 X 0 y
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
10 / 39
The fitted model corresponding to the levels of the regressor variable,
x:
ŷ = X β̂
ŷ = X (X 0 X )−1 X 0 y = Hy
H: Hat matrix
The hat matrix, H, is an idempotent matrix and is a symmetric
matrix. i.e. H 2 = H and H T = H
H is an orthogonal projection matrix.
Residuals:
e = y − ŷ = y − Hy = (I − H)y
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
11 / 39
Estimating σ 2
An unbiased estimator of σ 2 is
σ2
n
P
σ̂ 2 =
i=1
ei2
n−p
=
SSE
n−p
where;
e represents the estimated residuals from the model
p represents the number regression coefficients
n represents the number of observations used
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
12 / 39
Example
Fit a multiple regression to the data below using the matrix approach
y
3
2
4
5
8
Dr Eric Nimako Aidoo
X1
2
3
5
7
8
X2
1
5
3
6
7
General Linear Regression Models
September, 2020
13 / 39
y = Xβ + ε
 
3
2
 
4
 
5
8
Dr Eric Nimako Aidoo

1
1

1

1
1
2
3
5
7
8
 

ε1
1  



5 β̂0
ε2 


3 β̂1  + ε = ε3 

ε4 
6 β̂2
ε5
7
General Linear Regression Models
September, 2020
14 / 39
y = Xβ + ε
.
y
 
3
2
 
4
 
5
8
Dr Eric Nimako Aidoo

1
1

1

1
1
2
3
5
7
8

 
1  
ε1


β̂
5
0

ε2 




3 β̂1 + ε = ε3 

ε4 
6 β̂2
7
ε5
General Linear Regression Models
September, 2020
15 / 39


 1
1 1 1 1 1 
1

X’X = 2 3 5 7 8 
1
1 5 3 6 7 1
1
Dr Eric Nimako Aidoo
2
3
5
7
8

1


5 25 22
5



3
 = 25 151 130
22 130 120
6
7


P
P
PN
P x12 P x2

= P x1 P x1
Px1 x22
x2
x2 x1
x2
General Linear Regression Models
September, 2020
16 / 39
−1 

5 25 22
1.201 −0.138 −0.071
25 151 130 = −1.138 0.114 −0.098
22 130 120
−0.071 −0.098 0.128

XX
inverse
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
17 / 39


1 1 1 1 1
X’y = 2 3 5 7 8
1 5 3 6 7
Dr Eric Nimako Aidoo
 
3
   P 
3
22
 
P y
4 = 131 =  x1 y 
 
P
5
x2 y
111
8
General Linear Regression Models
September, 2020
18 / 39
β̂ = (X 0 X )−1 X 0 y

  
  
1.201 −0.138 −0.071
22
0.50
β0
−1.138 0.114 −0.098 131 =  1  = β1 
−0.071 −0.098 0.128
111
−0.25
β2
ŷi = 0.50 + 1Xi1 + (−0.25)Xi2
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
19 / 39
R codes
> y = c(3,2,4,5,8)
> x1 = c(2,3,5,7,8)
> x2 = c(1,5,3,6,7)
>
> ExpA = data.frame(y,x1,x2)
> ExpA
1
2
3
4
5
>
y
3
2
4
5
8
x1
2
3
5
7
8
Dr Eric Nimako Aidoo
x2
1
5
3
6
7
General Linear Regression Models
September, 2020
20 / 39
R output
> Cor(ExpA)
y
x1
x2
y
1.000
0.894
0.640
x1
0.894
1.000
0.814
x2
0.640
0.814
1.000
> modelA=lm(y∼x1+x2,data=ExpA)
> summary(modelA)
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
21 / 39
R output
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
22 / 39
Interpretation of model parameters
The equation is: ŷ = 0.5 + 1.0x1 − 0.21x2
−ŷ is expected to increase by 1 for a unit increase in x1 whilst
keeping x2 constant
−ŷ is expected to decrease by 0.2 for a unit increase in x2 whilst
keeping x1 constant
−Ŷ is expected to be 0.5 on average when x1 and x2 are zero
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
23 / 39
REAL DATA LAB SESSION
Example: Actuaries Salary Survey
An insurance firm collected data for a sample of 20 actuaries. A suggestion
was made that regression analysis could be used to determine if salary was
related to the years of experience and the score on the firm’s aptitude test.
The years of experience, score on the aptitude test, and corresponding
annual salary (GHc 1,000) for a sample of 20 actuaries is shown on the
next slide.
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
24 / 39
Exper.
4
7
1
5
8
10
0
1
6
6
Score
78
100
86
82
86
84
75
80
83
91
Dr Eric Nimako Aidoo
Salary
24.0
43.0
23.7
34.3
35.8
38.0
22.2
23.1
30.0
33.0
Exper.
9
2
10
5
6
8
4
6
3
3
Score
88
73
75
81
74
87
79
94
70
89
Salary
38.0
26.6
36.2
31.6
29.0
34.0
30.1
33.9
28.2
30.0
General Linear Regression Models
September, 2020
25 / 39
Example: Actuaries Salary Survey
Suppose we believe that salary (y) is related to the years of experience
(x1 ) and the score on the actuaries aptitude test (x2 ) by the following
regression model:
y = β0 + β1 x1 + β2 x2 + ε
where
y = annual salary (GHc 1000)
x1 = years of experience
x2 = score on aptitude test
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
26 / 39
R code
> Cor(salary1)
y
x1
x2
Salary
1.000
0.855
0.589
Exper
0.855
1.000
0.336
Score
0.589
0.336
1.000
> modelc = lm(Salary ∼ Expert + Score, data = salary1))
> summary(modelc)
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
27 / 39
R output
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
28 / 39
Interpreting the Coefficients
Model
Salary = 3.174 + 1.404(Exper) + 0.251(Score)
Note: Predicted salary will be in thousands of cedis
Salary is expected to increase by GHc 1,404 for each additional year of
experience (when the variable score on attitude test is held constant).
Salary is expected to increase by GHc 251 for each additional point
scored on the aptitude test (when the variable years of experience is
held constant).
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
29 / 39
Multiple Line Regression with three predictors
y1
y2
y3
···
y1
=
=
=
=
=
β̂0
β̂0
β̂0
···
β̂0
+ β̂1 X11
+ β̂1 X21
+ β̂1 X31
···
+ β̂1 Xn1
+ β̂2 X12
+ β̂2 X22
+ β̂2 X32
···
+ β̂2 Xn2
+ β̂3 X13
+ β̂3 X23
+ β̂3 X33
···
+ β̂3 Xn3
+ ε1
+ ε2
+ ε3
···
+ εn
.
In matrix notation briefly expressed:
y = X β̂ + ε
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
30 / 39
Multiple Line Regression with three predictors
 
1
y1
 y2  1
  
 y3  = 1
  
· · · 1
1
yn

Dr Eric Nimako Aidoo
X11
X21
X31
···
Xn1
X12
X22
X32
···
Xn2

X13
X23 

X33 

···
Xn3
 
ε1
 
 ε2 
β̂0
 
β̂1  =  ε3 
 
· · ·
β̂2
εn
General Linear Regression Models
September, 2020
31 / 39
yi = β̂0 + β̂1 Xi1 + β̂2 Xi2 + β̂i3 + εi
Try it your self
Use R to fit a multiple regression model to this data
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
32 / 39
ANOVA IN MULTIPLE REGRESSION
Sum of Squares
The least squares method allows to check the following equality:
ε0 ε = (y − X β̂)0 (y − X β̂)
= y 0 y − 2β̂ 0 X 0 y + β̂ 0 X 0 X β̂
= y 0 y − 2β̂ 0 X 0 y + β̂ 0 X 0 X [X 0 X ]−1 X 0 y
= y 0 y − 2β̂ 0 X 0 y + β̂X 0 y
= y 0 y − β̂ 0 X 0 y
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
33 / 39
Partition of Sum of Squares
Since in general: = y 0 y − β̂ 0 X 0 y
it’s possible to derive that the sum of squares of the distance of y from its
average can be decomposed into the sum of squares due to regression and
the sum of squares due to error, according to:
y 0 y − nȳ 2 = (β̂ 0 X 0 y − nȳ 2 ) + ε0 ε
y 0 y − nȳ 2 = (β̂ 0 X 0 y − nȳ 2 ) + (y 0 y − β̂ 0 X 0 y )
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
34 / 39
In summary:


SSreg






SSres







SStot
= β̂ 0 X 0 y −
P
( y )2
n
= y 0 y − β̂ 0 X 0 y
= y 0 y − nȳ 2
SStot = SSres + SSreg
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
35 / 39
ANOVA Table for Salary Example
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
36 / 39
Coefficient of Determination
Coefficient of Determination R 2
R 2 = SSR/SST
R 2 = 500.3285/599.7855 = 0.83418
For the salary data, we find that R2 = 0.83
Thus, the model accounts for about 83% of the variability in the
salary response
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
37 / 39
Adjusted Coefficient of Determination Ra2
Because the coefficient of determination depends on both the number of
observation (n) and the number of independent variables (p) it is
convenient to correct by the degrees of freedom. Hence, the use of
adjusted coefficient of determination.
Adjusted Coefficient of Determination Ra2
Ra2 = 1 − (1 − R 2 )
Ra2 = 1 − (1 − 0.834179)
Dr Eric Nimako Aidoo
n−1
n−p−1
20 − 1
= 0.814671
20 − 2 − 1
General Linear Regression Models
September, 2020
38 / 39
R 2 and Ra2 (The output from R software)
Dr Eric Nimako Aidoo
General Linear Regression Models
September, 2020
39 / 39
Download