Multiple Regression Analysis

advertisement
Module H8 Practical 6
Multiple Regression Analysis
Objectives:
By the end of this practical you should be able to:






conduct a multiple regression analysis and write down the fitted
equation
explain what hypotheses are being tested through the t-tests
interpret the results of such tests of hypotheses
have greater confidence in interpreting the meaning of R2
have greater confidence in conducting and interpreting results of a
residual analysis
appreciate how t-probabilities change when x-variables are dropped
from the mode
1. The data for this practical concerns rural female headed households from the Kigoma
region of Tanzania, found in the Excel sheet Kigoma_RuralWomen in file H8_data.xls,
and also in the Stata file named Kigoma_RuralWomen.dta. A listing of the variables can
be found on page 5.
In this practical, you will be considering the relationship between log consumption
expenditure (in variable lnexpdf) and the following four variables:

Household size (in variable hhsize);



Age of household head (in variable age)
Number of employed adults in household (in variable empl)
Usual number of meals per day (in variable num_meal)
The aim is to investigate the extent to which hhsize, age, empl and num_meal explain
the variability in income poverty so that the resulting multiple regression equation could be
used to predict poverty levels of households not in the data set.
SADC Course in Statistics
Module H8 Practical 6 – Page 1
Module H8 Practical 6
(a) Start by producing a matrix plot between the dependent variable, lnexpdf, and the
four explanatory variables. What conclusions can you make from these plots? Write these
down below.
(b) Fit a multiple linear regression model with consumption expenditure as the Dependent
Variable and the variables hhsize, age, empl and num_meal as the Explanatory Variables.
Note down the parameter estimates, their standard errors and the t-probabilities in the
table below.
Variable name
Parameter estimate
Standard error
t-probability
household size
age
number employed
no of meals per day
constant
What conclusions may be drawn from the results above? Write them down with respect to
the contribution that each of the four explanatory variables make to the model.
SADC Course in Statistics
Module H8 Practical 6 – Page 2
Module H8 Practical 6
(c) How much of the variability in log(consumption expenditure) is explained by the four
variables included in the regression?
How much of the variation has been left unexplained?
Give an estimate for the measure of unexplained variability:
What are the degrees of freedom associated with this estimate?
(d) Would you consider dropping any of the variables in your model and re-fitting the
model with the remaining variables? If so, which variable would you consider dropping,
and why?
(e)
Re-fit the model with variables hhsize, age, and empl.1 Are you happy with the
resulting model? How much of the variation in lnexpdf is now left unexplained?
(f) Fit a model relating lnexpdf to hhsize and empl and write down answers to the
following questions.
 What is the equation describing your multiple regression model?
Note: Although this seems to suggest that dropping num_meal was the best choice in (d), this is not
necessarily the case as you will see in the next session.
1
SADC Course in Statistics
Module H8 Practical 6 – Page 3
Module H8 Practical 6
 Are both hhsize and empl important in explaining the variability in lnexpdf? Give
reasons for your answer.
 What percentage of the variability in lnexpdf is explained by hhsize and empl?
(g) Finally, conduct a residual analysis to examine possible departures from model
assumptions and to identify any extreme observations. What are your conclusions?
SADC Course in Statistics
Module H8 Practical 6 – Page 4
Module H8 Practical 6
Listing of data on Kigoma rural female headed households
-----------------------------------------------------------------------------storage display
value
variable name
type
format
label
variable label
-----------------------------------------------------------------------------hhid
float %9.0g
household id
urb_rur
float %9.0g
urb_rur
urban or rural
region
float %9.0g
region
region
zone
float %9.0g
agro-ecological zone
stratum
float %9.0g
stratum
division of tanzania into 3
groups
hh_wt
float %9.0g
final household weight
expen
float %9.0g
expenditure per adult equivalent
lnexpdf
float %9.0g
log (to base e) of expenditure
per adult equivalent
hhsize
float %9.0g
household size
hhsize2
float %9.0g
age
float %9.0g
age of household head
agesq
float %9.0g
sexhead
float %9.0g
sexhead
sex
edu
float %9.0g
edu
education level of hh head
act1
float %9.0g
act1
primary activity of household
head
act2
float %9.0g
act2
secondary activity of household
head
empl
float %9.0g
empl
number of adults employed (inc.
self-empl)
depratio
float %9.0g
dependency ratio
pprm
float %9.0g
continuous variable for persons
per room
p_room
float %9.0g
p_room
persons per sleeping room
walls
float %9.0g
walls
status of walls
water
float %9.0g
water
source of water supply
fuelight
float %9.0g
fuelight
source of fuel for lighting
fuelght2
float %9.0g
fuelght2
source of fuel for lighting
(detailed)
toilet
float %9.0g
toilet
toilet facilities
qmeat
float %9.0g
in past wk, days meat eaten
qmilk
float %9.0g
in past wk, days milk taken
num_meal
float %9.0g
num_meal
no. of meals per day
radio
float %9.0g
radio
radio or radio cassette owned?
bicycle
float %9.0g
bicycle
bicycle owned?
watch
float %9.0g
watch
watch owned?
iron
float %9.0g
iron
iron owned?
mosqnet
float %9.0g
mosqtnet
mosquito net owned?
table
float %9.0g
table
table owned?
sofa
float %9.0g
sofa
sofa owned?
lamp
float %9.0g
lamp
lamp owned?
soap
float %9.0g
soap
paid for soap (either bar or
powder)?
wheatf
float %9.0g
wheatf
paid for wheat flour?
anyland
float %9.0g
anyland
household owns any land for
farming/ pastoralism
landarea
float %9.0g
acres of land owned by hh for
farming/pastoralism
cashinc
float %9.0g
cashinc
households main source of cash
------------------------------------------------------------------------------
SADC Course in Statistics
Module H8 Practical 6 – Page 5
Module H8 Practical 6
2. If you have time, try also the exercise below.
The average annual rainfall (mm) is given below for 16 stations in the southern Pennines,
UK. The elevation of the rain gauges is also given. Of interest is to explore whether the
annual rainfall has any relationship with either the elevation or the altitude.
The data is also available in the worksheet named penrain in the Excel file H8_data.xls.
It is also available in the Stata file named penrain.dta.
Station
Average
Annual
Rainfall (mm)
Gauge
Elevation
(metres)
Maximum
Altitude (m)
within 2 km
Wessenden
1273
366
518
Blackmoorfoot
1094
244
328
956
235
290
Yateholme
1616
308
547
Harden Moss
1345
369
475
Wakefield
670
35
76
Langsett
1059
250
370
Underbank
949
184
355
Cannon Hall
738
113
210
Barnsley
640
40
118
Chew
1536
532
541
Bottoms Reservoir
1074
153
385
Yeoman Hey
1329
239
503
Dunford
1442
329
484
Broomhead Manor
1246
418
490
856
124
340
Huddersfield
Moor Hall
(a) Carry out a suitable analysis to investigate whether elevation or altitude or both
contribute significantly to the model. You may wish to conduct two simple linear
regressions first, one with elevation alone as the explanatory variable, and then with altitude
alone as the explanatory variable. Finally fit both elevation and altitude. Note down the
key results of these three regressions in the tables below and give your comments.
SADC Course in Statistics
Module H8 Practical 6 – Page 6
Module H8 Practical 6
(i) Regression of annual rainfall on elevation:
Variable name
Parameter estimate
Standard error
t-probability
Standard error
t-probability
elevation
constant
Adj. R2:
Comments:
(i) Regression of annual rainfall on altitude:
Variable name
Parameter estimate
altitude
constant
Adj. R2:
Comments:
(ii) Regression of annual rainfall on elevation and altitude:
Variable name
Parameter estimate
Standard error
t-probability
elevation
altitude
constant
Adj. R2:
Comments:
SADC Course in Statistics
Module H8 Practical 6 – Page 7
Module H8 Practical 6
(iv) Overall conclusions from the above analyses:
(b) Once you have chosen the model that best describes the variation in annual rainfall
amounts, carry out an analysis of residuals to determine the validity of your conclusions. If
there are doubts about the model assumptions, what remedial action might you consider?
Take these actions and comment on whether you think they have been effective.
SADC Course in Statistics
Module H8 Practical 6 – Page 8
Download