Last Computer Problem

advertisement
252cp404a 4/19/04 (Open this document in 'Page Layout' view!)
Last Computer Problem
This is an internet project. You are trying to answer the question, ‘how well does manufacturing explain
differences in income?’ You should use some measure of income per person or family in each state as your
dependent variable and try to explain it as a function of (to start with) percent of output or labor force in
manufacturing. This should start out as a simple regression. Then you should try to see whether there are
other variables that explain the differences as well. One possibility is the per cent of the adult population
with college or high school diplomas. Possible sources of data are below, but think about what you use, and
try to find some other sources. Total income of a state, for example is a very poor choice, rather than some
per capita measure because it is simply going to be high for places with a lot of people without indicating
how well off they are. Similarly the fraction of the workforce with a certain education level is far better then
the number.
http://www.nam.org/s_nam/sec.asp?CID=5&DID=3 Manufacturing share in state economies
(http://www.nam.org/Docs/IEA/26767_20002001ManufacturingShareandChangeinStateEconomies.pdf?DocTypeID=9&TrackID=&Param=@CategoryI
D=1156@TPT=2002-2001+Manufacturing+Share+and+Change+in+State+Economics)
http://www.nemw.org/data.htm Per capita income by state.
http://www.nemw.org/data.htm State personal income per capita.
http://www.bea.doc.gov/bea/regional/data.htm Personal income per capita by state.
http://www.census.gov/statab/www/ Many state statistics, including persons with bachelor’s degrees.
http://www.epinet.org/content.cfm/datazone_index Income inequality, median income, unemployment rates.
Doing a regression
Assume that your dependent variable (‘response’ in Minitab talk) is in c2 and your independent variables
(‘predictors’ in Minitab) are in c1 and c3. You can always refer to them by their names if the columns are
labeled.
The basic commands for a regression with 2 independent variables are
Regress c2 2 c1 c3
And
Stepwise c2 c1 c3
There are examples of commands like this in problems like 14.35, written up in pages like pp 9-11 etc. in
252SolnK1. It is often easier to use the pull-down menus in Minitab. In the examples below I ran data from
your disk provided with the text after downloading it with the labels provided by the data set. To run
‘regress,’ I used the ‘stat’ pull-down menu and picked ‘regression’ twice. The only option I picked was
Variance Inflation Factor.
————— 4/19/2004 4:53:41 PM ————————————————————
Welcome to Minitab, press F1 for help.
1
252cp403 4/19/04
MTB > Retrieve "C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive
D\MINITAB\Warecost.MTW". #I used a pull-down menu and ‘open a new worksheet’
Retrieving worksheet from file: C:\Documents and Settings\RBOVE.WCUPANET\My
Documents\Drive D\MINITAB\Warecost.MTW
# Worksheet was saved on Tue Dec 02 2003
Results for: Warecost.MTW
MTB > Regress c2 2 c1 c3;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Brief 2.
#Don’t bother with this option – it is put in automatically.
# This is the Variance inflation factor. For interpretation, see the text or the
#outline.
#This is a default value too. You can play around with brief 3, these control
#the amount of detail in the output and can be used to give you predicted
#values for Y .
Regression Analysis: Sales versus DistCost, Orders
The regression equation is
Sales = 65.6 + 4.32 DistCost + 0.0188 Orders
Predictor
Constant
DistCost
Orders
Coef
65.64
4.323
0.01884
S = 45.66
SE Coef
57.51
1.865
0.03272
R-Sq = 71.4%
T
1.14
2.32
0.58
P
0.267
0.031
0.571
R-Sq(adj) = 68.6%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
DistCost
Orders
DF
1
1
DF
2
21
23
SS
109116
43776
152892
MS
54558
2085
VIF
6.4
6.4
#Note that only the coeff of DistCost is
#significant.
F
26.17
P
0.000
Seq SS
108425
691
Unusual Observations
Obs
DistCost
Sales
14
72.3
328.00
Fit
461.65
SE Fit
9.37
Residual
-133.65
St Resid
-2.99R
R denotes an observation with a large standardized residual
Here I repeated the analysis using the ‘stat’ pull-down menu, picked ‘regression’ and then ‘stepwise.’ The
subcommands were all generated by Minitab and would be what would have been used if I had just put in
the first line as a command.
MTB > Stepwise c2 c1 c3;
SUBC>
AEnter 0.15;
SUBC>
ARemove 0.15;
SUBC>
Constant.
2
252cp403 4/19/04
Stepwise Regression: Sales versus DistCost, Orders
Alpha-to-Enter: 0.15
Response is
Sales
Step
Constant
1
78.09
DistCost
5.31
T-Value
P-Value
7.32
0.000
Alpha-to-Remove: 0.15
on
2 predictors, with N =
24
# The computer came up with ‘Sales’ = 78.09 + 5.31’DistCost’ and quit. It
#decided that ‘Orders’ had very weak explanatory power.
S
45.0
R-Sq
70.92
R-Sq(adj)
69.59
C-p
1.3
More? (Yes, No, Subcommand, or Help)
SUBC> y
No variables entered or removed
Anyway, your job is to add whatever variable you think ought to explain your income measure. Consider all
50 states your sample. Your report should tell what numbers you used, from where and from what years.
What coefficients were significant and do you think on the basis of your results that manufacturing is an
important predictor of a state’s prosperity.?
Of course, if you don’t like this assignment, get approval to research something else on the internet. For
example, does the per cent of the population in prison affect the crime rate (maybe with a few years’ lag)?
Or are there better predictors? And get out the Durbin-Watson, prison vs. crime rate is a time series project.
3
Download