252cp404a 4/19/04 (Open this document in 'Page Layout' view!) Last Computer Problem This is an internet project. You are trying to answer the question, ‘how well does manufacturing explain differences in income?’ You should use some measure of income per person or family in each state as your dependent variable and try to explain it as a function of (to start with) percent of output or labor force in manufacturing. This should start out as a simple regression. Then you should try to see whether there are other variables that explain the differences as well. One possibility is the per cent of the adult population with college or high school diplomas. Possible sources of data are below, but think about what you use, and try to find some other sources. Total income of a state, for example is a very poor choice, rather than some per capita measure because it is simply going to be high for places with a lot of people without indicating how well off they are. Similarly the fraction of the workforce with a certain education level is far better then the number. http://www.nam.org/s_nam/sec.asp?CID=5&DID=3 Manufacturing share in state economies (http://www.nam.org/Docs/IEA/26767_20002001ManufacturingShareandChangeinStateEconomies.pdf?DocTypeID=9&TrackID=&Param=@CategoryI D=1156@TPT=2002-2001+Manufacturing+Share+and+Change+in+State+Economics) http://www.nemw.org/data.htm Per capita income by state. http://www.nemw.org/data.htm State personal income per capita. http://www.bea.doc.gov/bea/regional/data.htm Personal income per capita by state. http://www.census.gov/statab/www/ Many state statistics, including persons with bachelor’s degrees. http://www.epinet.org/content.cfm/datazone_index Income inequality, median income, unemployment rates. Doing a regression Assume that your dependent variable (‘response’ in Minitab talk) is in c2 and your independent variables (‘predictors’ in Minitab) are in c1 and c3. You can always refer to them by their names if the columns are labeled. The basic commands for a regression with 2 independent variables are Regress c2 2 c1 c3 And Stepwise c2 c1 c3 There are examples of commands like this in problems like 14.35, written up in pages like pp 9-11 etc. in 252SolnK1. It is often easier to use the pull-down menus in Minitab. In the examples below I ran data from your disk provided with the text after downloading it with the labels provided by the data set. To run ‘regress,’ I used the ‘stat’ pull-down menu and picked ‘regression’ twice. The only option I picked was Variance Inflation Factor. ————— 4/19/2004 4:53:41 PM ———————————————————— Welcome to Minitab, press F1 for help. 1 252cp403 4/19/04 MTB > Retrieve "C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive D\MINITAB\Warecost.MTW". #I used a pull-down menu and ‘open a new worksheet’ Retrieving worksheet from file: C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive D\MINITAB\Warecost.MTW # Worksheet was saved on Tue Dec 02 2003 Results for: Warecost.MTW MTB > Regress c2 2 c1 c3; SUBC> Constant; SUBC> VIF; SUBC> Brief 2. #Don’t bother with this option – it is put in automatically. # This is the Variance inflation factor. For interpretation, see the text or the #outline. #This is a default value too. You can play around with brief 3, these control #the amount of detail in the output and can be used to give you predicted #values for Y . Regression Analysis: Sales versus DistCost, Orders The regression equation is Sales = 65.6 + 4.32 DistCost + 0.0188 Orders Predictor Constant DistCost Orders Coef 65.64 4.323 0.01884 S = 45.66 SE Coef 57.51 1.865 0.03272 R-Sq = 71.4% T 1.14 2.32 0.58 P 0.267 0.031 0.571 R-Sq(adj) = 68.6% Analysis of Variance Source Regression Residual Error Total Source DistCost Orders DF 1 1 DF 2 21 23 SS 109116 43776 152892 MS 54558 2085 VIF 6.4 6.4 #Note that only the coeff of DistCost is #significant. F 26.17 P 0.000 Seq SS 108425 691 Unusual Observations Obs DistCost Sales 14 72.3 328.00 Fit 461.65 SE Fit 9.37 Residual -133.65 St Resid -2.99R R denotes an observation with a large standardized residual Here I repeated the analysis using the ‘stat’ pull-down menu, picked ‘regression’ and then ‘stepwise.’ The subcommands were all generated by Minitab and would be what would have been used if I had just put in the first line as a command. MTB > Stepwise c2 c1 c3; SUBC> AEnter 0.15; SUBC> ARemove 0.15; SUBC> Constant. 2 252cp403 4/19/04 Stepwise Regression: Sales versus DistCost, Orders Alpha-to-Enter: 0.15 Response is Sales Step Constant 1 78.09 DistCost 5.31 T-Value P-Value 7.32 0.000 Alpha-to-Remove: 0.15 on 2 predictors, with N = 24 # The computer came up with ‘Sales’ = 78.09 + 5.31’DistCost’ and quit. It #decided that ‘Orders’ had very weak explanatory power. S 45.0 R-Sq 70.92 R-Sq(adj) 69.59 C-p 1.3 More? (Yes, No, Subcommand, or Help) SUBC> y No variables entered or removed Anyway, your job is to add whatever variable you think ought to explain your income measure. Consider all 50 states your sample. Your report should tell what numbers you used, from where and from what years. What coefficients were significant and do you think on the basis of your results that manufacturing is an important predictor of a state’s prosperity.? Of course, if you don’t like this assignment, get approval to research something else on the internet. For example, does the per cent of the population in prison affect the crime rate (maybe with a few years’ lag)? Or are there better predictors? And get out the Durbin-Watson, prison vs. crime rate is a time series project. 3