Statistics 103 Probability and Statistical Inference Instructions for lab

advertisement
Statistics 103 Probability and Statistical Inference
Instructions for lab 9
Lab Objective
Practice with multiple linear regression, residual diagnostics, and dummy variables.
Lab Procedures
Many macroeconomic studies use cross-sectional data (i.e., data from the same time frame) from
countries around the world. Of particular interest is the factors related to Gross National Product
(GNP), which essentially is the amount of money the country produces from all sources.
Open the data set countries.JMP. It contains economic data for 97 countries from around the
world. All monetary values are expressed in U.S. dollars. The variables include:
GNP (per capita)
Birth Rate (per 1000)
Death Rate (per 1000)
Infant Deaths (per 1000)
Life Expectancy (Males)
Life Expectancy (Females)
Region
Country

the GNP divided by the number of people in the country.
the number of births per 1000 people in the country.
the number of deaths per 1000 people in the country.
the number of infant deaths per 1000 people in the country.
average age at death for men.
average age at death for women.
Eastern European and former Soviet Union countries = 1
South American and Central American countries = 2
"Western" countries (e.g., France, Japan, USA) = 3
Middle Eastern countries = 4
South Asian countries = 5
African countries = 6.
name of country.
Fitting multiple linear regression model
1. Analyze  Fit model
2. Select the response variable and add to the Y box.
3. Select each predictor and add to the Construct Model Effects box.
4. To add an interaction.
5. Highlight a variable in the Select Column and a variable in the Construct Model
Effects box. Then click Cross. You should see the interaction term in the Effect
box.
6. Click Run Model.
Questions
1. Does a normal curve describe the distribution of per capita GNP well?
2. What is the regression equation for predicting per capita GNP (Y) from birth rate (X)?
3.
How much error did GNPs deviate from the regression line?
4. Click on the red arrow beside Linear Fit and select Plot Residual. Does the plot of residuals
versus the predictor suggest any violations of the regression assumptions? If so what are
they?
5. Let's do the regression using the (natural) logarithm of per capita GNP as the dependent
variable.
a. What is the regression equation for predicting the logarithm per capita GNP (Y) from
birth rate (X)?
b. Also plot the residuals versus predictor. Are the linear model assumptions more
appropriate?
6. Interpret the effect of birth rate on LogGNP and give a 90% confidence interval for the true
regression slope.
7. Fit a linear model for LogGNP and include both birth rate and death rate simultaneously.
a. Interpret the slope coefficient for birth rate.
b. How does the slope of birth rate change between model with or without death rate as
a predictor?
c. Describe the relationship between birth rate and death rate.
For the following questions, use the indicator (dummy) variables for country region to fit an
appropriate model. Write down the model and the estimated regression coefficients. Justify your
conclusion by either reporting a confidence interval or carrying out a hypothesis test.
8. After controlling for birth rate and death rate, is there evidence that Western countries have
higher log GNP compared to the other countries?
9. After controlling for death rate, is there evidence that the associations of birth rate and log
GNP were different between Western countries and the other countries?
Download