4/1/05

advertisement
SENIC-(Applied Linear Regression Models
by John Neter, Michael H Kutner, William Wasserman, Christopher J. Nachtsheim)
Data: This data consists of a random sample of 113 hospitals selected from the
original 338 hospitals surveyed.
Each line of the data set has an identification number and provides information on 11
other variables for a single hospital. The data presented here are for the 1975-76
study period. The 12 variables are:
Variable Variable name
number
1
Identification
number
2
Length of stay
3
4
Age
Infection risk
5
Routine culturing
ratio
6
Routine chest Xray ratio
7
8
Number of beds
Medical school
affiliation
Region
Average daily
census
Number of nurses
9
10
11
12
Available facilities
and services
Description
1-113
Average length of all patients in
Hospital (in days)
Average age of patients (in years)
Average estimated probability of
Acquiring infection in hospital
(in percent)
Ratio of number of cultures performed
To number of patients without signs or symptoms of
hospital-acquired infection, times 100.
Ratio of number of X-rays performed to number of
patients without signs or symptoms of pneumonia, times
100
Average number of beds in hospital during study period
1=Yes, 2=No
Geographic region, where: 1=NE, 2=NC, 3=S, 4=W
Average number of patients in hospital per day during
study period
Average number of full-time equivalent registered and
licensed practical nurses during study period (number
full time plus one half the number part time)
Percent of 35 potential facilities and services that are
provided by the hospital
The data is in
http://www.stat.uiowa.edu/~stramer/S150/SENIC.MTW
Do not open the file. Save it first.
1
1. Two models have been proposed for predicting the average length of patient stay in a
hospital (Y). Model I utilizes as predictor variables age, infection risk, and available
facilities and services. Model II uses as predictor variables number of beds, infection risk,
and available facilities and services.
a. For each of the two proposed models, fit first-order regression model with three
predictor variables.
2
b. Calculate R for each model. Is the model clearly preferable in terms of this
measure?
c. For each model, obtain the residuals. In terms of the residuals, is one model
clearly more appropriate than the other?
2. For each geographic region regress infection risk against the predictor variables age,
routine culturing ratio, average daily census, and available facilities and services.
a. Use first-order regression model with four predictor variables. State the estimated
regression functions.
b. Are the estimated regression functions similar for the four regressions? Discuss.
2
c. Calculate MSE and R for each region. Are these measures similar for the four
regions? Discuss
d. Obtain the residuals for each fitted model. State your finding.
3. For predicting the average length of patient stay in a hospital (Y), it has been decided
to include age and infection risk as predictor variables.
a. For each of the following variables, calculate the coefficient of partial
determination given that age and infection risk are included in the model: routine
culturing ratio, average daily census, number of nurses, and available facilities
and services.
b. Using the F test statistic, test whether or not the variable determined to be best in
part (a) is helpful in the regression model when age and infection risk are
included in the model; use   0.05 .
2
4. Length of stay is to be predicted, and the pool of potential predictor variables includes
all other variables in the data set except medical school affiliation and region. It is
believed that a model with log Y as the response variable and the predictor variables in
the first-order terms with no interaction terms will be appropriate. Consider cases 57-113
to constitute the mode-building data set to be used for the following analysis.
a. Obtain the correlation matrix of the X variables. Is there evidence of strong linear
pair wise association among the predictor variables here?
b. Find the best subset according to the R-square-adjusted criterion.
3
Consumer Expenditure and Money Stock
The following table gives quarterly data from 1952 to 1956 on consumer expenditure (Y)
and the stock money (X), both measured in billions of current dollars for the United
States.
Year Quarter Expenditure
1952
1
214.6
1952
2
217.7
1952
3
219.6
1952
4
227.2
1953
1
230.9
1953
2
233.3
1953
3
234.1
1953
4
232.3
1954
1
233.7
1954
2
236.5
1954
3
238.7
1954
4
243.2
1955
1
249.4
1955
2
254.3
1955
3
260.9
1955
4
263.3
1956
1
265.6
1956
2
268.2
1956
3
270.4
1956
4
275.6
Stock
159.3
161.2
162.8
164.6
165.9
167.9
168.3
169.7
170.5
171.6
173.9
176.1
178.0
179.1
180.2
181.2
181.6
182.5
183.3
184.3
a. Regress Y on X and summarize the results.
b. Compute the Durbin-Watson statistic D. What conclusion regarding the presence
of autocorrelation would you draw from D?
c. Apply the first iteration of the Cochrane -Orcutt procedure to the data. What is
your conclusion?
4
Download