SENIC-(Applied Linear Regression Models by John Neter, Michael H Kutner, William Wasserman, Christopher J. Nachtsheim) Data: This data consists of a random sample of 113 hospitals selected from the original 338 hospitals surveyed. Each line of the data set has an identification number and provides information on 11 other variables for a single hospital. The data presented here are for the 1975-76 study period. The 12 variables are: Variable Variable name number 1 Identification number 2 Length of stay 3 4 Age Infection risk 5 Routine culturing ratio 6 Routine chest Xray ratio 7 8 Number of beds Medical school affiliation Region Average daily census Number of nurses 9 10 11 12 Available facilities and services Description 1-113 Average length of all patients in Hospital (in days) Average age of patients (in years) Average estimated probability of Acquiring infection in hospital (in percent) Ratio of number of cultures performed To number of patients without signs or symptoms of hospital-acquired infection, times 100. Ratio of number of X-rays performed to number of patients without signs or symptoms of pneumonia, times 100 Average number of beds in hospital during study period 1=Yes, 2=No Geographic region, where: 1=NE, 2=NC, 3=S, 4=W Average number of patients in hospital per day during study period Average number of full-time equivalent registered and licensed practical nurses during study period (number full time plus one half the number part time) Percent of 35 potential facilities and services that are provided by the hospital The data is in http://www.stat.uiowa.edu/~stramer/S150/SENIC.MTW Do not open the file. Save it first. 1 1. Two models have been proposed for predicting the average length of patient stay in a hospital (Y). Model I utilizes as predictor variables age, infection risk, and available facilities and services. Model II uses as predictor variables number of beds, infection risk, and available facilities and services. a. For each of the two proposed models, fit first-order regression model with three predictor variables. 2 b. Calculate R for each model. Is the model clearly preferable in terms of this measure? c. For each model, obtain the residuals. In terms of the residuals, is one model clearly more appropriate than the other? 2. For each geographic region regress infection risk against the predictor variables age, routine culturing ratio, average daily census, and available facilities and services. a. Use first-order regression model with four predictor variables. State the estimated regression functions. b. Are the estimated regression functions similar for the four regressions? Discuss. 2 c. Calculate MSE and R for each region. Are these measures similar for the four regions? Discuss d. Obtain the residuals for each fitted model. State your finding. 3. For predicting the average length of patient stay in a hospital (Y), it has been decided to include age and infection risk as predictor variables. a. For each of the following variables, calculate the coefficient of partial determination given that age and infection risk are included in the model: routine culturing ratio, average daily census, number of nurses, and available facilities and services. b. Using the F test statistic, test whether or not the variable determined to be best in part (a) is helpful in the regression model when age and infection risk are included in the model; use 0.05 . 2 4. Length of stay is to be predicted, and the pool of potential predictor variables includes all other variables in the data set except medical school affiliation and region. It is believed that a model with log Y as the response variable and the predictor variables in the first-order terms with no interaction terms will be appropriate. Consider cases 57-113 to constitute the mode-building data set to be used for the following analysis. a. Obtain the correlation matrix of the X variables. Is there evidence of strong linear pair wise association among the predictor variables here? b. Find the best subset according to the R-square-adjusted criterion. 3 Consumer Expenditure and Money Stock The following table gives quarterly data from 1952 to 1956 on consumer expenditure (Y) and the stock money (X), both measured in billions of current dollars for the United States. Year Quarter Expenditure 1952 1 214.6 1952 2 217.7 1952 3 219.6 1952 4 227.2 1953 1 230.9 1953 2 233.3 1953 3 234.1 1953 4 232.3 1954 1 233.7 1954 2 236.5 1954 3 238.7 1954 4 243.2 1955 1 249.4 1955 2 254.3 1955 3 260.9 1955 4 263.3 1956 1 265.6 1956 2 268.2 1956 3 270.4 1956 4 275.6 Stock 159.3 161.2 162.8 164.6 165.9 167.9 168.3 169.7 170.5 171.6 173.9 176.1 178.0 179.1 180.2 181.2 181.6 182.5 183.3 184.3 a. Regress Y on X and summarize the results. b. Compute the Durbin-Watson statistic D. What conclusion regarding the presence of autocorrelation would you draw from D? c. Apply the first iteration of the Cochrane -Orcutt procedure to the data. What is your conclusion? 4