A. Multiple choice question Question 1: Discrete Components, Inc. manufactures a line of electrical resistors. Presently, the carbon composition line is producing 100 ohm resistors. The population variance of these resistors "must not exceed 4" to conform to industry standards. Periodically, the quality control inspectors check for conformity by randomly selecting 10 resistors from the line and calculating the sample variance. The last sample had a variance of 4.36. Assume that the population is normally distributed. Using 𝛼 = 0.05, the null hypothesis is _________________. a) 𝜎 ! = 100 b) 𝜎 = 10 c) 𝑠 ! = 4 d) 𝜎 ! = 4 Question 2: David Desreumaux, VP of Human Resources of American First Banks (AFB), is reviewing the employee training programs of AFB banks. Based on a recent census of personnel, David knows that the variance of teller training time in the Southeast region is 8, and he wonders if the variance in the Southwest region is the same number. His staff randomly selected personnel files for 15 tellers in the Southwest Region, and determined that their mean training time was 25 hours and that the standard deviation was 4 hours. Assume that teller training time is normally distributed. Using 𝛼 = 0.10, the critical values of chi-square are ________. a) 7.96 and 26.30 b) 6.57 and 23.68 c) -1.96 and 1.96 d) -1.645 and 1.645 Question 3: Suppose the fat content of a hotdog follows normal distribution. Ten random measurements give a mean of 21.77 and standard deviation of 3.69. The 90% confidence interval for the population variance of fat content of a hotdog is ________ a) 5.2 to 21.3 b) 7.3 to 36.9 c) 19.63 to 23.91 d) 19.85 to 23.69 160 180 y 200 220 240 Question 4: According to the following graphic, x and y have ____________ 10000 a) strong negative correlation b) virtually no correlation c) strong positive correlation 20000 30000 x 40000 50000 d) moderate negative correlation Question 5: One of the assumptions made in simple regression is that ______________. a) the error terms are exponentially distributed b) the error terms have unequal variances c) the model is linear d) the error terms are dependent Question 6: A manager wishes to predict the annual cost (y) of an automobile based on the number of miles (x) driven. The following model was developed: y = 1,550 + 0.36x. If a car is driven 15,000 miles, the predicted cost is ____________. a) 2090 b) 3850 c) 7400 d) 6950 Question 7: A researcher believes that a variable is Poisson distributed across six categories. To test this, the following random sample of observations is collected: Category 0 1 2 3 4 5 Observed 47 56 39 22 18 10 Using 𝛼 = 0.10, the observed chi-square value for this goodness-of-fit test is ____. a) 2.28 b) 14.56 c) 17.43 d) 1.68 Question 8: Use the following set of observed frequencies to test the independence of the two variables. Variable one has values of 'A' and 'B'; variable two has values of 'C', 'D', and 'E'. C D E A 12 10 8 B 20 24 26 Using 𝛼 = 0.05, the critical chi-square value is _______. a) 9.488 b) 1.386 c) 8.991 d) 5.991 Question 9: Sam Hill, Director of Media Research, is analyzing subscribers to the Life West of the Saline magazine. He wonders whether subscriptions are influenced by the head of household’s employment classification. His staff prepared the following contingency table from a random sample of 300 households. Head of Household Classification Clerical Managerial Professional Subscribes Yes 10 90 60 No 60 60 20 Using 𝛼 = .05, the appropriate decision is ______________. a) reject the null hypothesis and conclude the two variables are independent b) do not reject the null hypothesis and conclude the two variables are independent c) reject the null hypothesis and conclude the two variables are not independent d) do not reject the null hypothesis and conclude the two variables are not independent Question 10: A multiple regression analysis produced the following tables. Predictor Intercept x1 x2 Coefficients 752.0833 11.87375 1.908183 Standard Error 336.3158 5.32047 0.662742 t Statistic 2.236241 2.231711 2.879226 p-value 0.042132 0.042493 0.01213 Source df SS MS F p-value Regression 2 203693.3 101846.7 6.745406 0.010884 Residual 12 181184.1 15098.67 Total 14 384877.4 These results indicate that ____________. a) none of the predictor variables are significant at the 5% level b) each predictor variable is significant at the 5% level c) x1 is the only predictor variable significant at the 5% level d) x2 is the only predictor variable significant at the 5% level 30000 -5 -10 -20000 -10000 Residuals 0 Residuals 0 5 10000 20000 10 Question 11: Among the three following figures, which residual plots show that the fitted models should be revised? 32000 34000 36000 26 28 30 32 Fitted values Fitted values Residual plot 2 -2 -1 Residuals 0 1 2 Residual plot 1 52 54 56 58 Fitted values Residual plot 3 a) The first and the second plots b) The first plot only c) The first and the third plots d) The third plot only. 60 62 34 36 Question 12: Large correlations between two or more independent variables in a multiple regression model could result in the problem of ________. a) multicollinearity b) autocorrelation c) zero mean d) non-normality B. Practice question: Question 1: Data: Medgpa.csv a. Write the logistic regression equation relating x (GPA) to y (Acceptance). b. Using Stata to compute the estimated logit. c. What is the interpretation of P(Acceptance= 1) when GPA = 3.67? d. What is the estimate of the odds ratio? What is its interpretation? Question 2: Consumer Reports provided extensive testing and ratings for 24 treadmills. An overall score, based primarily on ease of use, ergonomics, exercise range, and quality, was developed for each treadmill tested. In general, a higher overall score indicates better performance. The following data show the price, the quality rating, and overall score for the 24 treadmills (Consumer Reports, February 2006). To incorporate the effect of quality, a categorical variable with three levels, we used two dummy variables: Quality-E and Quality-VG. Each variable was coded 0 or 1 as follows. Data: Treadmills.xlsx a. Develop an estimated regression equation that could be used to estimate the overall score given the price and the quality rating. b. For the estimated regression equation developed in part (a), test for overall significance using 𝛼 = 0.10. c. For the estimated regression equation developed in part (a), use the t test to determine the significance of each independent variable. Use 𝛼 = 0.10. d. Check 4 assumptions with stdres is standardize residual. e. Estimate the overall score for a treadmill with a price of $2000 and a good quality rating. How much would the estimate change if the quality rating were very good? Explain. f. Find a 95% confidence interval and a 95% prediction interval for a treadmill with a price of $2000 and a good quality rating. Question 3: A study investigated the relationship between audit delay (Delay), the length of time from a company’s fiscal year-end to the date of the auditor’s report, and variables that describe the client and the auditor. Some of the independent variables that were included in this study follow. Industry: A dummy variable coded 1 if the firm was an industrial company or 0 if the firm was a bank, savings and loan, or insurance company. Public: A dummy variable coded 1 if the company was traded on an organized exchange or over the counter; otherwise coded 0. Quality: A measure of overall quality of internal controls, as judged by the auditor, on a five-point scale ranging from “virtually none” (1) to “excellent” (5). Finished: A measure ranging from 1 to 4, as judged by the auditor, where 1 indicates “all work performed subsequent to year-end” and 4 indicates “most work performed prior to year-end.” Data: Audit.csv a. Develop the estimated regression equation using all of the independent variables. b. Did the estimated regression equation developed in part (a) provide a good fit? Explain. c. On the basis of your observations about the relationship between Delay and Finished, develop an alternative estimated regression equation to the one developed in (a) to explain as much of the variability in Delay as possible. d. Consider a model in which only Industry is used to predict Delay. At a .01 level of significance, test for any positive autocorrelation in the data.