Practice Exam Here are some practice problems that we (Paul Ferguson and I) have compiled. They are very similar to those I will test you on. In fact, many of these problems are straight off of my old exams. There are approximately two tests worth of problems here, but I wanted to give you plenty of practice. Problems are not worth equal points on the final exam. Regression problems are typically longer, have more parts, and therefore receive more points. Good luck! 1) In an attempt to reduce the number of person-hours lost as a result of industrial accidents, a large production plant installed new safety equipment. In a test of the effectiveness of the equipment, a random sample of 50 departments was chosen. The number pf person-hours lost in the month prior to and the month after the installation of the safety equipment was recorded. The percentage change was calculated for the sample and resulted in a mean of -1.2% and a sample standard deviation of s = 5%. a) What conclusion can you draw using a 10% significance level? b) Calculate the p-value for the test in part (a). 2) An automatic machine in a manufacturing process is operating properly if the lengths of an important subcomponent are normally distributed, with mean = 117 cm and std. dev = 5.2 cm. a) Find the probability that one randomly selected unit has a length greater than 120 cm. b) Find the probability that, if four units are randomly selected, their mean length exceeds 120 cm. 3) A lawn and garden retailer operates 4 stores in the DFW metroplex. One of their most popular items is a lawn tractor. Weekly customer demand is N(10,25) at each store. Each store replenishes its stock to 15 lawn tractors at the start of each week. Note: Assume weekly demands at each store are independent. a) What is the probability of a stockout in a single store? b) Suppose the 4 stores decide to pool their stock. Specifically, they decide to pool their weekly allocations (4 X 15 = 60) in a centrally located warehouse and draw from it as needed to satisfy their demand. How often will a store experience a stockout now? c) Comment on whether it is better for each store to have its own inventory or whether a central distribution center is better. 4) Baseball fans marvel at the home runs Barry Bonds hits. To get an idea of how far his home runs travel, a random sample of 6 home runs is used. The distance of each is given below: {440, 460, 420, 470, 460, 450} The descriptive statistics for this data set are: Barry’s HR Distance Mean Standard Error 450 7.302967 Median 455 Mode Standard Deviation 460 17.88854 Sample Variance 320 Kurtosis 0.585937 Skewness -0.94334 Range Minimum Maximum Sum Count 50 420 470 2700 6 a) Assuming distances are normally distributed, construct a 98% confidence interval for the mean distance traveled. b) A number of teams are interested in acquiring Barry as their ‘franchise player’. In particular, the Baltimore Orioles are seriously interested in pursuing him to bolster their forever sagging offense. The manager of the O’s thinks that the stadium in Baltimore, Camden Yards, might hurt Barry’s home run production. He estimates that Barry would need to average better than 430 ft per home run to maintain his offensive production in Camden Yards. Is there sufficient information at the .05 level to put the manager’s mind at ease? c) Calculate the p-value for the test in part (b). 5) REI, an outdoor retailer, operates 125 stores in the western region of the US. One of their most popular items is ‘3-season’ tents (independent of retailers: e.g., North Face, Sierra Designs, Mountain Hardware, etc.). Monthly customer demand at each store for ‘3-season’ tents follows a uniform distribution between 5 and 9 (i.e., U[5,9] at each store; mean = 7 and variance = 1.33). Note: Assume monthly customer demands at each store are independent. a) Assume each store replenishes its stock to 8 units at the start of each month. This stock must last the entire month. What is the probability of a stockout in a single store? b) Suppose REI decides to hold its inventory of ‘3-season’ tents in a centralized warehouse facility in Denver, CO. Each store in the western region draws stock from the warehouse to satisfy its demand. The retailer will replenish the inventory of tents to a level of 900 ‘3-season’ tents at the start of each month in the warehouse (7.2 tents per store). This supply, again, must last for the entire month. What is the probability that these 900 ‘3-season’ tents will satisfy the demand for all 125 stores? c) Suppose REI decides to set a minimum service level to 95% using the inventory from its centralized warehouse. In other words, the retailer wishes to meet monthly demand at all of its western region stores 95% of the time. How many tents need to be stored in the warehouse at the start of each month to meet this minimum service level (and, when stating your answer, answer in the context of actually ordering tents from vendors)? d) Comment on whether it is better for each store to have its own inventory or whether a central distribution center is better. 6) (Yes, this is real data!) In the Excel worksheet “Problem 2 – Portfolio,” you are presented with monthly return data for two investment funds — one Growth fund and one Real Estate Investment Trust (REIT) fund. Please answer/address the following questions: a) b) c) d) e) f) Compute the sample means. Compute the sample variances. Compute the sample standard deviations. Calculate the sample covariance for the funds. Calculate the sample correlation for the funds. Determine the allocations that minimize the variance of the portfolio (you need only determine the allocation to the nearest tenth). Also, estimate the expected return and the standard deviation of this portfolio. Hint: Using Excel, first “guess” the fractions for each investment. Then make systematic adjustments until the minimum variance portfolio is discovered (alternatively, build a table and select the appropriate allocations). Finally, calculate the expected return and standard deviation associated with this portfolio. 7) Profitable banks are ones that make good decisions on loan applications. Credit scoring is a statistical technique that helps banks make that decision. However, many branches overturn credit scoring recommendations, while still others do not use the technique at all. In an attempt to determine the factors that affect loan decisions, a statistician surveyed 100 banks and recorded the percentage of bad loans (any loan that is not completely repaid), the average loan size, and whether a score card (a credit scoring method) is used, and if so, whether the scorecard recommendations are overturned more than 10% of the time. The worksheet “Problem 3 – Banks” contains the data. Column 1 contains the percentage of good loans, column 2 contains the average loan amount, column 3 contains the credit scoring code (1 = no scorecard, 2 = scorecard overturned more than 10% of the time, and 3 = scorecard overturned less than 10% of the time). a) Perform a regression analysis, state the regression equation and assess the fit. b) Interpret and test the coefficients. What does this tell you? c) Predict with 95% confidence the percentage of bad loans for a bank whose average loan is $10,000 and which does not use a scorecard. 8) Physicians have been recommending more exercise for their patients, particularly those who are overweight. One benefit of regular exercise appears to be a reduction in cholesterol, a substance associated with heart disease. To study the relationship more carefully, a physician took a random sample of 50 patients who do not exercise. She measured their cholesterol levels. She then started them on regular exercise programs. After 4 months, she asked each patient how many minutes per week (on average) he or she exercised and also measured their cholesterol levels. The worksheet “Problem 4 – Cholesterol” contains the data. Column 1 contains weekly exercise in minutes, column 2 contains cholesterols level before the exercise program and column 3 contains cholesterol levels after the exercise program. a) Determine the regression equation that relates exercise time with cholesterol reduction. Also, interpret the coefficient(s). b) Discuss the fit of the model. c) Can we conclude at the 5% significance level that the amount of exercise is linearly related to cholesterol reduction? d) Predict with 95% confidence the reduction in cholesterol level of an individual who plans to exercise for 5 hours per week for a total of 4 months. 9) (Yes, this is real data!) Suppose we are interested in buying or selling products through online auctions. What situations are good for buying? What situations are good for selling? To investigate this problem more rigorously, a researcher collected data on winning bid prices for used computers purchased through online auctions. Over an approximately three month interval beginning in May 2002, 488 purchases of Dell’s Latitude CPXH 500GT 500MHz 128MB laptop on eBay were recorded. Data included (1) the winning bid for a particular auction, (2) the day of the week the auction closed, (3) the number of bids in the auction, (4) the number of auctions that closed that day for the same laptop, and (5) the rank of the auction within a day (the order it closed among auctions for the same item). The worksheet “Problem 5 – Ebay” contains the actual data. The day of the week was coded with dummy variables. SUN = 1 if it was a Sunday (0 otherwise), MON = 1 if it was a Monday (0 otherwise), etc. The Excel output for the model is given below (NOTE: As additional practice, you should be able to recreate this output!) Regression Statistics Multiple R 0.471383199 R Square 0.222202121 Adjusted R Square 0.207557391 Standard Error 31.55046086 Observations 488 ANOVA df Regression Residual Total 9 478 487 Intercept SUN MON TUES WED THUR FRI #Bids #AUCTIONS Rank-in-Day Coefficients Standard Error 558.693216 5.726602353 -4.294706396 5.260381458 9.906281109 5.670000132 17.39920411 5.252984387 15.38320751 5.471611839 16.90919123 5.397031356 10.42141417 5.090961399 1.52278322 0.291589103 -0.839917408 0.356722205 -1.761991465 0.411566571 SUN SUN MON TUES WED THUR FRI #Bids #AUCTIONS Rank-in-Day SS 135931.7025 475816.2955 611747.998 MS F Significance F 15103.5225 15.17284 8.43639E-22 995.4315806 t Stat 97.56102861 -0.816424898 1.747139485 3.312251251 2.811458115 3.133054103 2.047042465 5.222359836 -2.354541982 -4.281182168 P-value 0 0.414664 0.081255 0.000996 0.005134 0.001836 0.0412 2.64E-07 0.018949 2.25E-05 Lower 95% Upper 95% 547.4407827 569.9456493 -14.63104331 6.041630523 -1.234932159 21.04749438 7.077401992 27.72100622 4.631815447 26.13459957 6.304345388 27.51403708 0.417977601 20.42485074 0.949827963 2.095738477 -1.54085534 -0.138979476 -2.570695316 -0.953287614 CORRELATION MATRIX WED THUR FRI -0.160313 -0.166031678 -0.185440699 -0.139881 -0.144871165 -0.161806532 -0.158989 -0.164660978 -0.183909765 1 -0.156328035 -0.174602705 -0.156328 1 -0.180831348 -0.174603 -0.180831348 1 0.018363 -0.009551683 -0.015571909 -0.111259 -0.138577096 -0.048708705 -0.065688 -0.081817125 -0.028758044 1 -0.148563898 -0.16885815 -0.160312802 -0.166031678 -0.185440699 0.026809086 0.005221148 0.003082611 MON -0.148563898 1 -0.147337406 -0.139881152 -0.144871165 -0.161806532 0.08939618 -0.054887516 -0.032406068 TUES -0.16885815 -0.147337406 1 -0.158989315 -0.164660978 -0.183909765 0.048372866 0.135913531 0.080244533 1.686543874 0.768556569 0.811538059 0.812097352 0.833052957 0.884575206 -0.135624146 0.171281717 -0.015765442 0.768556569 1.601056562 0.744962263 0.754937389 0.774085283 0.817335382 -0.180339334 0.212786718 -0.020963297 INVERSE OF CORRELATION MATRIX 0.811538059 0.812097 0.833052957 0.884575206 0.744962263 0.754937 0.774085283 0.817335382 1.662092941 0.779789 0.796794509 0.856283502 0.779789225 1.672419 0.826821526 0.868783747 0.796794509 0.826822 1.712524719 0.893256151 0.856283502 0.868784 0.893256151 1.77626704 -0.16076004 -0.11775 -0.094042202 -0.102422467 0.048830138 0.27359 0.302189257 0.230313932 -0.018687329 -0.013688 -0.010931806 -0.011905958 #Bids #AUCTIONS 0.026809 0.005221 0.089396 -0.054888 0.048373 0.135914 0.018363 -0.111259 -0.009552 -0.138577 -0.015572 -0.048709 1 -0.073642 -0.073642 1 -0.119202 0.590409 Rank-in-Day 0.003082611 -0.032406068 0.080244533 -0.065688364 -0.081817125 -0.028758044 -0.119201966 0.590408718 1 -0.135624 -0.180339 -0.16076 -0.11775 -0.094042 -0.102422 1.040647 -0.013248 0.120969 -0.015765442 -0.020963297 -0.018687329 -0.013687649 -0.010931806 -0.011905958 0.1209686 -0.907884524 1.549175529 0.171282 0.212787 0.04883 0.27359 0.302189 0.230314 -0.013248 1.62273 -0.907885 (a) Do the variables included in the model collectively explain a significant amount of the variation in winning bids? Cite the appropriate test, your test statistic, and your conclusion at the .05 level. What is your p-value? (b) What is the price difference between a Dell laptop auctioned on Saturday and one auctioned on Sunday (all other things held equal)? Is this difference statistically significant? Cite your null and alternative hypothesis, the relevant test statistic, and your conclusion at the .05 level. What is the p-value? (c) Suppose you are interested in whether it is better to auction laptops on weekdays or weekends (all other things being equal). What is your general conclusion based on this model? What day would you auction your Dell laptop on (ceteris paribus)? What day would you buy one on (ceteris paribus)? (d) Is there a relationship between the winning bid price and the auction’s position (rank-in-day) in this particular model? Cite an appropriate test, your test statistic, and your conclusion (at the .05 level). What is the p-value for this test? (e) Suppose you wanted to test whether selling laptops on a Tuesday is significantly different (at level .05) than selling on a Monday (ceteris paribus). Construct an appropriate model using the data in the attached excel file. Write down the formal test and your results/conclusions. 10) (Yes, this is real data!) Relief pitchers are baseball’s equivalent to place kickers in the NFL. You bring in some poor sap with the game on the line and he’s either a forgotten hero or a memorable goat. Many great relief pitchers do not have much on their record in the way of wins or losses since their role is to save games, i.e., protect a lead in the late innings. If you want to know what the “experts” think, CBS Sportsline.Com (September 24, 2002) posts ratings for the majority of MLB relief pitchers. Along with the pitchers’ ratings they post assorted “hard data” on performance. The site does not include any information on how they arrive at their expert ratings. The worksheet “Problem 6 — Relief Pitcher,” contains the actual data. (a) Are “wins” related to ratings? Build a simple linear regression model and discuss your results. (b) There appears to be a lot of residual “noise” in the data. Suppose we include other variables to account for this noise. Are wins related to ratings in this new model?