Homework Assignment 5 Carlos M. Carvalho Statistics – Texas MBA McCombs School of Business Problem 1 The data file is available in class website. It contains information on 18 projects developed at a firm. The variables included in the file are: • Profit: profit of the project in thousands of dollars • RD: expenditure on research and development for the project in thousands of dollars • Risk: a measure of risk assigned at the outset of the project (a) Regress Profit on RD and Risk. Does there appear to be a relationship between Profit and RD after the risk of the project has been controlled for? (b) Plot residuals versus fitted values. Do any of the regression assumptions appear to be violated? If yes, state which assumptions and justify your answer. (c) Plot residuals versus RD and residuals versus Risk. Based on these plots suggest a correction for any violation detected in (b)? Try implementing your suggested correction. Does your new model appear to be an improvement over the original model? Justify your answer. (d) What does your new model suggest is the expected change in Profit when RD changes? What is the expected change in Profit when Risk changes? 1 Problem 2 This question is based on the Compass Maritime Services Case Study. The dataset is available in the file class website. Read the case study before answering the questions below. Put your self into Basil Karatzas shoes and decide, based on a regression model, how much to bid for the Bet Performer. Write a short and professional report (two pages max) describing your analysis and justifying your final bid. 2 Problem 3 Read the “Oakland As (A)” case in the course pack. The data is available in the course website. The tab in the spreadsheet labeled Full Data Set contains the data in Exhibit 1 of the case while the tab labeled Nobel Data contains the attendance figures for the games Nobel pitched in and those he did not pitch in. (a) Compute the descriptive statistics for the attendance at the games Nobel pitched in and those he did not pitch in. What is the difference in the average attendance for these two sets of games? Does this provide meaningful evidence that Nobel should be paid more because attendance was higher in the games he pitched in? (b) Plot Ticket against Time (i.e. create a time series plot of Ticket). Do you see any patterns in the data? (c) Run the regression T ickett = β0 + β1 N obelt + t where N obel is a dummy variable that takes the value 1 when Nobel starts on day t. What are the estimates of β0 and β1 ? How do these relate to the average attendance figures computed in part (a)? (d) Do the residuals from the regression in part (c) appear to be independent? Why or why not? If they are not independent, what factors might explain the pattern? (e) Run the regression T ickett = β0 + β1 P ost + β2 GBt + β3 T empt + β4 P rect + β5 T OGt + β6 T Vt + +β7 P romot + β8 N obelt + β9 Y ankst + β10 W eekendt + β11 ODt + β12 DHt + t Do the residuals from this regression appear to be independent? (It is a close call but assume they are independent.) Why would these residuals be independent while the residuals from the model in part (c) are dependent? (f) What evidence is there about Nobel pitching in a game being related to the attendance at the game? Do you have more confidence in drawing a conclusion from the model in part (c) or the model in part (e) to answer this question? Why? (g) Do you think Nobels agent has a legitimate case that Nobel should be paid more because he brings fans to the games? 3 Problem 4 Read the “Oakland As (B)” case in the course packet. The data file is available in the class website. (a) Run a regression of Attendance against Wins. What is the interpretation of the coefficient associated with Wins? What is the interpretation of R2 in this regression? What is the practical problem associated with using this model to forecast Attendance for the next season (i.e. to forecast attendance in the 1981 season)? (b) Now run a regression of Attendance against Roddeys forecast of the number of wins for that season. Why is the R2 value obtained from this regression so much lower than the R2 obtained from the regression in part (a)? (c) Why is it more appropriate to use the model in part (b) for forecasting Attendance than the model in part (a)? (d) Before the 1981 season starts Roddey forecasts 95 wins for the season. Using the model from part (b), what is the prediction for attendance in the 1981 season? What is the standard deviation associated with the prediction? (e) Using the prediction and standard deviation for the prediction from the model in part (b), what is the probability associated with a bonus to Nobel of $0, $50,000, $100,000 and $150,000? What is the mean of this distribution? (e) Using the probability distribution from part (d), what is the expected cost if the lump-sum incentive plan is used? 4 Problem 5: Electricity Demand (Optional) To operate effectively, power companies must be able to predict daily peak demand for electricity. Demand is defined as the rate (measured in megawatts) at which energy is delivered to customers. Power companies are always trying to improve their statistical models for daily peak demand as these models provide a short-term forecast that helps in their economic and infrastructural planning. You have been given the task to help create a new model for peak electricity demand for Gulf Energy, an Alabama-based utilities Company. The data made available to you consist of daily observations, for an entire year (starting in January), on peak demand recorded in a representative area of Gulf Energy activity. You also have available data on the observed temperature in each day as well as whether it is a weekday, Saturday or Sunday. The data collected is available in the class website. Using regression models you should provide a model for daily electricity demand. Write a short report (max two pages) explaining your analysis and your proposed model. In addition, provide a ballpark estimate of peak demand for each month of the next year. 5 Problem 6: Wine Sales (Optional) Read the case “Northern Napa Valley Winery, Inc.” in the course packet. The data file is available in the course website. The file contains the monthly wine sales for the Northern Napa Valley Winery for the period January, 1988 through August, 1996. The goal of this case is to provide monthly forecasts for wine sales in the next twelve months, i.e. September, 1996 through August, 1997, and to combine the monthly forecasts to provide a forecast of annual sales for these twelve months. (a) Using the appropriate time series plots decide whether sales or the percentage change in sales, denoted PctChange, should be analyzed. Fit a model to the appropriate variable (Sales or PctChange) that accounts for the trend and seasonality in the data. (b) Are the residuals from this model independent? If not, modify the model so the residuals are independent. (c) Using the final model obtained above, provide numerical forecasts of wine sales in September, 1996 and October, 1996. (d) Given the forecasts for the months September, 1996 through August, 1997, explain briefly how you could get a forecast of annual sales for the year encompassing these twelve months. You do not have to do the actual calculations, just provide a brief explanation of what you would do if the twelve monthly forecasts were available. (e) Provide a 68% prediction interval for the forecast of sales in September, 1996. 6