Homework Assignment 5

advertisement
Homework Assignment 5
Carlos M. Carvalho
Statistics – Texas MBA
McCombs School of Business
Problem 1
The data file is available in class website. It contains information on 18 projects developed
at a firm. The variables included in the file are:
• Profit: profit of the project in thousands of dollars
• RD: expenditure on research and development for the project in thousands of dollars
• Risk: a measure of risk assigned at the outset of the project
(a) Regress Profit on RD and Risk. Does there appear to be a relationship between
Profit and RD after the risk of the project has been controlled for?
(b) Plot residuals versus fitted values. Do any of the regression assumptions appear to be
violated? If yes, state which assumptions and justify your answer.
(c) Plot residuals versus RD and residuals versus Risk. Based on these plots suggest
a correction for any violation detected in (b)? Try implementing your suggested
correction. Does your new model appear to be an improvement over the original
model? Justify your answer.
(d) What does your new model suggest is the expected change in Profit when RD changes?
What is the expected change in Profit when Risk changes?
1
Problem 2
This question is based on the Compass Maritime Services Case Study. The dataset
is available in the file class website. Read the case study before answering the questions
below. Put your self into Basil Karatzas shoes and decide, based on a regression model,
how much to bid for the Bet Performer. Write a short and professional report (two pages
max) describing your analysis and justifying your final bid.
2
Problem 3
Read the “Oakland As (A)” case in the course pack. The data is available in the course
website. The tab in the spreadsheet labeled Full Data Set contains the data in Exhibit 1
of the case while the tab labeled Nobel Data contains the attendance figures for the games
Nobel pitched in and those he did not pitch in.
(a) Compute the descriptive statistics for the attendance at the games Nobel pitched in
and those he did not pitch in. What is the difference in the average attendance for
these two sets of games? Does this provide meaningful evidence that Nobel should be
paid more because attendance was higher in the games he pitched in?
(b) Plot Ticket against Time (i.e. create a time series plot of Ticket). Do you see any
patterns in the data?
(c) Run the regression
T ickett = β0 + β1 N obelt + t
where N obel is a dummy variable that takes the value 1 when Nobel starts on day t.
What are the estimates of β0 and β1 ? How do these relate to the average attendance
figures computed in part (a)?
(d) Do the residuals from the regression in part (c) appear to be independent? Why or
why not? If they are not independent, what factors might explain the pattern?
(e) Run the regression
T ickett = β0 + β1 P ost + β2 GBt + β3 T empt + β4 P rect + β5 T OGt + β6 T Vt +
+β7 P romot + β8 N obelt + β9 Y ankst + β10 W eekendt + β11 ODt + β12 DHt + t
Do the residuals from this regression appear to be independent? (It is a close call but
assume they are independent.) Why would these residuals be independent while the
residuals from the model in part (c) are dependent?
(f) What evidence is there about Nobel pitching in a game being related to the attendance
at the game? Do you have more confidence in drawing a conclusion from the model
in part (c) or the model in part (e) to answer this question? Why?
(g) Do you think Nobels agent has a legitimate case that Nobel should be paid more
because he brings fans to the games?
3
Problem 4
Read the “Oakland As (B)” case in the course packet. The data file is available in the class
website.
(a) Run a regression of Attendance against Wins. What is the interpretation of the
coefficient associated with Wins? What is the interpretation of R2 in this regression?
What is the practical problem associated with using this model to forecast Attendance
for the next season (i.e. to forecast attendance in the 1981 season)?
(b) Now run a regression of Attendance against Roddeys forecast of the number of wins
for that season. Why is the R2 value obtained from this regression so much lower
than the R2 obtained from the regression in part (a)?
(c) Why is it more appropriate to use the model in part (b) for forecasting Attendance
than the model in part (a)?
(d) Before the 1981 season starts Roddey forecasts 95 wins for the season. Using the
model from part (b), what is the prediction for attendance in the 1981 season? What
is the standard deviation associated with the prediction?
(e) Using the prediction and standard deviation for the prediction from the model in part
(b), what is the probability associated with a bonus to Nobel of $0, $50,000, $100,000
and $150,000? What is the mean of this distribution?
(e) Using the probability distribution from part (d), what is the expected cost if the
lump-sum incentive plan is used?
4
Problem 5: Electricity Demand (Optional)
To operate effectively, power companies must be able to predict daily peak demand for
electricity. Demand is defined as the rate (measured in megawatts) at which energy is
delivered to customers. Power companies are always trying to improve their statistical
models for daily peak demand as these models provide a short-term forecast that helps
in their economic and infrastructural planning. You have been given the task to help
create a new model for peak electricity demand for Gulf Energy, an Alabama-based utilities
Company. The data made available to you consist of daily observations, for an entire year
(starting in January), on peak demand recorded in a representative area of Gulf Energy
activity. You also have available data on the observed temperature in each day as well as
whether it is a weekday, Saturday or Sunday.
The data collected is available in the class website. Using regression models you should provide a model for daily electricity demand. Write a short report (max two pages) explaining
your analysis and your proposed model. In addition, provide a ballpark estimate of peak
demand for each month of the next year.
5
Problem 6: Wine Sales (Optional)
Read the case “Northern Napa Valley Winery, Inc.” in the course packet. The data file is
available in the course website. The file contains the monthly wine sales for the Northern
Napa Valley Winery for the period January, 1988 through August, 1996.
The goal of this case is to provide monthly forecasts for wine sales in the next twelve
months, i.e. September, 1996 through August, 1997, and to combine the monthly forecasts
to provide a forecast of annual sales for these twelve months.
(a) Using the appropriate time series plots decide whether sales or the percentage change
in sales, denoted PctChange, should be analyzed. Fit a model to the appropriate
variable (Sales or PctChange) that accounts for the trend and seasonality in the data.
(b) Are the residuals from this model independent? If not, modify the model so the
residuals are independent.
(c) Using the final model obtained above, provide numerical forecasts of wine sales in
September, 1996 and October, 1996.
(d) Given the forecasts for the months September, 1996 through August, 1997, explain
briefly how you could get a forecast of annual sales for the year encompassing these
twelve months. You do not have to do the actual calculations, just provide a brief
explanation of what you would do if the twelve monthly forecasts were available.
(e) Provide a 68% prediction interval for the forecast of sales in September, 1996.
6
Download