ECOM90009 Quantitative Methods for Business Second Semester, 2018 Third Assignment Due by: 5 pm on Friday, October 19, 2018 This assignment must be submitted by 5 pm on the above due date. Any assignments not submitted by the due date and time will be given a mark of zero. This assignment is marked out of 100 and is worth 10 per cent of the final grade for QMB. The purpose of this assignment is to give you practice working with the underlying concepts of quantitative methods, and to give you feedback on your understanding of these concepts. A group of two, three or four students (but no more than four students) may work together and submit one set of assignment answers for the group. All members of the group, however, MUST be enrolled in the same workshop. For assignments submitted as a group, all valid group members will receive the same mark for the assignment. Students that attempt to submit an assignment with a group that is not in their own workshop, or in a group with more than four members, will not receive any credit for the assignment. Students will form their own groups. The group will allocate one member to submit the answers on behalf of the group. Individuals may work alone if they wish and submit their own assignment answers, but I would urge students to work in groups. All assignments must be submitted with Turnitin. The information that you need to do this can be found from the following link Turnitin student guide: https://www.lms.unimelb.edu.au/user_guides/turnitin_stu_guide.pdf 1 Students MUST copy and paste the template provided below into the top of the first page of their assignment answers, and complete the template, before submitting answers. Subject Code: Subject Name: Assignment Number: Workshop Day and Time: Tutor Name: Student ID Number Student Name 1. 2. 3. 4. It is essential that you include the name of your tutor and your allocated workshop day and time at the top of your assignment answers in order for your assignment to be graded in a timely manner. Assignment answers must be typed, in 12 point font, with 1.5 line spacing. For each question, the written part of your answers must be within the specified word limits provided below. Words in excess of those limits will be ignored during marking. You are not required to write as much as those word limits. Shorter answers, if well-written, concise and clear, may receive as many or more marks than longer answers. Answer the questions directly. Do not present unnecessary graphs or numerical measures, or discuss irrelevant matters. Marks will be deducted if you present inappropriate or unnecessary material. You MUST submit your assignment answers in a Portable Document Format (pdf) file. You should also look closely at the pdf file you upload before confirming your submission, to ensure that all your answers are included as you expect. Good luck. Dr. John Shannon Department of Economics The University of Melbourne 2 ASSIGNMENT QUESTIONS Question 1 (55 marks) Background The Great Place to Stay corporation operates a large chain of motels that provide a medium price service to families on holidays and staff from small businesses on business trips. The corporation has hired an analyst to help them to obtain a suitable multiple regression that can be used when choosing possible new motels to purchase or to build. The corporation wants to ensure that they only add extra motels to the chain that are likely to be profitable. After her initial discussions with the CEO of the Great Place to Stay corporation the analyst decides that an appropriate dependent variable to use in the model is Y the operating margin of a motel expressed in percentages (Margin) After further discussions she and the CEO decide that there are 7 different independent variables that could possibly be included in the multiple regression model. X1 Total number of motel and hotel rooms within 5 kilometres of a motel. (Number) X2 Number of kilometres between a motel and its nearest competitor. (Nearest) X3 Office space in thousands of square metres in offices close enough to the motel for people on business trips to want to use the motel. (OfficeSpace) X4 University enrolment in thousands at the nearest university to the motel. (Enrolment) X5 Median household income in thousands of dollars of households in the area in which the motel is located. (Income) X6 Distance in kilometres from the motel to the Central Business District. (Distance) X7 The Quality dummy variable which is designed to measure the quality of service the motel can provide to customers. This variable has a value of 1 if a motel has been built or extensively renovated in the last 6 years and a value of 0 if it was built or renovated more than 6 years ago. (Quality) In order to develop an appropriate multiple regression model the analyst now selects a random sample of n = 120 Great Place to Stay motels. The analyst places the sample data in the Excel workfile Assignment 3 Motel Profitability Model Data.xlsx. The 120 values for the Margin (Y) variable are placed in cells A3:A122. The 120 values for each of the 7 independent variables can be found in cells B3:H122. The names or labels of the Y variable and the 7 independent X variables are in cells A1:H2. Use a Level of Significance of = 0.05 in all questions in this assignment. 3 Question 1 (contd.) Answer the following questions a: Perform a preliminary analysis for your regression model where you need to: 1. Obtain the descriptive statistics for the Margin (Y) variable and for the 7 independent variables Number (X1), Nearest (X2), OfficeSpace (X3), Enrolment (X4), Income (X5), Distance (X6) and Quality (X7). If we only know the values of the descriptive statistics what will our estimate of the typical value of the Margin (Y) variable be equal to. Please note whether or not the value that you obtain has any limitations or problems. Comment on which if any of the 8 variables in your model do not have a symmetrical distribution.(X2?) What does the Sum of the Quality (X7) dummy variable tell us? 2. Find the set of correlation coefficients for all possible combinations of the 8 variables i.e. the Margin or Y variable and the 7 independent variables Number (X1), Nearest (X2), OfficeSpace (X3), Enrolment (X4), Income (X5), Distance (X6) and Quality (X7). Use the hypothesis testing procedure where the null hypothesis is H0: = 0 and the alternative hypothesis is HA: ≠ 0 and the testing statistic is z = r n . to identify which pairs of the 8 variables have a significant linear relationship. 3. Obtain the 7 different scatter diagrams in which Margin (Y) is always the dependent variable and the seven variables Number (X1), Nearest (X2), OfficeSpace (X3), Enrolment (X4), Income (X5), Distance (X6) and Quality (X7) are the independent variables. Briefly explain what the scatter diagrams and the corresponding correlation coefficients for the Margin (Y) variable are telling us about the possible relationships between the Margin (Y) variable and the different possible independent variables. You should comment on whether these results are consistent with the relationships between the Margin variable and different independent variables you would expect. State what if anything does the scatter diagram for the Margin or Y variable and Quality dummy variable show us. (10 marks) N.B. When you use the Regression tool in Data Analysis in Question 1 you need to click the 4 choices in the Residuals section that appear on the bottom of the dialog box. 4 Question 1 (contd.) b: The CEO of the Great Place to Stay corporation tells the analyst that in the past the firm has used a multiple regression model of the Margin (Y) variable with the following three independent variables namely Number (X1), OfficeSpace (X3) and Income (X5) Use Excel to estimate the multiple regression model in which the dependent variable Margin (Y) is a function of these three independent variables. 1. Use your Excel output to write down your estimated model. Briefly explain what the coefficients of the 2 independent variables Number (X1) and OfficeSpace (X3) are telling us about how Margin is affected by changes in these two independent variables. 2. Using the Predicted values for the Margin or Ŷ values and the Standard Residuals for all 120 motels, (The Standard Residuals are the z scores for the Residuals.) check whether the error terms for this model satisfy the relevant key assumptions about the error terms. Briefly explain why we look at these assumptions concerning the error terms before we use the Excel output to assess the quality of our estimated model. When you do this make sure you briefly explain why in this study we only look at 2 of the 5 standard assumptions about the error term. (10 marks) c: Using the Excel output for this model briefly explain what the values of the F statistic, Rsquared and the Adjusted R-squared are telling us about this estimated model. Using the Excel output for this model briefly explain which of the independent variables has a significant impact on the Margin (Y) variable. Briefly discuss whether there is any evidence that there is a problem with Multicollinearity in this model. (10 marks) d: The analyst decides that it would useful to obtain a parsimonious model which only includes independent variables for which the P-values of all the coefficients are less than 0.05. To obtain this parsimonious model the analyst first estimates a model in which the Margin (Y) variable is a function of all 7 possible independent variables Number (X1), Nearest (X2), OfficeSpace (X3), Enrolment (X4), Income (X5), Distance (X6) and Quality (X7) Using the Excel output write the estimated model. Using the Predicted values for the Margin or Ŷ values and the Standard Residuals for all 120 motels check whether the error terms for this model satisfy the key assumptions reneging Take the model with 7 independent variables and check whether any independent variable has a P-value greater than 0.05. If there are no P-values greater than 0.05 then this model is said to be the parsimonious model. If there any independent variables whose 5 coefficients have a P-value greater than 0.05 remove the variable with the largest P-value and estimate the model with the remaining 6 independent variables. 6 Question 1 (d) (contd.) Repeat this process until you have what is called the parsimonious model in which the P-values for the coefficients of all the independent variables are less than 0.05. Using the Excel output write the estimated Parsimonious model. Using the Predicted values for the Margin or Ŷ values and the Standard Residuals for all 120 motels check whether the error terms for the Parsimonious model satisfy the key assumptions about the error terms. Using the relevant Excel output briefly compare your estimated Parsimonious model with the original estimated model in part (c) of this question. (Make sure you comment on the estimated coefficient values in both models.) (15 marks) e: The analyst tells the CEO of the Great Place to Stay corporation that it would be useful to use both the original model with 3 independent variables and the Parsimonious model when forecasting what the Margin (Y) value will be for a particular motel. To see how well these models forecast the values of the Margin variable the analyst selects 4 of the 120 motels in the random sample namely motels 4, 8, 19 and 94 in the list of 120 motels and obtains the values of the Margin (Y) variable and the 7 possible independent variables which are shown in the following table. Motels Variables 4 8 19 94 Margin (Y) 31.9 50.2 62.8 34.8 Number (X1) 3422 3021 1613 2740 Nearest (X2) 3.3 1.7 1.7 0.6 Office Space (X3) 43.4 57.2 68.6 16.9 Enrolment (X4) 15.5 8.5 21.5 17.2 Income (X5) 41 45 31 38 Distance (X6) 19.4 8.8 6.6 7.3 Quality (X7) 1.0 0.0 1.0 0.0 Using these values obtain the forecasts or estimated Margin values for each motel from both models and compare your forecasts with the actual Margin values. Briefly discuss which model produces the best forecasts. Briefly discuss what features of a motel i.e. what type of values for the different independent variables, seem to make a motel more likely to have a Margin value which is much greater than or much less than what our models indicate the Margin values should be. (10 marks) Total marks for Question 1 7 55 = 10 + 10 + 10 + 15 + 10 Question 2 (45 marks) Background The research section of a large property development firm wants to develop a model which will help them to forecast the number of Housing Starts in Victoria in any month. This will help their firm to better predict how many new houses will come onto the market i.e. the supply of houses, in any future monthly period. The analysts in the research section decide to develop 3 different types of models and use them to produce 12 monthly forecasts. The 3 types of models are an Exponential Smoothing model, the Seasonal Indices model and the Dummy Variables model. The analysts obtains the n = 240 monthly values for Victorian Housing Starts from August 1998 to July 2018. These 240 values can be found in the workfile Assignment 3 Victorian Housing Starts Data.xlsx. This workfile contains three different worksheets Data, Seasonal Indices Model and Dummy Variables Model where Data contains the Dates i.e. the Years and Months, the Victorian Housing Starts i.e. the number of new houses where building started in that month and the trend variable t which has values from 1 to 240. Seasonal Indices Model contains these same Victorian Housing Starts values but here they are arranged in the form of a table with a row for each year and a column for each month. Dummy Variables Model contains these same Victorian Housing Starts values along with 12 different dummy variables for each of the 12 months. It also contains the Trend variable. To obtain a more realistic picture of how well each model forecasts unknown future monthly rainfall levels the analysts decide to divide the data into two periods. 1. They call the 228 values from August 1998 to in July 2017 as the within-sample period. Only data from this period is used to estimate the models 2. They call the 12 values in the period from August 2017 to July 2018 the out-of-sample period. The models estimated using data from the within-sample period are used to estimate the values in the out-of-sample period While the analysts actually know these values in the out-of-sample period they pretend that these values are not known when they estimate the models. They then compare the forecasts from the models with the actual values to see which model works best in a practical situation. Answer the following questions a: To perform a preliminary analysis we examine the Line graph and the Histogram for all 240 monthly values of the Housing Starts in Victoria. We obtain these charts by highlighting all of the 240 values, clicking Insert / Recommended Charts and then separately choosing the two appropriate charts. Using these two charts briefly discuss what you think are the key features of the values of the Housing Starts in Victoria. (4 marks) 8 Question 2 (contd.) b: Obtain the Exponential Smoothing or Averaging forecasts based on a smoothing parameter with a value of 0.10. In Excel we go to the Data worksheet where the Housing Starts in Victoria are stored in cells B1:B241 and click Data / Data Analysis / Exponential Smoothing In this application we will set the smoothing parameter or omega to 0.1. In Excel you are asked to enter the Damping Factor which is equal to 1 minus the smoothing parameter. In this case it will be equal to 0.9 ( = 1 - 0.1). If we use the within-sample data from August 1998 to July 2017 in cells B1:B229 to obtain our forecasts the dialog box we should have will look like this Briefly explain how the size of the smoothing parameter affects the forecasts. Obtain the 12 forecasts for the 12 months in the out-of-sample period. (Note that the final value obtained with the 228 values will give us a forecast for the first month in out-of-sample period August 2017 or period 229. To obtain the forecasts for the remaining months in the out-of-sample period i.e. for periods 230 to 240 you will need to repeat the above process and each time you will add one extra month to your Input Range e.g for the next forecast the Input Range changes from B1:B229 to B1:B230.) Draw a single chart which shows both the linegraph for these forecasts and the linegraph for the actual values of the Housing Starts in Victoria variable. Obtain the Root Mean Square Error (RMSE) for these 12 forecasts. (8 marks) c: The manager now asks the consultant to develop a Multiplicative Time Series model similar to the one which is discussed in Seminar 11. The data which can be used to estimate this model can be found in the Seasonal Indices Model worksheet. You should use the data in the within-sample period when you obtain this model. When obtaining the seasonal indices for the 12 months you are expected to use a 12 month centred moving average (CMA). 9 Question 2 (c) (contd.) Briefly explain what the values of these seasonal indices are telling us about the values of the Housing Starts in Victoria variable in the different months. (8 marks) d: Estimate both the linear and the quadratic trend models using the 228 within-sample values. In these two models Linear model: yt = 0 + 1t + t Quadratic model: yt = 0 + 1t + 1t2 + t the t or Trend variable contains the values from 1 to 228. Briefly discuss which of these two models of the trend you think is the most appropriate and explain what type of trend (if any) is present in the values of the Housing Starts in Victoria variable. Using the most appropriate trend model and values of the Seasonal Indices from part (c) of the question obtain forecasts for the values of the Housing Starts in Victoria variable in the out-of-sample period from August 2017 to July 2018. Draw a single chart which shows both the line graph for these forecasts and the line graph for the actual values of the Housing Starts in Victoria variable. Obtain the Root Mean Square Error (RMSE) for these 12 forecasts. (10 marks) e: Starting with the most appropriate trend model from part (d) of this question add to your model the 11 monthly seasonal dummy variables from February (M2) to December (M12). The values of the monthly seasonal dummy variables M1 to M12 are given in the Dummy Variables Model worksheet. Estimate this model and briefly discuss the quality of the estimated model. Explain how we interpret the value of the coefficient of the February or M2 variable. Using this model obtain forecasts for the values of the Housing Starts in Victoria variable in the out-of-sample period from August 2017 to July 2018. Draw a single chart which shows both the line graph for these forecasts and the line graph for the actual values of the Housing Starts in Victoria variable. Obtain the Root Mean Square Error (RMSE) for these 12 forecasts. (10 marks) f: Briefly discuss which set of forecasts from the 3 models do you think would be the most useful to the research section of the large property development firm. (5 marks) Total marks for Question 2 45 = 4 + 8 + 8 + 10 + 10 + 5 END OF ASSIGNMENT 3 10