Quantity research

advertisement
ECOM90009
Quantitative Methods for Business
Second Semester, 2018
Third Assignment
Due by: 5 pm on Friday, October 19, 2018
This assignment must be submitted by 5 pm on the above due date.
Any assignments not submitted by the due date and time will be given a mark of zero.
This assignment is marked out of 100 and is worth 10 per cent of the final grade for QMB.
The purpose of this assignment is to give you practice working with the underlying concepts
of quantitative methods, and to give you feedback on your understanding of these concepts.
A group of two, three or four students (but no more than four students) may work together and
submit one set of assignment answers for the group. All members of the group, however,
MUST be enrolled in the same workshop.
For assignments submitted as a group, all valid group members will receive the same mark for
the assignment. Students that attempt to submit an assignment with a group that is not in their
own workshop, or in a group with more than four members, will not receive any credit for the
assignment. Students will form their own groups. The group will allocate one member to submit
the answers on behalf of the group. Individuals may work alone if they wish and submit their
own assignment answers, but I would urge students to work in groups.
All assignments must be submitted with Turnitin. The information that you need to do this can
be found from the following link
Turnitin student guide: https://www.lms.unimelb.edu.au/user_guides/turnitin_stu_guide.pdf
1
Students MUST copy and paste the template provided below into the top of the first page of
their assignment answers, and complete the template, before submitting answers.
Subject Code:
Subject Name:
Assignment Number:
Workshop Day and Time:
Tutor Name:
Student ID Number
Student Name
1.
2.
3.
4.
It is essential that you include the name of your tutor and your allocated workshop day and
time at the top of your assignment answers in order for your assignment to be graded in a
timely manner.
Assignment answers must be typed, in 12 point font, with 1.5 line spacing. For each
question, the written part of your answers must be within the specified word limits provided
below. Words in excess of those limits will be ignored during marking. You are not required
to write as much as those word limits. Shorter answers, if well-written, concise and clear,
may receive as many or more marks than longer answers. Answer the questions directly. Do
not present unnecessary graphs or numerical measures, or discuss irrelevant matters.
Marks will be deducted if you present inappropriate or unnecessary material.
You MUST submit your assignment answers in a Portable Document Format (pdf) file. You
should also look closely at the pdf file you upload before confirming your submission, to ensure
that all your answers are included as you expect.
Good luck.
Dr. John Shannon
Department of Economics
The University of Melbourne
2
ASSIGNMENT QUESTIONS
Question 1 (55 marks)
Background
The Great Place to Stay corporation operates a large chain of motels that provide a medium
price service to families on holidays and staff from small businesses on business trips. The
corporation has hired an analyst to help them to obtain a suitable multiple regression that can
be used when choosing possible new motels to purchase or to build. The corporation wants to
ensure that they only add extra motels to the chain that are likely to be profitable.
After her initial discussions with the CEO of the Great Place to Stay corporation the analyst
decides that an appropriate dependent variable to use in the model is
Y
the operating margin of a motel expressed in percentages (Margin)
After further discussions she and the CEO decide that there are 7 different independent
variables that could possibly be included in the multiple regression model.
X1 Total number of motel and hotel rooms within 5 kilometres of a motel. (Number)
X2 Number of kilometres between a motel and its nearest competitor. (Nearest)
X3 Office space in thousands of square metres in offices close enough to the motel for
people on business trips to want to use the motel. (OfficeSpace)
X4 University enrolment in thousands at the nearest university to the motel. (Enrolment)
X5 Median household income in thousands of dollars of households in the area in which
the motel is located. (Income)
X6 Distance in kilometres from the motel to the Central Business District. (Distance)
X7 The Quality dummy variable which is designed to measure the quality of service the
motel can provide to customers. This variable has a value of 1 if a motel has been
built or extensively renovated in the last 6 years and a value of 0 if it was built or
renovated more than 6 years ago. (Quality)
In order to develop an appropriate multiple regression model the analyst now selects a random
sample of n = 120 Great Place to Stay motels. The analyst places the sample data in the Excel
workfile Assignment 3 Motel Profitability Model Data.xlsx.
The 120 values for the Margin (Y) variable are placed in cells A3:A122.
The 120 values for each of the 7 independent variables can be found in cells B3:H122.
The names or labels of the Y variable and the 7 independent X variables are in cells A1:H2.
Use a Level of Significance of  = 0.05 in all questions in this assignment.
3
Question 1 (contd.)
Answer the following questions
a: Perform a preliminary analysis for your regression model where you need to:
1. Obtain the descriptive statistics for the Margin (Y) variable and for the 7 independent
variables Number (X1), Nearest (X2), OfficeSpace (X3), Enrolment (X4), Income (X5),
Distance (X6) and Quality (X7). If we only know the values of the descriptive statistics
what will our estimate of the typical value of the Margin (Y) variable be equal to. Please
note whether or not the value that you obtain has any limitations or problems.
Comment on which if any of the 8 variables in your model do not have a symmetrical
distribution.(X2?)
What does the Sum of the Quality (X7) dummy variable tell us?
2. Find the set of correlation coefficients for all possible combinations of the 8 variables
i.e. the Margin or Y variable and the 7 independent variables Number (X1), Nearest
(X2), OfficeSpace (X3), Enrolment (X4), Income (X5), Distance (X6) and Quality (X7).
Use the hypothesis testing procedure where the null hypothesis is H0:  = 0 and the
alternative hypothesis is HA:  ≠ 0 and the testing statistic is z = r n . to identify which
pairs of the 8 variables have a significant linear relationship.
3. Obtain the 7 different scatter diagrams in which Margin (Y) is always the dependent
variable and the seven variables
Number (X1), Nearest (X2), OfficeSpace (X3), Enrolment (X4), Income (X5),
Distance (X6) and Quality (X7)
are the independent variables.
Briefly explain what the scatter diagrams and the corresponding correlation coefficients
for the Margin (Y) variable are telling us about the possible relationships between the
Margin (Y) variable and the different possible independent variables. You should
comment on whether these results are consistent with the relationships between the
Margin variable and different independent variables you would expect.
State what if anything does the scatter diagram for the Margin or Y variable and Quality
dummy variable show us.
(10 marks)
N.B. When you use the Regression tool in Data Analysis in Question 1 you need to click the 4
choices in the Residuals section that appear on the bottom of the dialog box.
4
Question 1 (contd.)
b: The CEO of the Great Place to Stay corporation tells the analyst that in the past the firm
has used a multiple regression model of the Margin (Y) variable with the following three
independent variables namely
Number (X1), OfficeSpace (X3) and Income (X5)
Use Excel to estimate the multiple regression model in which the dependent variable
Margin (Y) is a function of these three independent variables.
1. Use your Excel output to write down your estimated model.
Briefly explain what the coefficients of the 2 independent variables Number (X1) and
OfficeSpace (X3) are telling us about how Margin is affected by changes in these two
independent variables.
2. Using the Predicted values for the Margin or Ŷ values and the Standard Residuals for
all 120 motels, (The Standard Residuals are the z scores for the Residuals.) check
whether the error terms for this model satisfy the relevant key assumptions about the
error terms.
Briefly explain why we look at these assumptions concerning the error terms before we
use the Excel output to assess the quality of our estimated model. When you do this
make sure you briefly explain why in this study we only look at 2 of the 5 standard
assumptions about the error term.
(10 marks)
c: Using the Excel output for this model briefly explain what the values of the F statistic, Rsquared and the Adjusted R-squared are telling us about this estimated model.
Using the Excel output for this model briefly explain which of the independent variables
has a significant impact on the Margin (Y) variable.
Briefly discuss whether there is any evidence that there is a problem with Multicollinearity
in this model.
(10 marks)
d: The analyst decides that it would useful to obtain a parsimonious model which only
includes independent variables for which the P-values of all the coefficients are less than
0.05. To obtain this parsimonious model the analyst first estimates a model in which the
Margin (Y) variable is a function of all 7 possible independent variables
Number (X1), Nearest (X2), OfficeSpace (X3), Enrolment (X4), Income (X5),
Distance (X6) and Quality (X7)
Using the Excel output write the estimated model.

Using the Predicted values for the Margin or Ŷ values and the Standard Residuals
for all 120 motels check whether the error terms for this model satisfy the key
assumptions
reneging
Take the model with 7 independent variables and check whether any independent
variable has a P-value greater than 0.05. If there are no P-values greater than 0.05 then
this model is said to be the parsimonious model. If there any independent variables whose
5
coefficients have a P-value greater than 0.05 remove the variable with the largest P-value
and estimate the model with the remaining 6 independent variables.
6
Question 1 (d) (contd.)
Repeat this process until you have what is called the parsimonious model in which the
P-values for the coefficients of all the independent variables are less than 0.05. Using
the Excel output write the estimated Parsimonious model.
Using the Predicted values for the Margin or Ŷ values and the Standard Residuals for all
120 motels check whether the error terms for the Parsimonious model satisfy the key
assumptions about the error terms.
Using the relevant Excel output briefly compare your estimated Parsimonious model with
the original estimated model in part (c) of this question.
(Make sure you comment on the estimated coefficient values in both models.) (15 marks)
e: The analyst tells the CEO of the Great Place to Stay corporation that it would be useful to
use both the original model with 3 independent variables and the Parsimonious model
when forecasting what the Margin (Y) value will be for a particular motel. To see how well
these models forecast the values of the Margin variable the analyst selects 4 of the 120
motels in the random sample namely motels 4, 8, 19 and 94 in the list of 120 motels and
obtains the values of the Margin (Y) variable and the 7 possible independent variables
which are shown in the following table.
Motels
Variables
4
8
19
94
Margin (Y)
31.9
50.2
62.8
34.8
Number (X1)
3422
3021
1613
2740
Nearest (X2)
3.3
1.7
1.7
0.6
Office Space (X3)
43.4
57.2
68.6
16.9
Enrolment (X4)
15.5
8.5
21.5
17.2
Income (X5)
41
45
31
38
Distance (X6)
19.4
8.8
6.6
7.3
Quality (X7)
1.0
0.0
1.0
0.0
Using these values obtain the forecasts or estimated Margin values for each motel from both
models and compare your forecasts with the actual Margin values. Briefly discuss which
model produces the best forecasts. Briefly discuss what features of a motel i.e. what type of
values for the different independent variables, seem to make a motel more likely to have a
Margin value which is much greater than or much less than what our models indicate the
Margin values should be.
(10 marks)
Total marks for Question 1
7
55 = 10 + 10 + 10 + 15 + 10
Question 2 (45 marks)
Background
The research section of a large property development firm wants to develop a model which will
help them to forecast the number of Housing Starts in Victoria in any month. This will help their
firm to better predict how many new houses will come onto the market i.e. the supply of houses,
in any future monthly period. The analysts in the research section decide to develop 3 different
types of models and use them to produce 12 monthly forecasts. The 3 types of models are an
Exponential Smoothing model, the Seasonal Indices model and the Dummy Variables model.
The analysts obtains the n = 240 monthly values for Victorian Housing Starts from August 1998
to July 2018. These 240 values can be found in the workfile Assignment 3 Victorian Housing
Starts Data.xlsx. This workfile contains three different worksheets Data, Seasonal Indices
Model and Dummy Variables Model where
Data contains the Dates i.e. the Years and Months, the Victorian Housing Starts i.e. the
number of new houses where building started in that month and the trend variable t which
has values from 1 to 240.
Seasonal Indices Model contains these same Victorian Housing Starts values but here they
are arranged in the form of a table with a row for each year and a column for each month.
Dummy Variables Model contains these same Victorian Housing Starts values along with
12 different dummy variables for each of the 12 months. It also contains the Trend variable.
To obtain a more realistic picture of how well each model forecasts unknown future monthly
rainfall levels the analysts decide to divide the data into two periods.
1. They call the 228 values from August 1998 to in July 2017 as the within-sample period.
Only data from this period is used to estimate the models
2. They call the 12 values in the period from August 2017 to July 2018 the out-of-sample
period. The models estimated using data from the within-sample period are used to estimate
the values in the out-of-sample period
While the analysts actually know these values in the out-of-sample period they pretend that
these values are not known when they estimate the models. They then compare the forecasts
from the models with the actual values to see which model works best in a practical situation.
Answer the following questions
a: To perform a preliminary analysis we examine the Line graph and the Histogram for all 240
monthly values of the Housing Starts in Victoria. We obtain these charts by highlighting all
of the 240 values, clicking
Insert / Recommended Charts
and then separately choosing the two appropriate charts.
Using these two charts briefly discuss what you think are the key features of the values of
the Housing Starts in Victoria.
(4 marks)
8
Question 2 (contd.)
b: Obtain the Exponential Smoothing or Averaging forecasts based on a smoothing
parameter with a value of 0.10. In Excel we go to the Data worksheet where the Housing
Starts in Victoria are stored in cells B1:B241 and click
Data / Data Analysis / Exponential Smoothing
In this application we will set the smoothing parameter or omega to 0.1. In Excel you are
asked to enter the Damping Factor which is equal to 1 minus the smoothing parameter. In
this case it will be equal to 0.9 ( = 1 - 0.1). If we use the within-sample data from August
1998 to July 2017 in cells B1:B229 to obtain our forecasts the dialog box we should have
will look like this
Briefly explain how the size of the smoothing parameter affects the forecasts.
Obtain the 12 forecasts for the 12 months in the out-of-sample period.
(Note that the final value obtained with the 228 values will give us a forecast for the first
month in out-of-sample period August 2017 or period 229. To obtain the forecasts for the
remaining months in the out-of-sample period i.e. for periods 230 to 240 you will need to
repeat the above process and each time you will add one extra month to your Input Range
e.g for the next forecast the Input Range changes from B1:B229 to B1:B230.)
Draw a single chart which shows both the linegraph for these forecasts and the linegraph
for the actual values of the Housing Starts in Victoria variable.
Obtain the Root Mean Square Error (RMSE) for these 12 forecasts.
(8 marks)
c: The manager now asks the consultant to develop a Multiplicative Time Series model
similar to the one which is discussed in Seminar 11. The data which can be used to
estimate this model can be found in the Seasonal Indices Model worksheet. You should
use the data in the within-sample period when you obtain this model.
When obtaining the seasonal indices for the 12 months you are expected to use a 12
month centred moving average (CMA).
9
Question 2 (c) (contd.)
Briefly explain what the values of these seasonal indices are telling us about the values of
the Housing Starts in Victoria variable in the different months.
(8 marks)
d: Estimate both the linear and the quadratic trend models using the 228 within-sample
values. In these two models
Linear model:
yt = 0 + 1t + t
Quadratic model:
yt = 0 + 1t + 1t2 + t
the t or Trend variable contains the values from 1 to 228.
Briefly discuss which of these two models of the trend you think is the most appropriate
and explain what type of trend (if any) is present in the values of the Housing Starts in
Victoria variable.
Using the most appropriate trend model and values of the Seasonal Indices from part (c)
of the question obtain forecasts for the values of the Housing Starts in Victoria variable in
the out-of-sample period from August 2017 to July 2018.
Draw a single chart which shows both the line graph for these forecasts and the line graph
for the actual values of the Housing Starts in Victoria variable.
Obtain the Root Mean Square Error (RMSE) for these 12 forecasts.
(10 marks)
e: Starting with the most appropriate trend model from part (d) of this question add to your
model the 11 monthly seasonal dummy variables from February (M2) to December (M12).
The values of the monthly seasonal dummy variables M1 to M12 are given in the Dummy
Variables Model worksheet.
Estimate this model and briefly discuss the quality of the estimated model.
Explain how we interpret the value of the coefficient of the February or M2 variable.
Using this model obtain forecasts for the values of the Housing Starts in Victoria variable
in the out-of-sample period from August 2017 to July 2018.
Draw a single chart which shows both the line graph for these forecasts and the line graph
for the actual values of the Housing Starts in Victoria variable.
Obtain the Root Mean Square Error (RMSE) for these 12 forecasts.
(10 marks)
f: Briefly discuss which set of forecasts from the 3 models do you think would be the most
useful to the research section of the large property development firm.
(5 marks)
Total marks for Question 2
45 = 4 + 8 + 8 + 10 + 10 + 5
END OF ASSIGNMENT 3
10
Download