Time Series Forecasting Homework Problems

advertisement
R: 2/3/16
Time Series Forecasting Homework Problems
The videos for this module walk through completing the first homework problem.
Follow along and complete the first homework problem. on the time series day class
schedule demonstrate how to do time series forecasting on the Amtrak example.
Work through the tutorial before working on this homework.
1. Department Store Problem
Data Description
This DeptStore.csv file reflects the quarterly sales of a department store from 2009
through 2014. Forecast the sales for the year 2015.
The screen capture below shows the first several records in the downloaded raw
data.
Top of raw data
Prepare the Data
You will not use the data in columns A and B directly in your model building. Instead
you will use them to generate data that you will use to build the model. Create a
textual value for QtrName (1st, 2nd, 3rd, 4th), so that dummy variables will be
generated. The time component in this model is quarter, so create t_qtr = 1, 2, 3,
…27, 28.
Below, the first several records and last records are shown of how the data should
look when it ready to be partitioned so that it can be used in the forecasting process.
Top of prepped data
Bottom of prepped data
The dataset contains values through the end of 2014. Create input values for the
year 2015 so that you will be able to forecast the sales for the quarters in 2015.
Create Partitions
The table below summarizes which records to use for four datasets. Use VisMiner to
create the four datasets. The video tutorials walk you through this process.
Training
Validation
TrainPlusValid
Forecast (Future)
Begin
1st quarter 2009
1st quarter 2014
1st quarter 2009
1st quarter 2015
End
4th quarter 2013
4th quarter 2014
4th quarter 2014
4th quarter 2015
Number of
Records
20
4
24
4
Visualize the Data Trend
a. Create an MS Excel chart based on the data from 2009 through 2014 to
visualize the data for the trend and assess the strength of a linear and
polynomial relationship between t_qtr and ySales. Note: Remember to only
plot the TrainPlusValid data. Do not include the records to be forecasted
since you have placed zeros there are placeholders. If you include these in
the trendline, it will mess up the R2 values.
Use Excel to calculate the R2 for a linear relationship between t_qtr and
ySales. Record the R2 for a linear relationship.
2
Use Excel to determine the strength of R2 for a polynomial relationship
between t_qtr and ySales. Record the R2 for that relationship.
Create and Evaluate Models
b. Complete the three models below. The table below shows which input
variables should be included in each model. Determine which of the models
produces the most accurate results.
Note: In VisMiner, the way to determine which input variables will be
included in a model is to create a derived dataset that only includes your
chosen input variables and the output variable. The t_qtr2 is not in your
prepped data because the Polynomial 2nd Order modeler in VisMiner will
create the squared term from the non-squared term automatically. Some
check figures are provided.
c. Fill in the quality metrics in the table below. Note: The mouse-over feature in
VisMiner shows the RMSE and R2. We could also calculate MAE and MAPE in
MS Excel if we chose to, but it is not necessary for the purpose of this
exercise.
Model
1.1
1.2
1.3
Input Variables to Include
Qtr
t_qtr
t_qtr2
√
√
√
√
√
√
Model quality metrics for validation
dataset
R2
RMSE
MAE
MAPE
0.206
14,785
5,514
4.4%
d. Save a screen capture of the coefficients of the most accurate models.
Update the Coefficients and Create the Forecast
e. Now that you know which model has the best fit, use the TrainPlusValid
dataset to re-estimate the coefficients for that model. Also save a screen
capture of the coefficients of the model.
f. With the resulting model, forecast the sales for Forecast dataset. Save a
screen capture of the forecasted data. Check figure: The first forecasted
ySales value should be 72,406.
3
2. Berlin Tunnel Homework Problem
Data Description
This data reflects the number of vehicles that went through the Berlin tunnel each
day for about two years. Download the Tunnel2.csv dataset.
The dataset contains only the Date and yVehicles columns, so you will need to create
additional data columns so that you can use them in your analysis.
The last day that we have historical data for is 11/16/05. To make the process
easier, I have added dates for which we want to create forecasts from 11/17/05
through 12/7/05, and inserted a zero to act as a place holder for the number of
vehicles for records where we will need to generate a forecast.
First Records
Shows records where known traffic ends
and placeholders for the forecasted
values begins.
Prepare the Data
In this problem, instead of using Excel, use the date manipulation functions in
VisMiner (Create derived dataset then Computed Columns) to create additional data
that correspond to the given dates. Specifically, use the following date manipulation
functions to create the following fields:
4
Date manipulation
function in VisMiner
To create this
field
DateDiff
t_day
MonthNbrName
Month
DayOfWeek
Wday
Explanation
Time index for day. In this function
<later date> is the Date column and
<earlier date> is “10/31/03” so that
11/1/03 will be computed to be day
1. As explained in an earlier VisMiner
reading, specific dates must be
surrounded by quotation marks in
VisMiner when creating calculated
fields.
Dummy Variables for Months (e.g.,
1Jan, 2Feb, 3Maretc)
Dummy variables for day of week
(e.g., 1Sun, 2Mon, 3Tue, etc.)
Below, some records are shown of how the data should look when it ready to be
partitioned so that it can be used in the forecasting process.
First few records
Bottom records with know values
The table below shows the top and bottom records in the dataset to be forecasted.
First few records
Bottom records
5
Create Partitions
The table below shows which records should be in each partition. The videos show
how to create these partitions in VisMiner.
Training
Validation
TrainPlusValid
Forecast (Future)
Begin
11/1/03
10/1/05
11/1/03
11/17/05
End
9/30/05
11/16/05
11/16/05
12/7/05
Records
1-700
701-747
1-747
748-768
Number of
Records
700
47
747
21
Visualize the Data Trend
a. Create an MS Excel line chart for traffic from 11/1/03 up through the
validation data (11/16/05) to visualize the data for the trend and assess the
strength of a linear and polynomial relationship between t_day and
yVehicles. Use Excel to calculate the R2 for a linear relationship and a
polynomial relationship between tunnel traffic and t_day. Record the R2 for
the linear relationship and the R2 for the polynomial relationship.
Create and Evaluate Models
b. Complete the three models below. The table below shows which input
variables should be included in each model. Some check figures are
provided. Determine which of the models produces the most accurate results.
Use Excel to calculate the model quality metrics for the validation dataset for
each model. Fill in the quality metrics in the table below.
Model
2.1
2.2
2.2
Input Variables to Include
Wday Month t_day t_day2
√
√
√
√
√
√
√
√
√
RMSE
Quality metrics for
validation dataset
2
R
MAE
MAPE
4,402
3.9%
4,702
6
Update the Coefficients and Create the Forecast
c. Now that you know which model has the best fit, train a model with the
TrainPlusValid dataset using the same variables from the most accurate
model. Save a screen capture of the coefficients of the most accurate model.
d. With the resulting model, in VisMiner forecast the number of vehicles that
will go through the tunnel for the forecast period. Save the dataset that
includes the forecasted records and bring it to class.
7
Download