Forecasting
Lecturer: Prof. Duane S. Boning
Rev 8
1
Regression – Review & Extensions
• Single Model Coefficient: Linear Dependence
• Slope and Intercept (or Offset):
• Polynomial and Higher Order Models:
• Multiple Parameters
• Key point: “linear” regression can be used as long as the model is linear in the coefficients (doesn’t matter the dependence in the independent variable)
• Time dependencies
– Explicit
– Implicit
2
Agenda
1.
Regression
• Polynomial regression
• Example (using Excel)
2.
Time Series Data & Time Series Regression
• Autocorrelation – ACF
• Example: white noise sequences
• Example: autoregressive sequences
• Example: moving average
• ARIMA modeling and regression
3.
Forecasting Examples
3
Time Series – Time as an Implicit Parameter
• Data is often collected with a time-order
4
2
0
• An underlying dynamic process
(e.g. due to physics of a manufacturing process) may create autocorrelation in the data
-2
0
5
0
-5
-10
0 uncorrelated
10
10
20 time
30 40 autocorrelated
20 time
30 40
50
50
4
Intuition: Where Does Autocorrelation Come From?
• Consider a chamber with volume V , and with gas flow in and gas flow out at rate f . We are interested in the concentration x at the output, in relation to a known input concentration w .
5
Key Tool: Autocorrelation Function (ACF)
• Time series data: time index i
4
2
0
-2
• CCF: cross-correlation function
-4
0 20 40 60 time
1
0.5
0
• ACF: auto-correlation function
-0.5
-1
0 5 10 15 20 lags
) ACF shows the “similarity” of a signal
25 to a lagged version of same signal
80 100
30 35 40
6
Stationary vs. Non-Stationary
Stationary series:
Process has a fixed mean
10
5
0
-5
-10
0 100 200 time
300 400
30
20
10
0
-10
0 100 200 time
300 400
500
500
7
White Noise – An Uncorrelated Series
• Data drawn from IID gaussian
4
2
•
• ACF: We also plot the 3 limits – values within these not significant
• Note that r(0) = 1 always (a signal is always equal to itself with zero lag – perfectly autocorrelated at k = 0)
• Sample mean
Sample variance
0
-2
-4
0
1
0.5
0
-0.5
-1
0
50 100 time
150 200
5 10 15 20 lags
25 30 35 40
8
Autoregressive Disturbances
• Generated by:
• Mean
• Variance
10
5
0
-5
-10
0 100 200 time
300 400
1
0.5
0
-0.5
-1
0
Slow drop in ACF with large
5 10 15 20 lags
25 30 35
So AR (autoregressive) behavior increases variance of signal.
500
40
9
Another Autoregressive Series
• Generated by:
• High negative autocorrelation:
Slow drop in ACF with large
But now ACF alternates in sign
10
5
0
-5
-10
0
1
0.5
0
-0.5
-1
0
100 200 time
300 400 500
5
Slow drop in ACF with large
10 15 20 lags
25 30 35 40
10
Random Walk Disturbances
• Generated by:
• Mean
• Variance
30
20
10
0
-10
0
1
0.5
0
-0.5
-1
0
100 200 time
300 400 500
5 10 15 20 lags
25 30 35
Very slow drop in ACF for = 1
40
11
• Generated by:
Moving Average Sequence
• Mean
• Variance
4
2
0
-2
-4
0 100 200 time
300 400
1
0.5
0 r (1)
-0.5
Jump in ACF at specific lag
-1
0 5 10 15
So MA (moving average) behavior also increases variance of signal.
20 lags
25 30 35
500
40
12
• Generated by:
• Both AR & MA behavior
ARMA Sequence
10
5
0
-5
-10
0
1
0.5
0
-0.5
-1
0
100 200 time
300 400 500
5
Slow drop in ACF with large
10 15 20 lags
25 30 35 40
13
ARIMA Sequence random walk (integrative) action
• Start with ARMA sequence:
400
200
• Add Integrated (I) behavior
0
-200
0
1
0.5
0
-0.5
-1
0
100 200 time
300 400 500
5
Slow drop in ACF with large
10 15 20 lags
25 30 35 40
14
Periodic Signal with Autoregressive Noise
Original Signal
20
10
0
-10
0
1
0.5
0
-0.5
-1
0
50
5
100 150 200 250 300 350 400 time
10 15 20 lags
25 30 35 40
5
0
After Differencing
-5
0
1
0.5
0
-0.5
-1
0
50
5
100 150 200 250 300 350 400 time
10 15 20 lags
25 30 35 40
See underlying signal with period = 5
15
Cross-Correlation: A Leading Indicator
• Now we have two series:
– An “input” or explanatory variable x
– An “output” variable y
• CCF indicates both AR and lag:
10
5
0
-5
-10
0
10
5
0
-5
-10
0
1
0.5
0
-0.5
-1
0
100
100
200
200 time time
300
300
400
400
500
500
5 10 15 20 lags
25 30 35 40
16
Regression & Time Series Modeling
• The ACF or CCF are helpful tools in selecting an appropriate model structure
– Autoregressive terms?
• x i
= x i-1
– Lag terms?
• y i
= x i-k
• One can structure data and perform regressions
– Estimate model coefficient values, significance, and confidence intervals
– Determine confidence intervals on output
– Check residuals
17
Statistical Modeling Summary
1.
Statistical Fundamentals
• Sampling distributions
• Point and interval estimation
• Hypothesis testing
2.
Regression
• ANOVA
• Nominal data: modeling of treatment effects (mean differences)
• Continuous data: least square regression
3.
Time Series Data & Forecasting
• Autoregressive, moving average, and integrative behavior
• Auto- and Cross-correlation functions
• Regression and time-series modeling
18
MIT OpenCourseWare http://ocw.mit.edu
2.854 / 2.853 Introduction to Manufacturing Systems
Fall 2010
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms .