Math 419/592
Winter 2009
Prof. Andrew Ross
Eastern Michigan University
No Time
No Space Discrete
Ch 1, 2, 3
Discrete Ch 4 DTMC
Continuous Ch 5, 7 Ch 5, 6, 7, 8
Continuous
Ch 1, 2, 3
YOU ARE HERE
Ch 10
Or, if you prefer,transpose it:
No Time Discrete
No Space
Discrete Ch 1, 2, 3 Ch 4 DTMC
Continuous
Ch 5, 7
Ch 5, 6, 7, 8
Continuous Ch 1, 2, 3 YOU ARE HERE Ch 10
Take Math 560
(Optimization) this fall!
Sign up soon or it will disappear
Look at the data!
Common Models
Multivariate Data
Cycles/Seasonality
Filters
or else!
2
Years: 1958 to now; vertical scale 300 to 400ish
1. Look at the data
2. Quantify any pattern you see
3. Remove the pattern
4. Look at the residuals
5. Repeat at step 2 until no patterns left
Look at the data
Suck the life out of it
Spend hours poring over the noise
What should noise look like?
One of these things is not like the others
1
0.75
0.5
0.25
0
-0.25
-0.5
-0.75
-1
-1.25
2.5
2.25
2
1.75
1.5
1.25
0 10 20 30 40 50 60 70
4
3.5
3
2.5
2
1.5
1
0.5
0
7
6.5
6
5.5
5
4.5
0 10 20 30 40 50 60 70
0.3
0.25
0.2
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
-0.25
0 10 20 30 40 50 60 70
1.75
1.5
1.25
1
0.75
0.5
0.25
0
-0.25
-0.5
-0.75
-1
-1.25
0 10 20 30 40 50 60 70
The upper-right-corner plot is Stationary.
Mean doesn't change in time
no Trend
no Seasons (known frequency)
no Cycles (unknown frequency)
Variance doesn't change in time
Correlations don't change in time
Up to here, weakly stationary
Joint Distributions don't change in time
That makes it strongly stationary
Time is “t”, not “n”
even though it's discrete
State (value) is Y, not X
to avoid confusion with x-axis, which is time.
Value at time t is Y t
, not Y(t)
because time is discrete
Of course, other books do other things.
Fit a plain linear regression, then subtract it out:
Fit Y t
= m*t + b,
New data is Z t
= Y t
– m*t – b
Or use quadratic fit, exponential fit, etc.
Differencing
For linear trend, new data is Z t
= Y t
– Y
To remove quadratic trend, do it again: t-1
W t
= Z t
– Z t-1
=Y t
– 2Y
Like taking derivatives t-1
+ Y t-2
What’s the equivalent if you think the trend is exponential, not linear?
Hard to decide: regression or differencing?
Will get to it later.
For the next few slides, assume no cycles/seasons.
How do you compare two quantities?
Multiply them!
If they’re both positive, you’ll get a big, positive answer
If they’re both big and negative…
If one is positive and one is negative…
If one is big&positive and the other is small&positive…
Dot product of two vectors
Proportional to the cosine of the angle between them (do they point in the same direction?)
Inner product of two functions
Integral from a to b of f(x)*g(x) dx
Covariance of two data sets x_i, y_i
Sum_i (x_i * y_i)
How correlated is the series with itself at various lag values?
E.g. If you plot Y t+1 versus Y t and find the correlation, that's the correl. at lag 1
ACF lets you calculate all these correls. without plotting at each lag value.
ACF is a basic building block of time series analysis.
y = -0.4234x + 1.4167
R
2
= 0.1644
Lag-1 of bus IATs
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0 0.2
0.4
0.6
0.8
1
ACF and PACF of bus IATs
1.2
1.2
1.4
1
0.8
0.6
0.4
0.2
0
-0.2
0
-0.4
-0.6
1.6
5
1.8
10 15 20 25 30 lag
35 acf
LCL(95%)
UCL(95%)
At lag 0, ACF=1
Symmetric around lag 0
Approx. confidence-interval bars around ACF=0
To help you decide when ACF drops to near-0
Less reliable at higher lags
Often assume ACF dies off fast enough so its absolute sum is finite.
If not, called “long-term memory”; e.g.
River flow data over many decades
Traffic on computer networks
R, Splus, SAS, SPSS, Matlab, Scilab will do it for you
Excel: download PopTools (free!)
http://www.cse.csiro.au/poptools/
Excel, etc: do it yourself.
First find avg. and std.dev. of data
Next, find AutoCoVariance Function (ACVF)
Then, divide by variance of data to get ACF
Y-bar is mean of whole data set
Not just mean of N-h data points
Left side: old way, can produce correl>1
Right side: new way
Difference is “End Effects”
Pg 30 of Peña, Tiao, Tsay
(if it makes a difference, you're up to no good?)
White Noise
AR
MA
ARMA
ARIMA
SARIMA
ARMAX
Kalman Filter
Exponential Smoothing, trend, seasons
Sequence of I.I.D. Variables e t mean=zero, Finite std.dev., often unknown
Often, but not always, Gaussian
12. 5
10
7. 5
5
2. 5
0
-2. 5
-5
-7. 5
-10
-12. 5
0 500 1000 1500 2000
Order 1: Y t
=a*Y t-1
+ e t
E.g.
Order 2: Y
New = (90% of old) + random fluctuation t
=a
1
*Y t-1
+a
2
*Y t-2
+ e t
Order p denoted AR(p)
p=1,2 common; >2 rare
AR(p) like p'th order ODE
AR(1) not stationary if |a|>=1
E[Y t
] = 0, can generalize
Find appropriate order
Estimate coefficients
via Yule-Walker eqn.
Estimate std.dev. of white noise
If estimated |a|>0.98, try differencing.
Order 1:
Y t
= b
0 e t
+b
1 e t-1
Order q: MA(q)
In real data, much less common than AR
But still important in theory of filters
Stationary regardless of b values
E[Y t
] = 0, can generalize
Drops to zero after lag=q
That's a good way to determine what q should be!
-0.4
-0.6
-0.8
-1
0.2
0
-0.2
0
1
0.8
0.6
0.4
5
ACF
10 15 20 25 30
ACF
Never completely dies off, not useful for finding order p.
AR(1) has exponential decay in
ACF
1
0.8
0.6
0.4
0.2
0
-0.2
0
-0.4
-0.6
-0.8
-1
Instead, use Partial
ACF=PACF, which dies after lag=p
PACF of MA never dies.
5 10 15 20 25 30
ARMA(p,q) combines AR and MA
Often p,q <= 1 or 2
AR-Integrated-MA
ARIMA(p,d,q)
d=order of differencing before applying
ARMA(p,q)
For nonstationary data w/stochastic trend
Seasonal ARIMA(p,d,q)-and-(P,D,Q)
S
Often S=
12 (monthly) or
4 (quarterly) or
52 (weekly)
Or, S=7 for daily data inside a week
ARMAX=ARMA with outside explanatory variables (halfway to multivariate time series)
Underlying process that we don't see
We get noisy observations of it
Like a Hidden Markov Model (HMM), but state is continuous rather than discrete.
AR/MA, etc. can be written in this form too.
State evolution (vector): S
Observations (scalar): Y t t
=
= F * S t-1
H * S t
+ e t
+ h t
(Generalized) AutoRegressive Conditional
Heteroskedastic (heteroscedastic?)
Like ARMA but variance changes randomly in time too.
Used for many financial models
More a method than a model.
Very common in practice
Forecasting w/o much modeling of the process.
A t
= forecast of series at time t
Pick some parameter a between 0 and 1
A
t
= a
Y or A t t
+ (1a
)A t-1
= A t-1
+ a*
(error in period t)
Why call it “Exponential”?
Weight on Y t at lag k is (1a
) k
Train the model: try various values of a
Pick the one that gives the lowest sum of absolute forecast errors
The larger a is, the more weight given to recent observations
Common values are 0.10, 0.30, 0.50
If best a is over 0.50, there's probably some trend or seasonality present
Exponential smoothing: no trend or seasonality
Excel/Analysis Toolpak can do it if you tell it a
Holt's method: accounts for trend.
Also known as double-exponential smoothing
Holt-Winters: accounts for trend & seasons
Also known as triple-exponential smoothing
Along with ACF, use Cross-Correlation
Cross-Correl is not 1 at lag=0
Cross-Correl is not symmetric around lag=0
Leading Indicator: one series' behavior helps predict another after a little lag
Leading means “coming before”, not “better than others”
Can also do cross-spectrum, aka coherence
Suppose a yearly cycle
Sample quarterly: 3-med, 6-hi, 9-med, 12-lo
Sample every 6 months: 3-med, 9-med
Or 6-hi, 12-lo
To see a cycle, must sample at twice its freq.
Demo spreadsheet
This is the Nyquist limit
Compact Disc: samples at 44.1 kHz, top of human hearing is 20 kHz
We have data, want to find
Cycle length (e.g. Business cycles), or
Strength of seasonal components
Idea: use sine waves as explanatory variables
If a sine wave at a certain frequency explains things well, then there's a lot of strength.
Could be our cycle's frequency
Or strength of known seasonal component
Explains=correlates
Ordinary covar:
At freq. Omega,
T t
1
0
( X t
X )( Y t
Y )
T t
1
0 sin(
t ) Y t
(means are zero)
Problem: what if that sine is out of phase with our cycle?
Also correlate with a cosine
90 degrees out of phase with sine
Why not also with a 180-out-of-phase?
Because if that had a strong correl, our original sine would have a strong correl of opposite sign.
Sines & Cosines, Oh My —combine using complex variables!
d (
)
T t
1
0 e
i
t
Y t
Often a scaling factor like 1/T, 1/sqrt(T), 1/2pi, etc. out front.
Some people use +i instead of -i
Often look only at the frequencies k
2
k / T k=0,...,T-1 d (
k
)
t
T
1
0 e
2
ik / T
Y t
That reminds me of matrix multiplication.
Define a matrix F whose j,k entry is
exp(-i*j*k*2pi/T)
Then d
F Y
Matrix multiplication takes T^2 operations
This matrix has a special structure, can do it in about T log T operations
That's the FFT=Fast Fourier Transform
Easiest if T is a power of 2
Take magnitude & argument of each DFT result
Plot squared magnitude vs. frequency
This is the “Periodogram”
Large value = that frequency is very strong
Often plotted on semilogy scale, “decibels”
Example spreadsheet
First, play with amplitudes:
(1,0) then (0,1) then (1,.5) then (1,.7)
Next, play with frequency1:
2*pi/8 then 2*pi/4
2*pi/6, 2*pi/7, 2*pi/9, 2*pi/10
2*pi/100, 2*pi/1000
Summarize your results for yourself. Write it down!
Reset to 2*pi/8 then play with phase2:
0, 1, 2, 3, 4, 5, 6...
Now add some noise to Yt
Value at k=0 is mean of data series
Called “DC” component
Area under periodogram is proportional to
Var(data series)
Height at each point=how much of variance is explained by that frequency
Plotting argument vs. frequency shows phase
Often need to smooth with moving avg.
Try it!
Why is it called white noise?
Pink noise, etc. (look up in Wikipedia)
Zero out the frequencies you don't want
Invert the FT
FT is its own inverse! Not like Laplace Transform.
This is “frequency-domain” filtering
MP3 files: filter out the freqs. you wouldn't hear
because they're overwhelmed by stronger frequencies
Time-domain filtering: example spreadsheet
Smoothing: moving average
Filters out high frequencies (noise is high-freq)
Low-pass filter
Detrending: differencing
Filters out trends and slow cycles (which look like trends, locally)
High-pass filter
Band-pass filter
Band-reject filter (esp. 12-month cycles)
Time-domain filter's freq. response comes from the FT of its averaging coefficients
Example spreadsheet
This curve is called the “Transfer Function”
Good audio speakers publish their frequency response curves
Ordinary theory assumes that ACF dies off faster than 1/h
But some time series don't satisfy that:
River flows
Packet amounts on data networks
Connected to chaos & fractals
Enders: Applied Econometric Time Series
Kedem & Fokianos: Regression Models for Time
Series Analysis
Pen~a, Tao, Tsay: A Course in Time Series Analysis
Brillinger: lecture notes for Stat 248 at UC Berkeley
Brillinger:Time Series: Data Analysis and Theory
Brockwell & Davis: Introduction to Time Series and
Forecasting
0
-0.05
-0.1
-0.15
-0.2
-0.25
0.3
0.25
0.2
0.15
0.1
0.05
0 10 20 30 40 50 60 70