C Io la o

advertisement
Module 3
Descriptive Time Series Statistics
and Introduction to Time Series Models
Class notes for Statistics 451: Applied Time Series
Iowa State University
Copyright 2015 W. Q. Meeker.
November 11, 2015
17h 20min
Populations
3-1
• A population is a collection of identifiable units (or a specified characteristic of these units).
• A frame is a listing of the units in the population.
• We take a sample from the frame and use the resulting data
to make inferences about the population.
• Simple random sampling with replacement from a population implies independent and identically distributed (iid)
observations.
Process, Realization, Model,
Inference, and Applications
Forecasts
Inferences
Z1 , Z2 , ..., Zn
Realization
Parameters
Control
Estimates for Model
Description
3-3
• Standard statistical methods use the iid assumption for observations or residuals (in regression).
Process
Generates Data
Statistical Model
for the Process
(Stochastic Process Model)
3-5
Module 3
Segment 1
Populations, Processes, and
Descriptive Time Series Statistics
Processes
outputs
3-2
• A process is a system that transforms inputs into outputs,
operating over time.
inputs
• A process generates a sequence of observations (data) over
time.
• We can use a realization from the process to make inferences about the process.
3-4
• The iid model is rarely appropriate for processes (observations close together in time are typically correlated and the
process often changes with time).
Stationary Stochastic Processes
Zt is a stochastic process. Some properties of Zt include
mean µt, variance σt2, and autocorrelation ρZt,Zt+k . In general,
these can change over time.
• Strongly stationary (also strictly or completely stationary):
F (zt1 , . . . , ztn ) = F (zt1+k , . . . , ztn+k )
Difficult to check.
• Weakly stationary (or 2nd order or covariance stationary)
requires only that µ = µt and σ 2 = σt2 be constant and that
ρk = ρZt,Zt+k depend only on k.
Easy to check with sample statistics.
Generally, “stationary” is understood to be weakly stationary.
3-6
Billions of Dollars
Billions of Dollars
15
1967
1969
Change in Business Inventories 1955-1969
Stationary?
1965
1959
1961
1963
Year
1965
1967
1969
Notation
Ã
!
n
2
t=1 zt
Pn
n
√
γb0
3-8
b0
2 = γ
SZ̄
n [· · · ]
0
ρbk = γbγbk
n
t=1 (zt −z̄)(zt+k −z̄)
Pn−k
bZ =
σ
t=1(zt −z̄)
Pn
b Z = z̄ =
µ
γbk =
γb0 =
Estimate
Estimation of Stationary Process Parameters
Some Notation
Process
Parameter
µZ = E(Z)
2 = Var(Z)
γ0 = σZ
Mean of Z
Variance of Z
0
2 = Var(Z̄)
σZ̄
x
y1
y2
.
yn
y
To estimate ρx,y , we use the sample correlation
Pn
i=1(xi − x̄)(yi − ȳ)
, (Stat 101)
n
2 Pn (y − ȳ)2
i=1(xi − x̄)
i=1 i
ρbx,y = qP
−1 ≤ ρbx,y ≤ 1
3 - 10
3 - 12
ρx,y denotes the “population” correlation between all values
of x and y in the population.
x1
x2
.
xn
Consider random data y and x (e.g., sales and advertising):
Correlation [From “Statistics 101”]
σ2
γ
2
σZ̄
= Var(Z̄) = 0 =
n
n
Thus if one incorrectly uses the iid formula, the variance is
under estimated, potentially leading for false conclusions.
with correlated data is more complicated than with iid data
where
n−1
X
γ0
|k|
2
σZ̄
=
Var(
Z̄)
=
1−
ρk
n k=−(n−1)
n
The Variance of Z̄
Variance of Z̄ with Correlated Data
Variance of Z̄
σZ
1963
Year
Standard
Deviation
1961
γk
1959
Autocovariance
1957
3-7
ρk = γγk
1955
1957
Change in Business Inventories 1955-1969
bZ
with z̄ and z̄ ± 3σ
1955
Module 3
Segment 2
Correlation and Autocorrelation
3 - 11
3-9
5
Autocorrelation
10
0
-5
15
10
5
0
-5
Sample Autocorrelation
zt
z1
z2
z3
..
zn−2
zn−1
zn
z4
z5
z6
..
–
–
–
z5
z6
z7
..
–
–
–
Lagged Variables
zt+1 zt+2 zt+3 zt+4
z2
z3
z4
..
zn−1
zn
–
z3
z4
z5
..
zn
–
–
Consider the random time series realization z1, z2, . . . , zn
t
1
2
3
..
n−2
n−1
n
P
n−k
t=1 (zt − z̄)(zt+k − z̄) ,
Pn
2
t=1(zt − z̄)
40
60
100
k = 0, 1, 2, . . .
80
3 - 15
3 - 13
Assuming weak (covariance) stationarity, let ρk denote the
process correlation between observations separated by k time
periods. To compute the “order k” sample autocorrelation
(i.e., correlation between zt and zt+k )
γb
ρbk = k =
γb0
20
Wolfer Sunspot Numbers 1770-1869
Note: −1 ≤ ρbk ≤ 1
0
• •
•
•
•
• •
•••
••
•• •
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
spot.ts[-1]
•
•
•
•
•
•
•
100
•
•
•
•
•
•
•
150
•
spot.ts[-100]
Sample Autocorrelation (alternative formula)
zt
z1
z2
z3
z4
..
zn
Lagged Variables
zt−1 zt−2 zt−3 zt−4
–
–
–
–
z1
–
–
–
z2
z1
–
–
z3
z2
z1
–
..
..
..
..
zn−1 zn−2 zn−3 zn−4
Consider the random time series realization z1, z2, . . . , zn
t
1
2
3
4
..
n
Assuming weak (covariance) stationarity, let ρk denote the
process correlation between observations separated by k time
periods.
P
n
t=k+1(zt − z̄)(zt−k − z̄)
,
Pn
2
t=1(zt − z̄)
Spot1
–
101
82
66
35
.
7
37
74
–
–
–
Lagged Variables
Spot2 Spot3 Spot4
–
–
–
–
–
–
101
–
–
82
101
–
66
82
101
.
.
.
16
30
47
7
16
30
37
7
16
74
37
7
–
74
37
–
–
74
Lagged Sunspot Data
Spot
101
82
66
35
31
.
37
74
–
–
–
–
3 - 16
3 - 14
k = 0, 1, 2, . . .
To compute the “order k” sample autocorrelation (i.e., correlation between zt and zt−k )
γb
ρbk = k =
γb0
t
1
2
3
4
5
.
99
100
101
102
103
104
Note: −1 ≤ ρbk ≤ 1
•••
100
timeseries
50
Lag 5
• •
•
••
•••
100
•
•
150
•
•
140
•
150
•
•
•
•••
••
Lag 1
•
100
series
Lag 6
50
• •
•
•
• •• •
•••
••
••• •• • ••••
•
•
•
•
• •
•• ••• •••• • •
••• • • • • •
• •
•
•• •
••• • ••••
••••••••••• • •
••••
•••
0
•
•
100
series
Lag 11
50
60
•
•
• •
•
•
100
•
•
•
150
•
150
•
•
140
• •
•
•
• ••• • •
•
•
• •
•
•
• •
•
•
•• • • • • •• •
•
•••••
• • ••
••
•• •
• •
••• •
••
•• ••• ••
•
•
••
••• ••• • •••
• • •• ••••• • •
0
20
••
•••
•
•
• •
•
•
•
•
•
•• • •• •••• • •
• •
•
• • ••••
•• •••
• •• • ••
• • • ••••
•
•• •
• •• •
•
•••••• •••• •••• • •
•
0
series
Lag 2
100
series
Lag 7
50
•
•
•
•
•••
•
•
•••• • •
•
•
••• •
•••• • •• • •
• •
••
•
•• • •• •• • • •
•
•••• •••• •• •
••
•
•
••• • •
•
•
••••• • • ••••• •• •• • • • •
•••
••
0
•
100
series
Lag 12
50
60
100
•
150
•
150
140
••
• •
•
•
• •
•
•• •
•
•
••• • •
•
•
•
• •
•• • •
• • • ••
• ••
• ••
•
•
••••• • • •
•
••
•• •
• •
•
• •• • • •• •
•••• • • •• • • •••
••• •••••• ••••
0
20
•
•
••
•
•
•
•
•
•
• •• •
• •
• • • •••• ••• •
•
•
••
•• •
•
• • ••
•
• •• • •• •
• •
• •• • •
•
••••• •• • •
•
•••••••• •• • •• •
0
series
•
•
Lag 3
50
series
100
•
•
•
•
•
••
•• • • •
•
• •
••
•
•
•
•••• • • •• •
••
• • •
••
•••• •
•
••• ••••••• • • •
•
• •
•••
• • •
••
• • •
•• •• •• •• ••••• • • • ••
• ••
•
• ••
0
•
Lag 8
100
series
Lag 13
50
60
100
•
150
•
150
•
140
• ••
•
•
••
•
•• •
•
•
•• •• •
•
•
••
•
• •
• • • ••• •
•
•••
•
• ••• • • • • • • •
••
•
• ••
••
• ••• ••••
••••• •• •• •• • •••
• • ••• ••••• ••
•
0
20
•
•
•
•
• • •
•
•
•
••
•• • •
••
• • ••••
•••
•
•
• • •
• • • •• • • • • •• ••
•
•• • •• • •
••••••• • • • •
••
•
•
•••
•• •
••• •
0
series
•
•
Lag 4
50
100
series
Lag 9
series
60
100
•
•
•
140
•
150
•
••
•
••
•
•• •
••
•••• ••
•
••• •
•
•• • ••••
• •
• •
• •
••••••• • • • •• • • • •
• •
• •
•••
•• •
• •• • •
•
••
••• •• ••• •••• •
• •
••• •
•• • •
0
20
•
• •
• • •
•
•
• • • •
•
•
•
• • •••• •
• •• •
•
•
•• ••• • •• •
• • •• ••• • •• •
••• •••
•
•• •• •• ••••• • •
•
•••• • •••• ••
0
5
10
Lag
20
3 - 18
15
Series : timeseries
0
•
Wolfer Sunspot Numbers
Correlation Between Observations Separated by k
Time Periods [show.acf(spot.ts)]
•
•
•••
•••••
•••
•••
••••
••••
•••
•••
••••
••••
•••
•••••
••••
••••
0
•
•
100
series
Lag 10
50
••
60
•
•• • •
••
• ••• •
•
•
•
•••• ••
•
• • •
•• • •
•
• • ••
••••••••
•
• •
•
• •
•• • •
•
• •• •••
•
• •
••
• • • • ••• ••••
• • ••• • • • • • •
0
••
•
20
• •
•
•
•
••
•
•
•• • •
••
• •
•
•
•• • •
• ••
••• •••• ••
• •• • ••
• •
• •
•• • •
•• •••• • •• •• • • •
•
••••• • •••
•
0
series
150
100
50
Wolfer Sunspot Numbers
Correlation Between Observations Separated by One
Time Period
••
••
•
•
•• •
•
50
•
Autocorrelation = 0.806244
Correlation = 0.81708
••
••
•
•
•• •
• •
• • •
••
••
• • • •• • • •
•
•
••• • ••
0
3 - 17
Series lag 4
Series lag 9
150
100
50
0
150
100
150
100
150
100
50
0
150
100
150
100
50
0
150
50
0
150
100
0
150
100
50
0
1.0
ACF
0.2
0.6
-0.2
Series lag 3
Series lag 8
Series lag 13
50
0
150
100
50
0
Series lag 2
Series lag 7
Series lag 12
50
0
150
100
50
0
Series lag 1
Series lag 6
Series lag 11
50
0
150
100
50
0
timeseries
Series lag 5
Series lag 10
100
50
0
150
100
50
0
150
100
50
0
150
100
50
0
Series : spot.ts
10
Lag
60
15
80
20
100
3 - 21
3 - 19
Wolfer Sunspot Numbers Sample ACF Function
5
40
Wolfer Sunspot Numbers 1770-1869
20
Residual Analysis
and Forecasts
Least Squares or
Maximum Likelihood
Time Series Plot
Range-Mean Plot
ACF and PACF
Data Analysis Strategy
Tentative
Identification
Estimation
Diagnostic
Checking
Forecasting
Explanation
Control
60
80
15
Module 3
Segment 3
100
3 - 22
3 - 20
Autoregression, Autoregressive Modeling,
and and Introduction to Partial Autocorrelation
Autoregressive (AR) Models
• AR(0): zt = µ + at (White noise or “Trivial” model)
• AR(1): zt = θ0 + φ1zt−1 + at
• AR(2): zt = θ0 + φ1zt−1 + φ2zt−2 + at
• AR(3): zt = θ0 + φ1zt−1 + φ2zt−2 + φ3zt−3 + at
• AR(p): zt = θ0 + φ1zt−1 + φ2zt−2 + · · · + φpzt−p + at
at ∼ nid(0, σa2)
40
Year Number
10
Lag
0
•
20
40
60
Year Number
0
1
80
Quantiles of Standard Normal
-1
••
•••••
••
••
••••
•••••••••
•••••••
•••••
•••••••••••••
•••••••••••••••
••••••••••••••
• • • •••
-2
2
• •
3 - 24
•
100
Sunspot Residuals, AR(1) Model
AR(1) Model for the Wolfer Sunspot Data
20
Data and 1-Step Ahead Predictions
0
5
Series : residuals(spot.ar1.fit)
0
residuals(spot.ar1.fit)
0
0
No
Model
ok?
Yes
Use the Model
3 - 23
60
20
-20
60
20
-20
150
100
50
0
0.8
ACF
0.2
-0.4
1.0
0.8
0.6
ACF
0.4
0.2
0.0
-0.2
150
100
50
0
40
60
Year Number
10
Lag
60
80
15
80
15
100
100
20
40
60
Year Number
0
1
80
•
••••••
••••••
••••••••••
••••••••••••
•••••••
••••••••••••••
••••••••••••••••
• ••••••••
• •
-1
Quantiles of Standard Normal
2
• •
3 - 25
•
100
Sunspot Residuals, AR(2) Model
0
•
-2
-2
20
-1
40
60
Year Number
0
1
80
•
••••••
••••••
••••••••••
••••••••••••
•••••••
••••••••••••••
••••••••••••••••
• ••••••••
• •
2
• •
•
Sunspot Residuals, AR(4) Model
0
•
3 - 27
φbpp
.8062
−.6341
.0805
−.0611
40
60
Year Number
10
Lag
80
15
100
0
•
20
40
60
Year Number
0
1
80
Quantiles of Standard Normal
-1
•
••••••
••••••
••••••••••
••••••••••••
•••••••
••••••••••••••
••••••••••••••••
• ••••••••
• •
-2
2
• •
3 - 26
•
100
Sunspot Residuals, AR(3) Model
AR(3) Model for the Wolfer Sunspot Data
20
Data and 1-Step Ahead Predictions
0
5
Series : residuals(spot.ar3.fit)
0
=
ρbk −
φb1,1 = ρb1
φbkk
k−1
X
φbk−1,j ρbj
φbk−1 ,j ρbk−j
j=1
j=1
k−1
X
1−
, k = 2, 3, ...
3 - 28
3 - 30
The Durbin formula gives somewhat different answers due
to the basis of the estimator (ordinary least squares versus
solution of the Yule-Walker equations).
• φbkk = φbk from the AR(k) model
• φb2,2 = φb2 from the AR(2) model
• φb1,1 = φb1 from the AR(1) model
We can estimate φkk , k = 1, 2, . . . with φbkk , k = 1, 2, . . .
The “true partial autocorrelation function,” denoted by φkk ,
for k = 1, 2, . . . is the process correlation between observations separated by k time periods (i.e., between zt and
zt+k ) with the effect of the intermediate zt+1, . . . , zt+k−1
removed.
Sample Partial Autocorrelation
This sample PACF formula is based on the solution of the
Yule-Walker equations (covered module 5).
Thus the PACF contains no new information—just a different way of looking at the information in the ACF.
φbkj = φbk−1,j − φbkk φbk−1,k−j (k = 3, 4, ...; j = 1, 2, ..., k − 1)
where
The sample PACF φbkk can be computed from the sample ACF
ρb1, ρb2, . . . using the recursive formula from Durbin (1960):
Sample Partial Autocorrelation Function
residuals(spot.ar2.fit)
AR(2) Model for the Wolfer Sunspot Data
20
Data and 1-Step Ahead Predictions
0
5
40
Year Number
10
Lag
p
PACF
Quantiles of Standard Normal
φbp
13.96
−9.81
2.04
−1.41
3 - 29
20 40
0
-40
0 20
-40
150
100
50
0
0.8
ACF
0.4
Series : residuals(spot.ar2.fit)
0
20
5
S
.810
−.711
.208
−.147
tφb
Summary of Sunspot Autoregressions
R2
21.53
15.32
15.12
15.01
Regression Output
p
.6676
.8336
.8407
.8463
Order
Model
1
2
3
4
φbp is from ordinary least squares (OLS).
φbpp is from the Durbin formula.
AR(1)
AR(2)
AR(3)
AR(4)
0
Series : residuals(spot.ar4.fit)
0
Data and 1-Step Ahead Predictions
AR(4) Model for the Wolfer Sunspot Data
residuals(spot.ar2.fit)
residuals(spot.ar2.fit)
-0.2
0 20
-40
0 20
-40
40
20
0
-20
0 20
-40
150
100
50
0
0.8
ACF
0.4
-0.2
150
100
50
0
0.8
ACF
0.4
-0.2
Module 3
Segment 4
Introduction to ARMA Models
and Time Series Modeling
Mean
50
Range-Mean Plot
40
1800
60
w= Number of Spots
1820
time
0
0
10
10
Wolfer Sunspots
70
1840
Lag
20
PACF
Lag
20
ACF
30
30
1860
Graphical Output from RTSERIES Function
iden(spot.tsd)
1780
30
1.0
0.5
150
kk
3 - 31
3 - 33
3 - 35
• Values of ρbk and φbkk may be judged to be different from
zero if the “t-like” statistics are outside specified limits (±2
is often suggested; might use ±1.5 as a “warning”).
• In long realizations from a stationary process, if the true
parameter is really 0, then the distribution of a “t-like”
statistic can be approximated by N(0, 1)
Use to compute “t-like” statistics t = φbkk /Sφb and set dekk
cision bounds.
• Sample PACF standard error: Sφb
Also can compute the “t-like” statistics t = ρbk /Sρbk .
√
= 1/ n
• Sample ACF standard error: Sρbk = t
v
!
uÃ
u 1+2ρb12+···+2ρb2
k−1
n
Standard Errors for Sample Autocorrelations and
Sample Partial Autocorrelations
20
ACF
Partial ACF
0.0
-1.0
1.0
0.5
0.0
-1.0
w
Range
100
50
0
140
120
100
80
60
40
3 - 32
Autoregressive-Moving Average (ARMA) Models
(a.k.a. Box-Jenkins Models)
• AR(0): zt = µ + at (White noise or “Trivial” model)
• AR(1): zt = θ0 + φ1zt−1 + at
• AR(2): zt = θ0 + φ1zt−1 + φ2zt−2 + at
• AR(p): zt = θ0 + φ1zt−1 + · · · + φpzt−p + at
• MA(1): zt = θ0 − θ1at−1 + at
• MA(2): zt = θ0 − θ1at−1 − θ2at−2 + at
• MA(q): zt = θ0 − θ1at−1 − · · · − θq at−q + at
• ARMA(1, 1): zt = θ0 + φ1zt−1 − θ1at−1 + at
• ARMA(p, q): zt = θ0 + φ1zt−1 + · · · + φp zt−p
− θ1at−1 − · · · − θq at−q + at
at ∼ nid(0, σa2)
Tabular Output from RTSERIES Function
iden(spot.tsd)
Identification Output for Wolfer Sunspots
w= Number of Spots
[1] "Standard deviation of the working series= 37.36504"
ACF
Lag
ACF
se
t-ratio
1
1 0.806243956 0.1000000 8.06243956
2
2 0.428105325 0.1516594 2.82280693
3
3 0.069611110 0.1632975 0.42628402
.
.
.
Partial ACF
Lag Partial ACF se
t-ratio
1 0.806243896 0.1 8.06243896
2 -0.634121358 0.1 -6.34121358
1
2
3 - 34
AR(p)
PACF
Cutoff after p > 1
Impossible
ARMA
M A(q)
PACF
Dies Down
Drawing Conclusions from Sample ACF’s and PACF’s
ACF Dies Down
ACF Cutoff after q > 1
Technical details, background, and numerous examples will
be presented in Modules 4 and 5.
3 - 36
Comments on Drawing Conclusions from Sample
ACF’s and PACF’s
The “t-like” statistics should only be used as guidelines in
model building because:
• An ARMA model is only an approximation to some true
process
• Sampling distributions are complicated; the “t-like” statistics do not really follow a standard normal distribution (only
approximately, in large samples, with a stationary process)
• Correlations among the ρbk values for different k (e.g., ρb1
and ρb2 may be correlated).
• Problems relating to simultaneous inference (looking at many
different statistics simultaneously, confidence/significance
levels have little meaning).
3 - 37
Download