Uploaded by raghav garg

time series slides

advertisement
Time Series
Karanveer Mohan
Keegan Go
EE103
Stanford University
November 15, 2016
Stephen Boyd
Outline
Introduction
Linear operations
Least-squares
Prediction
Introduction
2
Time series data
I
represent time series x1 , . . . , xT as T -vector x
I
xt is value of some quantity at time (period, epoch) t, t = 1, . . . , T
examples:
I
–
–
–
–
–
I
average temperature at some location on day t
closing price of some stock on (trading) day t
hourly number of users on a website
altitude of an airplane every 10 seconds
enrollment in a class every quarter
vector time series: xt is an n-vector; can represent as T × n matrix
Introduction
3
Types of time series
time series can be
I
smoothly varying or more wiggly and random
I
roughly periodic (e.g., hourly temperature)
I
growing or shrinking (or both)
random but roughly continuous
(these are vague labels)
I
Introduction
4
Melbourne temperature
I
daily measurements, for 10 years
I
you can see seasonal (yearly) periodicity
30
25
20
15
10
5
0
−5
0
Introduction
500
1000
1500
2000
2500
3000
3500
4000
5
Melbourne temperature
I
zoomed to one year
25
20
15
10
5
0
50
Introduction
100
150
200
250
300
350
6
Apple stock price
I
log10 of Apple daily share price, over 30 years, 250 trading days/year
I
you can see (not steady) growth
2.5
2
1.5
1
0.5
0
−0.5
−1
0
Introduction
1000
2000
3000
4000
5000
6000
7000
8000
9000
7
Log price of Apple
I
zoomed to one year
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
0.4
0.35
6000
Introduction
6050
6100
6150
6200
6250
8
Electricity usage in (one region of) Texas
I
total in 15 minute intervals, over 1 year
I
you can see variation over year
10
9
8
7
6
5
4
3
2
1
0
0
0.5
1
1.5
2
2.5
3
3.5
4
4
x 10
Introduction
9
Electricity usage in (one region of) Texas
I
zoomed to 1 month
I
you can see daily periodicity and weekend/weekday variation
9
8
7
6
5
4
3
2
1
500
Introduction
1000
1500
2000
2500
10
Outline
Introduction
Linear operations
Least-squares
Prediction
Linear operations
11
Down-sampling
I
k× down-sampled time series selects every kth entry of x
I
can be written as y = Ax
for 2× down-sampling, T even,

1 0 0 0 0 0
 0 0 1 0 0 0

 0 0 0 0 1 0

A= . . . . . .
 .. .. .. .. .. ..

 0 0 0 0 0 0
0 0 0 0 0 0
I
I
···
···
···
0
0
0
..
.
0
0
0
..
.
0
0
0
..
.
···
···
1
0
0
0
0
1
0
0
0
..
.








0 
0
alternative: average consecutive k-long blocks of x
Linear operations
12
Up-sampling
I
k× (linear) up-sampling interpolates between entries of x
I
can be written as y = Ax
I
for 2× up-sampling

1
 1/2






A=






Linear operations

1/2
1
1/2
1/2
1
..
.
1
1/2













1/2 
1
13
Up-sampling on Apple log price
4× up-sample
0.17
0.16
0.15
0.14
0.13
0.12
0.11
0.1
0.09
5100
Linear operations
5102
5104
5106
5108
5110
5112
5114
5116
5118
5120
14
Smoothing
I
k-long moving average y of x is given by
yi =
I
1
(xi + xi+1 + · · · + xi+k−1 ),
k
can express as y = Ax,

1/3 1/3

1/3


A=


i = 1, . . . , T − k + 1
e.g., for k = 3,

1/3
1/3 1/3
1/3 1/3 1/3
..
.






1/3
I
1/3
1/3
can also have trailing or centered smoothing
Linear operations
15
Melbourne daily temperature smoothed
I
centered smoothing with window size 41
30
25
20
15
10
5
0
−5
0
Linear operations
500
1000
1500
2000
2500
3000
3500
4000
16
First-order differences
I
(first-order) difference between adjacent entries
I
discrete analog of derivative
I
express as y = Dx, D is the (T − 1) × T difference matrix


−1 1
...


−1 1 . . .


D=

.
.
..
..


. . . −1 1
I
kDxk2 (Laplacian) is a measure of the wiggliness of x
kDxk2 = (x2 − x1 )2 + · · · + (xT − xT −1 )2
Linear operations
17
Outline
Introduction
Linear operations
Least-squares
Prediction
Least-squares
18
De-meaning
I
de-meaning a time series means subtracting its mean:
x̃ = x − avg(x)
I
rms(x̃) = std(x)
I
this is the least-squares fit with a constant
Least-squares
19
Straight-line fit and de-trending
I
fit data (1, x1 ), . . . , (T, xT ) with affine model xt ≈ a + bt
(also called straight-line fit)
I
b is called the trend
I
a + bt is called the trend line
I
de-trending a time series means subtracting its straight-line fit
I
de-trended time series shows variations above and below the
straight-line fit
Least-squares
20
Straight-line fit on Apple log price
Trend
2.5
2
1.5
1
0.5
0
−0.5
−1
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
6000
7000
8000
9000
Residual
1
0.5
0
−0.5
−1
0
Least-squares
1000
2000
3000
4000
5000
21
Periodic time series
I
let P -vector z be one period of periodic time series
xper = (z, z, . . . , z)
(we assume T is a multiple of P )
I
express as xper = Az with


IP


A =  ... 
IP
Least-squares
22
Extracting a periodic component
I
given (non-periodic) time series x, choose z to minimize kx − Azk2
I
gives best least-squares fit with periodic time series
I
simple solution: average periods of original:
ẑ = (1/k)AT x,
I
k = T /P
e.g., to get ẑ for January 9, average all xi ’s with date January 9
Least-squares
23
Periodic component of Melbourne temperature
30
25
20
15
10
5
0
−5
0
Least-squares
500
1000
1500
2000
2500
3000
3500
4000
24
Extracting a periodic component with smoothing
I
can add smoothing to periodic fit by minimizing
kx − Azk2 + λkDzk2
I
λ > 0 is smoothing parameter
I
D is P × P circular difference matrix

−1 1

−1 1


.. ..
D=
.
.


1
I

−1





1 
−1
λ is chosen visually or by validation
Least-squares
25
Choosing smoothing via validation
I
split data into train and test sets, e.g., test set is last period (P
entries)
I
train model on train set, and test on the test set
I
choose λ to (approximately) minimize error on the test set
Least-squares
26
Validation of smoothing for Melbourne temperature
trained on first 8 years; tested on last two years
3
2.95
2.9
2.85
2.8
2.75
2.7
2.65
2.6
−2
10
Least-squares
−1
10
0
10
1
10
2
10
27
Periodic component of temperature with smoothing
I
zoomed on test set, using λ = 30
25
20
15
10
5
0
2900
Least-squares
3000
3100
3200
3300
3400
3500
3600
28
Outline
Introduction
Linear operations
Least-squares
Prediction
Prediction
29
Prediction
I
goal: predict or guess xt+K given x1 , . . . , xt
I
K = 1 is one-step-ahead prediction
I
prediction is often denoted x̂t+K , or more explicitly x̂(t+K|t)
(estimate of xt+K at time t)
I
x̂t+K − xt+K is prediction error
applications: predict
I
–
–
–
–
–
Prediction
asset price
product demand
electricity usage
economic activity
position of vehicle
30
Some simple predictors
I
constant: x̂t+K = a
I
current value: x̂t+K = xt
I
linear (affine) extrapolation from last two values:
x̂t+K = xt + K(xt − xt−1 )
I
average to date: x̂t+K = avg(x1:t )
I
(M + 1)-period rolling average: x̂t+K = avg(x(t−M ):t )
I
straight-line fit to date (i.e., based on x1:t )
Prediction
31
Auto-regressive predictor
I
auto-regressive predictor:
x̂t+K = (xt , xt−1 , . . . , xt−M )T β
– M is memory length
– (M + 1)-vector β gives predictor weights
– can add offset v to x̂t+K
I
prediction x̂t+K is linear function of past window xt−M :t
I
(which of the simple predictors above have this form?)
Prediction
32
Least squares fitting of auto-regressive models
I
choose coefficients β via least squares (regression)
I
regressors are (M + 1)-vectors
x1:(M +1) , . . . , x(N −M ):N
I
outcomes are numbers
x̂M +K+1 , . . . , x̂N +K
I
can add regularization on β
Prediction
33
Evaluating predictions with validation
I
I
for simple methods: evaluate RMS prediction error
for more sophisticated methods:
– split data into a training set and a test set (usually sequential)
– train prediction on training data
– test on test data
Prediction
34
Example
I
predict Texas energy usage one step ahead (K = 1)
I
train on first 10 months, test on last 2
Prediction
35
Coefficients
I
I
using M = 100
0 is the coefficient for today
2
1.5
1
0.5
0
−0.5
−1
0
Prediction
10
20
30
40
50
60
70
80
90
100
36
Auto-regressive prediction results
9
8
7
6
5
4
3
2
1
3
3.1
3.2
3.3
3.4
3.5
4
x 10
Prediction
37
Auto-regressive prediction results
showing the residual
0.8
0.6
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
3
3.1
3.2
3.3
3.4
3.5
4
x 10
Prediction
38
Auto-regressive prediction results
predictor
average (constant)
current value
auto-regressive (M = 10)
auto-regressive (M = 100)
Prediction
RMS error
1.20
0.119
0.073
0.051
39
Autoregressive model on residuals
I
fit a model to the time series, e.g., linear or periodic
subtract this model from the original signal to compute residuals
I
apply auto-regressive model to predict residuals
I
can add predicted residuals back to model to obtain predictions
I
Prediction
40
Example
I
I
Melbourne temperature data residuals
zoomed on 100 days in test set
8
6
4
2
0
−2
−4
−6
−8
3000
Prediction
3010
3020
3030
3040
3050
3060
3070
3080
3090
3100
41
Auto-regressive prediction of residuals
10
8
6
4
2
0
−2
−4
−6
−8
−10
3000
Prediction
3010
3020
3030
3040
3050
3060
3070
3080
3090
3100
42
Prediction results for Melbourne temperature
I
tested on last two years
predictor
average
current value
periodic (no smoothing)
periodic (smoothing, λ = 30)
auto-regressive (M = 3)
auto-regressive (M = 20)
auto-regressive on residual (M = 20)
Prediction
RMS error
4.12
2.57
2.71
2.62
2.44
2.27
2.22
43
Download