Time Series Karanveer Mohan Keegan Go EE103 Stanford University November 12, 2015 Stephen Boyd Outline Introduction Linear operations Least-squares Prediction Introduction 2 Time series data I represent time series x1 , . . . , xT as T -vector x I xt is value of some quantity at time (period, epoch) t, t = 1, . . . , T examples: I – – – – – I average temperature at some location on day t closing price of some stock on (trading) day t hourly number of users on a website altitude of an airplane every 10 seconds enrollment in a class every quarter vector time series: xt is an n-vector; can represent as T × n matrix Introduction 3 Types of time series time series can be I smoothly varying or more wiggly and random I roughly periodic (e.g., hourly temperature) I growing or shrinking (or both) random but roughly continuous (these are vague labels) I Introduction 4 Melbourne temperature I daily measurements, for 10 years I you can see seasonal (yearly) periodicity 30 25 20 15 10 5 0 −5 0 Introduction 500 1000 1500 2000 2500 3000 3500 4000 5 Melbourne temperature I zoomed to one year 25 20 15 10 5 0 50 Introduction 100 150 200 250 300 350 6 Apple stock price I log10 of Apple daily share price, over 30 years, 250 trading days/year I you can see (not steady) growth 2.5 2 1.5 1 0.5 0 −0.5 −1 0 Introduction 1000 2000 3000 4000 5000 6000 7000 8000 9000 7 Log price of Apple I zoomed to one year 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 6000 Introduction 6050 6100 6150 6200 6250 8 Electricity usage in (one region of) Texas I total in 15 minute intervals, over 1 year I you can see variation over year 10 9 8 7 6 5 4 3 2 1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4 x 10 Introduction 9 Electricity usage in (one region of) Texas I zoomed to 1 month I you can see daily periodicity and weekend/weekday variation 9 8 7 6 5 4 3 2 1 500 Introduction 1000 1500 2000 2500 10 Outline Introduction Linear operations Least-squares Prediction Linear operations 11 Down-sampling I k× down-sampled time series selects every kth entry of x I can be written as y = Ax I for 2× down-sampling, 1 0 0 0 0 0 A= . . .. .. 0 0 0 0 Linear operations T even, 0 0 0 .. . ··· ··· ··· 0 0 0 .. . 0 0 0 .. . 0 0 0 .. . 0 0 0 0 0 0 0 0 ··· ··· 1 0 0 0 0 1 0 1 0 .. . 0 0 0 .. . 0 0 1 .. . 0 0 0 .. . 0 0 12 Down-sampling on Apple log price 4× down-sample 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.1 0.09 5100 Linear operations 5102 5104 5106 5108 5110 5112 5114 5116 5118 5120 13 Up-sampling I k× (linear) up-sampling interpolates between entries of x I can be written as y = Ax I for 2× up-sampling 1 1/2 A= Linear operations 1/2 1 1/2 1/2 1 .. . 1 1/2 1/2 1 14 Up-sampling on Apple log price 4× up-sample 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.1 0.09 5100 Linear operations 5102 5104 5106 5108 5110 5112 5114 5116 5118 5120 15 Smoothing I k-long moving average y of x is given by yi = I 1 (xi + xi+1 + · · · + xi+k−1 ), k can express as y = Ax, 1/3 1/3 1/3 A= i = 1, . . . , T − k + 1 e.g., for k = 3, 1/3 1/3 1/3 1/3 1/3 1/3 .. . 1/3 I 1/3 1/3 can also have trailing or centered smoothing Linear operations 16 Melbourne daily temperature smoothed I centered smoothing with window size 41 30 25 20 15 10 5 0 −5 0 Linear operations 500 1000 1500 2000 2500 3000 3500 4000 17 First-order differences I (first-order) difference between adjacent entries I discrete analog of derivative I express as y = Dx, D is the (T − 1) × T difference matrix −1 1 ... −1 1 . . . D= . . .. .. . . . −1 1 I kDxk2 is a measure of the wiggliness of x kDxk2 = (x2 − x1 )2 + · · · + (xT − xT −1 )2 Linear operations 18 Outline Introduction Linear operations Least-squares Prediction Least-squares 19 De-meaning I de-meaning a time series means subtracting its mean: x̃ = x − avg(x) I rms(x̃) = std(x) I this is the least-squares fit with a constant Least-squares 20 Straight-line fit and de-trending I fit data (1, x1 ), . . . , (T, xT ) with affine model xt ≈ a + bt (also called straight-line fit) I b is called the trend I a + bt is called the trend line I de-trending a time series means subtracting its straight-line fit I de-trended time series shows variations above and below the straight-line fit Least-squares 21 Straight-line fit on Apple log price Trend 2.5 2 1.5 1 0.5 0 −0.5 −1 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 6000 7000 8000 9000 Residual 1 0.5 0 −0.5 −1 0 Least-squares 1000 2000 3000 4000 5000 22 Periodic time series I let P -vector z be one period of periodic time series xper = (z, z, . . . , z) (we assume T is a multiple of P ) I express as xper = Az with IP A = ... IP Least-squares 23 Extracting a periodic component I given (non-periodic) time series x, choose z to minimize kx − Azk2 I gives best least-squares fit with periodic time series I simple solution: average periods of original: ẑ = (1/k)AT x, I k = T /P e.g., to get ẑ for January 9, average all xi ’s with date January 9 Least-squares 24 Periodic component of Melbourne temperature 30 25 20 15 10 5 0 −5 0 Least-squares 500 1000 1500 2000 2500 3000 3500 4000 25 Extracting a periodic component with smoothing I can add smoothing to periodic fit by minimizing kx − Azk2 + λkDzk2 I λ > 0 is smoothing parameter I D is P × P circular difference matrix −1 1 −1 1 .. .. D= . . 1 I −1 1 −1 λ is chosen visually or by validation Least-squares 26 Choosing smoothing via validation I split data into train and test sets, e.g., test set is last period (P entries) I train model on train set, and test on the test set I choose λ to (approximately) minimize error on the test set Least-squares 27 Validation of smoothing for Melbourne temperature trained on first 8 years; tested on last two years 3 2.95 2.9 2.85 2.8 2.75 2.7 2.65 2.6 −2 10 Least-squares −1 10 0 10 1 10 2 10 28 Periodic component of temperature with smoothing I zoomed on test set, using λ = 30 25 20 15 10 5 0 2900 Least-squares 3000 3100 3200 3300 3400 3500 3600 29 Outline Introduction Linear operations Least-squares Prediction Prediction 30 Prediction I goal: predict or guess xt+K given x1 , . . . , xt I K = 1 is one-step-ahead prediction I prediction is often denoted x̂t+K , or more explicitly x̂(t+K|t) (estimate of xt+K at time t) I x̂t+K − xt+K is prediction error applications: predict I – – – – – Prediction asset price product demand electricity usage economic activity position of vehicle 31 Some simple predictors I constant: x̂t+K = a I current value: x̂t+K = xt I linear (affine) extrapolation from last two values: x̂t+K = xt + K(xt − xt−1 ) I average to date: x̂t+K = avg(x1:t ) I (M + 1)-period rolling average: x̂t+K = avg(x(t−M ):t ) I straight-line fit to date (i.e., based on x1:t ) Prediction 32 Auto-regressive predictor I auto-regressive predictor: x̂t+K = (xt , xt−1 , . . . , x(t−M ) )T β + v – M is memory length – (M + 1)-vector β gives predictor weights; v is offset I I prediction x̂t+K is affine function of past window xt−M :t (which of the simple predictors above have this form?) Prediction 33 Least-squares fitting of auto-regressive models I choose coefficients β, offset v via least-squares (regression) I regressors are (M + 1)-vectors x1:(M +1) , . . . , x(N −M ):N I outcomes are numbers x̂M +K+1 , . . . , x̂N +K I can add regularization on β Prediction 34 Evaluating predictions with validation I I for simple methods: evaluate RMS prediction error for more sophisticated methods: – split data into a training set and a test set (usually sequential) – train prediction on training data – test on test data Prediction 35 Example I predict Texas energy usage one step ahead (K = 1) I train on first 10 months, test on last 2 Prediction 36 Coefficients I I using M = 100 0 is the coefficient for today 2 1.5 1 0.5 0 −0.5 −1 0 Prediction 10 20 30 40 50 60 70 80 90 100 37 Auto-regressive prediction results 9 8 7 6 5 4 3 2 1 3 3.1 3.2 3.3 3.4 3.5 4 x 10 Prediction 38 Auto-regressive prediction results showing the residual 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 3 3.1 3.2 3.3 3.4 3.5 4 x 10 Prediction 39 Auto-regressive prediction results predictor average (constant) current value auto-regressive (M = 10) auto-regressive (M = 100) Prediction RMS error 1.20 0.119 0.073 0.051 40 Autoregressive model on residuals I fit a model to the time series, e.g., linear or periodic subtract this model from the original signal to compute residuals I apply auto-regressive model to predict residuals I can add predicted residuals back to model to obtain predictions I Prediction 41 Example I I Melbourne temperature data residuals zoomed on 100 days in test set 8 6 4 2 0 −2 −4 −6 −8 3000 Prediction 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100 42 Auto-regressive prediction of residuals 10 8 6 4 2 0 −2 −4 −6 −8 −10 3000 Prediction 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100 43 Prediction results for Melbourne temperature I tested on last two years predictor average current value periodic (no smoothing) periodic (smoothing, λ = 30) auto-regressive (M = 3) auto-regressive (M = 20) auto-regressive on residual (M = 20) Prediction RMS error 4.12 2.57 2.71 2.62 2.44 2.27 2.22 44