School of Electrical Engineering and Computer Science Time Series Analysis Topics in Machine Learning Fall 2011 Time Series Discussions • • • • • • Overview Basic definitions Time domain Forecasting Frequency domain State space Why Time Series Analysis? • Sometimes the concept we want to learn is the relationship between points in time What is a time series? Time series: a sequence of measurements over time A sequence of random variables x1, x2, x3, … Time Series Examples Definition: A sequence of measurements over time Finance Social science Epidemiology Medicine Meterology Speech Geophysics Seismology Robotics Three Approaches • Time domain approach – Analyze dependence of current value on past values • Frequency domain approach – Analyze periodic sinusoidal variation • State space models – Represent state as collection of variable values – Model transition between states Sample Time Series Data Johnson & Johnson quarterly earnings/share, 1960-1980 Sample Time Series Data Yearly average global temperature deviations Sample Time Series Data Speech recording of “aaa…hhh”, 10k pps Sample Time Series Data NYSE daily weighted market returns Not all time data will exhibit strong patterns… LA annual rainfall …and others will be apparent Canadian Hare counts Time Series Discussions • • • • • • Overview Basic definitions Time domain Forecasting Frequency domain State space Definitions • Mean • Variance variance mean Definitions • Covariance Cov ( X , Y ) N ( x i x )( y i y ) i 1 N • Correlation Cor ( X , Y ) r Cov ( X , Y ) X Y Correlation Y Y Y r = +1 X X r=0 r = +.3 Y Y Y X r = -1 X X r = -.6 X r=0 Redefined for Time • Mean function X (t ) E ( X t ) for t 0 , 1, 2 ,... Ergodic? • Autocovariance X ( h ) Cov ( X t h , X t ) lag • Autocorrelation X (h) X (h) X (0) Cor ( X t h , X t ) Autocorrelation Examples lag Positive lag Negative Stationarity – When there is no relationship • {Xt} is stationary if – X(t) is independent of t – X(t+h,t) is independent of t for each h • In other words, properties of each section are the same • Special case: white noise Time Series Discussions • • • • • • Overview Basic definitions Time domain Forecasting Frequency domain State space Linear Regression • Fit a line to the data • Ordinary least squares – Minimize sum of squared distances between points and line y = x + • Try this out at http://hspm.sph.sc.edu/courses/J716/demos/LeastSquares/LeastSquaresDemo.html R2: Evaluating Goodness of Fit • Least squares minimizes the combinedresidual RSS i (Y Y i ) 2 • Explained sum of squares is difference between line and mean ESS i (Y Y ) y = x + 2 • Total sum of squares is the total of these two TSS ESS RSS i 2 2 ( Y Y i ) (Y Y ) i R2: Evaluating Goodness of Fit • R2, the coefficient of determination R 2 ESS TSS 1 RSS TSS • 0 R2 1 • Regression minimizes RSS and so maximizes R2 y = x + R2: Evaluating Goodness of Fit R 2 ESS TSS 1 RSS TSS R2: Evaluating Goodness of Fit R 2 ESS TSS 1 RSS TSS R2: Evaluating Goodness of Fit R 2 ESS TSS 1 RSS TSS Linear Regression • Can report: – Direction of trend (>0, <0, 0) – Steepness of trend (slope) – Goodness of fit to trend (R2) Examples What if a linear trend does not fit my data well? • Could be no relationship • Could be too much local variation – Want to look at longer-term trend – Smooth the data • Could have periodic or seasonality effects – Add seasonal components X t a b1t b2 Q1 b3 Q 2 b 4 Q 3 b5 Q 4 • Could be a nonlinear relationship Moving Average • Compute an average of the last m consecutive data points • 4-point moving average is x MA ( 4 ) ( x t x t 1 x t 2 x t 3 ) 4 • Smooths white noise • Can apply higher-order MA • Exponential smoothing • Kernel smoothing k mt a jk j xt j Power Load Data 53 week 5 week Piecewise Aggregate Approximation • Segment the data into linear pieces Interesting paper Nonlinear Trend Examples Nonlinear Regression Fit Known Distributions ARIMA: Putting the pieces together • Autoregressive model of order p: AR(p) • Moving average model of order q: MA(q) • ARMA(p,q) ARIMA: Putting the pieces together • Autoregressive model of order p: AR(p) x t 1 x t 1 2 x t 2 .. p x t p w t • Moving average model of order q: MA(q) • ARMA(p,q) -2 0 2 4 AR(1), 0 . 9 0 20 40 60 80 100 -4 -2 0 2 4 AR(1), 0 . 9 0 20 40 60 80 100 ARIMA: Putting the pieces together • Autoregressive model of order p: AR(p) x t 1 x t 1 2 x t 2 .. p x t p w t • Moving average model of order q: MA(q) x t 1 w t 1 2 w t 2 .. q w t q w t • ARMA(p,q) ARIMA: Putting the pieces together • Autoregressive model of order p: AR(p) x t 1 x t 1 2 x t 2 .. p x t p w t • Moving average model of order q: MA(q) x t 1 w t 1 2 w t 2 .. q w t q w t • ARMA(p,q) – A time series is ARMA(p,q) if it is stationary and x t 1 x t 1 2 x t 2 .. p x t p w t 1 w t 1 2 w t 2 .. q w t q ARIMA (AutoRegressive Integrated Moving Average) • ARMA only applies to stationary process • Apply differencing to obtain stationarity – Replace its value by incremental change from last value Differenced x1 x2 x3 x4 1 time x2-x1’ x3’-x2’ x4’-x3’ 2 times x3’-2x2’+x1’ x4’-2x3’+x2’ • A process xt is ARIMA(p,d,q) if – AR(p) – MA(q) – Differenced d times • Also known as Box Jenkins Time Series Discussions • • • • • • Overview Basic definitions Time domain Forecasting Frequency domain State space Express Data as Fourier Frequencies • Time domain – Express present as function of the past • Frequency domain – Express present as function of oscillations, or sinusoids Time Series Definitions • Frequency, , measured at cycles per time point • J&J data – 1 cycle each year – 4 data points (time points) each cycle – 0.25 cycles per data point • Period of a time series, T = 1/ – J&J, T = 1/.25 = 4 – 4 data points per cycle – Note: Need at least 2 Fourier Series • Time series is a mixture of oscillations – Can describe each by amplitude, frequency and phase Take a look – Can also describe as a sum of amplitudes at all time points – (or magnitudes at all frequencies) x t cos( 2 t ) sin( 2 t ) – If we allow for mixtures of periodic series then q xt [ i 1 i cos( 2 i t ) i sin( 2 i t )] Example x t 1 2 cos( 2 t 6 / 100 ) x t 2 4 cos( 2 t10 / 100 ) 3 sin( 2 t 6 / 100 ) 5 sin( 2 t10 / 100 ) x t 3 6 cos( 2 t 40 / 100 ) x t 4 x t1 x t 2 x t 3 7 sin( 2 t 40 / 100 ) How Compute Parameters? n/2 xt [ i ( j / n ) cos( 2 tj / n ) i ( j / n ) sin( 2 tj / n )] j 1 • Regression • Discrete Fourier Transform d ( ) d ( j / n ) n 1 n 2 xt e i 2 tj / n t 1 • DFTs represent amplitude and phase of series components • Can use redundancies to speed it up (FFT) Breaking down a DFT • Amplitude A ( j / n ) | d ( j / n ) | R ( d ( j / n )) I ( d ( j / n )) 2 2 • Phase ( j / n ) tan 1 ( I ( d ( j / n )) / R ( d ( j / n ))) Example GBP 2 1 0 -1 1 frequency GBP 2 1 0 -1 2 frequencies GBP 2 1 0 -1 3 frequencies GBP 2 1 0 -1 5 frequencies GBP 2 1 0 -1 10 frequencies GBP 2 1 0 -1 20 frequencies Periodogram • Measure of squared correlation between – Data and – Sinusoids oscillating at frequency of j/n P ( j / n) ( 2 n n x t cos( 2 tj / n )) ( 2 t 1 – Compute quickly using FFT 2 n n t 1 x t sin( 2 tj / n )) 2 Example P(6/100) = 13, P(10/100) = 41, P(40/100) = 85 Wavelets • Can break series up into segments – Called wavelets – Analyze a window of time separately – Variable-sized windows Time Series Discussions • • • • • • Overview Basic definitions Time domain Forecasting Frequency domain State space State Space Models • Current situation represented as a state – Estimate state variables from noisy observations over time – Estimate transitions between states • Kalman Filters – Similar to HMMs • HMM models discrete variables • Kalman filters models continuous variables Conceptual Overview x • Lost on a 1-dimensional line • Receive occasional sextant position readings – Possibly incorrect • Position x(t), Velocity x’(t) Conceptual Overview 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 • • • • • 0 10 20 30 40 50 60 70 80 90 100 Current location distribution is Gaussian Transition model is linear Gaussian Noisy information The sensor model is linear Gaussian Sextant Measurement at ti: Mean = i and Variance = 2i Measured Velocity at ti: Mean = ’i and Variance = ’2i Kalman Filter Algorithm • Start with current location (Gaussian) • Predict next location – Use current location – Use transition function (linear Gaussian) – Result is Gaussian • Get next sensor measurement (Gaussian) • Correct prediction – Weighted mean of previous prediction and measurement Conceptual Overview 0.16 0.14 0.12 prediction Measurement at i 0.1 0.08 0.06 0.04 0.02 0 0 10 20 30 40 50 60 70 80 90 100 • We generate the prediction for time i+, prediction is Gaussian • GPS Measurement: Mean = i+ and Variance = 2i + • They do not match Conceptual Overview prediction 0.16 corrected estimate 0.14 0.12 0.1 0.08 measurement at i+ 0.06 0.04 0.02 0 0 10 20 30 40 50 60 70 80 90 100 • Corrected mean is the new optimal estimate of position • New variance is smaller than either of the previous two variances Updating Gaussian Distributions • One-step predicted distribution is Gaussian P ( X t 1 | e1:t ) xt P ( X t 1 | x t ) P ( x t | e1:t ) dx t Transition Prior • After new (linear Gaussian) evidence, updated distribution is Gaussian P ( X t 1 | e1:t 1 ) P ( e t 1 | X t 1 ) P ( X t 1 | e1:t ) New measurement Previous step Why Is Kalman Great? • The method, that is… • Representation of state-based series with general continuous variables grows without bound Why Is Time Series Important? • Time is an important component of many processes • Do not ignore time in learning problems • ML can benefit from, and in turn benefit, these techniques – – – – – – Dimensionality reduction of series Rule discovery Cluster series Classify series Forecast data points Anomaly detection