Mixing Dynamic Linear Models Kevin R. Keane krkeane@buffalo.edu Computer Science & Engineering University at Buffalo SUNY Modeling label distributions on a lattice prior distribution of labels p(L) likelihood of observed data given a label p(D|L) posterior distribution of labels given data p(L|D) ∝ p(D|L)p(L) obtain other distributions of interest Z p(X|D) ∝ p(X|L)p(L|D)dL Mixing Dynamic Linear Models – 1/33 A slight twist Usual setup model distribution of labels at lattice sites This project model distribution of linear models at lattice sites Mixing Dynamic Linear Models – 2/33 Mixing Models Why bother modeling distributions over models? Avoid the need to carefully specify model parameters in advance use simple components grid approximation spanning likely parameter values let the data figure it out . . . To obtain better models . . . Mixing Dynamic Linear Models – 3/33 What’s a better model? A model with higher posterior density for quantities of interest. For example: parameters in a stock returns model parameters describing a plane ... Mixing Dynamic Linear Models – 4/33 What’s a better model? Posterior Density P( Θ D ) 2 "Better" model 1.5 1 0.5 0 −3 −2 −1 0 Value of Θ 1 2 3 Mixing Dynamic Linear Models – 5/33 Bayesian framework prior distribution of models p(M ) likelihood of observed data given a model p(D|M ) posterior distribution of models given data p(M |D) ∝ p(D|M )p(M ) obtain other distributions of interest Z p(X|D) ∝ p(X|M )p(M |D)dM Mixing Dynamic Linear Models – 6/33 Drilling down further . . . Linear regression models Y = F 0θ + Yi is the observed response Fi is the regression vector / independent variables θ is the regression parameter vector i is the error of fit FYI, ordinary least squares estimate of θ is 0 −1 θ̂ = (F F ) FY Mixing Dynamic Linear Models – 7/33 Examples of linear models Return of a stock is a function of market, industry and stock specific return components 1 α 0 r = F θ + , F = rM , θ = βM . rI βI Points on a plane 0 = F 0 θ + , x y F = , z −1 A B θ = . C D Mixing Dynamic Linear Models – 8/33 Dynamic Linear Models Ordinary least squares yields a single estimate θ̂ of the regression parameter vector θ for the entire data set. we may not have a finite data set, but rather an infinite data stream! we expect / permit θ to vary (slightly) as we traverse the lattice, θs ≈ θt . So, our models are dynamic. Dynamic linear models (West & Harrison) are a generalized form of / able to represent: Kalman filters (Kalman) flexible least squares (Kalaba & Tesfatsion) linear dynamical systems (Bishop) ... Mixing Dynamic Linear Models – 9/33 Specifying a dynamic linear model {Ft , G, V, W } Ft0 is a row in the design matrix vector of independent variables effecting Yt G is the evolution matrix captures deterministic changes to θ θt ≈ Gθt−1 V is the observational variance a.k.a. Var() in ordinary least squares W is the evolution variance matrix captures random changes to θ θt = Gθt−1 + wt , wt ∼ N (0, W ) G and W make the linear model dynamic Mixing Dynamic Linear Models – 10/33 Specifying a dynamic linear model {Ft , G, V, W } The observation equation is Yt = Ft0 θt + νt , The evolution equation is θt = Gθt−1 + ωt , νt ∼ N (0, V ) ωt ∼ N (0, W ) The initial information is summarized (θ0 |D0 ) ∼ N (m0 , C0 ) Information at time t Dt = {Yt , Dt−1 } Mixing Dynamic Linear Models – 11/33 Experiment #1 Specify 3 DLMs {F = 1, G = 1, W = 5.0000, V = 1} {F = 1, G = 1, W = 0.0500, V = 1} {F = 1, G = 1, W = 0.0005, V = 1} Generate 1000 observations YM Select a model, generate 200 observations Select a model, generate 200 observations ... Mixing Dynamic Linear Models – 12/33 Generated observations Yt from 3 DLMs 30 Observed Data Y t W=5 20 W=.0005 10 W=.05 DLM {1,1,1,W} 0 −10 0 200 400 600 Time t 800 1000 local consistency is a function of Wm . . . Mixing Dynamic Linear Models – 13/33 Inference with a dynamic linear model {Ft , G, V, W } Posterior at t − 1, Prior at t, (θt−1 |Dt−1 ) ∼ N (mt−1 , Ct−1 ) (θt |Dt−1 ) ∼ N (at , Rt ) at = Gmt−1 , Rt = GCt−1 G0 + W Mixing Dynamic Linear Models – 14/33 Inference with a dynamic linear model {Ft , G, V, W } Posterior at t − 1, Prior at t, (θt |Dt−1 ) ∼ N (at , Rt ) at = Gmt−1 , (θt−1 |Dt−1 ) ∼ N (mt−1 , Ct−1 ) Rt = GCt−1 G0 + W One-step forecast at t, ft = Ft0 at (Yt |Dt−1 ) ∼ N (ft , Qt ) Qt = Ft0 Rt Ft + V Mixing Dynamic Linear Models – 15/33 Inference with a dynamic linear model {Ft , G, V, W } Posterior at t − 1, Prior at t, (θt |Dt−1 ) ∼ N (at , Rt ) at = Gmt−1 , Rt = GCt−1 G0 + W One-step forecast at t, ft = Ft0 at (θt−1 |Dt−1 ) ∼ N (mt−1 , Ct−1 ) (Yt |Dt−1 ) ∼ N (ft , Qt ) Qt = Ft0 Rt Ft + V One-step forecast is the model likelihood (Yt |Dt−1 ) = (Yt , Dt−1 |Dt−1 , {Ft , G, V, W }) = (D|M ) Mixing Dynamic Linear Models – 16/33 Inference with a dynamic linear model {Ft , G, V, W } Prior at t, (θt |Dt−1 ) ∼ N (at , Rt ) Rt = GCt−1 G0 + W at = Gmt−1 , One-step forecast at t, Qt = Ft0 Rt Ft + V ft = Ft0 at Posterior at t, (Yt |Dt−1 ) ∼ N (ft , Qt ) (θt |Dt ) ∼ N (mt , Ct ) e t = Yt − f t mt = at + At et At = −1 Rt Ft Qt Ct = Rt − At Qt A0t Mixing Dynamic Linear Models – 17/33 Experiment #1 continued / inference about θt Specify 3 DLMs {F = 1, G = 1, W = 5.0000, V = 1} {F = 1, G = 1, W = 0.0500, V = 1} {F = 1, G = 1, W = 0.0005, V = 1} Generate 1000 observations YM Select a model, generate 200 observations Select a model, generate 200 observations ... Compute posteriors (θt−1 |M = m, Dt−1 ) Mixing Dynamic Linear Models – 18/33 Posterior mean mt from 3 DLMs 30 W=5 W=.05 W=.0005 m t 20 10 DLM {1,1,1,W} 0 −10 0 200 400 600 Time t 800 1000 (θt |Dt ) ∼ N (mt , Ct ) Mixing Dynamic Linear Models – 19/33 Experiment #1 continued / inference about models Specify 3 DLMs {F = 1, G = 1, W = 5.0000, V = 1} {F = 1, G = 1, W = 0.0500, V = 1} {F = 1, G = 1, W = 0.0005, V = 1} Generate 1000 observations YM Select a model, generate 200 observations Select a model, generate 200 observations ... Compute posteriors (θt−1 |M = m, Dt−1 ) Assume prior p(M = m) = Compute likelihoods (Dt |M = m) Compute posterior model probabilities p(m|Dt ) 1 3 Mixing Dynamic Linear Models – 20/33 Trailing interval model log likelihoods 1 10−day log likelihood −10 2 −10 W=5 W=.05 W=.0005 3 −10 0 200 400 600 Time t 800 1000 Mixing Dynamic Linear Models – 21/33 Posterior model probabilities and ground truth W=5 W=.05 W=.0005 1 P( M | D ) 0.8 0.6 0.4 0.2 0 −0.2 0 200 400 600 Time t 800 1000 Mixing Dynamic Linear Models – 22/33 Modeling interpretation / tricks Large W permits the state vector θ to vary abruptly in a stochastic manner Examples of when is this important? stock price model — one company aquires another company planar geometry model — a building’s plane intersects the ground plane Easy to obtain parameter estimates from a mixture model. For example,evolution variance WM is obtained from individual model variances, likelihoods, and posterior probabilities Z Z WM = Wm p(Wm |m)p(m|D)dm = Wm p(m|D)dm M M Mixing Dynamic Linear Models – 23/33 A statistical arbitrage application Montana, Triantafyllopoulos, and Tsagaris. Flexible least squares for temporal data mining and statistical arbitrage. Expert Systems with Applications, 36(2):2819-2830, 2009. Parameter δ = WW+V controls adaptiveness. W = 0, δ = 0 equivalent to ordinary least squares. Published results for 11 constant parameter models Model S&P 500 Index return as a function of the largest principal component return (score) of the underlying stocks i i h h rs&p 500 = F 0 θ + , F = rpca , θ = βpca . Mixing Dynamic Linear Models – 24/33 Experiment #2 Data used is the log price return series for SPY and RSP SPY is the capitalization weighted S&P 500 ETF RSP is the equal-weighted S&P 500 ETF; our proxy for Montana’s βpca i h h i rspy = F 0 θ + , F = rrsp , θ = βrsp . Mixing Dynamic Linear Models – 25/33 250 200 RSP 150 100 50 SPY 2004 2006 2008 2010 0.100 sqrt(W) 0.010 0.001 sqrt(V) 2004 2006 2008 2010 Experiment #2 continued Data used is the log price return series for SPY and RSP SPY is the capitalization weighted S&P 500 ETF RSP is the equal-weighted S&P 500 ETF; our proxy for Montana’s βpca i h h i rspy = F 0 θ + , F = rrsp , θ = βrsp . Can a mixture model outperform the ex post best single DLM? Mixing Dynamic Linear Models – 28/33 150 100 Mixture Model 50 Best DLM {F,1,1,W=221} DLMs {F,1,1,W} 0 -50 2004 2006 2008 2010 3.0 Mixture Model 2.5 2.0 1.5 10 DLMs {F,1,1,W} 100 1000 Future work Longer term (after this semester), generalize the one-dimensional DLM framework to permit application to images and video. For video, would probably need to consider variation W in different directions, δθ , δx δθ , δy and δθ δt . (image coordinates x and y, frame t). Mixing Dynamic Linear Models – 32/33 THANK YOU. Mixing Dynamic Linear Models – 33/33