Learning Wiener models – starting from the model Thomas B. Schön Division of Automatic Control Linköping University Sweden Joint work with (alphabetical order): Michael I. Jordan (UC Berkeley), Fredrik Lindsten (Linköping University), Lennart Ljung (Linköping University), Brett Ninness (University of Newcastle, Australia) and Adrian Wills (University of Newcastle, Australia). Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Block-oriented nonlinear models 2(23) Block-oriented nonlinear models are dynamic models consisting of interactions of linear dynamic models and static nonlinear models. v1 ut Σ e1 L1 h1 (·) Σ e2 v3 Σ L3 y1 v2 L2 h2 (·) Learning Wiener Models Thomas B. Schön y2 AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET The Wiener model 3(23) We will study one notable member of the class of block-oriented nonlinear models, the Wiener model. vt ut L et zt h(·) Σ yt A Wiener model is a linear dynamical model (L) followed by a static nonlinearity (h(·)). Learning problem: Find L and h(·) based on {u1:T , y1:T }. Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Wiener model – explicit state space equations Linear Gaussian state space (LGSS) model: xt + 1 xt = A B + vt , ut vt ∼ N (0, Q), zt = Cxt . Static nonlinearity: • Parametric: yt = h(zt , β) + et , et ∼ N (0, R). • Non-parametric: yt = h(zt ) + et , et ∼ N (0, R). Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET 4(23) Example of simplifying assumptions 5(23) Most of the existing work deals with special cases of the general problem. Typical restrictions imposed are: • • • • The nonlinearity h(·) is assumed to be invertible. The measurement noise et is absent. The LGSS model is deterministic (vt is absent). The LGSS model is stochastic, but vt is assumed white. In the models and solutions provided here we do not have to make any of these assumptions. Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Outline 6(23) 1. Parametric model (yt = h(zt , β) + et ) 2. Semiparametric models (yt = h(zt ) + et ) • The LGSS model is parametric, but the nonlinearity is modelled using a Bayesian nonparametric model (Gaussian process). • Data driven model: Same as above, but a sparseness prior (ARD) is placed on the dynamics. Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Parametric model 7(23) Linear Gaussian state space (LGSS) model: xt xt + 1 = A B + vt , ut vt ∼ N (0, Q), zt = Cxt , yt = h(zt , β) + et , et ∼ N (0, R). Maximum Likelihood (ML) amounts to solving, ML b θ = arg max log pθ (y1:T ) θ where the log-likelihood function is given by T log pθ (y1:T ) = ∑ log pθ (yt | y1:t−1 ) t=1 Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Where is the challenge? 8(23) Two challenges: 1. The one-step prediction pdf pθ (yt | y1:t−1 ) has to be computed, 2. In solving the optimisation problem θbML = arg max log pθ (y1:T ) θ the derivatives ∂ ∂θ pθ (yt | y1:t−1 ) are useful. The Expectation Maximisation (EM) algorithm together with a Particle Smoother (PS) provides a systematic way of dealing with both of hese challenges. Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET EM-PS (I/II) 9(23) Rather than studying arg max log pθ (y1:T ), θ the EM algorithm is concerned with arg max log pθ (y1:T , x1:T ). θ The key idea underlying EM is to consider the joint likelihood function of the observed variables y1:T and the latent variables x1:T . Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET EM-PS (II/II) 10(23) EM splits the original problem arg max pθ (y1:N ) θ into two manageable (and closely linked) subproblems: 1. Compute a conditional expectation Q(θ, θk ) = Eθk {log pθ (x1:T , y1:T ) | y1:T } = Z log pθ (x1:T , y1:T )pθk (x1:T | y1:T )dx1:T R. Douc, A. Garivier, E. Moulines, and J. Olsson. Sequential Monte Carlo smoothing for general state space hidden Markov models. Annals of Applied Probability, 21(6):2109 2145, 2011. 2. Solve a maximisation problem θk+1 = arg max Q(θ, θk ) θ Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Parametric model – blind Wiener learning (I/IV) e1,t h1 (zt , β) ut L zt Σ y1,t e2,t h2 (zt , β) Σ y2,t Learning problem: Find L and β, r1 , r2 based on {y1,1:T , y2,1:T }. Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET 11(23) Parametric model – blind Wiener learning (III/IV) • Second order LGSS model with 30 complex poles. • EM-PS was terminated after just 100 iterations. 25 20 15 Gain (dB) • Employ the EM-PS with N = 100 particles. 12(23) 10 5 0 −5 • Results obtained using T = 1000 samples. −10 −15 0 0.5 1 1.5 2 Normalised frequency 2.5 3 • The plots are based on 100 realisations of data. • Nonlinearities (dead zone and saturation) shown on next slide. Learning Wiener Models Thomas B. Schön Bode plot of estimated mean (black), true system (red) and the result for all 100 realisations (gray). AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Parametric model – blind Wiener learning (IV/IV) 13(23) 1 0.5 0.8 0.6 Output of nonlinearity 2 Output of nonlinearity 1 0 −0.5 −1 0.4 0.2 0 −0.2 −0.4 −0.6 −1.5 −2 −1.5 −1 −0.5 0 Input to nonlinearity 1 0.5 1 −0.8 −1 −0.5 0 0.5 Input to nonlinearity 2 1 1.5 Estimated mean (black), true static nonlinearity (red) and the result for all 100 realisations (gray). Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Semiparametric model 1 – known model order 14(23) First step towards a fully data driven model, the order of the LGSS model is assumed known. Parameters: θ = {Γ, Q, r, h(·)}. Bayesian model specified by priors • Conjugate priors for Γ = [A B], Q and r, • p(Γ, Q) = Matrix-normal inverse-Wishart • p(r) = inverse-Wishart u1:T Γ Q x1:T • Gaussian process prior on h(·), y1:T h(·) ∼ GP (z, k(z, z0 )). h(·) Learning Wiener Models Thomas B. Schön r AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Semiparametric model 2 – fully data driven 15(23) Everything is learned from the data, by introducing the possibility to switch specific model components on and off. Parameters: θ = {Γ, Q, δ̄, r, h(·)}. δ̄ Bayesian model specified by priors • Sparseness prior (ARD) on Γ = [A B], nu • p(Γ | δ̄) = ∏nj=x + N (γj | 0, δj−1 Inx +nu ) 1 nu • δ̄ = {δj }jn=x + 1 , u1:T Γ Q p(δj ) = Gam(δj ; a, b) x1:T • Inverse-Wishart prior on Q and r • Gaussian process prior on h(·), y1:T 0 h(·) ∼ GP (z, k(z, z )). h(·) Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET r Learning algorithm 16(23) Aim: Devise a Gibbs sampler targeting p(θ, x1:T | y1:T ). PG-BS for Wiener system identification: • PG-BS • Run a conditional PF, targeting p(x1:T | θ, y1:T ); ? ; • Run a backward simulator to sample x1:T • Draw ? , y ), if MNIW prior, or; • {Γ? , Q? , r? } ∼ p(Γ, Q, r | h, x1:T 1:T ? , y ), if ARD prior; • {Γ? , δ̄? , Q? , r? } ∼ p(Γ, δ̄, Q, r | h, x1:T 1:T ? , y ). • Draw h? ∼ p(h | r? , x1:T 1:T Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Snapshot – where are we? 17(23) We have introduced two Bayesian semiparametric models: • Assume the model order of the LGSS model to be known and employ conjugate priors (MNIW). • Fully data driven model where everything is learned from data, made possible via the ARD prior. We have a learning algorithm (PG-BS) for each model. Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Example – known model order (I/II) 18(23) with conjugate prior (MNIW). • 6th order LGSS model and a saturation. • Using T = 1000 measurements. • Employ the PG-BS sampler with N = 15 particles. • Run 15000 MCMC iterations, discard 5000 as burn-in. Learning Wiener Models Thomas B. Schön True Posterior mean 99 % credibility 0 −5 −10 −15 200 Phase (deg) • Bayesian semiparametric model Magnitude (dB) 5 150 100 50 0 0 0.5 1 1.5 2 2.5 3 Frequency (rad/s) True Bode diagram of the linear system (black), estimated Bode diagram (dashed black) and 99% credibility interval (blue). AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Example – known model order (II/II) 1 19(23) True Posterior mean 99 % credibility 0.8 0.6 h(z) 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1.5 −1 −0.5 0 z 0.5 1 1.5 True static nonlinearity (black), estimated posterior mean (dashed black) and 99% credibility interval (blue). Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Data driven learning (I/II) 20(23) 3000 2500 • Bayesian semiparametric model 2000 with ARD prior. 1500 • 4th order LGSS model. • Using T = 1000 measurements. • Employ the PG-BS sampler with N = 15 particles. • Run 15000 MCMC iterations, discard 5000 as burn-in. 1000 500 0 1 2 3 4 5 6 7 8 9 10 11 ARD precision parameters δj = 1, . . . , 11. The rightmost box plot corresponds to the input signal. Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET 1 True Posterior mean 99 % credibility 20 True Posterior mean 99 % credibility 0.8 0.6 10 0.4 0 −10 100 Phase (deg) 21(23) h(z) Magnitude (dB) Data driven learning (I/II) 0.2 0 −0.2 −0.4 0 −0.6 −100 −0.8 −1 −200 −1.5 0 0.5 1 1.5 2 2.5 3 −1 −0.5 0 z 0.5 1 1.5 Frequency (rad/s) Bode diagram of the 4th-order linear system. Estimated mean, true system and 99% credibility intervals (blue). Learning Wiener Models Thomas B. Schön Estimated mean (thick black), 99% credibility intervals (blue) and the true static nonlinearity (non-monotonic) (black). AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Conclusions 22(23) vt ut L et zt h(·) Σ yt • The Wiener model offeres a “controlled” venture into the realm of nonlinear dynamical models. • Note the “characterization of the uncertainty”. • Three cases studied • Fully parametric (ML – EM-PS) • Semiparametric (Bayes – PG-BS) • Data driven semiparametric (Bayes – PG-BS) Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET Learning Wiener models – references 23(23) • Maximum Likelihood approach (EM-PS) Adrian Wills, Thomas B. Schön, Lennart Ljung and Brett Ninness. Identification of Hammerstein-Wiener Models. Automatica, 2012. (Accepted for publication) Thomas B. Schön, Adrian Wills and Brett Ninness. System Identification of Nonlinear State-Space Models. Automatica, 47(1):39-49, January 2011. • Bayesian approach (PG-BS) Fredrik Lindsten, Thomas B. Schön and Michael I. Jordan. Data driven Wiener system identification. Automatica, 2012 (submitted). Christophe Andrieu, Arnaud Doucet and Roman Holenstein, Particle Markov chain Monte Carlo methods, Journal of the Royal Statistical Society: Series B, 72:269-342, 2010. M ATLAB code is available from www.control.isy.liu.se/˜lindsten/code.html Learning Wiener Models Thomas B. Schön AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET