Nonlinear Estimation Professor David H. Staelin Massachusetts Institute of Technology Lec22.5-1 3/27/01 Case I: Nonlinear Physics p̂ = [D1D2 ] ⎡1 ⎤ ⎢ d⎥ d ⎣ ⎦ Data Best Fit, "Linear Regression" D1 D2 Slope P{p} Optimum ˆ Estimator P(d) Parameter p Probability Distribution p 0 Minimum Square Error (MSE) is usually the "optimum" sought. It is important that the "training" data used for regression is similar to, but independent of, the "test" data used for performance evaluation. Lec22.5-2 3/27/01 J1 Case II: Non-Gaussian Statistics d Best linear estimator dA Maximum a posteriori probability "MAP" estimator p̂ dA 0 Best Nonlinear Estimator (avoids pˆ < 0). p P { p dA } p p̂ dA Minimizes MSE given dA ˆ Note: Linear estimator can set p<0 (which is non-physical): nonlinear estimator approaches correct asymptotes as d → ±∞ Note: Physics here is linear. Can use polynomials etc. for p̂(d), or recursive linear estimators updating D. Lec22.5-3 3/27/01 J2 Linear Estimates for Nonlinear Problems p d2 Optimum Linear Estimate d1 p d1 Truth (without noise) Optimum Linear d2 Estimate Linear estimates are suboptimum for d1 or d2 alone. d1 = a0 + a1p + a2p2 and d2 = b0 + b1p + b2p2 Say Then p2 = ( d2 − b0 − b1p ) / b2 d1 = a0 + a1p + a2 ( d2 − b0 − b1p ) / b2 = c 0 + c1p + c 2 d2 (defines plane in ( p, d1, d2 ) - space Therefore pˆ = p = ( −c 0 + d1 − c 2 d2 ) / c1 where c1 = a1 − a2b1 ≠0 b2 p̂ ( d1, d2 ) can be linear in d1, d2 and (noiseless case) perfect, even though p ( d1, d2 ) is nonlinear In general, the more varied the data d, the smaller the performance gap between linear estimators and nonlinear ones, which are a superset. Lec22.5-4 3/27/01 J3 Linear Estimates for Nonlinear Problems p p ( d1 d2 ) p ( d1,d2 ) p ( d2 d1 ) Plane pˆ ( d1, d2 ) is the linear estimator d2 ∆p d1 Plane pˆ ( d1,d2 ) has two angular degrees of freedom to use for minimizing (∆p). Lec22.5-5 3/27/01 J4 Generalization to Nth-Order Nonlinearities Let d1 = c1 + a11p + a12p2 + … + a1npn d2 = c 2 + a21p + a22p2 + … + a2npn • • • dn = c n + an1p + an2p2 + … + annpn where d1, d2,…, dn are observed noise-free data related to p by an nth-order polynomial. Claim: In non-singular cases there exists an exact linear estimator pˆ = Dd + constant. Note: For perfect linear estimation the number of different observations must equal or exceed n, the maximum order of the polynomials characterizing the physics. Lec22.5-6 3/27/01 J5 Nonlinear Estimators for Nonlinear Problems Approaches: 1) pˆ = Ddaug where d is augmented via polynomials 2) Rank reduction first via KLT 3) Neural nets 4) Iterations with D ( pˆ ) Case (1): daug = 1, d1, d2 , d12, d22, d1d2 , d13 , for example Case (2): Use of KLT to Reduce Rank Option A: d′ daug Nonlinear KLT d Augmentation Ddaug p̂ Drop Useful if d is of High Order Lec22.5-7 3/27/01 L1 “Karhunen-Loeve Transform” (KLT) ∆ d′ = Kd transforms a noisy redundant data vector d into a [reduced-rank ] vector d′ with most variance in d1′, least in d′n , where E ⎣⎡d′id′j ⎦⎤ = δijλi , and λi ≥ λ j>i ⎡λ1 . . . .0 ⎤ t t ∆ ⎡ ⎥K Cdd = E dd ⎤ = K ⎢ λ2 ⎢⎣ ⎥⎦ ⎢ ⎥ λ 0 . . . . n⎦ ⎣ e.g. h 2 10KM 3 T(h) 1 Basis Functions Are Rows of K KLT The KLT is essentially the same as Principal Component Analysis (PCA) and Empirical Orthogonal Functions (EOF). Lec22.5-8 3/27/01 L2 Nonlinear Estimators Using Rank Reduction Option B: d Nonlinear augmentation Dd′ KLT p̂ Drop Useful if d is very nonlinear Option C: d KLT d′ drop Lec22.5-9 3/27/01 Nonlinear augmentation daug KLT d′ aug Dd′aug p̂ drop L3 Neural Networks (NN) NN d 1 d1 dN W01 W11 WN1 WNM d p̂ Neural nets compute complex polynomials with great efficiency, simplicity. Σ 1 d1′ Σ M dM′ NN1 d′ NN2 NN coefficients computed via relaxation optimization back propagation. d′′ NN3 WNM p̂ "Hidden Layers" NN may be recursive or purely Feed-Forward (FFNN) Lec22.5-10 3/27/01 L4 Neural Networks (NN) Weights Wi j are found by guessing, then perturbing to reduce cost function on ⎡⎣pˆ − p⎤⎦ over "training set" of known ⎡⎣ d,p⎤⎦ . Training can be tedious. Commercial software tools are commonly used. Degrees of freedom in training set should exceed the number of weights by a modest factor of ~ 3-10. Risks of "overtraining" are reduced by monitoring NN performance on independent test data. The more nonlinear problems need more layers and more internal nodes, as determined empirically. Neural nets can also perform recognition tasks. Simple neural nets are easier to train correctly, so merge with linear techniques, e.g. d Lec22.5-11 3/27/01 NN Linear Estimator p̂ L5 Accommodating Nonlinearity via Iteration A. Library Retrievals d Do ∧ Di ∧ Dk p pi p Dk Do Successive Di Chosen ⎛∧ ⎞ from Library as f ⎜ p i ⎟ ⎝ ⎠ d Lec22.5-12 3/27/01 M1 Accommodating Nonlinearity via Iteration B. Physics Recomputation ∧ po "First Guess" ∧ d + + ∆pi D − + ∧ n ∧ po + ∑ ∆pi ∧ pn i =1 d̂i +1 W Nonlinear Physics ∧ Essential only that D always yields ∆ pi in right direction (so errors shrink). Lec22.5-13 3/27/01 M2 Accommodating Nonlinearity via Iteration C. Spatial Iteration 5 4 6 7 8 Space i Works well if data is smooth. ∧ ∧ + di + − Delay ∆pi + ∆di D ∧ + ∧ ∧ di −1 pi − 1 pi Delay Alternate Method (Use if ∆di > Threshold Lec22.5-14 3/27/01 M3 Multi-Dimensional Estimation e.g., Atmospheric Temperature Profiles 1) 2) ( ) Tˆ ( h, x, y ) = f ( d ( x, y ) , d ( x ± ∆, y ± ∆ ) ,...) Tˆ ( h, x, y ) = f d ( x, y ) Estimate improved by additional correlations and degrees of freedom. Lec22.5-15 3/27/01 M4 Genetic Algorithms 1) Characterize algorithm by segmented character string, e.g., a binary number. 2) Significance of string is defined, e.g., A. As weights in a linear estimator or neural net. B. It characterizes the architecture of a neural net. 3) Many algorithms (strings) are tested, and the metrics for each are evaluated for some ensemble of test cases. 4) Randomly combine elements of best algorithms to form new ones; some random changes may be made too. 5) Iterate steps (3) and(4) until asymmtotic optimum is reached. Can be used for pattern recognition, signal detection, parameter estimation, and other purposes. 6) Proper choice of "gene" size accelerates convergence (genes swapped as units). Lec22.5-16 3/27/01 M5