Convex Optimization — Boyd & Vandenberghe 6. Approximation and fitting • norm approximation • least-norm problems • regularized approximation • robust approximation 6–1 Norm approximation minimize kAx − bk (A ∈ Rm×n with m ≥ n, k · k is a norm on Rm) interpretations of solution x⋆ = argminx kAx − bk: • geometric: Ax⋆ is point in R(A) closest to b • estimation: linear measurement model y = Ax + v y are measurements, x is unknown, v is measurement error given y = b, best guess of x is x⋆ • optimal design: x are design variables (input), Ax is result (output) x⋆ is design that best approximates desired result b Approximation and fitting 6–2 examples • least-squares approximation (k · k2): solution satisfies normal equations AT Ax = AT b (x⋆ = (AT A)−1AT b if rank A = n) • Chebyshev approximation (k · k∞): can be solved as an LP minimize t subject to −t1 Ax − b t1 • sum of absolute residuals approximation (k · k1): can be solved as an LP minimize 1T y subject to −y Ax − b y Approximation and fitting 6–3 Penalty function approximation minimize φ(r1) + · · · + φ(rm) subject to r = Ax − b (A ∈ Rm×n, φ : R → R is a convex penalty function) examples • deadzone-linear with width a: φ(u) = max{0, |u| − a} • log-barrier with limit a: φ(u) = log barrier quadratic 1.5 φ(u) • quadratic: φ(u) = u 2 2 1 deadzone-linear 0.5 0 −1.5 −1 −0.5 0 u 0.5 1 1.5 −a2 log(1 − (u/a)2) |u| < a ∞ otherwise Approximation and fitting 6–4 example (m = 100, n = 30): histogram of residuals for penalties φ(u) = u2, φ(u) = |u|, φ(u) = max{0, |u|−a}, φ(u) = − log(1−u2) Log barrier Deadzone p=2 p=1 40 0 −2 10 −1 0 1 2 0 −2 −1 0 1 2 0 −2 10 −1 0 1 2 0 −2 −1 0 1 2 20 r shape of penalty function has large effect on distribution of residuals Approximation and fitting 6–5 Huber replacements penalty function (with parameter M ) φhub(u) = u2 |u| ≤ M M (2|u| − M ) |u| > M linear growth for large u makes approximation less sensitive to outliers 2 20 10 f (t) φhub(u) 1.5 1 0.5 0 −1.5 0 −10 −1 −0.5 0 u 0.5 1 1.5 −20 −10 −5 0 t 5 10 • left: Huber penalty for M = 1 • right: affine function f (t) = α + βt fitted to 42 points ti, yi (circles) using quadratic (dashed) and Huber (solid) penalty Approximation and fitting 6–6 Least-norm problems minimize kxk subject to Ax = b (A ∈ Rm×n with m ≤ n, k · k is a norm on Rn) interpretations of solution x⋆ = argminAx=b kxk: • geometric: x⋆ is point in affine set {x | Ax = b} with minimum distance to 0 • estimation: b = Ax are (perfect) measurements of x; x⋆ is smallest (’most plausible’) estimate consistent with measurements • design: x are design variables (inputs); b are required results (outputs) x⋆ is smallest (’most efficient’) design that satisfies requirements Approximation and fitting 6–7 examples • least-squares solution of linear equations (k · k2): can be solved via optimality conditions 2x + AT ν = 0, Ax = b • minimum sum of absolute values (k · k1): can be solved as an LP minimize 1T y subject to −y x y, Ax = b tends to produce sparse solution x⋆ extension: least-penalty problem minimize φ(x1) + · · · + φ(xn) subject to Ax = b φ : R → R is convex penalty function Approximation and fitting 6–8 Regularized approximation minimize (w.r.t. R2+) (kAx − bk, kxk) A ∈ Rm×n, norms on Rm and Rn can be different interpretation: find good approximation Ax ≈ b with small x • estimation: linear measurement model y = Ax + v, with prior knowledge that kxk is small • optimal design: small x is cheaper or more efficient, or the linear model y = Ax is only valid for small x • robust approximation: good approximation Ax ≈ b with small x is less sensitive to errors in A than good approximation with large x Approximation and fitting 6–9 Scalarized problem minimize kAx − bk + γkxk • solution for γ > 0 traces out optimal trade-off curve • other common method: minimize kAx − bk2 + δkxk2 with δ > 0 Tikhonov regularization minimize kAx − bk22 + δkxk22 can be solved as a least-squares problem 2 A b √ minimize x− 0 2 δI solution x⋆ = (AT A + δI)−1AT b Approximation and fitting 6–10 Optimal input design linear dynamical system with impulse response h: y(t) = t X τ =0 h(τ )u(t − τ ), t = 0, 1, . . . , N input design problem: multicriterion problem with 3 objectives PN 1. tracking error with desired output ydes: Jtrack = t=0(y(t) − ydes(t))2 PN 2. input magnitude: Jmag = t=0 u(t)2 PN −1 3. input variation: Jder = t=0 (u(t + 1) − u(t))2 track desired output using a small and slowly varying input signal regularized least-squares formulation minimize Jtrack + δJder + ηJmag for fixed δ, η, a least-squares problem in u(0), . . . , u(N ) Approximation and fitting 6–11 example: 3 solutions on optimal trade-off surface (top) δ = 0, small η; (middle) δ = 0, larger η; (bottom) large δ 1 0.5 0 y(t) u(t) 5 −5 −10 0 −0.5 50 4 100 t 150 200 y(t) u(t) 100 t 150 200 50 100 t 150 200 50 100 t 150 200 0 −0.5 −2 50 4 100 t 150 200 −1 0 1 2 0.5 y(t) u(t) 50 0.5 0 0 0 −0.5 −2 −4 0 −1 0 1 2 −4 0 0 50 Approximation and fitting 100 t 150 200 −1 0 6–12 Signal reconstruction minimize (w.r.t. R2+) (kx̂ − xcork2, φ(x̂)) • x ∈ Rn is unknown signal • xcor = x + v is (known) corrupted version of x, with additive noise v • variable x̂ (reconstructed signal) is estimate of x • φ : Rn → R is regularization function or smoothing objective examples: quadratic smoothing, total variation smoothing: φquad(x̂) = n−1 X i=1 Approximation and fitting (x̂i+1 − x̂i)2, φtv (x̂) = n−1 X i=1 |x̂i+1 − x̂i| 6–13 quadratic smoothing example 0.5 x x̂ 0.5 0 −0.5 0 0 1000 2000 3000 4000 1000 2000 3000 4000 1000 2000 3000 4000 0.5 0 1000 2000 3000 4000 0 −0.5 0.5 0 0.5 0 x̂ xcor x̂ −0.5 −0.5 0 0 −0.5 1000 2000 3000 i original signal x and noisy signal xcor Approximation and fitting 4000 0 i three solutions on trade-off curve kx̂ − xcork2 versus φquad(x̂) 6–14 total variation reconstruction example x̂i 2 2 x 1 0 −2 0 2 0 500 1000 1500 2000 500 1000 1500 2000 500 1000 1500 2000 −2 0 x̂i −1 500 1000 1500 −2 0 2 2 1 0 x̂i xcor 0 2000 0 −1 −2 0 500 1000 i 1500 original signal x and noisy signal xcor 2000 −2 0 i three solutions on trade-off curve kx̂ − xcork2 versus φquad(x̂) quadratic smoothing smooths out noise and sharp transitions in signal Approximation and fitting 6–15 x̂ 2 2 x 1 0 −2 0 2 0 500 1000 1500 2000 500 1000 1500 2000 500 1000 1500 2000 −2 0 x̂ −1 500 1000 1500 2000 −2 0 2 2 1 0 x̂ xcor 0 0 −1 −2 0 500 1000 i 1500 original signal x and noisy signal xcor 2000 −2 0 i three solutions on trade-off curve kx̂ − xcork2 versus φtv (x̂) total variation smoothing preserves sharp transitions in signal Approximation and fitting 6–16 Robust approximation minimize kAx − bk with uncertain A two approaches: • stochastic: assume A is random, minimize E kAx − bk • worst-case: set A of possible values of A, minimize supA∈A kAx − bk tractable only in special cases (certain norms k · k, distributions, sets A) 12 example: A(u) = A0 + uA1 • xstoch minimizes E kA(u)x − bk22 with u uniform on [−1, 1] • xwc minimizes sup−1≤u≤1 kA(u)x − bk22 figure shows r(u) = kA(u)x − bk2 Approximation and fitting xnom 8 r(u) • xnom minimizes kA0x − 10 bk22 6 4 xstoch xwc 2 0 −2 −1 0 u 1 2 6–17 stochastic robust LS with A = Ā + U , U random, E U = 0, E U T U = P minimize E k(Ā + U )x − bk22 • explicit expression for objective: E kAx − bk22 = E kĀx − b + U xk22 = kĀx − bk22 + E xT U T U x = kĀx − bk22 + xT P x • hence, robust LS problem is equivalent to LS problem minimize kĀx − bk22 + kP 1/2xk22 • for P = δI, get Tikhonov regularized problem minimize kĀx − bk22 + δkxk22 Approximation and fitting 6–18 worst-case robust LS with A = {Ā + u1A1 + · · · + upAp | kuk2 ≤ 1} minimize supA∈A kAx − bk22 = supkuk2≤1 kP (x)u + q(x)k22 where P (x) = A1 x A2 x · · · Apx , q(x) = Āx − b • from page 5–14, strong duality holds between the following problems maximize kP u + qk22 subject to kuk22 ≤ 1 minimize t+ λ I subject to P T qT P λI 0 q 0 0 t • hence, robust LS problem is equivalent to SDP minimize t+ λ I subject to P (x)T q(x)T Approximation and fitting P (x) q(x) λI 0 0 0 t 6–19 example: histogram of residuals r(u) = k(A0 + u1A1 + u2A2)x − bk2 with u uniformly distributed on unit disk, for three values of x 0.25 xrls frequency 0.2 0.15 0.1 xtik xls 0.05 0 0 1 2 r(u) 3 4 5 • xls minimizes kA0x − bk2 • xtik minimizes kA0x − bk22 + δkxk22 (Tikhonov solution) • xrls minimizes supA∈A kAx − bk22 + kxk22 Approximation and fitting 6–20