6. Approximation and fitting • norm approximation • least-norm problems • regularized approximation

advertisement
Convex Optimization — Boyd & Vandenberghe
6. Approximation and fitting
• norm approximation
• least-norm problems
• regularized approximation
• robust approximation
6–1
Norm approximation
minimize kAx − bk
(A ∈ Rm×n with m ≥ n, k · k is a norm on Rm)
interpretations of solution x⋆ = argminx kAx − bk:
• geometric: Ax⋆ is point in R(A) closest to b
• estimation: linear measurement model
y = Ax + v
y are measurements, x is unknown, v is measurement error
given y = b, best guess of x is x⋆
• optimal design: x are design variables (input), Ax is result (output)
x⋆ is design that best approximates desired result b
Approximation and fitting
6–2
examples
• least-squares approximation (k · k2): solution satisfies normal equations
AT Ax = AT b
(x⋆ = (AT A)−1AT b if rank A = n)
• Chebyshev approximation (k · k∞): can be solved as an LP
minimize t
subject to −t1 Ax − b t1
• sum of absolute residuals approximation (k · k1): can be solved as an LP
minimize 1T y
subject to −y Ax − b y
Approximation and fitting
6–3
Penalty function approximation
minimize φ(r1) + · · · + φ(rm)
subject to r = Ax − b
(A ∈ Rm×n, φ : R → R is a convex penalty function)
examples
• deadzone-linear with width a:
φ(u) = max{0, |u| − a}
• log-barrier with limit a:
φ(u) =
log barrier
quadratic
1.5
φ(u)
• quadratic: φ(u) = u
2
2
1
deadzone-linear
0.5
0
−1.5
−1
−0.5
0
u
0.5
1
1.5
−a2 log(1 − (u/a)2) |u| < a
∞
otherwise
Approximation and fitting
6–4
example (m = 100, n = 30): histogram of residuals for penalties
φ(u) = u2,
φ(u) = |u|,
φ(u) = max{0, |u|−a},
φ(u) = − log(1−u2)
Log barrier
Deadzone
p=2
p=1
40
0
−2
10
−1
0
1
2
0
−2
−1
0
1
2
0
−2
10
−1
0
1
2
0
−2
−1
0
1
2
20
r
shape of penalty function has large effect on distribution of residuals
Approximation and fitting
6–5
Huber
replacements
penalty function (with parameter M )
φhub(u) =
u2
|u| ≤ M
M (2|u| − M ) |u| > M
linear growth for large u makes approximation less sensitive to outliers
2
20
10
f (t)
φhub(u)
1.5
1
0.5
0
−1.5
0
−10
−1
−0.5
0
u
0.5
1
1.5
−20
−10
−5
0
t
5
10
• left: Huber penalty for M = 1
• right: affine function f (t) = α + βt fitted to 42 points ti, yi (circles)
using quadratic (dashed) and Huber (solid) penalty
Approximation and fitting
6–6
Least-norm problems
minimize kxk
subject to Ax = b
(A ∈ Rm×n with m ≤ n, k · k is a norm on Rn)
interpretations of solution x⋆ = argminAx=b kxk:
• geometric: x⋆ is point in affine set {x | Ax = b} with minimum
distance to 0
• estimation: b = Ax are (perfect) measurements of x; x⋆ is smallest
(’most plausible’) estimate consistent with measurements
• design: x are design variables (inputs); b are required results (outputs)
x⋆ is smallest (’most efficient’) design that satisfies requirements
Approximation and fitting
6–7
examples
• least-squares solution of linear equations (k · k2):
can be solved via optimality conditions
2x + AT ν = 0,
Ax = b
• minimum sum of absolute values (k · k1): can be solved as an LP
minimize 1T y
subject to −y x y,
Ax = b
tends to produce sparse solution x⋆
extension: least-penalty problem
minimize φ(x1) + · · · + φ(xn)
subject to Ax = b
φ : R → R is convex penalty function
Approximation and fitting
6–8
Regularized approximation
minimize (w.r.t. R2+) (kAx − bk, kxk)
A ∈ Rm×n, norms on Rm and Rn can be different
interpretation: find good approximation Ax ≈ b with small x
• estimation: linear measurement model y = Ax + v, with prior
knowledge that kxk is small
• optimal design: small x is cheaper or more efficient, or the linear
model y = Ax is only valid for small x
• robust approximation: good approximation Ax ≈ b with small x is
less sensitive to errors in A than good approximation with large x
Approximation and fitting
6–9
Scalarized problem
minimize kAx − bk + γkxk
• solution for γ > 0 traces out optimal trade-off curve
• other common method: minimize kAx − bk2 + δkxk2 with δ > 0
Tikhonov regularization
minimize kAx − bk22 + δkxk22
can be solved as a least-squares problem
2
A
b √
minimize x−
0 2
δI
solution x⋆ = (AT A + δI)−1AT b
Approximation and fitting
6–10
Optimal input design
linear dynamical system with impulse response h:
y(t) =
t
X
τ =0
h(τ )u(t − τ ),
t = 0, 1, . . . , N
input design problem: multicriterion problem with 3 objectives
PN
1. tracking error with desired output ydes: Jtrack = t=0(y(t) − ydes(t))2
PN
2. input magnitude: Jmag = t=0 u(t)2
PN −1
3. input variation: Jder = t=0 (u(t + 1) − u(t))2
track desired output using a small and slowly varying input signal
regularized least-squares formulation
minimize Jtrack + δJder + ηJmag
for fixed δ, η, a least-squares problem in u(0), . . . , u(N )
Approximation and fitting
6–11
example: 3 solutions on optimal trade-off surface
(top) δ = 0, small η; (middle) δ = 0, larger η; (bottom) large δ
1
0.5
0
y(t)
u(t)
5
−5
−10
0
−0.5
50
4
100
t
150
200
y(t)
u(t)
100
t
150
200
50
100
t
150
200
50
100
t
150
200
0
−0.5
−2
50
4
100
t
150
200
−1
0
1
2
0.5
y(t)
u(t)
50
0.5
0
0
0
−0.5
−2
−4
0
−1
0
1
2
−4
0
0
50
Approximation and fitting
100
t
150
200
−1
0
6–12
Signal reconstruction
minimize (w.r.t. R2+) (kx̂ − xcork2, φ(x̂))
• x ∈ Rn is unknown signal
• xcor = x + v is (known) corrupted version of x, with additive noise v
• variable x̂ (reconstructed signal) is estimate of x
• φ : Rn → R is regularization function or smoothing objective
examples: quadratic smoothing, total variation smoothing:
φquad(x̂) =
n−1
X
i=1
Approximation and fitting
(x̂i+1 − x̂i)2,
φtv (x̂) =
n−1
X
i=1
|x̂i+1 − x̂i|
6–13
quadratic smoothing example
0.5
x
x̂
0.5
0
−0.5
0
0
1000
2000
3000
4000
1000
2000
3000
4000
1000
2000
3000
4000
0.5
0
1000
2000
3000
4000
0
−0.5
0.5
0
0.5
0
x̂
xcor
x̂
−0.5
−0.5
0
0
−0.5
1000
2000
3000
i
original signal x and noisy
signal xcor
Approximation and fitting
4000
0
i
three solutions on trade-off curve
kx̂ − xcork2 versus φquad(x̂)
6–14
total variation reconstruction example
x̂i
2
2
x
1
0
−2
0
2
0
500
1000
1500
2000
500
1000
1500
2000
500
1000
1500
2000
−2
0
x̂i
−1
500
1000
1500
−2
0
2
2
1
0
x̂i
xcor
0
2000
0
−1
−2
0
500
1000
i
1500
original signal x and noisy
signal xcor
2000
−2
0
i
three solutions on trade-off curve
kx̂ − xcork2 versus φquad(x̂)
quadratic smoothing smooths out noise and sharp transitions in signal
Approximation and fitting
6–15
x̂
2
2
x
1
0
−2
0
2
0
500
1000
1500
2000
500
1000
1500
2000
500
1000
1500
2000
−2
0
x̂
−1
500
1000
1500
2000
−2
0
2
2
1
0
x̂
xcor
0
0
−1
−2
0
500
1000
i
1500
original signal x and noisy
signal xcor
2000
−2
0
i
three solutions on trade-off curve
kx̂ − xcork2 versus φtv (x̂)
total variation smoothing preserves sharp transitions in signal
Approximation and fitting
6–16
Robust approximation
minimize kAx − bk with uncertain A
two approaches:
• stochastic: assume A is random, minimize E kAx − bk
• worst-case: set A of possible values of A, minimize supA∈A kAx − bk
tractable only in special cases (certain norms k · k, distributions, sets A)
12
example: A(u) = A0 + uA1
• xstoch minimizes E kA(u)x − bk22
with u uniform on [−1, 1]
• xwc minimizes sup−1≤u≤1 kA(u)x − bk22
figure shows r(u) = kA(u)x − bk2
Approximation and fitting
xnom
8
r(u)
• xnom minimizes kA0x −
10
bk22
6
4
xstoch
xwc
2
0
−2
−1
0
u
1
2
6–17
stochastic robust LS with A = Ā + U , U random, E U = 0, E U T U = P
minimize E k(Ā + U )x − bk22
• explicit expression for objective:
E kAx − bk22 = E kĀx − b + U xk22
= kĀx − bk22 + E xT U T U x
= kĀx − bk22 + xT P x
• hence, robust LS problem is equivalent to LS problem
minimize kĀx − bk22 + kP 1/2xk22
• for P = δI, get Tikhonov regularized problem
minimize kĀx − bk22 + δkxk22
Approximation and fitting
6–18
worst-case robust LS with A = {Ā + u1A1 + · · · + upAp | kuk2 ≤ 1}
minimize supA∈A kAx − bk22 = supkuk2≤1 kP (x)u + q(x)k22
where P (x) =
A1 x A2 x · · ·
Apx , q(x) = Āx − b
• from page 5–14, strong duality holds between the following problems
maximize kP u + qk22
subject to kuk22 ≤ 1
minimize
t+ λ
I
subject to  P T
qT
P
λI
0

q
0 0
t
• hence, robust LS problem is equivalent to SDP
minimize
t+ λ
I
subject to  P (x)T
q(x)T
Approximation and fitting

P (x) q(x)
λI
0 0
0
t
6–19
example: histogram of residuals
r(u) = k(A0 + u1A1 + u2A2)x − bk2
with u uniformly distributed on unit disk, for three values of x
0.25
xrls
frequency
0.2
0.15
0.1
xtik
xls
0.05
0
0
1
2
r(u)
3
4
5
• xls minimizes kA0x − bk2
• xtik minimizes kA0x − bk22 + δkxk22 (Tikhonov solution)
• xrls minimizes supA∈A kAx − bk22 + kxk22
Approximation and fitting
6–20
Download