HW3

advertisement
This assignment asks you to match a set of points P=(p1,…,pn) to another set of points Q=(q1,…,qn) by
finding an optimal transformation T. Similar problems are commonly faced in image processing, image
retrieval, computer vision, and robotics.
1. Consider the version of this problem in two dimensions, and consider finding a linear
transformation 𝑇(𝑥) = 𝐴𝑥, where A is a 2x2 matrix
𝑎11 𝑎12
𝐴 = [𝑎
]
21 𝑎22
Write out the expression f(A) that computes the sum of squared distances between pi and Aqi in
terms of the variable matrix A. Then, compute the partial derivatives of this expression with
respect to a11, a12, a21, and a22. You may perform the partial derivatives individually, or if you are
feeling up to working with matrix derivatives, use the facts that ∇𝐴 (𝑢𝑇 𝐴𝑣) = 𝑢𝑣 𝑇 and
∇𝐴 (𝑢𝑇 𝐴𝑇 𝐴 𝑢) = 2𝐴𝑢𝑢𝑇 for any vectors u, v.
2. Assemble the partial derivatives that you obtained in question 1 into matrix form:
𝜕𝑓
𝜕𝑓
𝜕𝑎11 𝜕𝑎12
(𝐴)
𝜕𝑓
𝜕𝑓
[𝜕𝑎21 𝜕𝑎22 ]
and use matrix algebra to solve for the value(s) of A that are critical points of f(A). What
matrices do you have to assume are invertible?
3. If you wished to solve this problem using Newton’s method, how fast would it converge? Think
about the form of the gradient, and give the tightest answer that you can for this particular
problem.
4. Now consider the constraint that the matrix must be symmetric (i.e., A=AT). Write down the
condition involving Lagrange multipliers that must be satisfied by a solution to this problem
(hint: there should be four variables, one constraint, and one Lagrange multiplier). Rearrange
the expression so that A is on the left hand side and an expression involving  (and no other
optimization variables) is on the right hand side. Consider how this expression differs from your
answer for question 2. In light of this difference and the symmetricity constraint, what is the
role of ? (You are not asked to solve for the optimal symmetric matrix)
5. Now consider the constraint that the matrix must be orthogonal (i.e., ATA=I). Write down the
condition involving Lagrange multipliers that must be satisfied by a solution to this problem
(hint: there should be four variables, four constraints, and four Lagrange multipliers).
6. Challenge question: solve the condition you wrote in question 5 to produce the optimal
orthogonal transformation matrix(ces). Hint: write out the derivatives of the constraints as
products of A with constant matrices. What is the geometric interpretation of the solution(s)?
1. f(A) = sumi ||pi – Aqi||2= sumi (piTpi + qiTATAqi – 2 piTAqi )
2. f(A) = sumi [2 A qiqiT – 2 piqiT ] = 0 implies A = (sumi piqiT) (sumj qjqjT)-1. Here we have to have
the matrix sumj qjqjT be non-singular, which essentially means that all of the qi’s need to span a
2-dimensional subspace of R2.
3. Newton’s method would converge in one step. Notice that the gradient is a linear function of
each element of A, and so the Hessian matrix is constant. Also recall that Newton’s method,
when applied to minimization, constructs a best-fit quadratic form to a function and solves for
the optimal step for the quadratic form. Since the quadratic form is exact in this case, the
optimum is reached in a single step.
4. The constraint A=AT can be written as g(A) = A12-A21=0 (because the constraints on A11 and A22
are vacuous). Taking the matrix derivative of g with respect to A, we have
g(A) = [0 -1]
[1 0]
So the Lagrange equation that needs to be solved is f(A*) + * g = 0.
We need to find A* = (sum piqiT ) (sum qiqiT)-1 + ½ * g (sum qiqiT)-1
so that A*12 = A*21.
To solve this equation, let Let B = (sum qiqiT)-1 and C= sum piqiT.
A* = C B + ½ * [0 -1;1 0] B
Look at off-diagonal elements of CB: C11 B12 + C12 B22 and C21 B11 + C22 B21
Look at off diagonal elements of [0 -1;1 0]B: -B22 and B11.
Solve for * so that the off diagonal elements are equal: C11 B12 + C12 B22 – ½ lambda*B22 = C21 B11 + C22
B21 + ½ *B11
=>
2(C11 B12 + C12 B22 - C21 B11 - C22 B21 ))/ (B22+B11) = *
A21*= A12* = [B11(C11 B12 + C12 B22)– B22(C21 B11 + C22 B21 ))]/ (B22+B11)
The role of the Lagrange multiplier lambda is to “resist” the breaking of symmetry by adding in an antisymmetric component to A, namely dg/dA B.
5.
The orthogonality constraint ATA=I gives the constraints
g1(A) = A112 + A212-1 = 0
g2(A) = A222 + A122-1 = 0
g3(A) = A11A12 + A21A22 = 0
g4(A) = A12A11 + A22A21 = 0
(note that constraints 3 and 4 are identical)
g1(A) = [2 A11 0] = A [ 2 0; 0 0]
[2 A21 0]
g2(A) = [0 2A12] = A [ 0 0; 0 2]
[0 2A22]
g3(A) = g4(A) = [A12 A11] = A [0 1; 1 0]
[A22 A21]
So we need to solve for Lagrange multipliers 1, 2, 3, 4 such that the condition
f(A*) + 1 g1 (A*)+ 2 g2(A*) + 3 g3(A*) + 4 g4 (A*) = 0
and all of the gi(A*) =0 are satisfied.
6.
A* (2 sumi qiqiT + [21, 3+4; 3+4, 22]) = 2 sumi piqiT
Since it doesn’t really matter what 3 and 4 are individually, let’s just say 3=4.
A* (sumi qiqiT + [1, 3; 3, 2]) = sumi piqiT
Let B = sumi qiqiT and C = sumi piqiT
Since A* is invertible and orthogonal, we can multiply both sides by A*T to get
B + [1, 3; 3, 2] = A*T C.
So [1, 3; 3, 2] = A*T C - B. Notice that the diagonal of this matrix can be any arbitrary value
and the condition is satisfied. The only curious component is the off-diagonal, which can be any
arbitrary value but needs to be equal on both the upper left and lower right. So,
3 = A*11 C12 + A*21 C22 - B12 = A*12 C11 + A*22 C21 – B21.
Note that B is symmetric, so the B term drops out.
3-B12 = A*11 C12 + A*21 C22 = A*12 C11 + A*22 C21.
Now combine this constraint with the three equalities
A*112 + A*212 = 1
A*222 + A*122 = 1
A*11A*12 + A*21A*22 = 0
Rearrange equality 3 to obtain A*11 = - A*21A*22 / A*12 and replace it in equality 1:
A*212 (A*222 + A*122) = A*122
Since A*222 + A*122 = 1, we get
A*212 = A*122
And hence A*112 = A*222 , which means either (a) A*21 = A*12 and A*11 = - A*22, or (b) A*21 = -A*12 and A*11 =
A*22.
Let’s take case (a). The 3-B12 equality states that -A*22 C12 + A*12 C22 = A*12 C11 + A*22 C21, which implies
that 0=A*12 (C11- C22) + A*22 (C21+ C12), which means that A*22/ A*12 = (C21+ C12)/ (C22- C11). This gives us a
ratio, for which we should choose a scale such that the magnitude of the vector (A*22, A*12) is 1. The
remaining elements of A* can then be determined using the case’s conditions. A similar equation is
given for case b). In fact, the resulting matrix A* in case (a) is a mirror+ rotation and in case (b) it is a
pure rotation.
Download