[Supplementary Material] Jiajia Luo, Wei Wang, and Hairong Qi The University of Tennessee, Knoxville {jluo9, wwang34, hqi}@utk.edu 1. Details of Coefficients Calculation where sij = [yji ; yji ; 0; . . . ; 0] | {z } The objective function of DL-GSGC is: C X 2 2 {kYi − DXi kF + kYi − D∈i Xi kF + i=1 XNi 2 i 2 |xj |1 + λ2 kXi kF } min kD∈i / Xi k F + λ 1 j=1 D,X (1) J N X X kαi − xj k22 wij + λ3 f +K D0i = [D; D∈i ; D∈i / ; L(xij ) = Ai X p λ2 I] (6) kαm − xij k22 wmj (7) m=1 To use the feature-sign search for coefficients calculation, Eq. 4 can be rewritten as: 2 i i 0 xi + √ λ1 − D | + |x s 1 i j j 2 2 + λ2 j min (8) A i λ3 X p xij i 2 k 2 + λ2 αm − xj k2 wmj 2 + λ2 m=1 √ where xij = 2 + λ2 xij and D0i is the l2 column-wise normalized version of D0i . To simplify notation, we present the following equivalent optimization problem: i=1 j=1 subject to kdk k22 = 1, (5) ∀k = 1, 2, . . . , K The Eq. 1 can be rewritten as: 2 Ni C Yi − DXi X X i {Yi − D∈i Xi + λ1 |xj |1 i=1 D X j=1 i ∈i / (2) min F D,X J N XX 2 kαi − xj k22 wij + λ2 kXi kF } + λ3 min f (x) ≡ h(x) + λ|x|1 x i=1 j=1 h(x) = ks − Bxk22 + γ After fixing the dictionary D, the coefficients for the j-th feature in class i, xij , can be calculated by solving the following convex problem: 2 i i y − Dx j j i i y − D x ∈i j j i + λ |x | 1 j 1 i 0 − D∈i x / j p (3) min xij 0 − λ2 Ixij 2 J X i 2 + λ3 kα − x k w m j 2 mj Ai X (9) kβ m − xk22 wm m=1 λ3 λ = γ = 2+λ and β m = where B = 2 √ 2 + λ2 αm . Then the first derivative of h(x) over x can be represented as: D0i , √ λ1 , 2+λ2 Ai X ∇h(x) = − 2BT s − 2γ wm β m m=1 T + 2(B B + γ Ai X (10) wm I)x m=1 m=1 Details of coefficients calculation using the feature-sign search method [1] are provided in Algorithm 1. According to the Algorithm 1, solutions for Eq. 8 can be 1 obtained as xij . The coefficients should be xij = √2+λ xij . K×K where I ∈ R is an identity matrix. To remove the influence of shared features among classes, we use templates belonging to the same class as the input feature for similarity measure at this stage. Therefore, the objective function becomes: n o sij − D0i xij 2 + λ1 |xij |1 + λ3 L(xij ) min (4) 2 i 2 References [1] H. Lee, A. Battle, R. Raina, and A. Y. Ng. Efficient sparse code algorithms. NIPS, 2007. 1 xj 1 Algorithm 1 Coefficients Calculation Input: β = [β 1 , . . . , β Ai ], W = [w1 , . . . , wAi ], λ, γ, B, s, and h(x) Output: xT = [x1 , . . . , xK ] 1: Step 1: Initialization 2: x = 0, θ = 0, and active set H = {}, where θi = {1, 0, −1} denotes sign(xi ). 3: Setp 2: Active Set Update 4: From zero coefficients of x, choose the one with the maximum absolute value of the first derivative: i = arg maxi |∇h(xi )| 5: if ∇h(xi ) > λ then 6: set θi = −1, H = {i} ∪ H 7: end if 8: if ∇h(xi ) < −λ then 9: set θi = 1, H = {i} ∪ H 10: end if 11: Setp 3: Feature-sign step 12: Let B̂ be a submatrix of B that contains only the columns corresponding to the active set H. Let x̂, β̂ m , and θ̂ be subvectors of x, β m , and θ corresponding to the active set H. 13: Compute the analytical solution according to Eqs. 9 PAi and 10: x̂new = (B̂T B̂ + γ m=1 wm I)−1 (B̂T s + PAi λθ̂ γ m=1 wm β̂ m − 2 ) 14: Perform a discrete line search on the closed line segment from x̂ to x̂new : check the objective value at x̂new and all points where any coefficients changes sign; and then update x̂ to the point with the lowest objective value. 15: Remove zero coefficients of x̂ from the active set H and update θ = sign(x). 16: Step 4: Check the Optimality Conditions 17: if ∇h(xj ) + λsign(xj ) = 0, ∀xj 6= 0 then 18: if |∇h(xj )| ≤ λ, ∀xj = 0 then 19: Return x as solution 20: else 21: Go to Step 2 22: end if 23: else 24: Go to Step 3 25: end if