Complex Support Vector Machines for Regression and Quaternary Classification P.Bouboulis, S.Theodoridis, C.Mavroforakis, L.Dalla Learning (MKL) 19, 2010 (?) (I got it from arXiv) Coffeetalk, 1 May 2014 donderdag 1 mei 2014 Classification of complex data • Classifying bird songs: ï3 8 8000 x 10 6 7000 4 6000 2 frequency 5000 0 ï2 4000 3000 ï4 2000 ï6 ï8 1000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 1 2 3 4 5 time (s) 6 7 8 9 • But... the spectrogram contains complex values! • Can I classify a complex-valued image? 2 donderdag 1 mei 2014 Complex support vector machine • Inputs are complex vectors: • Output can be: r ˜ = x + ix x i • a complex number (regression) • a class label (classification) • To output a label: � For real valued SVM: ≥ 0 assign to ω1 f (x) = �w, x� + b < 0 assign to ω2 For complex SVM: ˜ x ˜ �H + b f (x) = �w, 3 donderdag 1 mei 2014 Complex inner product • You can define inner product in different ways: • Simple: r r i i ˜ x ˜ �H = �w , x � + �w , x � �w, • Just a concatenation of the real and imaginary spaces. • You can use the same SVM as always... • Complex: � � i r r i r r i i ˜ x ˜ �H = �w , x � + �w , x � + i �w , x � − �w , x � �w, • Now the output is not so clear: ˜ x ˜ �H + b) sign (�w, sign of a complex number? donderdag 1 mei 2014 4 v∗ Not 2-class, but 4-class classifier! Fig. 4. A complex couple of hyperplanes separates the space of complex numbers (i.e., H = C) into four parts. 2 ∂L =w− ∗ c and for n = 1, . . . finally take tha 5. A complex coupledistinguish of hyperplanes thatFOUR separates classes: the four given • Now theFig. classifier can classes. The hyperplanes are chosen so that to maximize the margin between the classes. ˜ x ˜ �H + b)) ≷ 0 sign (Re(�w, ˜ ˜ sign (Im(� w, x � + b)) ≷ 0 H !" $" #$ r r $2 ! w +v ! $ ! wi − v i $ 2 H n=1 #$2 and $ −(w + v ) $ $ +$ $ wr − v r $ 2 = H i i "wr + v r "2H + "wi − v i "2H + "(wi + v i )"2H + "wr − v r "2H = donderdag 1 mei 2014 r 2 i 2 N % r 2 i 2 5 Further things in the paper... • To find the Quadratic program (or, find derivatives of real valued functions in a complex space), use “Wirtinger’s calculus (it appears you can write it as two independent quadratic programming problems) • Derive for both regression as classification • Propose kernel mappings for complex data: ΦC (˜ x) 6 donderdag 1 mei 2014 w "H + 2"v "H + 2"v "H = 2("w"2H + "v"2H ). N for n = 1, . . . , N . Following a similar procedure as in complex SVR case, it turns out that the dual problem ca split into two separate maximization tasks: For your information: SVM optimization problem H + N ! • It C N looks like: (ξnr + ξni ) a n=1 Φ∗C (z n ), v#H + c) Φ∗C (z n ), w#H + c) maximize ≥ ≥ ≥ 1− 1− 0 ξnr ξni • Labels d = ±1 ± i 1, . . . , N. n (four classes) (28) es N ! r i 2 •C the !H + QPs(ξfor + ξ n n) N n=1 real and the complex part of the + "Φ∗C (z n ), v#H + c) − 1 + ξnr ) label + "Φ∗C (z n ), w#H + c) − 1 + ξni & N % an − n=1 subject to an drn = 0 n=1 C 0 ≤ an ≤ N for n = 1, . . . , N and ˆ a an am drn drm κrC (z m , z n ) n,m=1 N % subject to maximize N % N % bn − n=1 N % bn bm din dim κrC (z m , z n ) n,m=1 N % bn din = 0 n=1 C 0 ≤ bn ≤ N for n = 1, . . . , N. Observe that, similar to the regression case, these 7 lems are equivalent with two distinct real SVM (dual) donderdag 1 mei 2014 Conclusion r ˜ = x + ix • Treating the real and imaginary part of x as independent features is OK! i • In the fully complex treatment, we get a four-class classifier explo • ... and we can define new kernels 3 ! n, 2 1 0 1.0 2 0.5 1 0.0 0 !0.5 !1 Henc !1.0 !2 Fig. 1. The element κrC (·, (0, 0)T ) of the induced real feature space of the complex Gaussian kernel. 8 donderdag 1 mei 2014 As a on C