C OLOR ATTRIBUTES FOR O BJECT D ETECTION FAHAD S HAHBAZ K HAN1,∗ , R AO M UHAMMAD A NWER2,∗ , J OOST VAN DE W EIJER2 , A NDREW D. B AGDANOV2,3 , M ARIA VANRELL2 , A NTONIO M. L ÓPEZ2 1 C OMPUTER V ISION L ABORATORY, L INKÖPING U NIVERSITY, S WEDEN 2 C OMPUTER V ISION C ENTER ,U NIVERSITAT A UTONOMA DE B ARCELONA ,S PAIN 3 M EDIA I NTEGRATION AND C OMMUNICATION C ENTER , U NIVERSITY OF F LORENCE , I TALY P ROBLEM C OLOR D ESCRIPTORS Goal: Augmenting existing intensity based detectors with color information. C OLOR D ESCRIPTOR S ELECTION E XPERIMENTAL VALIDATION Evaluation criteria: To avoid laborious cross validation, we use the KL-ratio to compare the discriminative power. A high KL-ratio reflects a more discriminative descriptor. PASCAL VOC2007: no early fusion method improves over standard HOG. Cartoon: Our approach provides a gain of 14% over standard HOG. KL-ratio = E ARLY VS L ATE HUE descriptor (HUE): cells are represented by a histogram: O1 hue = arctan O2 →s= √ O12 + O22 P minj ∈C / m KL(pj ,pk ) P k∈C m m k∈C m mini∈C ,i6=k KL(pj ,pk ) (5) where KL(pi , pj ) = PN x=1 pi (x) log pi (x) pj (x) (6) PASCAL VOC: Our approach improves on 15 out of 20 object categories compare to baseline. (1) Opponent descriptor (OPP): cells are represented by a histogram over the opponent angle: angxO It is known that late fusion obtains better results in higher level pyramid representations. = arctan O1x O2x →s= p O12x + O22x Method HOG Best 2007 UCI UOCTTI LEO Oxford-MKL LBP-HOG Our Approach (2) Color names: are linguistic color labels which human assign to colors in the world. In this work we use the mapping learned from Google images [Van de Weijer, TIP09]. CN = {p(cn1|R), p(cn2|R), ..., p(cn11|R)} with p(cni |R) = 1 p P x∈R p(cni |f(x)) (3) (4) mean AP(VOC07) 32.3 23.3 27.1 29.6 32.1 34.3 34.8 mean AP(VOC09) 28.0 27.9 27.7 21.9 29.4 Method mean AP (Cartoon) mean AP (VOC07) HOG 27.6 32.3 OPPHOG 34.2 30.6 RGBHOG 35.3 31.3 C-HOG 35.2 30.1 Our Approach 41.7 34.8 Cartoon Detection: Conclusion: We select color names since it has more discriminative power while being compact. C OLORING O BJECT D ETECTION Part-Based: We extend the HOG feature with color attributes within the part-based framework [Felzenswalb, PAMI11]. Ci = [HOGi , CNi ] Feature Dimension HOG 31 OPPHOG 93 RGBHOG 93 (7) C-HOG 93 CN-HOG 42 C ONCLUSIONS C ARTOON C HARACTER D ETECTION 1. We propose the use of color attributes as explicit color representation. 18 Classes: The Simpsons, Tom, Jerry, Fred, Barney, Sylvester, Mickymouse, Donaldduck, Tweety, Coyote, Roadrunner, Buggs, Daffy, Shaggy and Scooby. Images: 586(train 304, testing 282). Source: Google. Properties: Compactness: only an 11D histogram for each cell is computed. Invariance: possess a degree of photometric invariance. Discriminative power: separate bins represent the achromatic colors: black, grey, and white. ESS-Based: We add a separate color vocabulary to the ESS framework [Lampert CVPR08]. Color and shape are combined with late fusion. Cartoon Character Detection: mean AP SIFT 8.8 CN-SIFT 12.9 C-SIFT 10.3 OPPSIFT 9.3 2. We introduce a new dataset of cartoon character images. 3. Early fusion based approaches yield inferior results for object detection. Our approach achieves state-of-the-art on PASCAL VOC 2007, 2009 and cartoon datasets despite its simplicity.