Using Attributes to Describe What People Wear Andy Gallagher October 14, 2013 with Huizhong Chen and Bernd Girod Objective Attribute learning List of attributes Men’s Black color Sweater Long sleeve Solid pattern Low skin exposure … 3 Outline Attributes Describing Clothing with Attributes ! Miscellaneous Topics ! Attributes Attributes Describing objects by their attributes, A Farhadi, I Endres, D Hoiem, D Forsyth Computer Vision and Pattern Recognition, 2009. CVPR 2009 Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, S. Harmeling, CVPR 2009 Many others Computer Vision image features classification Computer Vision image features classification [ .1 -.9 .1 .231 -.1] ? Computer Vision image features What feature representation should we use? classification Computer Vision image [ .1 -.9 .1 .231 -.1] features attributes classification Has hair, has skin, has ear, has eye, has arms Now we can talk… Attributes Properties shared by many objects Explicit semantics Facilitate human-CPU communication Materials (glass, fur, wood, etc.) Parts (has wheel, has tail, etc.) Shape (boxy, cylindrical, etc.) Based on a slide by David Forsyth 11 Example Attributes Face Tracer Image Search “Smiling Asian Men With Glasses” Kumar et al., 2008 12 Example Attributes Farhadi et al. 2009 13 Example Attributes Lampert et al. 2009 14 Slide credit: Devi Parikh Example Attributes Welinder et al. 2010 15 Slide credit: Devi Parikh Attribute Models Classifiers for binary attributes Kumar et al. 2010 16 Slide credit: Devi Parikh Why attributes? How humans naturally describe visual concepts Image search I want elegant silver sandals with high heels 17 Slide credit: Devi Parikh Example Attributes Verification classifier SAME Kumar et al., 2010 Why attributes? An okapi is a mammal with a reddish dark back, with striking horizontal white stripes on the front and back legs. (Wikipedia) 19 Why attributes? An okapi is a mammal with a reddish dark back, with striking horizontal white stripes on the front and back legs. (Wikipedia) 20 Why attributes? An okapi is a mammal with a reddish dark back, with striking horizontal white stripes on the front and back legs. (Wikipedia) 21 Zero-shot Learning Aye-ayes Are nocturnal Live in trees Have large eyes Have long middle fingers Which one of these is an aye-aye? Humans can learn from descriptions (zero examples). Slide adapted from Christoph Lampert by Devi Parikh 22 Is this a giraffe? No. Is this a giraffe? Yes. Is this a giraffe? No. Slide credit: Devi Parikh 23 Parkash and Parikh, 2012 Current belief Focused feedback Knowledge of the world I think this is a giraffe. What do you think? No, its neck is too short for it to be a giraffe. Learner learns better from its mistakes Accelerated discriminative learning with few examples [Animals with even shorter necks] Ah! These must not be giraffes either then. …… Feedback on one, transferred to many Slide credit: Devi Parikh 24 Which Attributes to Describe? (a) (b) (c) (f) (d) (e) Please choose a person to the left of the person who is frowning Sadovnik et al. 2013 25 Describing Clothing with Attributes Objective Attribute learning List of attributes Men’s Black color Sweater Long sleeve Solid pattern Low skin exposure … Recommend and Analyze Recommendations Formal Sport Related Work Person identification with clothing Bounding box under face [Anguelov, 2007] Clothing segmentation [Gallagher, 2008] Dataset Preparation 1856 people from the web. Images are unconstrained. Dataset Preparation $400 spent for collecting 283,107 labels on Amazon Mechanical Turk (AMT). 3 Multiclass 23 Binary Dataset Statistics The System Feature 1 Combine features … … … Feature N SVM1 SVMN Pose estimation A: attribute F: feature Attribute classifier 1 F1 F2 Attribute classifier 2 … Feature extraction & quantization SVM Attribute classifier M A2 F4 A1 A4 F3 A3 Multi-attribute CRF inference Predictions Blue Solid pattern Outerwear Wear scarf Long sleeve … Pose Estimation [Eichner et. al., 2010] Perform upper body detection, by using complementary results from face detector and deformable part models. Foreground highlighting within the enlarged upper body bounding box. Parse the upper body into head, torso, upper and lower parts of the left and right arms. Feature Extraction SIFT descriptor extracted over the sampling grid. Similar procedure for the arm regions. Feature Extraction Maximum Response Filters [Varma 2005] LAB color Skin probability MRF bank RGB image Skin probability Feature Extraction Raw features are quantized using soft Kmeans (K=5 in our implementation). Quantized features are aggregated over various body regions, by max or average pooling. Feature type Region Pooling method SIFT Torso Average Texture Left upper arm Max Color Right upper arm Skin probability Left lower arm Right lower arm Feature Fusion SVM is a kernel-based classification technique. Feature fusion solution: combined SVM is trained using weighted sum of the kernels. Combining features consistently outperforms the single best feature. K1 K1 SVM 1 Predict accuracy 1 K2 K2 SVM 2 Predict accuracy 2 … KN SVM Combined KN SVM N Predict accuracy N Attribute prediction Recap Feature 1 Combine features … … … Feature N SVM1 SVMN Pose estimation A: attribute F: feature Attribute classifier 1 F1 F2 Attribute classifier 2 … Feature extraction & quantization SVM Attribute classifier M A2 F4 A1 A4 F3 A3 Multi-attribute CRF inference Predictions Blue Solid pattern Outerwear Wear scarf Long sleeve … Attribute Dependencies Necktie and T-Shirt? Attribute Inference with CRF Each attribute is a node. All nodes are pair-wise connected. The edge connecting 2 nodes corresponds to the joint probability of these 2 attributes. F6 F5 A6 F1 A5 F4 A1 F2 A2 F3 A3 A4 Ai: Attribute i Fi: Features for Ai CRF for Attribute Learning P( A1, A2 F1, F2 ) P(F1, F2 A1, A2 )P( A1, A2 ) P(F1 A1 )P(F2 A2 )P( A1, A2 ) [Following CRF model] F2( A2 F2 ) P( A1 F1 ) P F1 …P( A , A ) P( A1 ) P( A2 ) 1 2 FM P( A1 F1 ) … P( A2 F2 ) A2 log logP( A1 , A2 ) C logP( A1 , A2 F1 , F2 ) log P( A1 ) P( A2 ) Node 1 potential Node 2 potential A1 A ( A ) ( A ) M For a fully connected CRF, we maximize: 1 2 Edge potential ( A1 , A2 ) ( A ) ( A , A ) Ai S i Node potential ( Ai , A j )E i j Edge potential The CRF potential is maximized using standard belief propagation technique [Tappen et. al. 2003] . 44 No necktie (Wear necktie) Wear necktie Has collar Has collar Men’s Men’s Has placket Has placket Low exposure High exposure (Low exposure) No scarf No scarf Solid pattern Solid pattern Black Gray & black Short sleeve (Long sleeve) Long sleeve V-shape neckline V-shape neckline Dress (Suit) Suit No necktie Has collar Men’s Has placket Low exposure Wear scarf Solid pattern Brown & black No sleeve (long sleeve) V-shape neckline Tank top (outerwear) Experimental Results Questions that we are interested in: Does combining features improve performance? Does the pose model help? Does the CRF work? Pose Vs No Pose - Experiment Setup Positive and negative examples are balanced. SVM classification Chi-squared kernel Leave-1-out cross validation Comparison with attribute learning without pose model. Features are extracted within a scaled clothing mask under the face. Evaluation performed under the same The clothing mask experiment settings. [Gallagher 2008] 95% Necktie Collar Gender Placket presence Skin exposure Scarf Pattern solid Pattern floral Pattern spot Pattern graphics Pattern plaid Pattern stripe Color red Color yellow Color green Color cyan Color blue Color purple Color brown Color white Color gray Color black >2 colors sleevelength neckline category Accuracy (binary-class) / MAP (multi-class) Best feature (with pose) Combined feature (with pose) Combined feature (no pose) 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% Multiclass Confusion Matrix Necktie Collar Gender Placket presence Skin exposure Scarf Pattern solid Pattern floral Pattern spot Pattern graphics Pattern plaid Pattern stripe Color red Color yellow Color green Color cyan Color blue Color purple Color brown Color white Color gray Color black >2 colors sleevelength neckline category G-mean Before CRF After CRF 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% Steve Jobs: “solid pattern, men’s clothing, black color, long sleeves, round neckline, outerwear, wearing scarf” The predicted dressing style of weddings: Male: “solid pattern, suit, long-sleeves, Vshape neckline, wearing necktie, wearing scarf, has collar, has placket” Female: “high skin exposure, no sleeves, dress, other neckline shapes, white, >2 colors, floral pattern” Gender Recognition Face-based: Project faces in the Fisher space. Clothing-based: The gender output of our system. Better gender recognition is achieved by combining face and clothing. Conclusions Clothing attributes can be better learned with a human pose model. CRF offers improved performance by exploring attribute relations. Proposed novel applications that exploit the predicted attributes. Miscellaneous 56 What do you have? 57 58 59 AutoCropping 60 AutoCropping Auction Probability: 97% 61 AutoCropping Eigenvector Quantized Eigenvector 62 63 How do photos affect value? Angled, high contrast: ~$115 64 How do photos affect value? Frontal, Flash reflection ~$88 65 Thank You! 66 Future Work Expect even better performance by using the (almost) ground truth pose estimated by Kinect sensors [Shotton et. al., Best Paper CVPR 2011]. Incorporate clothing information in person identification. The Loop What we know about people Images and Computer Vision 68 The Loop: This talk Examples of how social data has helped understand images of people Some things I’ve learned about people from computer vision 69 What is context? 75 Context 76 Which monster is larger? Shepard RN (1990) Mind Sights: Original Visual Illusions, Ambiguities, and other Anomalies, New York: WH Freeman and Company 77 Your brain specializes in faces 78 Find The Face In the beans: 79 http://www.michaelbach.de/ot/sze_muelue/index.html Understanding images of people 80