Local Affine Feature Tracking in Films/Sitcoms Chunhui Gu CS 294-6 Final Presentation Dec. 13, 2006 Objective • Automatically detect and track local affine features in film/sitcom frame sequences. – Current Dataset: Sex and the City – Why sitcom? • Simple daily environment • Few or no special effects • Repeated scenes Outline • Preprocessing • Tracking Algorithm – Pairwise local matching – Robust features • Feature Matching across Shots • Results – Feature matching vs baseline color histogram – Time complexity – When does tracking fail Preprocessing (i-1)’th shot Frame Extraction SIFT Feature Extraction i’th shot Shot Detection MSER Interest Point Detection Tracking Algorithm • Basic: Pairwise Matching f x i m , ymi Frame i i m f nj Frame j=i+1 Tracking Algorithm • Basic: Pairwise Matching f x i m , ymi Frame i i m f nj Frame j=i+1 Tracking Algorithm • Basic: Pairwise Matching f x i m , ymi Frame i i m min d f f nj Frame j=i+1 Thresholding on both minimum distance and ratio Tracking Algorithm • Basic: Pairwise Matching f x i m , ymi Frame i i m f nj Frame j=i+1 Tracking Algorithm • Basic: Pairwise Matching f x i m , ymi Frame i i m f nj Frame j=i+1 Tracking Algorithm • Problem of Pairwise Matching – Sensitive to occlusion and feature misdetection • Solutions: – Use multiple overlapping windows – Backward Matching • Match features in current frame to features in all previous frames within the shot • Pruning process (reduce computation time) • Select a proportion of features that have longer tracking length as robust features Shot grouping/Scene Retrieval 10746 10747 10772 f rf53 x1 , x2 ,...xm53 Shot 53 Scene 5 f rf49 x1 , x2 ,...xm49 Shot 49 10933 10934 10968 f rf56 x1 , x2 ,...xm56 Shot 56 11393 11394 11435 f rf60 x1 , x2 ,...xm60 Shot 60 11533 11534 11560 Inter-Shot Matching f J1 x1 , x2 ,...xn1 f J 2 x1 , x2 ,...xn2 f J q x1 , x2 ,...xnq f I1 x1 , x2 ,...xm1 f I2 x1 , x2 ,...xm2 f I p x1 , x2 ,...xmp Shot I D Shot J “Confusion Table” 50 50 50 55 55 55 60 60 60 65 65 65 70 70 70 75 75 75 50 55 60 65 Ground Truth 70 75 50 55 60 65 70 Color Histograms 75 50 55 60 65 70 Feature Matching 75 ROC ROC curve of Feature Matching 1 0.9 True Detection 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 False Alarm 0.8 1 When Does Tracking Fail? • Tracking feature outside local window – Rare when continuous tracking – Happens when occlusion occurs • Same feature splitting to two or more groups – Long occlusion – Multiple matching in a single frame f x i m , ymi Frame i i m f nj Frame j=i+1 Computation Complexity • Everything except for MSER and SIFT algorithms are implemented in Matlab (slow…) Complexity Time Frame Extraction O(N) ~0.3s/frame Shot Detection O(N*f(B)) ~0.07s/frame (B=16) MSER Detection O(N) ~0.3s/frame SIFT Detection O(N) ~0.9s/frame Feature Tracking O(N*F*W*L) ~0.5s/frame Matching across shots O(S2*T2) ~1s/shot pair N: # of frames; (30,000) B: # of bins for color hist (16) F: ave. # of features per frame; (400) W: Local window size; (15) L: tracking length; (20) T: ave. # of robust trackers per shot; (300) S: # of shots; (35) Conclusion • We successfully implemented local affine feature tracking in sitcom “sex and the city”. The tracking method is robust to occlusion and feature misdetection. • Although no quantitative precision/recall curve (hard to find ground truth), the demonstration shows that precision is almost perfect with good recall performance. • We show one successful application of using robust features to associate similar shots together for scene retrieval. Future Work • Implement algorithm in real-time (C/C++) • Search unique shots in films/sitcoms • Separate indoor scenes from outdoor scenes • Determine context of the scene Acknowledgement