Research Activities at Florida State Vision Group Florida State University Xiuwen Liu Department of Computer Science Florida State University http://www.cs.fsu.edu/~liux/courses/intro-seminar-09.ppt Research Statement My research goal is to create machines that can “see” with similar and super human performance and their applications • This seems a trivial problem as each of us can do this without any effort • Computer + Camera = “A See Machine” ? 9/28/2009 11:35:41 PM intro-seminar-09.ppt Visual Pathway 9/28/2009 11:35:43 PM intro-seminar-09.ppt Visual Illusion 9/28/2009 11:35:44 PM intro-seminar-09.ppt Outline Motivations • Some applications of computer vision and pattern recognition techniques Some of my research projects Related courses Contact information 9/28/2009 11:35:44 PM intro-seminar-09.ppt 3D Urban Models 9/28/2009 11:35:44 PM intro-seminar-09.ppt 6 9/28/2009 11:35:45 PM intro-seminar-09.ppt 7 9/28/2009 11:35:45 PM intro-seminar-09.ppt 8 Computational Photography 9/28/2009 11:35:45 PM intro-seminar-09.ppt 9 Face Detection for Auto Focusing Some cameras now have the built-in algorithms to automatically detect and focus on the faces as they are the most important subject in everyday photographing Source: http://facedetection.fujifilmusa.com/ 9/28/2009 11:35:45 PM intro-seminar-09.ppt 10 Human-Computer Interactions 9/28/2009 11:35:46 PM intro-seminar-09.ppt Sign Language Recognition 9/28/2009 11:25:47 PM intro-seminar-09.ppt CyberKnife 9/28/2009 11:25:46 PM intro-seminar-09.ppt CyberKnife – Cont. 9/28/2009 11:25:46 PM intro-seminar-09.ppt Image-Guided Neurosurgery 9/28/2009 11:25:46 PM intro-seminar-09.ppt Intelligent Transportation Systems http://dfwtraffic.dot.state.tx.us/dal-cam-nf.asp 9/28/2009 11:25:46 PM intro-seminar-09.ppt Computer Vision Applications – cont. Military applications • Automated target recognition 9/28/2009 11:25:46 PM intro-seminar-09.ppt Biometrics Iris code can achieve zero false acceptance intro-seminar-09.ppt 9/28/2009 11:25:44 PM Computer Vision in Sports How was the yellow created? 9/28/2009 11:25:44 PM intro-seminar-09.ppt Generic Image Modeling How can we characterize all these images perceptually? 9/28/2009 11:25:43 PM intro-seminar-09.ppt Spectral Histogram Representation Spectral histogram • Given a bank of filters F(a), a = 1, …, K, a spectral histogram is defined as the marginal distribution of filter responses I(a ) (v) F (a ) * I(v) H (a ) I 1 (a ) ( z) δ ( z I (v)) |I| v H I ( H I(1) , H I( 2) ,, H I( K ) ) 9/28/2009 11:25:42 PM intro-seminar-09.ppt Spectral Histogram Representation - continued Choice • • • • of filters Laplacian of Gaussian filters Gabor filters Gradient filters Intensity filter LoG filter 9/28/2009 11:25:42 PM Gabor filter intro-seminar-09.ppt Spectral Histogram Representation - continued 9/28/2009 11:25:42 PM intro-seminar-09.ppt Texture Synthesis Examples - continued Observed image An Synthesized image image with periodic structures 9/28/2009 11:25:42 PM intro-seminar-09.ppt Object Synthesis Examples - continued 9/28/2009 11:25:41 PM intro-seminar-09.ppt Performance Comparison 9/28/2009 11:25:41 PM intro-seminar-09.ppt Face Detection Based On Spectral Representations Face detection is to detect all instances of faces in a given image Each image window is represented by its spectral histogram • A support vector machine is trained on training faces • Then the trained support vector machine is used to classify each image window in an input image 9/28/2009 11:25:41 PM intro-seminar-09.ppt Face detection - continued 9/28/2009 11:25:41 PM intro-seminar-09.ppt Face detection - continued 9/28/2009 11:25:40 PM intro-seminar-09.ppt Face detection - continued 9/28/2009 11:25:40 PM intro-seminar-09.ppt Rotation Invariant Face Detection 9/28/2009 11:25:39 PM intro-seminar-09.ppt Rotation Invariant Face Detection - continued 9/28/2009 11:25:39 PM intro-seminar-09.ppt Linear Representations Linear representations are widely used in appearance-based object recognition and other applications • Simple to implement and analyze • Efficient to compute • Effective for many applications a ( I ,U ) U I R T 9/28/2009 11:25:39 PM d intro-seminar-09.ppt Standard Linear Representations Principal Component Analysis • Designed to minimize the reconstruction error on the training set • Obtained by calculating eigenvectors of the co-variance matrix Fisher Discriminant Analysis • Designed to maximize the separation between means of each class • Obtained by solving a generalized eigen problem Independent Component Analysis • Designed to maximize the statistical independence among coefficients along different directions • Obtained by solving an optimization problem with some object function such as mutual information, negentropy, .... 9/28/2009 11:25:39 PM intro-seminar-09.ppt Standard Linear Representations - continued Standard linear representations are sub optimal for recognition applications • Evidence in the literature • A toy example – Standard representations give the worst recognition performance Optimal 9/28/2009 11:25:38 PM component analysis intro-seminar-09.ppt Performance Measure - continued Suppose there are C classes to be recognized • Each class has ktrain training images • It has kcross cross validation images • We used h(x) = 1/(1+exp(-2bx) 9/28/2009 11:25:38 PM intro-seminar-09.ppt Performance Measure - continued F(U) depends on the span of U but is invariant to change of basis • In other words, F(U)=F(UO) for any orthonormal matrix O • The search space of F(U) is the set of all the subspaces, which is known as the Grassmann manifold – It is not a flat vector space and gradient flow must take the underlying geometry of the manifold into account 9/28/2009 11:25:38 PM intro-seminar-09.ppt Deterministic Gradient Flow - continued Gradient at [J] (first d columns of n x n identity matrix) 9/28/2009 11:25:38 PM intro-seminar-09.ppt Deterministic Gradient Flow - continued Gradient at U: Compute Q such that QU=J Deterministic gradient flow on Grassmann manifold 9/28/2009 11:25:37 PM intro-seminar-09.ppt Stochastic Gradient and Updating Rules Stochastic gradient is obtained by adding a stochastic component Discrete updating rules 9/28/2009 11:25:37 PM intro-seminar-09.ppt MCMC Simulated Annealing Optimization Algorithm Let X(0) be any initial condition and t=0 1. 2. 3. 4. 5. 6. 7. Calculate the gradient matrix A(Xt) Generate d(n-d) independent realizations of wij’s Compute Y (Xt+1) according to the updating rules Compute F(Y) and F(Xt) and set dF=F(Y)- F(Xt) Set Xt+1 = Y with probability min{exp(dF/Dt),1} Set Dt+1 = Dt / g and set t=t+1 Go to step 1 9/28/2009 11:25:37 PM intro-seminar-09.ppt ORL Face Dataset 9/28/2009 11:25:36 PM intro-seminar-09.ppt Performance Comparison 9/28/2009 11:25:36 PM intro-seminar-09.ppt Real-time Scene Interpretation Object detection and recognition problem • Given a set of images, find regions in these images which contain instances of relevant objects • Here the number of relevant objects is assumed to be large – For example, the system should be able to handle 30,000 different kinds of objects, an estimate of the human brain’s capacity for basic level visual categorization [I. Biederman, Psychological Review, vol. 94, pp. 115-147, 1987] 9/28/2009 11:25:36 PM intro-seminar-09.ppt Problem Statement for Scene Interpretation Object detection and recognition problem • Given a set of images, find regions in these images which contain instances of relevant objects • Here the number of relevant objects is assumed to be large – For example, the system should be able to handle 30,000 different kinds of objects, an estimate of the human’s capacity for basic level visual categorization [I. Biederman, Psychological Review, vol. 94, pp. 115-147, 1987] Goal • Develop a system that can achieve real-time detection and recognition for images of size 640 x 480 with high accuracy – Say, at a frame rate of 15 frames per second 9/28/2009 11:25:36 PM intro-seminar-09.ppt Existing Approaches Fast methods but low accuracy • One can for example classify one pixel at a time • However, it is to identify airplanes with high accuracy due to high false positives and negatives 9/28/2009 11:25:35 PM intro-seminar-09.ppt Existing Approaches – cont. Fast methods but low accuracy • One can for example classify one pixel at a time • However, it is to identify airplanes with high accuracy Methods with good accuracy but slow • One can in theory use deformable template matching to locate instances of airplanes • It may need several hours to process one image 9/28/2009 11:25:35 PM intro-seminar-09.ppt Proposed Framework 9/28/2009 11:25:35 PM intro-seminar-09.ppt Specifications and Requirements We want to detect and recognize at least 30,000 object classes in images • At four different scales • Using exhaustive search of local windows, that is, we do not assume segmentation or other pre-processing • If we assume objects are in some (e.g. 21 x 21) windows, this means that there will be many (18,432,000) local windows to be classified/processed • We want to do this on a 3.6 Ghz Dell Precision workstation with an estimated performance of 28,665.4 MIPS • This amounts to that we have about 1555 instructions to process a 21 x 21 local window 9/28/2009 11:25:35 PM intro-seminar-09.ppt Requirements – cont. To achieve the specifications, we need two critical components • A classifier that can reduce the average classification time effectively – Note that on average we have 1555 instructions; if we can process 90% of those windows using only 100 instructions per window, we can have on average 14,650 instructions for the remaining 10% local windows • Features that can discriminate a large number of objects and can be computed using a few instructions – Do such features exist? 9/28/2009 11:25:34 PM intro-seminar-09.ppt Topological Local Spectral Histograms We introduce a new class of features, which we called TLSH features • It is defined relative to a chosen set of filters • For a given filter, it is defined as a histogram of a local window of the filtered image • One bin of the histogram is given by 9/28/2009 11:25:34 PM intro-seminar-09.ppt Topological Local Spectral Histogram Example Convolution is implemented using FPGAs 9/28/2009 11:25:34 PM intro-seminar-09.ppt Local Spectral Histogram Features 9/28/2009 11:25:34 PM intro-seminar-09.ppt Topological Local Spectral Histograms – cont. Why TLSH features? • It provides a very rich set of over-complete features – For example, suppose we have 22 filters, there will be 1,173,942 different TLSH features within a 21 x 21 region, considering different windows and different filters – TLSH features are more effective than Haar features used by Viola and Jones [P. Viola and M. Jones, International Journal of Computer Vision, vol. 57, pp. 137-154, 2004] 9/28/2009 11:25:34 PM intro-seminar-09.ppt ORL Face Dataset 9/28/2009 11:25:33 PM intro-seminar-09.ppt Comparison Between Haar and TLSH Features 9/28/2009 11:25:33 PM intro-seminar-09.ppt COIL Dataset 9/28/2009 11:25:33 PM intro-seminar-09.ppt Comparison Between Haar and TLSH Features 9/28/2009 11:25:33 PM intro-seminar-09.ppt Texture Dataset 9/28/2009 11:25:32 PM intro-seminar-09.ppt Comparison Between Haar and TLSH Features 9/28/2009 11:25:32 PM intro-seminar-09.ppt Mixed Dataset 9/28/2009 11:25:32 PM intro-seminar-09.ppt Comparison Between Haar and TLSH Features 9/28/2009 11:25:31 PM intro-seminar-09.ppt Comparison Between Haar and TLSH Features 9/28/2009 11:25:31 PM intro-seminar-09.ppt Classifier To achieve the specification, we also need a classifier that takes only a few instructions to make a decision on average • At the same time, we need to achieve high accuracy We propose to use a look-up table tree classifier • I.e., a decision tree classifier where each node is implemented by a look-up table 9/28/2009 11:25:31 PM intro-seminar-09.ppt Look-up Table Tree Classifier 9/28/2009 11:25:31 PM intro-seminar-09.ppt Look-up Table Tree Classifier 9/28/2009 11:25:31 PM intro-seminar-09.ppt An Example Path in a Decision Tree 9/28/2009 11:25:30 PM intro-seminar-09.ppt Constructing Look-up Table Decision Tree Joint optimization of clustering, TLSH features, and optimal linear projections • We want to maximize the separations between marginal distributions of different clusters • We can do the optimization iteratively – We can do clustering first using current TLSH features and projections to maximize the separations – We can find optimal TLSH features given linear projections – Then we can find optimal linear projections given updated TLSH features 9/28/2009 11:25:30 PM intro-seminar-09.ppt Performance Comparison RCT – Rapid Classification Tree, implemented by Keith Haynes 9/28/2009 11:25:30 PM intro-seminar-09.ppt Detection and Recognition 9/28/2009 11:25:30 PM intro-seminar-09.ppt Detection and Recognition 9/28/2009 11:25:29 PM intro-seminar-09.ppt Content-based Video Representation, Indexing and Retrieval A video is an extrinsic 3D representation of a 4D volume • 3D spatial space + 1D temporal space = 4D volume • For video, 2D image space + 1D temporal space = 3D volume Our group is working an intrinsic 4D representation for video • By first reconstructing the scene using SLAM (Simultaneous localization and mapping) and stereopsis 9/28/2009 11:25:29 PM intro-seminar-09.ppt Computer Vision on Mobile Devices Source: http://www.appliedmediaanalysis.com/MATES.htm 9/28/2009 11:25:29 PM intro-seminar-09.ppt Computer Vision for Gerotechnology As mobile devices become more powerful, they may serve as an efficient interface to make up visual, memory, and other deficiencies due to aging • The society is aging – For example, people of 65 and older are 16.8% of Florida’s population (US Census Bureau, 2005) • By modifying and enhancing environments, vision technology can be critical for helping people stay active and be independent 9/28/2009 11:25:29 PM intro-seminar-09.ppt Motivations The goal is to organize large number of images so that they can be retrieved efficiently and effectively based on content 9/28/2009 11:25:28 PM intro-seminar-09.ppt Problems And Approach Organization of large libraries of images for efficient content-based indexing and retrieval of images Image Categorization - We assume that a training database containing labeled images representing various different classes is available • The goal is to learn optimal low-dimensional features that can be used to assign a new query image to the correct class Retrieval - The objective is to find the top ℓ matches in a database to a query image Image 9/28/2009 11:25:28 PM intro-seminar-09.ppt Preliminary Experiments With SH-features Dataset: COREL-1000 (100 images in each of 10 categories) We utilize a bank of 5 filters and apply each filter to the R, G, and B channels of the images to obtain a total of fifteen 11-bin histograms per image • Thus, the SH-feature vector h(I, F) has dimension 165 For a query image I, we calculate the Euclidean distances between h(I, F) and h(J, F), for every J in the database, and rank the images according to increasing distances • To quantify retrieval performance and compare the results with those reported for other methods, we use the weighted precision and the average rank 9/28/2009 11:25:28 PM intro-seminar-09.ppt The Weighted Precision For a query image I, the retrieval precision for the top ℓ returns is nℓ(I)/ℓ, where nℓ is the number of correct returns The weighted precision for COREL-1000 for I is defined as 9/28/2009 11:25:28 PM intro-seminar-09.ppt The Average Rank For each query image, we rank all the images in the dataset • The average rank is the average rank of all images that belong to the same class as the query image • For COREL-1000, the perfect value for a query images is 50.5 9/28/2009 11:25:28 PM intro-seminar-09.ppt Average Precision: Comparison 9/28/2009 11:25:27 PM intro-seminar-09.ppt Average Rank: Comparison 9/28/2009 11:25:27 PM intro-seminar-09.ppt Content-based Image Retrieval We now apply the image-categorization classifier learned with OFA to retrieve images • Note that the low-dimensional image representation was optimized to categorize images correctly according to the nearest neighbor criterion, but not to rank matches to a query image correctly according to their distances in feature space The goal is to exploit the image categorization method to retrieve images • Idea is to use the distance in feature space to assign probabilities that measure the compatibility of the image with a given class. Use these probabilities to rank classes and retrieve images using this ranking 9/28/2009 11:25:27 PM intro-seminar-09.ppt Image Retrieval - continued 9/28/2009 11:25:27 PM intro-seminar-09.ppt Image Retrieval - continued a query image I and a positive integer ℓ, the goal is to retrieve a ranked list of ℓ images from the database. Given • We assume that all images in the database have been indexed according to content using the classifier learned with OFA. • Rank the classes according to the probabilities p(i|I). • Select as many images as possible from the most likely class. • Within this class, images are retrieved and ranked according to their Euclidean distances to I as measured in the reduced feature space (several variants possible). • Once that class is exhausted, we proceed similarly with the second most likely class and iterate the procedure until ℓ images are obtained. 9/28/2009 11:25:26 PM intro-seminar-09.ppt Experimental Results OFA was used to “learn” a 9-dimensional linear reduction of the SH-features of dimension 165, with 400 training images, 40 from each class • Leave-one-out was used for cross-validation The entire database was indexed with the nearest neighbor classifier applied to the reduced features All 1,000 images were used as query images • For each class i, the average weighted precision and the average rank were calculated for comparison with SIMPLIcity and Spectral Histograms 9/28/2009 11:25:26 PM intro-seminar-09.ppt Average Precision: Comparison 9/28/2009 11:25:26 PM intro-seminar-09.ppt Average Rank: Comparison 9/28/2009 11:25:26 PM intro-seminar-09.ppt Examples The top-left image is the query image, which is also the top return 9/28/2009 11:25:26 PM intro-seminar-09.ppt More Examples 9/28/2009 11:25:25 PM intro-seminar-09.ppt Precision–Recall 9/28/2009 11:25:25 PM intro-seminar-09.ppt Average Recall vs. Average Precision 9/28/2009 11:25:25 PM intro-seminar-09.ppt Average Recall vs. Average Precision 9/28/2009 11:25:25 PM intro-seminar-09.ppt Invariant Content Characterization Based on Single View 3D Reconstruction Many of the images share the following characteristics • There is a ground plane, which is relative flat • On the ground plane, we have objects that can be approximated by planar sides – Note the images will change depending on the view angles 9/28/2009 11:25:24 PM intro-seminar-09.ppt Single View 3D Reconstruction We want to characterize the meaningful contents of images • Ground itself in general is not interesting • We want to “capture” the objects on the ground 9/28/2009 11:25:24 PM intro-seminar-09.ppt Ground Plane Estimation Using Horizon Line For a pinhole camera model, each plane at the infinity will become a line in the image 9/28/2009 11:25:24 PM intro-seminar-09.ppt Pinhole Camera Model 9/28/2009 11:25:24 PM intro-seminar-09.ppt Plane at Infinity AX BY CZ D 0 Axcam Z By cam Z CZ D 0 f f Axcam By cam D C 0 f f Z Axcam By cam C 0 if Z f f That is, we can estimate the ground plane’s normal if we can estimate the horizon in the image 9/28/2009 11:25:24 PM intro-seminar-09.ppt Vertical Planes After we estimate the ground plane, we can then estimate in 3D any vertical plane if we know (at least) two pixels on the intersection line between the vertical plane and ground plane • We now approximate the scene using boxes in 3D 9/28/2009 11:25:23 PM intro-seminar-09.ppt Invariant Representations of Content For content-based retrieval, we derive a representation of an image that is scale and view independent • By representing each side of each 3D box using a normalized view – In other words, for each side of the box, we place a virtual camera at a fixed distance (which fixes the scale) and a fixed view, whose up vector is perpendicular to the ground plane (which fixes the view) 9/28/2009 11:25:23 PM intro-seminar-09.ppt An Example Shape Theory We want to quantify the difference between two shapes in a principled way • We do this by constructing a shape space and then use the geodesic distance of two shapes on the shape manifold as the metric 9/28/2009 11:25:22 PM intro-seminar-09.ppt Shape Clustering 9/28/2009 11:25:22 PM intro-seminar-09.ppt Shape Clustering 9/28/2009 11:25:22 PM intro-seminar-09.ppt Clustering Dendrogram 9/28/2009 11:25:22 PM intro-seminar-09.ppt Surface Parametrization 9/28/2009 11:25:21 PM intro-seminar-09.ppt Geodesic Interpolation Between Surfaces 9/28/2009 11:25:21 PM intro-seminar-09.ppt Atlas for Hippocampus 9/28/2009 11:25:21 PM intro-seminar-09.ppt Model for Blindness 9/28/2009 11:25:20 PM intro-seminar-09.ppt Computer Vision for Computational Systems Biology The goal of systems biology is to link the molecular and cellular events and properties to physiological functions 9/28/2009 11:25:20 PM proteins to organs: The Physiomeintro-seminar-09.ppt Source: “Integration from Project”, Nature Review, Vol. 4, 2007. Live Cell Imaging at Cellular Level 9/28/2009 11:25:19 PM intro-seminar-09.ppt MRI / fMRI / CT / PET Imaging 9/28/2009 11:25:19 PM intro-seminar-09.ppt FISHFinder@FSU Source: Gilbert’s group at Biology Department, FSU 9/28/2009 11:25:19 PM intro-seminar-09.ppt 113 High Throughput Nanoscale Localization In cellular and molecular biology, a typical problem is that biologists need to localize marked proteins in various areas 9/28/2009 11:25:19 PM intro-seminar-09.ppt Location Aware Services As ubiquitous computing is a reality, location aware services become a critical component • An example is GPS-based services • Currently, with Prof. Zhang we are studying a dramatically new way of localizing objects through RFID tags with a 2mm accuracy 9/28/2009 11:25:18 PM intro-seminar-09.ppt Activity Monitoring for Elderly With RFID tags, we can identify and localize many objects • By integrating with built cameras in phones, we can estimate a three dimensional model of the environment along with the states of the objects • An envisioned program is that a person can remotely get a summary and other statistics of daily activities of elderly who live independently 9/28/2009 11:25:14 PM intro-seminar-09.ppt Courses Most • • • • • • • Relevant Courses CAP 5638 Pattern Recognition – Offered Fall 2009 CAP 5415 Principles and Algorithms of Computer Vision CAP 6417 Theoretical Foundations of Computer Vision STA 5106 Computational Methods in Statistics I STA 5107 Computational Methods in Statistics I I ISC 5935-05/STA 5934-01 Applied Machine Learning Seminars and advanced studies Related Courses • CAP 5615 Artificial Neural Networks • CAP 5600 Artificial Intelligence 9/28/2009 11:19:20 PM intro-seminar-09.ppt Funding of the Group National • • • • Science Foundation DMS CISE IIS ACT CCF National 10/14/2008 10:24:04 AM Institute of Health intro-seminar-08.ppt Summary Computer Vision Group offers interesting research topics/projects • Efficient represents for generic images and videos • Real-time detection and recognition of objects • Computational models for object recognition and image classification • Medical/biological image analysis • Motion/video sequence analysis and modeling • They are challenging, interesting, and exciting • Now it is a productive and fruitful area to be in 10/14/2008 10:24:17 AM intro-seminar-08.ppt Contact Information • • • • • Name Web sites Email Offices Phones 10/14/2008 10:25:29 AM Xiuwen Liu http://cavis.fsu.edu http://www.cs.fsu.edu/~liux liux@cs.fsu.edu LOV 166 and Eppes 102 644-0050 and 645-2257 intro-seminar-08.ppt Thank you! Any questions?