Research Activities at Center for Applied Vision and Imaging Sciences and Florida State Vision Group Florida State University Xiuwen Liu Department of Computer Science Florida State University http://cavis.fsu.edu & http://fsvision.fsu.edu Research Statement My research goal is to create machines that can “see” with similar human performance • This seems a trivial problem as each of us can do this without any effort • Computer + Camera = “A See Machine” ? Visual Pathway Visual Illusion Outline Motivations • Some applications of computer vision and pattern recognition techniques Some of the research projects Related Courses Contact information Computer Vision Applications No hands across America • Sponsored by Delco Electronics, AssistWare Technology, and Carnegie Mellon University • Navlab 5 drove from Pittsburgh, PA to San Diego, CA, using the RALPH computer program. • The trip was 2849 miles of which 2797 miles were driven automatically with no hands – Which is 98.2% Computer Vision Applications – continued Computer Vision Applications – continued Human-Computer Interactions Sign Language Recognition CyberKnife CyberKnife – Cont. Image-Guided Neurosurgery Intelligent Transportation Systems http://dfwtraffic.dot.state.tx.us/dal-cam-nf.asp Computer Vision Applications – cont. Military applications • Automated target recognition Computer Vision Applications – continued Biometrics – cont. Iris code can achieve zero false acceptance Computer Vision in Sports How was the yellow created? Generic Image Modeling How can we characterize all these images perceptually? Spectral Histogram Representation Spectral histogram • Given a bank of filters F(a), a = 1, …, K, a spectral histogram is defined as the marginal distribution of filter responses I(a ) (v) F (a ) * I(v) H (a ) I 1 (a ) ( z) δ ( z I (v)) |I| v H I ( H I(1) , H I( 2) ,, H I( K ) ) Spectral Histogram Representation - continued Choice • • • • of filters Laplacian of Gaussian filters Gabor filters Gradient filters Intensity filter LoG filter Gabor filter Spectral Histogram Representation - continued Texture Synthesis Examples - continued Observed image An Synthesized image image with periodic structures Object Synthesis Examples - continued Performance Comparison Face Detection Based On Spectral Representations Face detection is to detect all instances of faces in a given image Each image window is represented by its spectral histogram • A support vector machine is trained on training faces • Then the trained support vector machine is used to classify each image window in an input image More results at http://fsvision.fsu.edu/face-detection Face detection - continued Face detection - continued Face detection - continued Rotation Invariant Face Detection Rotation Invariant Face Detection - continued Linear Representations Linear representations are widely used in appearance-based object recognition and other applications • Simple to implement and analyze • Efficient to compute • Effective for many applications a ( I ,U ) U I R T d Standard Linear Representations Principal Component Analysis • Designed to minimize the reconstruction error on the training set • Obtained by calculating eigenvectors of the co-variance matrix Fisher Discriminant Analysis • Designed to maximize the separation between means of each class • Obtained by solving a generalized eigen problem Independent Component Analysis • Designed to maximize the statistical independence among coefficients along different directions • Obtained by solving an optimization problem with some object function such as mutual information, negentropy, .... Standard Linear Representations - continued Standard linear representations are sub optimal for recognition applications • Evidence in the literature • A toy example – Standard representations give the worst recognition performance Optimal component analysis Performance Measure - continued Suppose there are C classes to be recognized • Each class has ktrain training images • It has kcross cross validation images • We used h(x) = 1/(1+exp(-2bx) Performance Measure - continued F(U) depends on the span of U but is invariant to change of basis • In other words, F(U)=F(UO) for any orthonormal matrix O • The search space of F(U) is the set of all the subspaces, which is known as the Grassmann manifold – It is not a flat vector space and gradient flow must take the underlying geometry of the manifold into account Deterministic Gradient Flow - continued Gradient at [J] (first d columns of n x n identity matrix) Deterministic Gradient Flow - continued Gradient at U: Compute Q such that QU=J Deterministic gradient flow on Grassmann manifold Stochastic Gradient and Updating Rules Stochastic gradient is obtained by adding a stochastic component Discrete updating rules MCMC Simulated Annealing Optimization Algorithm Let X(0) be any initial condition and t=0 1. 2. 3. 4. 5. 6. 7. Calculate the gradient matrix A(Xt) Generate d(n-d) independent realizations of wij’s Compute Y (Xt+1) according to the updating rules Compute F(Y) and F(Xt) and set dF=F(Y)- F(Xt) Set Xt+1 = Y with probability min{exp(dF/Dt),1} Set Dt+1 = Dt / g and set t=t+1 Go to step 1 ORL Face Dataset Performance Comparison Performance Comparison – cont. Brain Curve Classification Brain Curve Classification – cont. Real-time Scene Interpretation Object detection and recognition problem • Given a set of images, find regions in these images which contain instances of relevant objects • Here the number of relevant objects is assumed to be large – For example, the system should be able to handle 30,000 different kinds of objects, an estimate of the human brain’s capacity for basic level visual categorization [I. Biederman, Psychological Review, vol. 94, pp. 115-147, 1987] Global Monitoring Through High-resolution Satellite Images Problem Statement for Scene Interpretation Object detection and recognition problem • Given a set of images, find regions in these images which contain instances of relevant objects • Here the number of relevant objects is assumed to be large – For example, the system should be able to handle 30,000 different kinds of objects, an estimate of the human’s capacity for basic level visual categorization [I. Biederman, Psychological Review, vol. 94, pp. 115-147, 1987] Goal • Develop a system that can achieve real-time detection and recognition for images of size 640 x 480 with high accuracy – Say, at a frame rate of 15 frames per second Existing Approaches Fast methods but low accuracy • One can for example classify one pixel at a time • However, it is to identify airplanes with high accuracy due to high false positives and negatives Existing Approaches – cont. Fast methods but low accuracy • One can for example classify one pixel at a time • However, it is to identify airplanes with high accuracy Methods with good accuracy but slow • One can in theory use deformable template matching to locate instances of airplanes • It may need several hours to process one image Proposed Framework Specifications and Requirements We want to detect and recognize at least 30,000 object classes in images • At four different scales • Using exhaustive search of local windows, that is, we do not assume segmentation or other pre-processing • If we assume objects are in some (e.g. 21 x 21) windows, this means that there will be many (18,432,000) local windows to be classified/processed • We want to do this on a 3.6 Ghz Dell Precision workstation with an estimated performance of 28,665.4 MIPS • This amounts to that we have about 1555 instructions to process a 21 x 21 local window Requirements – cont. To achieve the specifications, we need two critical components • A classifier that can reduce the average classification time effectively – Note that on average we have 1555 instructions; if we can process 90% of those windows using only 100 instructions per window, we can have on average 14,650 instructions for the remaining 10% local windows • Features that can discriminate a large number of objects and can be computed using a few instructions – Do such features exist? Topological Local Spectral Histograms We introduce a new class of features, which we called TLSH features • It is defined relative to a chosen set of filters • For a given filter, it is defined as a histogram of a local window of the filtered image • One bin of the histogram is given by Topological Local Spectral Histogram Example Convolution is implemented using FPGAs Local Spectral Histogram Features Field Programmable Gate Arrays • Two primary methods for computation • Hard Wired Application Specific Integrated Circuit (ASIC) • Software-programmed microprocessors • New Approach • Programmable hardware • Field Programmable Gate Arrays (FPGAs) represent a breakthrough in computing technology – Especially for intrinsically parallel applications μP/ ASIC / FPGA Comparison Summary μP ASIC FPGA Programmable (flexible) Fixed Design Functionality (inflexible) Programmable (flexible) Relatively Slow Serial Computation Very Fast, highly parallelized computation Fast, Parallel Computation Floating and Fixed Point Fixed Point / Floating Fixed Point / Floating Relatively Inexpensive Design Cycle (Software) Expensive Design Cycle (requires chip design) Relatively Inexpensive Design Cycle Limited Bandwidth Very High Bandwidth Near ASIC Bandwidth Standard High Level Languages C/C++ or Assembly Hardware Description Language for Design / Simulation VHDL / Verilog Hardware Description Language for Design / Simulation VHDL / Verilog Hardware vs. Software L 1 • Software Implementation: y (n ) xk (n ) hk k 0 Sum = 0.0 I = 0; While (I < L) tmp = x(i) * h(i) Sum = Sum + tmp I = I+1 end A typical software implementation takes 4*L instructions to compute one convolution Hardware vs. Software A custom hardware implementation Multiply/Accumulate performed in parallel Can be done in one clock cycle Convolution Timing Diagram Convolution Start Signal Clock All nine response values finished Every 7 Clock Cycles: 9 new response values Topological Local Spectral Histograms – cont. Why TLSH features? • It provides a very rich set of over-complete features – For example, suppose we have 22 filters, there will be 1,173,942 different TLSH features within a 21 x 21 region, considering different windows and different filters – TLSH features are more effective than Haar features used by Viola and Jones [P. Viola and M. Jones, International Journal of Computer Vision, vol. 57, pp. 137-154, 2004] ORL Face Dataset Comparison Between Haar and TLSH Features COIL Dataset Comparison Between Haar and TLSH Features Texture Dataset Comparison Between Haar and TLSH Features Mixed Dataset Comparison Between Haar and TLSH Features Comparison Between Haar and TLSH Features Classifier To achieve the specification, we also need a classifier that takes only a few instructions to make a decision on average • At the same time, we need to achieve high accuracy We propose to use a look-up table tree classifier • I.e., a decision tree classifier where each node is implemented by a look-up table Look-up Table Tree Classifier Look-up Table Tree Classifier An Example Path in a Decision Tree Constructing Look-up Table Decision Tree Joint optimization of clustering, TLSH features, and optimal linear projections • We want to maximize the separations between marginal distributions of different clusters • We can do the optimization iteratively – We can do clustering first using current TLSH features and projections to maximize the separations – We can find optimal TLSH features given linear projections – Then we can find optimal linear projections given updated TLSH features Performance Comparison RCT – Rapid Classification Tree, implemented by Keith Haynes Detection and Recognition Detection and Recognition Shape Theory We want to quantify the difference between two shapes in a principled way • We do this by constructing a shape space and then use the geodesic distance of two shapes on the shape manifold as the metric Shape Clustering Shape Clustering Clustering Dendrogram Sulcal Curves Sulcal curves are important for characterizing brain functions Sulcal Curves Sulcal curves are important for characterizing brain functions Clustering of Sulcal Curves Modeling Mathematical Abilities and Disabilities As it is possible to acquire detailed surfaces of the human brain, one may ask how characteristics of the brain structure affect the mathematical abilities and disabilities • The U.S. Department of Education wants to know so that they can understand and find solutions to the mathematical problems young children have Corpus callosum examples of young children without mathematical disabilities (a) and with (b) SurfaVision – A Surface-based Vision System One of the challenges is how to build a machine vision that is robust • This has been proven to be very difficult after several decades of computer vision research We may now have a solution for applications in an indoor environment Multi-Camera Multi-Projector Scanning Surface Parametrization Geodesic Interpolation Between Surfaces Robust Visual Inference With a common domain for surface representations, we can pose the visual inference in the Bayesian framework by building probability models Human-Robot Collaborative Interaction The goal is to let robots be aware of the positions, poses, expressions, moods, and other factors of the humans so that robots can interact with humans collaborative In collaboration with Prof. Emmanuel Collins at the College Engineering Automated 3D Phenotype Measurement The central problem in biology is to understand the relationship between genotype and phenotype • With availability of genomes of humans and model organisms, the central problem becomes how to measure phenotype at a large scale 3D Urban Models Courses Most • • • • • • Relevant Courses CAP 5638 Pattern Recognition CAP 5415 Principles and Algorithms of Computer Vision CAP 6417 Theoretical Foundations of Computer Vision STA 5106 Computational Methods in Statistics I STA 5107 Computational Methods in Statistics I I Seminars and advanced studies Related Courses • CAP 5615 Artificial Neural Networks • CAP 5600 Artificial Intelligence • CAP 5xxx Machine Learning Funding of the Group National • • • • • Science Foundation DMS CISE IIS FRG ACT CCF NGA – National Geo-spatial Intelligence Agency Army Research Office • DURIP • Research grant Companies • Next Century and others under negotiation Summary CAVIS group and FSvision group offer interesting research topics/projects • Efficient represent for generic images • Real-time detection and recognition • Computational models for object recognition and image classification • Medical image analysis • Motion/video sequence analysis and modeling • They are challenging • They are interesting • They are exciting Contact Information • • • • • Name Web sites Email Offices Phones Xiuwen Liu http://cavis.fsu.edu http://fsvision.fsu.edu http://www.cs.fsu.edu/~liux liux@cs.fsu.edu LOV 166 and 118 North Woodward Ave. 644-0050 and 645-2257 Thank you! Any questions?