Research Activities at Florida State Vision Group Florida State University Xiuwen Liu Department of Computer Science Florida State University http://www.cs.fsu.edu/~liux/courses/intro-seminar-10.ppt Research Statement My research goal is to create machines that can “see” with similar and super human performance and their applications • This seems a trivial problem as each of us can do this without any effort • Computer + Camera = “A See Machine” ? 9/11/2010 11:50:27 PM intro-seminar-10.ppt Visual Pathway 9/11/2010 11:50:33 PM intro-seminar-10.ppt Visual Illusion 9/11/2010 11:50:39 PM intro-seminar-10.ppt Outline Why computer vision and pattern recognition • Motivating applications Some of my research projects Related courses Contact information 9/11/2010 11:50:43 PM intro-seminar-10.ppt What is the Passion? 9/11/2010 11:50:49 PM intro-seminar-10.ppt 7 Image-Guided Neurosurgery 9/11/2010 11:50:53 PM intro-seminar-10.ppt Computer Vision Applications – cont. Military applications • Automated target recognition 9/11/2010 11:51:03 PM intro-seminar-10.ppt Biometrics Iris code can achieve zero false acceptance intro-seminar-10.ppt 9/11/2010 11:51:11 PM Computer Vision in Sports How was the yellow created? 9/11/2010 11:51:13 PM intro-seminar-10.ppt Social Health The coming epidemic – Alzheimer’s • There is no cure but early detection is the key • How to do it? 9/11/2010 11:51:17 PM intro-seminar-10.ppt Smart Energy U.S. smart grid initiative 9/11/2010 11:51:44 PM intro-seminar-10.ppt Cyber-Physical Systems http://dpolyakov.com/images/design/smartplanet_040.jpg 9/11/2010 11:51:45 PM intro-seminar-10.ppt Computational Biology Life is fundamentally digital and so is biology 9/11/2010 11:51:47 PM intro-seminar-10.ppt Research Projects Image • • • • and shape presentations Image modeling Video analysis Medical image analysis Media for all – Automatic video description generation Cyber-physical systems – RFID Localization Computational Biology Classes 9/11/2010 11:52:18 PM intro-seminar-10.ppt Generic Image Modeling How can we characterize all these images perceptually? 9/11/2010 11:52:19 PM intro-seminar-10.ppt Spectral Histogram Representation Spectral histogram • Given a bank of filters F(a), a = 1, …, K, a spectral histogram is defined as the marginal distribution of filter responses I(a ) (v) F (a ) * I(v) H (a ) I 1 (a ) ( z) δ ( z I (v)) |I| v H I ( H I(1) , H I( 2) ,, H I( K ) ) 9/11/2010 11:52:20 PM intro-seminar-10.ppt Spectral Histogram Representation - continued Choice • • • • of filters Laplacian of Gaussian filters Gabor filters Gradient filters Intensity filter LoG filter 9/11/2010 11:52:20 PM Gabor filter intro-seminar-10.ppt Spectral Histogram Representation - continued 9/11/2010 11:52:21 PM intro-seminar-10.ppt Face detection - continued 9/11/2010 11:52:21 PM intro-seminar-10.ppt Face detection - continued 9/11/2010 11:52:22 PM intro-seminar-10.ppt Face detection - continued 9/11/2010 11:52:23 PM intro-seminar-10.ppt Rotation Invariant Face Detection 9/11/2010 11:52:28 PM intro-seminar-10.ppt Rotation Invariant Face Detection - continued 9/11/2010 11:52:29 PM intro-seminar-10.ppt Linear Representations Linear representations are widely used in appearance-based object recognition and other applications • Simple to implement and analyze • Efficient to compute • Effective for many applications a ( I ,U ) U I R T 9/11/2010 11:52:49 PM d intro-seminar-10.ppt Standard Linear Representations Principal Component Analysis • Designed to minimize the reconstruction error on the training set • Obtained by calculating eigenvectors of the co-variance matrix Fisher Discriminant Analysis • Designed to maximize the separation between means of each class • Obtained by solving a generalized eigen problem Independent Component Analysis • Designed to maximize the statistical independence among coefficients along different directions • Obtained by solving an optimization problem with some object function such as mutual information, negentropy, .... 9/11/2010 11:52:50 PM intro-seminar-10.ppt Optimal Component Analysis 9/11/2010 11:55:41 PM intro-seminar-10.ppt ORL Face Dataset 9/11/2010 11:55:42 PM intro-seminar-10.ppt Performance Comparison 9/11/2010 11:55:42 PM intro-seminar-10.ppt Real-time Scene Interpretation Object detection and recognition problem • Given a set of images, find regions in these images which contain instances of relevant objects • Here the number of relevant objects is assumed to be large – For example, the system should be able to handle 30,000 different kinds of objects, an estimate of the human brain’s capacity for basic level visual categorization [I. Biederman, Psychological Review, vol. 94, pp. 115-147, 1987] 9/11/2010 11:55:43 PM intro-seminar-10.ppt Problem Statement for Scene Interpretation Object detection and recognition problem • Given a set of images, find regions in these images which contain instances of relevant objects • Here the number of relevant objects is assumed to be large – For example, the system should be able to handle 30,000 different kinds of objects, an estimate of the human’s capacity for basic level visual categorization [I. Biederman, Psychological Review, vol. 94, pp. 115-147, 1987] Goal • Develop a system that can achieve real-time detection and recognition for images of size 640 x 480 with high accuracy – Say, at a frame rate of 15 frames per second 9/11/2010 11:55:43 PM intro-seminar-10.ppt Proposed Framework 9/11/2010 11:55:43 PM intro-seminar-10.ppt Specifications and Requirements We want to detect and recognize at least 30,000 object classes in images • At four different scales • Using exhaustive search of local windows, that is, we do not assume segmentation or other pre-processing • If we assume objects are in some (e.g. 21 x 21) windows, this means that there will be many (18,432,000) local windows to be classified/processed • We want to do this on a 3.6 Ghz Dell Precision workstation with an estimated performance of 28,665.4 MIPS • This amounts to that we have about 1555 instructions to process a 21 x 21 local window 9/11/2010 11:55:44 PM intro-seminar-10.ppt Requirements – cont. To achieve the specifications, we need two critical components • A classifier that can reduce the average classification time effectively – Note that on average we have 1555 instructions; if we can process 90% of those windows using only 100 instructions per window, we can have on average 14,650 instructions for the remaining 10% local windows • Features that can discriminate a large number of objects and can be computed using a few instructions – Do such features exist? 9/11/2010 11:55:44 PM intro-seminar-10.ppt Local Spectral Histograms We introduce a new class of features, which we called LSH features • It is defined relative to a chosen set of filters • For a given filter, it is defined as a histogram of a local window of the filtered image • One bin of the histogram is given by 9/11/2010 11:55:44 PM intro-seminar-10.ppt Local Spectral Histogram Example Convolution is implemented using FPGAs 9/11/2010 11:55:45 PM intro-seminar-10.ppt Local Spectral Histogram Features 9/11/2010 11:55:45 PM intro-seminar-10.ppt ORL Face Dataset 9/11/2010 11:55:45 PM intro-seminar-10.ppt Comparison Between Haar and LSH Features 9/11/2010 11:55:46 PM intro-seminar-10.ppt COIL Dataset 9/11/2010 11:55:46 PM intro-seminar-10.ppt Comparison Between Haar and LSH Features 9/11/2010 11:55:46 PM intro-seminar-10.ppt Texture Dataset 9/11/2010 11:55:47 PM intro-seminar-10.ppt Comparison Between Haar and LSH Features 9/11/2010 11:55:47 PM intro-seminar-10.ppt Mixed Dataset 9/11/2010 11:55:47 PM intro-seminar-10.ppt Comparison Between Haar and LSH Features 9/11/2010 11:55:47 PM intro-seminar-10.ppt Comparison Between Haar and LSH Features 9/11/2010 11:55:48 PM intro-seminar-10.ppt Classifier To achieve the specification, we also need a classifier that takes only a few instructions to make a decision on average • At the same time, we need to achieve high accuracy We propose to use a look-up table tree classifier • I.e., a decision tree classifier where each node is implemented by a look-up table 9/11/2010 11:55:48 PM intro-seminar-10.ppt Look-up Table Tree Classifier 9/11/2010 11:55:48 PM intro-seminar-10.ppt Look-up Table Tree Classifier 9/11/2010 11:55:49 PM intro-seminar-10.ppt An Example Path in a Decision Tree 9/11/2010 11:55:49 PM intro-seminar-10.ppt Performance Comparison RCT – Rapid Classification Tree, implemented by Keith Haynes 9/11/2010 11:55:50 PM intro-seminar-10.ppt Detection and Recognition 9/11/2010 11:55:50 PM intro-seminar-10.ppt Detection and Recognition 9/11/2010 11:55:50 PM intro-seminar-10.ppt Content-based Video Representation, Indexing and Retrieval A video is an extrinsic 3D representation of a 4D volume • 3D spatial space + 1D temporal space = 4D volume • For video, 2D image space + 1D temporal space = 3D volume Our group is working an intrinsic 4D representation for video • By first reconstructing the scene using SLAM (Simultaneous localization and mapping) and stereopsis 9/11/2010 11:55:51 PM intro-seminar-10.ppt 4D Video Representation Example 9/11/2010 11:55:51 PM intro-seminar-10.ppt VeScene System VeScene 9/11/2010 11:55:52 PM – Voiced Scene System intro-seminar-10.ppt Illustration by Nan Zhao Computer Vision for Gerotechnology As mobile devices become more powerful, they may serve as an efficient interface to make up visual, memory, and other deficiencies due to aging • The society is aging – For example, people of 65 and older are 16.8% of Florida’s population (US Census Bureau, 2005) • By modifying and enhancing environments, vision technology can be critical for helping people stay active and be independent 9/11/2010 11:55:53 PM intro-seminar-10.ppt Early Detection of Alzheimer’s Through Gait Alzheimer’s can be detected reliably five years before it can be clinically detected otherwise Others include typing/writing, and content analysis • As they are controlled by cognitive functions that depend on brain areas that are affected by Alzheimer’s • Usage of words is also affected 9/11/2010 11:55:53 PM Collaborating with Prof. Tyson to do early detection using smart phones intro-seminar-10.ppt Shape Theory We want to quantify the difference between two shapes in a principled way • We do this by constructing a shape space and then use the geodesic distance of two shapes on the shape manifold as the metric 9/11/2010 11:55:54 PM intro-seminar-10.ppt Surface Parametrization 9/11/2010 11:55:54 PM intro-seminar-10.ppt Geodesic Interpolation Between Surfaces 9/11/2010 11:55:55 PM intro-seminar-10.ppt Atlas for Hippocampus 9/11/2010 11:55:55 PM intro-seminar-10.ppt Characterizing Alzheimer’s via Shape Change 9/11/2010 11:55:56 PM intro-seminar-10.ppt Computer Vision for Computational Systems Biology The goal of systems biology is to link the molecular and cellular events and properties to physiological functions 9/11/2010 11:55:56 PM proteins to organs: The Physiomeintro-seminar-10.ppt Source: “Integration from Project”, Nature Review, Vol. 4, 2007. FISHFinder@FSU Source: Gilbert’s group at Biology Department, FSU 9/11/2010 11:55:56 PM intro-seminar-10.ppt 67 High Throughput Nanoscale Localization In cellular and molecular biology, a typical problem is that biologists need to localize marked proteins in various areas 9/11/2010 11:55:57 PM intro-seminar-10.ppt Live Cell Imaging at Cellular Level 9/11/2010 11:55:57 PM intro-seminar-10.ppt QUEST Project Quantitative Elastic Spatial-Temporal Atlases for Subcellular Structures 9/11/2010 11:55:58 PM intro-seminar-10.ppt Atomic Tomography for Electron Microscopy As life is digital, the ultimate necessary resolution for modeling biological processes is atomic resolution • As life is digital, the ultimate models will be discrete rather continuous • It appears that atomic reconstruction is almost within reach using a two-stage tomography algorithm – to be named ATomo – Being developed by Chaity and others 9/11/2010 11:55:58 PM intro-seminar-10.ppt Cyber Physical Systems As ubiquitous computing is a reality, location aware services become a critical component • An example is GPS-based services • Currently, with Prof. Zhang we are studying a dramatically new way of localizing objects through RFID tags with a 2mm accuracy 9/11/2010 11:55:58 PM intro-seminar-10.ppt Fine Granularity Localization of RFIDs 9/11/2010 11:55:59 PM intro-seminar-10.ppt Intelligent Human-Computer Interface Activity Monitoring for Elderly With RFID tags, we can identify and localize many objects • By integrating with built cameras in phones, we can estimate a three dimensional model of the environment along with the states of the objects • An envisioned program is that a person can remotely get a summary and other statistics of daily activities of elderly who live independently 9/11/2010 11:55:59 PM intro-seminar-10.ppt Courses Most Relevant Courses • CAP 5638 Pattern Recognition – Offered 2011-2012 • CAP 5415 Principles and Algorithms of Computer Vision – Will be offered Spring 2010 • • • • • CAP 6417 Theoretical Foundations of Computer Vision STA 5106 Computational Methods in Statistics I STA 5107 Computational Methods in Statistics I I ISC 5935-05/STA 5934-01 Applied Machine Learning Seminars and advanced studies Related Courses • CAP 5726 Computer Graphics • CAP 5600 Artificial Intelligence 9/11/2010 11:56:00 PM intro-seminar-10.ppt Funding of the Group National • • • • Science Foundation DMS CISE IIS ACT CCF National Institute of Health Industry • Harris 9/11/2010 11:56:00 PM intro-seminar-10.ppt Summary Computer Vision Group offers interesting research topics/projects • Effective and intrinsic represents for images and videos • Real-time detection and recognition of objects • Computational models for object recognition and image classification • Medical/biological image analysis • Motion/video sequence analysis and modeling • They are challenging, interesting, and exciting • Now it is a productive and fruitful area to be in 9/11/2010 11:56:01 PM intro-seminar-10.ppt Contact Information • • • • • Name Web sites Email Offices Phones 9/11/2010 11:56:02 PM Xiuwen Liu http://cavis.fsu.edu http://www.cs.fsu.edu/~liux liux@cs.fsu.edu LOV 166 and Eppes 102 644-0050 and 645-2257 intro-seminar-10.ppt Thank you! Any questions?