Research Activities at Center for Applied Vision and Imaging Sciences and

advertisement
Research Activities at
Center for Applied Vision and Imaging Sciences and
Florida State Vision Group
Florida State University
Xiuwen Liu
Department of Computer Science
Florida State University
http://cavis.fsu.edu & http://fsvision.fsu.edu
Research Statement
 My
research goal is to create machines that can
“see” with similar human performance
• This seems a trivial problem as each of us can do this
without any effort
• Computer + Camera = “A See Machine” ?
Visual Pathway
Visual Illusion
Outline
 Motivations
• Some applications of computer vision and pattern
recognition techniques
 Some
of the research projects
 Related
Courses
 Contact
information
Computer Vision Applications
 No
hands across America
• Sponsored by Delco Electronics, AssistWare Technology,
and Carnegie Mellon University
• Navlab 5 drove from Pittsburgh, PA to San Diego, CA,
using the RALPH computer program.
• The trip was 2849 miles of which 2797 miles were driven
automatically with no hands
– Which is 98.2%
Computer Vision Applications – continued
Computer Vision Applications – continued
Human-Computer Interactions
Sign Language Recognition
CyberKnife
CyberKnife – Cont.
Image-Guided Neurosurgery
Intelligent Transportation Systems
http://dfwtraffic.dot.state.tx.us/dal-cam-nf.asp
Computer Vision Applications – cont.
 Military
applications
• Automated target recognition
Computer Vision Applications – continued
Biometrics – cont.
Iris code can achieve zero false acceptance
Computer Vision in Sports
 How
was the yellow created?
Generic Image Modeling

How can we characterize all these images perceptually?
Spectral Histogram Representation
 Spectral
histogram
• Given a bank of filters F(a), a = 1, …, K, a spectral
histogram is defined as the marginal distribution of filter
responses
I(a ) (v)  F (a ) * I(v)
H
(a )
I
1
(a )
( z) 
δ
(
z

I
(v))

|I| v
H I  ( H I(1) , H I( 2) ,, H I( K ) )
Spectral Histogram Representation - continued
 Choice
•
•
•
•
of filters
Laplacian of Gaussian filters
Gabor filters
Gradient filters
Intensity filter
LoG filter
Gabor filter
Spectral Histogram Representation - continued
Texture Synthesis Examples - continued
Observed image
 An
Synthesized image
image with periodic structures
Object Synthesis Examples - continued
Performance Comparison
Face Detection Based On Spectral Representations
 Face
detection is to detect all instances of faces in a
given image
 Each image window is represented by its spectral
histogram
• A support vector machine is trained on training faces
• Then the trained support vector machine is used to classify
each image window in an input image

More results at http://fsvision.fsu.edu/face-detection
Face detection - continued
Face detection - continued
Face detection - continued
Rotation Invariant Face Detection
Rotation Invariant Face Detection - continued
Linear Representations

Linear representations are widely used in appearance-based
object recognition and other applications
• Simple to implement and analyze
• Efficient to compute
• Effective for many applications
a ( I ,U )  U I  R
T
d
Standard Linear Representations
 Principal
Component Analysis
• Designed to minimize the reconstruction error on the training set
• Obtained by calculating eigenvectors of the co-variance matrix
 Fisher Discriminant Analysis
• Designed to maximize the separation between means of each class
• Obtained by solving a generalized eigen problem
 Independent
Component Analysis
• Designed to maximize the statistical independence among coefficients
along different directions
• Obtained by solving an optimization problem with some object function
such as mutual information, negentropy, ....
Standard Linear Representations - continued
 Standard
linear representations are sub optimal
for recognition applications
• Evidence in the literature
• A toy example
– Standard representations give the worst recognition performance
 Optimal
component analysis
Performance Measure - continued

Suppose there are C classes to be recognized
• Each class has ktrain training images
• It has kcross cross validation images
• We used h(x) = 1/(1+exp(-2bx)
Performance Measure - continued
 F(U)
depends on the span of U but is invariant to
change of basis
• In other words, F(U)=F(UO) for any orthonormal matrix O
• The search space of F(U) is the set of all the subspaces,
which is known as the Grassmann manifold
– It is not a flat vector space and gradient flow must take the
underlying geometry of the manifold into account
Deterministic Gradient Flow - continued

Gradient at [J] (first d columns of n x n identity matrix)
Deterministic Gradient Flow - continued
 Gradient

at U: Compute Q such that QU=J
Deterministic gradient flow on Grassmann manifold
Stochastic Gradient and Updating Rules

Stochastic gradient is obtained by adding a stochastic
component

Discrete updating rules
MCMC Simulated Annealing Optimization Algorithm

Let X(0) be any initial condition and t=0
1.
2.
3.
4.
5.
6.
7.
Calculate the gradient matrix A(Xt)
Generate d(n-d) independent realizations of wij’s
Compute Y (Xt+1) according to the updating rules
Compute F(Y) and F(Xt) and set dF=F(Y)- F(Xt)
Set Xt+1 = Y with probability min{exp(dF/Dt),1}
Set Dt+1 = Dt / g and set t=t+1
Go to step 1
ORL Face Dataset
Performance Comparison
Performance Comparison – cont.
Brain Curve Classification
Brain Curve Classification – cont.
Real-time Scene Interpretation
 Object
detection and recognition problem
• Given a set of images, find regions in these images which
contain instances of relevant objects
• Here the number of relevant objects is assumed to be large
– For example, the system should be able to handle 30,000 different
kinds of objects, an estimate of the human brain’s capacity for basic
level visual categorization [I. Biederman, Psychological Review, vol. 94, pp. 115-147,
1987]
Global Monitoring Through High-resolution Satellite Images
Problem Statement for Scene Interpretation
 Object
detection and recognition problem
• Given a set of images, find regions in these images which
contain instances of relevant objects
• Here the number of relevant objects is assumed to be large
– For example, the system should be able to handle 30,000 different
kinds of objects, an estimate of the human’s capacity for basic level
visual categorization [I. Biederman, Psychological Review, vol. 94, pp. 115-147, 1987]
 Goal
• Develop a system that can achieve real-time detection and
recognition for images of size 640 x 480 with high accuracy
– Say, at a frame rate of 15 frames per second
Existing Approaches
 Fast
methods but low
accuracy
• One can for example classify
one pixel at a time
• However, it is to identify
airplanes with high accuracy
due to high false positives
and negatives
Existing Approaches – cont.
 Fast
methods but low
accuracy
• One can for example classify
one pixel at a time
• However, it is to identify
airplanes with high accuracy
 Methods
with good
accuracy but slow
• One can in theory use
deformable template
matching to locate instances
of airplanes
• It may need several hours to
process one image
Proposed Framework
Specifications and Requirements
 We
want to detect and recognize at least 30,000
object classes in images
• At four different scales
• Using exhaustive search of local windows, that is, we do not
assume segmentation or other pre-processing
• If we assume objects are in some (e.g. 21 x 21) windows, this
means that there will be many (18,432,000) local windows to
be classified/processed
• We want to do this on a 3.6 Ghz Dell Precision workstation
with an estimated performance of 28,665.4 MIPS
• This amounts to that we have about 1555 instructions to
process a 21 x 21 local window
Requirements – cont.
 To
achieve the specifications, we need two critical
components
• A classifier that can reduce the average classification time
effectively
– Note that on average we have 1555 instructions; if we can process
90% of those windows using only 100 instructions per window, we
can have on average 14,650 instructions for the remaining 10% local
windows
• Features that can discriminate a large number of objects and
can be computed using a few instructions
– Do such features exist?
Topological Local Spectral Histograms
 We
introduce a new class of features, which we
called TLSH features
• It is defined relative to a chosen set of filters
• For a given filter, it is defined as a histogram of a local
window of the filtered image
• One bin of the histogram is given by
Topological Local Spectral Histogram Example
Convolution is implemented
using FPGAs
Local Spectral Histogram Features
Field Programmable Gate Arrays
•
Two primary methods for computation
• Hard Wired Application Specific Integrated Circuit (ASIC)
• Software-programmed microprocessors
•
New Approach
• Programmable hardware
• Field Programmable Gate Arrays (FPGAs) represent a
breakthrough in computing technology
– Especially for intrinsically parallel applications
μP/ ASIC / FPGA Comparison Summary
μP
ASIC
FPGA
Programmable (flexible)
Fixed Design Functionality (inflexible)
Programmable (flexible)
Relatively Slow Serial Computation
Very Fast, highly parallelized
computation
Fast, Parallel Computation
Floating and Fixed Point
Fixed Point / Floating
Fixed Point / Floating
Relatively Inexpensive Design Cycle
(Software)
Expensive Design Cycle (requires chip
design)
Relatively Inexpensive Design Cycle
Limited Bandwidth
Very High Bandwidth
Near ASIC Bandwidth
Standard High Level Languages
C/C++ or Assembly
Hardware Description Language for
Design / Simulation
VHDL / Verilog
Hardware Description Language for
Design / Simulation
VHDL / Verilog
Hardware vs. Software
L 1
• Software Implementation: y (n )   xk (n )  hk
k 0
Sum = 0.0
I = 0;
While (I < L)
tmp = x(i) * h(i)
Sum = Sum + tmp
I = I+1
end
A typical software
implementation takes 4*L
instructions to compute one
convolution
Hardware vs. Software
 A custom
hardware implementation
Multiply/Accumulate
performed in parallel
Can be done in one
clock cycle
Convolution Timing Diagram
Convolution
Start Signal
Clock
All nine
response
values
finished
Every 7 Clock
Cycles: 9 new
response
values
Topological Local Spectral Histograms – cont.
 Why TLSH
features?
• It provides a very rich set of over-complete features
– For example, suppose we have 22 filters, there will be 1,173,942
different TLSH features within a 21 x 21 region, considering different
windows and different filters
– TLSH features are more effective than Haar features used by Viola and
Jones [P. Viola and M. Jones, International Journal of Computer Vision, vol. 57, pp.
137-154, 2004]
ORL Face Dataset
Comparison Between Haar and TLSH Features
COIL Dataset
Comparison Between Haar and TLSH Features
Texture Dataset
Comparison Between Haar and TLSH Features
Mixed Dataset
Comparison Between Haar and TLSH Features
Comparison Between Haar and TLSH Features
Classifier
 To
achieve the specification, we also need a
classifier that takes only a few instructions to make
a decision on average
• At the same time, we need to achieve high accuracy
 We
propose to use a look-up table tree classifier
• I.e., a decision tree classifier where each node is
implemented by a look-up table
Look-up Table Tree Classifier
Look-up Table Tree Classifier
An Example Path in a Decision Tree
Constructing Look-up Table Decision Tree
 Joint
optimization of clustering, TLSH features,
and optimal linear projections
• We want to maximize the separations between marginal
distributions of different clusters
• We can do the optimization iteratively
– We can do clustering first using current TLSH features and
projections to maximize the separations
– We can find optimal TLSH features given linear projections
– Then we can find optimal linear projections given updated TLSH
features
Performance Comparison
RCT – Rapid Classification Tree, implemented by Keith Haynes
Detection and Recognition
Detection and Recognition
Shape Theory

We want to quantify the difference between two shapes in
a principled way
• We do this by constructing a shape space and then use the geodesic
distance of two shapes on the shape manifold as the metric
Shape Clustering
Shape Clustering
Clustering Dendrogram
Sulcal Curves

Sulcal curves are important for characterizing brain
functions
Sulcal Curves

Sulcal curves are important for characterizing brain
functions
Clustering of Sulcal Curves
Modeling Mathematical Abilities and Disabilities

As it is possible to acquire detailed surfaces of the human
brain, one may ask how characteristics of the brain
structure affect the mathematical abilities and disabilities
• The U.S. Department of Education wants to know so that they can understand and
find solutions to the mathematical problems young children have
Corpus callosum examples of young children without mathematical disabilities (a) and with (b)
SurfaVision – A Surface-based Vision System


One of the challenges is how to build a machine vision that is
robust
• This has been proven to be very difficult after several decades of
computer vision research
We may now have a solution for applications in an indoor environment
Multi-Camera Multi-Projector Scanning
Surface Parametrization
Geodesic Interpolation Between Surfaces
Robust Visual Inference

With a common domain for surface representations, we can
pose the visual inference in the Bayesian framework by
building probability models
Human-Robot Collaborative Interaction

The goal is to let robots be aware of the positions, poses,
expressions, moods, and other factors of the humans so that
robots can interact with humans collaborative
In collaboration with Prof. Emmanuel Collins at the College Engineering
Automated 3D Phenotype Measurement

The central problem in biology is to understand the
relationship between genotype and phenotype
• With availability of genomes of humans and model organisms, the central
problem becomes how to measure phenotype at a large scale
3D Urban Models
Courses
 Most
•
•
•
•
•
•
Relevant Courses
CAP 5638 Pattern Recognition
CAP 5415 Principles and Algorithms of Computer Vision
CAP 6417 Theoretical Foundations of Computer Vision
STA 5106 Computational Methods in Statistics I
STA 5107 Computational Methods in Statistics I I
Seminars and advanced studies
 Related
Courses
• CAP 5615 Artificial Neural Networks
• CAP 5600 Artificial Intelligence
• CAP 5xxx Machine Learning
Funding of the Group
 National
•
•
•
•
•
Science Foundation
DMS
CISE IIS
FRG
ACT
CCF
 NGA –
National Geo-spatial Intelligence Agency
 Army Research Office
• DURIP
• Research grant
 Companies
• Next Century and others under negotiation
Summary
 CAVIS
group and FSvision group offer
interesting research topics/projects
• Efficient represent for generic images
• Real-time detection and recognition
• Computational models for object recognition and image
classification
• Medical image analysis
• Motion/video sequence analysis and modeling
• They are challenging
• They are interesting
• They are exciting
Contact Information
•
•
•
•
•
Name
Web sites
Email
Offices
Phones
Xiuwen Liu
http://cavis.fsu.edu
http://fsvision.fsu.edu
http://www.cs.fsu.edu/~liux
liux@cs.fsu.edu
LOV 166 and 118 North Woodward Ave.
644-0050 and 645-2257
Thank you!
Any questions?
Download