SMARTEYE - VEHICLE SECURITY SYSTEM USING FACIAL RECOGNITION ALFRED RITIKOS

advertisement
SMARTEYE - VEHICLE SECURITY SYSTEM
USING FACIAL RECOGNITION
ALFRED RITIKOS
A project report submitted in partial fulfilment of the
requirements for the award of the degree of
Master of Engineering
Faculty of Electrical Engineering
Universiti Teknologi Malaysia
MAY 2007
iii
DEDICATION
To my beloved wife, Phoay Eng, and sons, Ephraim and Keane.
iv
ACKNOWLEDGEMENT
I would like to acknowledge my gratitude and appreciation to my project
supervisor, Professor Dr. Ruzairi bin Hj. Abdul Rahim, for his guidance, advice and
friendship throughout the period of carrying out this project as well as throughout
this course, and also to Prof. Madya Dr. Syed Abdul Rahman bin Syed Abu Bakar
for his assistance in providing specialist advice in the area of image processing.
Furthermore, some of my workplace colleagues have been kind to provide
their face images to be captured for the needed database.
Finally, my gratitude is extended to my fellow students who have provided
much encouragement and support.
v
ABSTRACT
Facial recognition has gained increasing interest in the recent decade. Over
the years there have been several techniques being developed to achieve high success
rate of accuracy in the identification and verification of individuals for authentication
in security systems. This project experiments the concept of combining of multilevel
wavelet decomposition transformation and neural network for facial recognition in a
specific application with its own limitations, in that of vehicle security access control
system. The approach of this project is to conceptualise by simulation of the various
processes involved in developing an implementable system.
Keywords: Facial Recognition, Facial Verification, Image Extraction, Image
Processing,
Principal
Component
Transformation, Neural Network
Analysis,
Edge
Detection,
Wavelet
vi
ABSTRAK
Dalam masa singkat kebelakangan ini pengenalan muka (facial recognition)
telah banyak menerima tumpuan.
Beberapa teknik atau cara telah dikaji dan
dibangunkan untuk mencapai tahap ketepatan dengan kadar kejayaan yang tinggi
dalam usaha mengenalpasti seseorang individu untuk diberi kebenaran laluan dalam
sistem-sistem keselamatan.
Projek ini telah menyelidiki penggabungan konsep
multilevel wavelet decomposition transformation dan neural network untuk Facial
Recognition dalam penggunaan yang tertentu yang mempunyai had-hadnya
tersendiri, iaitu system kawalan keselamatan kenderaan. Projek ini tertumpu kepada
membuktikan konsep tersebut dengan cara simulasi berbagai proses aturcara yang
terlibat dalam sesuatu system yang boleh direka.
vii
TABLE OF CONTENTS
CHAPTER
1
TITLE
PAGE
DECLARATION
ii
DEDICATION
iii
ACKNOWLEDGEMENTS
iv
ABSTRACT
v
ABSTRAK
vi
TABLE OF CONTENTS
vii
LIST OF TABLES
viii
LIST OF FIGURES
ix
LIST OF ABBREVIATIONS
x
LIST OF SYMBOLS
xi
INTRODUCTION
1
1.1 Biometrics for Identification and Verification
1
1.2 Verification vs. Identification
2
1.3 Incentives for Facial Recognition Application in
3
Vehicle Security
2
3
PROJECT SCOPE
7
2.1 Project Background
7
2.2 Overall Objectives
9
2.3 Scope of Work and Methodology
9
FACIAL RECOGINITION
11
3.1 An Overview of Facial Recognition Biometric
11
3.2 Applications of Facial Recognition
12
3.3 Generic Facial Recognition Algorithm
13
viii
4
5
3.4 Algorithms Comparisons
15
3.5 Basis of Facial Recognition Process – The PCA
19
3.5.1 Minimum Distance Classifier
20
3.5.2 Matching by Correlation
22
3.6 Neural Networks
24
IMAGE EXTRACTION
26
4.1 Setting The Scene
26
4.2 Digital Image Structure
27
4.3 Image Acquisition
29
4.4 Importance Of Facial Positioning
31
IMAGE TRANSFORMATION & PROCESSING
34
5.1 Grayscale Transformation
34
5.2 Image Thresholding
35
5.3 Gaussian Filtering
35
5.4 Image Features Extraction - Canny Edge Detector
36
5.5 Quality of images - Brightness and Contrast
39
Adjustments
6
SYSTEM DESIGN
41
6.1 System Architecture
41
6.1.1 Hardware Architecture
41
6.1.2 Software Architecture
42
6.2 Wavelet Packet Analysis For Face Recognition
44
6.3 Discrete Cosine Transform
46
6.4 Face Matching (ANN Of Wavelets)
51
7
CONCLUSION
55
8
FUTURE WORK
57
8.1 Practical software/hardware implementation
57
8.2 Improving Image Quality
58
8.3 Robustness of algorithms
59
8.4 Combination Of Algorithms
59
Appendices A-E
61
REFERENCES
76
ix
LIST OF TABLES
TABLE NO.
3.1
TITLE
Summary List Of Image-Based FR Algorithms
PAGE
16
x
LIST OF FIGURES
FIGURE NO.
TITLE
PAGE
1.1
An intelligent car fob for a fleet management system
5
3.1
A typical Facial Recognition process
14
3.2
Generic algorithm for software programming
15
3.3
Basic form of Neural Network architecture
25
4.1
Framework of Image/Video processing
27
4.2
Binary values of pixels
27
4.3
Camera viewing axis must be perpendicular with the
32
image
4.4
Facial image at a non-perpendicular angle results in
32
error
5.1
Simulation of various edge detection methods
37
5.2
Result of simulation of various edge detection
38
methods
5.3
Canny edge detection of an RGB image
39
5.4
Poorly lit environment produces unreliable image data
40
6.1
Proposed system equipment layout
42
6.2
Facial Recognition System architecture
43
6.3
The Haar mother wavelet function
46
6.4
Wavelet decomposition tree
47
6.5
Haar wavelet decomposition to 2 levels
47
6.6
Haar wavelet decomposition to 4 levels
48
6.7
Details of wavelet levels
48
6.8
Details of a wavelet node being compressed with
49
threshold
xi
6.9
The inverse process (decomposition at 3 levels)
49
6.10
Histogram information of a selected node
50
6.11
Applying de-noising
50
6.12
Operation of the Neural Network
51
6.13
Image matching using Neural Network
53
6.14
The Facial Recognition System functionality
54
8.1
An example of the User Console design
60
xii
LIST OF ABBREVIATIONS
2D
-
Two-dimension
3D
-
Three-dimension
AAM
-
Active Appearance Model
ANN (NN)
-
Artificial Neural Network (Neural Network)
CWT
-
Continuous Wavelet Transform
DWT
-
Discreet wavelet Transform
EBGM
-
Elastic Bunch Graph Matching
FERET
-
Face Recognition Technology
FFT
-
Fast Fourier Transform
FPGA
-
Field-Programmable Gate Array
FR
-
Facial (or Face) Recognition
HMM
-
Hidden Markov Model
ICA
-
Independent Component Analysis
ID Card
-
Identity Card
KLT
-
Karhunen-Loeve Transform
LDA
-
Linear Discriminant Analysis
PCA
-
Principal Components Analysis
PIN
-
Personal Identification Number
ROI
-
Range of Interest
xiii
LIST OF SYMBOLS
c(x,y)
-
correlation
Dj
-
Euclidean distance
γ(x,y)
-
correlation coefficient
mj
mean vector of patterns
Nj
number of pattern vectors
ωj
pattern class
xj
unknown pattern vector
xiv
LIST OF APPENDICES
APPENDIX
TITLE
PAGE
A
Sample Of Images Used For The Project
61
B
MATLAB Command Codes For Extracting Edge
62
Detection Of A Coloured (RGB) Image
C
MATLAB Command Codes For Wavelet And
63
Neural Network Facial Verification
D
Proposed User Console MATLAB GUI Programme
65
E
Supplementary Notes
68
CHAPTER 1
INTRODUCTION
1.1
Biometrics for Identification and Verification
Biometrics is an emerging set of pattern-recognition technologies which
accurately and automatically identifies or verifies individuals based upon each
person’s unique physical or behavioural characteristics.
Identification using
biometrics has advantages over traditional methods involving ID Cards (tokens) or
PIN numbers (passwords) in that the person to be identified is required to be
physically present where identification is required and there is no need for
remembering a password or carrying a token. PINs or passwords may be forgotten,
and tokens like passports and driver's licenses may be forged, stolen, or lost.
Biometrics methods work by unobtrusively matching patterns of live
individuals in real-time against enrolled records. Biometric templates cannot be
reverse-engineered to recreate personal information and they cannot be stolen and
used to access personal information. Because of these inherent attributes, biometrics
is an effective means to secure privacy and deter identity theft.
Various biometric traits are being used for real-time recognition, the most
popular being face, iris and fingerprint. Other biometric systems which have found
2
their usefulness are based on retinal scan, voice, signature and hand geometry. By
using them together with existing tokens, passwords and keys, biometric systems are
being deployed to enhance security and reduce fraud.
In designing a practical biometric system, a user must first be enrolled in the
system so that his biometric template can be captured. This template is securely
stored in a central database or a smart card issued to him. The template is retrieved
when an individual needs to be identified. Depending on the context, a biometric
system can operate either in verification (authentication) or identification mode.
1.2
Verification vs. Identification
There are two different ways to recognize a person: verification and
identification. Verification (answers the question “Am I who I claim I am?”)
involves confirming or denying a person's claimed identity. In identification, the
system has to recognize a person (addressing the question “Who am I?”) from a list
of N users in the template database. Identification is a more challenging problem
because it involves 1:N matching compared to 1:1 matching for verification.
1.3
Incentives for Facial Recognition Application in Vehicle Security
Research on automatic face recognition in images has rapidly developed into
several inter-related lines, and this research has both lead to and been driven by a
disparate and expanding set of commercial applications.
The large number of
research activities is evident in the growing number of scientific communications
published on subjects related to face processing and recognition.
3
Anti-theft devices are not foolproof, but they can a deterrent or to slow down
the process. The longer it takes to steal a car, the more attention the thief attracts,
and the more likely the thief will look elsewhere. Anti-theft devices include those
listed below:
•
Fuel Shut Off
This blocks gasoline flow until a hidden switch is
tripped. The vehicle can only be driven a short
distance, until the fuel already in the carburetor is
used up.
•
Kill Switch
The vehicle will not start unless a hidden switch
is activated.
The switch prevents electrical
current from reaching the coil or carburetor.
Check your vehicle warranty before installing a
“kill switch.”
•
Time Delay Switch
The driver must turn the ignition key from “on”
to “start” after a precise, preset interval or the
engine won’t turn over.
•
Armored Ignition
A second tamper proof lock must be operated in
Cutoff
order to start the car. “Hot wiring” (staring a car
without a key) is very difficult with this device,
so it is especially effective against amateurs.
•
Hood Locks
These make it difficult to get to the battery,
engine, or vehicle security system.
•
Time Delay Fuse
Unless a concealed switch is turned off, starting
the vehicle causes a sensitive fuse to burn out,
cutting out power and stopping the motor.
•
Armoured Collar
A metal shield that locks around the steering
column and covers the ignition, the starter rods
and the steering wheel interlock rod.
•
Crook Lock
A long metal bar with a hook on each end to lock
the steering wheel to the brake pedal.
•
Audible Alarm
These alarm systems are positioned in the engine
4
Systems
to set off a buzzer, bell or siren if an attempt is
made to tamper with the hood, bypass the
ignition system, or move the vehicle without
starting the engine.
To illustrate the “evolution” of typical vehicle security system over the recent
years, here is an example of development of such products from a particular brand1
of cars:1995
passive security system (no remote); the system is armed by locking
the doors with or without the key; windows could be open and the
system would arm
1996
remote by coded alarm; unlocks all doors with one push
1997
remote by coded alarm changed to unlock only the driver's door with
one push
1999
keyless remote
2003
remote buttons coloured; a 'chirp' replaces the audible honk
2005
remote fobs and immobilizer keys with remote entry as before
2006
remotes with recessed buttons which are harder to accidentally press
on
2007 remote-start system
Keyless entry is becoming a standard feature in vehicles that have installed
alarm systems. A small battery operated device (fob or “remote”) hangs on the key
chain and features one or more buttons for arming and disarming the alarm. The
button operates the door locks as well. When one approaches the car, a press of the
button will not only disarm the alarm, but unlock the driver's door, making it
unnecessary to use a key. Hence, it allows keyless entry.
In a biometric vehicle security system, the objective is to authenticate a user
being an authorised person to have access to the ignition system. It could be a first
step before ignition could commence or it could be an integrated system for autoignition subsequent to authorisation being cleared.
5
A progression from the now common keyless fob used to open a vehicle,
there is a recent successful commercial implementation of biometric for
authorisation, in the form of fingerprint recognition. This, however, does have its
own weaknesses, such as the one depicted by a report by BBC News2 on 31 March
2005 of a local robbery incident where the owner’s finger was sliced off the end of
his index finger with a machete.
Potential applications of biometrics in vehicle security are for private vehicles
and especially for commercial vehicle fleet, such as rented cars, taxis, transportation
lorries and public buses.
One of the most effective ways to optimise use of vehicles is to allow drivers
to use vehicles from a motor pool. A “fleet management system” is an optimization
tool aimed at making it very easy to manage vehicles in a motor pool. There is little
need to look through paper records to see if someone is eligible to drive, or to check
if he has received the proper training for that vehicle, or if someone’s driver’s license
expired since she last used a vehicle.
Electronic key manufacturers for fleet management companies make
intelligent fobs which automatically record the transaction activity by date and time
both on the key cabinet and on the support software. This electronic key security
makes users accountable for the keys, reducing management risk and improving
efficiency. One such product for commercial fleet vehicles is available from Traka,
Inc.3
Figure 1.1: An intelligent car fob for a fleet management system
Their iFob is inserted into receptor sockets, adjacent to the door or equipment
which, check the permissions on the iFob. If acceptable, the Immobilisor will release
a door magnetic lock or solenoid and the door will open. The iFob will record the
access event as well as the time which it accumulates until returned to the Traka
6
cabinet at the end of the shift, when the events are downloaded. If a user attempts to
use the iFob outside its period of validity, the iFob will no longer activate the
Immobilisor. The iFob contains a chip with a guaranteed unique serial number,
giving every one an individual ID. The special shape of the iFob allows it to
automatically lock into the Traka cabinet and its smooth surface is inherently self
cleaning eliminating problems associated with dust or other contaminations. Where
keys need to be managed, they are attached using special self locking security seals,
so that they cannot be easily detached.
Being physically detached from the user, such a sophisticated device and
system are still subject to loss and misuse. Although each fob is assigned a serial
number and assigned to an individual person, there is no guarantee that another
person will not use it for access to the vehicle.
Because of its many advantages, biometrics is fast being used for physical
access control, computer log-in, welfare disbursement, international border crossing
(e-Passports) and national ID cards, verification of customers during transactions
conducted via telephone and Internet (e-Commerce and e-Banking). In automobiles,
biometrics is being adopted to replace keys for keyless entry and keyless ignition.
Here are some commercially available products for such vehicle access and starting
applications:Product name4
Biometrics method
Identisafe-09
Fingerprint
Retinasafe-18
Eyeball Recognition
Brainsafe-72
Brain fingerprinting
Voicesafe-36
Voice
Think-Start-99
Brain waves
There is much interest in using FR for security systems due to it advantages for the
above listed methods. These will be explained in the next chapter.
Among some advantages of Facial Recognition method for vehicle security
application are:(i)
more convenient, no active part of user; sensed as soon as one is
seated in position (and facing the camera)
7
(ii)
low risk scenario (failure means loss of one vehicle, compared to loss
to company properties & confidential materials, national security and
safety)
(iii)
a “better” alternative to existing methods. (What is the chance of a
thief cutting the owner’s/ authorised persons’ face or head (!) to steal
the vehicle; compare to his finger – as has happened to a driver?)
Some practical questions that need to be answered include:(i)
Is biometric really practical for this application? Even with
fingerprint method, do we not need a key to lock and open our vehicle
doors?
(ii)
Is there a method which is fully foolproof? Hacking/bypassing the
system is undeniable.
8
CHAPTER 2
PROJECT SCOPE
2.1
Project Background
This security access project is aimed at demonstrating advanced facial
recognition techniques that could antiquate, substitute, or otherwise, supplement,
conventional key/key fob vehicle ignition systems, and can be used as an alternative
to, or complement, existing fingerprint biometrics method. A computerised system
equipped with a digital camera can identify the face of a person and determine if the
person is authorized to start the vehicle.
A practical implementation of such a system would consists of a
microcontroller with embedded high-level language software, with a small-lensed
camera as data input and a LCD screen as output mediums. This integrated system
would be able to authorise a user before switching on the vehicle with a key; or
authorise a user and automatically switching on the vehicle without a key.
Whilst facial recognition systems are by now readily available in the market,
the vast majority of them are installed at large open spaces, such as in airport halls.
This project is for an application where the population (of users) is very small and
the area of coverage is confined within the vehicle driver enclosure. Both ends of the
9
system design, i.e. the image capturing by camera, and actuation of vehicle ignition
system, are considered simpler to implement, therefore these will not be dealt with in
detail beyond a brief introduction. The focus of this project is, thus, the development
of an algorithm for this very specific application.
2.2
Overall Objectives
The objectives of this project are:(i)
to select combination of existing facial recognition techniques and/or
mathematical
models;
considering
practicality
and
potential
implementation costs;
(ii)
to develop a control algorithm based on selected informationprocessing model for this specific application; and
(iii)
to simulate the facial recognition software programme for Vehicle
Security access clearance using software tools and incorporated into
the system hardware.
2.3
Scope of Work and Methodology
As the project developed, it was decided that hardware implementation of the
experiment such as using FPGA would encounter many technical constraints, such as
the demanding time to learn new programmes and application to real-time
conditions, as well as physical limitations of using FPGA, such as processing speed
and need for very large memory size for data, for image processing. It was also
decided that many algorithms have been developed by researchers in the area of
image processing and recognition, and developing new methods requires extensive
10
knowledge in various aspects of programming and theory behind their development.
Because of the complexity of developing new or improved algorithm software, it is
observed that to achieve a comprehensive design requires the effort of several people
working together; students and experts in multi-discipline fields.
Because of these constraints, it was decided that in the best interest of
achieving the requirements of this course that the coverage is simplified but to a
workable product, and it is certainly better to utilise the tools available in the market,
i.e. by using MATLAB programme, and exploring the combination of several
techniques. The objective of this project then is for “proof of concept” of FR
algorithms for a very specific application.
This project uses MATLAB Version 7.0.0.19920 (Release 14), Simulink 6.0
and MATLAB Release R2006a (which contains additional and advanced blocksets
not found in the previous MATLAB suite, especially the Video & Image Processing
Blocksets).
11
CHAPTER 3
FACE RECOGNITION
Since the beginning of time, humans have relied on facial recognition (FR) as
a way to establish and verify another person’s identity. FR technology isn’t any
different. Using software, a computer is able to locate human faces in images and
then match overall facial patterns to records stored in a database.
Because a person’s face can be captured by a camera from some distance
away, FR has a clandestine or covert capability (i.e. the subject does not necessarily
know he has been observed). For this reason, FR has been used in projects to
identify card counters or other undesirables in casinos, shoplifters in stores, criminals
and terrorists in urban areas.
3.1
An Overview of Facial Recognition Biometric
FR is the study of algorithms that automatically process images of the face.
FR records the spatial geometry of distinguishing features of the face. Different
vendors use different methods of FR, however, all focus on measures of key features
12
of the face, including its texture. Practical problems include recognizing faces from
still and moving images, analyzing and synthesizing faces, recognizing facial
gestures and emotions, modelling human performance, and encoding faces.
The face is a curved three-dimensional surface, whose image varies with
changes in illumination, pose, hairstyle, facial hair, makeup and age. All faces have
basically the same shape, yet the face of each person is different. The goal of FR
then is to find a representation that can distinguish among faces of different people,
yet at the same time be invariant to changes in the image of each person.
The initial emphasis was on recognizing faces in still images where the
sources of variation were highly controlled. This progressed to detecting faces,
processing video sequences, and recognizing faces under less controlled settings.
3.2
Applications of Facial Recognition
FR’s potentially many applications led to the development of FR algorithms.
Among the applications are law enforcement and security, human/ computer
interfaces, image compressions and coding of facial images and related areas of
facial gesture recognition; and analysis and synthesis of faces.
There are three basic scenarios that FR systems might address,
(i)
identifying an unknown person,
(ii)
verifying a claimed identity of a person, and
(iii)
analysing a face in an image.
The primary interest in law enforcement and security is in identification and security.
A major identification task is searching a database of known individuals for the
identity of an unknown person.
13
A similar application is maintaining the integrity of an identity database,
which could be compromised by,
(i)
a person having two identities, or
(ii)
two people having the same identity.
Both types of errors can result in degradation of recognition performance or in false
accusations being made.
The main application for security is verification. The input to a verification
system is a facial image and a claimed identity of the image; the output is either
acceptance or rejection of the claim.
Potential applications include controlling
access to buildings or equipment, confirming and verifying identities.
3.3
Generic Facial Recognition Algorithm
The four basic phases of FR are:
(i)
Image data pre-processing - reduces unwanted image variation by
aligning the face imagery, equalizing the pixel values, and
normalizing the contrast and brightness.
(ii)
Algorithm training - training creates subspaces into which test images
are subsequently projected and matched (examples include PCA,
PCA+LDA, EBGM and BIC algorithms).
(iii)
Algorithm testing - testing creates a distance matrix for union of all
images to be used either as probe images or gallery images in the
analysis phase.
(iv)
Analysis of results - performs analyses on the distance matrices;
includes computing recognition rates, conducting virtual experiments,
or performing other statistical analysis on the data.
14
Figure 3.1: A typical Facial Recognition process
The process begins by reducing variability of the human face to a set of
numbers.
Using mathematical technique called Principal Components Analysis
(PCA), large group of faces are examined to extract the most efficient building
blocks required to describe them. Any human face can be represented as weighted
sum of these building blocks, known as Eigenfaces. With PCA, essence of a human
face can be reduced to just 256 bytes of information.
Recognition process involves comparing Eigenface weights for two faces
using an algorithm that generates a match score. Different faces will produce a poor
match score; images of the same face will produce a good match score. In one-toone comparison (our Vehicle Access example), Eigenface weights of authorized
personnel are recorded in a central database. When someone steps before a camera,
his/her face is quickly compared to all faces in database to see if it generates a match.
In a one-to-many search, a database is created containing faces of individuals
whose presence would warrant action (e.g. most-wanted criminals, missing persons,
etc.). Cameras, overtly or covertly deployed at strategic locations, capture, in real
time, each face in the field of view and compare it with all records in the database.
15
Figure 3.2: Generic algorithm for software programming
The entire process should take less than a tenth of a second, with a high degree of
accuracy.
3.4
Algorithms Comparisons
Table 3.1 is a list of established image-based FR algorithms, and some
sample papers on each method are referenced. These will help us in designing and
conducting FR experiments which best suit our applications.
16
Table 3.1: Summary List Of Image-Based FR Algorithms5
Principal Component
Derived from Karhunen-Loeve's transformation.
Analysis6
Given an s-dimensional vector representation of
each face in a training set of images, PCA tends to
find a t-dimensional subspace whose basis vectors
correspond to the maximum variance direction in
the original image space. This new subspace is
normally lower dimensional (t<<s). If the image
elements are considered as random variables, the
PCA basis vectors are defined as eigenvectors of
the scatter matrix.
Independent Component
ICA minimizes both second-order and higher-order
Analysis7
dependencies in the input data and attempts to find
the basis along which the data (when projected onto
them) are statistically independent. M.S. Bartlett
provided two architectures of ICA for face
recognition task: Architecture I - statistically
independent basis images, and Architecture II factorial code representation.
Linear Discriminant Analysis8
LDA finds the vectors in the underlying space that
best discriminate among classes. For all samples of
all classes the between-class scatter matrix SB and
the within-class scatter matrix SW are defined. The
goal is to maximize SB while minimizing SW, in
other words, maximize the ratio det|SB|/det|SW|.
This ratio is maximized when the column vectors of
the projection matrix are the eigenvectors of (SW-1
× SB).
Elastic Bunch Graph
All human faces share a similar topological
Matching9
structure. Faces are represented as graphs, with
nodes positioned at fiducial points. (eyes, nose...)
and edges labeled with 2-D distance vectors. Each
node contains a set of 40 complex Gabor wavelet
17
coefficients at different scales and orientations
(phase, amplitude). They are called "jets".
Recognition is based on labelled graphs. A labelled
graph is a set of nodes connected by edges, nodes
are labelled with jets, and edges are labelled with
distances.
Evolutionary Pursuit10
An eigenspace-based adaptive approach that
searches for the best set of projection axes in order
to maximize a fitness function, measuring at the
same time the classification accuracy and
generalization ability of the system. Because the
dimension of the solution space of this problem is
too big, it is solved using a specific kind of genetic
algorithm called EP.
Kernel Methods
11
The face manifold in subspace need not be linear.
Kernel methods are a generalization of linear
methods. Direct non-linear manifold schemes are
explored to learn this non-linear manifold.
Trace Transform12
The Trace transform, a generalization of the Radon
transform, is a new tool for image processing which
can be used for recognizing objects under
transformations, e.g. rotation, translation and
scaling. To produce the Trace transform one
computes a functional along tracing lines of an
image. Different Trace transforms can be produced
from an image using different trace functionals.
Active Appearance Model13
An AAM is an integrated statistical model which
combines a model of shape variation with a model
of the appearance variations in a shape-normalized
frame. An AAM contains a statistical model if the
shape and gray-level appearance of the object of
interest which can generalize to almost any valid
example. Matching to an image involves finding
18
model parameters which minimize the difference
between the image and a synthesized model
example projected into the image.
3D Morphable Model14
Human face is a surface lying in the 3D space
intrinsically. Therefore the 3D model should be
better for representing faces, especially to handle
facial variations, such as pose, illumination, etc.
Volker Blantz proposed a method based on a 3D
morphable face model that encodes shape and
texture in terms of model parameters, and algorithm
that recovers these parameters from a single image
of a face.
3D Face Recognition15
The main novelty of this approach is the ability to
compare surfaces independent of natural
deformations resulting from facial expressions.
First, the range image and the texture of the face are
acquired. Next, the range image is preprocessed by
removing certain parts such as hair, which can
complicate the recognition process. Finally, a
canonical form of the facial surface is computed.
Such a representation is insensitive to head
orientations and facial expressions, thus
significantly simplifying the recognition procedure.
The recognition itself is performed on the canonical
surfaces.
16
Bayesian Framework
A probabilistic similarity measure based on
Bayesian belief that the image intensity differences
are characteristic of typical variations in appearance
of an individual. Two classes of facial image
variations are defined: intrapersonal variations and
extrapersonal variations. Similarity among faces is
measures using Bayesian rule.
Support Vector Machine17
Given a set of points belonging to two classes, a
19
SVM finds the hyperplane that separates the largest
possible fraction of points of the same class on the
same side, while maximizing the distance from
either class to the hyperplane. PCA is first used to
extract features of face images and then
discrimination functions between each pair of
images are learned by SVMs.
Hidden Markov Models18
HMM are a set of statistical models used to
characterize the statistical properties of a signal.
HMM consists of two interrelated processes: (1) an
underlying, unobservable Markov chain with a
finite number of states, a state transition probability
matrix and an initial state probability distribution
and (2) a set of probability density functions
associated with each state.
The aims of an FR programme are:(i)
fast processing time : for immediate access upon demand (at the press of the
“ignition” switch)
(ii)
high degree of accuracy : to ensure reliability; thus public acceptance
(iii)
low-cost: for wide application for various classes of vehicles.
3.5
Basis of Facial Recognition Process – the PCA
The recognition process involves comparing the Eigenface weights for two
faces using a proprietary algorithm that generates a match score. Different faces will
produce a poor match score; images of the same face will produce a good match
score.
20
The process begins by reducing the variability of the human face to a set of
numbers. Using a mathematical technique called Principal Components Analysis
(PCA), one can examine a large group of faces and extract the most efficient building
blocks required to describe them. It turns out that any human face can be represented
as the weighted sum of 128 of these building blocks, known as Eigenfaces, based on
the pioneering works of M. Turk and A. Pentland.19 With this technique, the essence
of a human face can be reduced to just 256 bytes of information. The recognition
process involves comparing the Eigenface weights for two faces using a proprietary
algorithm that generates a match score. Different faces will produce a poor match
score; images of the same face will produce a good match score.
The vehicle authorisation system requires a one-to-one comparison, the
Eigenface weights of authorized personnel are recorded in a central database. When
someone appearts before the camera, his or her face is quickly compared to all of the
faces in the database to see if it generates a match.
3.5.1
Minimum Distance Classifier
If we define the prototype of each pattern class to be the mean vector of the
patterns of that class:
mj =
1
Nj
∑x
x∈ω j
j
j = 1, 2, …, W
(1)
where Nj is the number of pattern vectors from class ωj and the summation is taken
over these vectors.
One way to determine the class membership of an unknown pattern vector x
is to assign it to the class of its closest prototype. Using the Euclidean distance to
determine closeness reduces the problem to computing the distance measures:
Dj(x) = ||x-mj||
j = 1, 2, …, W
(2)
21
where ||a|| = (aTa)½ is the Euclidean norm.
We then assign x to class ωj if Dj(x) is the smallest distance, i.e. the smallest distance
implies the best match is this formulation.
Selecting the smallest distance is
equivalent to evaluating the functions
dj(x) = xT-mj - ½mjTmj
j = 1, 2, …, W
and assigning x to class ωj if dj(x) yields the largest numerical value.
(3)
This
formulation agrees with the concept of a decision function, as defined in Equation
(2).
From Equations (3) and (6), the decision boundary between classes ωi and ωj
for a minimum distance classifier is
dij(x) = di(x) - dj(x)
= xT(mi - mj) -½(mi-mj)T(mi-mj) = 0
(4)
The surface given by Equation (4) is the perpendicular bisector of the line segment
joining mi - mj. For n=2, the perpendicular bisector is a line, for n=3 it is a plane,
and for n>3 it is called a hyperplane.
In practice, the minimum distance classifier works well when the distance
between means is large compared to the spread or randomness of each class with
respect to its mean. The minimum distance classifier yields optimum performance
(in terms of minimizing the average loss of misclassification) when the distribution
of each class about its mean is in the form of spherical ‘hypercloud’ in n-dimensional
pattern space.
The simultaneous occurrence of large mean separations and relatively small
class spread occur seldomly in practice unless the system designer controls the nature
of the input.
22
3.5.2
Matching by Correlation
The correlation of two functions f(x,y) and h(x,y) is defined as
f(x,y)○h(x,y) =
1
MN
M −1 N −1
∑∑ f * (m, n)h( x + m, y + n)
(5)
m=0 n=0
where f* denotes the complex conjugate of f.
We normally deal with real functions (images), in which case f*=f.
The correlation theorem. Let f(u,v) and H(u,v) denote the Fourier transforms
of f(x,y) and h(x,y), respectively. One-half of the correlation theorem states that
spatial correlation, f(x,y)○h(x,y), and the frequency domain product, F*(u,v)H(u,v),
constitute a Fourier transform pair. This result, normally stated as,
f(x,y)○h(x,y) Ù F*(u,v)H(u,v),
(6)
indicates that correlation in the spatial domain can be obtained by taking the inverse
Fourier transform of the product F*(u,v)H(u,v) where F* is the complex conjugate of
F.
An analogous result is that correlation in the frequency domain reduces to
multiplication in the spatial domain; that is,
f(x,y)h(x,y) Ù F*(u,v)○H(u,v).
(7)
These two results comprise the correlation theorem. It is assumed that all functions
have been properly extended by padding.
The principle use of correlation is for matching. In matching, f(x,y) is an
image containing objects or regions. If we want to determine whether f contains a
particular object or region in which we are interested, we let h(x,y) be that object or
region (this image is normally called a template). Then, if there is a match, the
correlation of the two functions will be maximum at the location where h finds a
correspondence in f.
The term cross-correlation often is used in place of the term correlation to clarify that
the images being correlated are different. In autocorrelation, both the images are
identical; we have the autocorrelation theorem,
f(x,y)○f(x,y) Ù |F(u,v)|2.
(8)
23
This result states that the Fourier transform of the spatial autocorrelation is the lower
spectrum. Similarly,
|f(x,y)|2 Ù F(u,v)○F(u,v).
(9)
Image correlation is considered as a basis for finding matches of a subimage
ω(x, y) of size J X K within an image f(x, y) of size M X N, where we assume that
J<M and K<N. Although the correlation approach can be expressed in vector form,
working directly with an image or subimage format is more intuitive. In its simplest
form, the correlation between f(x, y) and ω(x, y) is
c(x, y) =
∑∑ f (s, t )ω ( x + s, y + t )
s
(10)
t
for x=0, 1, 2,….,M-1, y=0,1,2,….,N-1,
and the summation is taken over the image region where ω and f overlap.
The correlation function given in Equation (10) has the disadvantage of being
sensitive to changes in the amplitude of f and ω. For example, doubling all the
values of f doubles the value of c(x,y). An approach frequently used to overcome this
difficulty is to perform matching via the correlation coefficient, which is defined as
γ(x, y) =
∑∑[ f (s, t ) f (s, t )][w( x + s, y + t ) − w ]
[
s
t
{∑∑ f ( s, t ) − f ( s, t )
s
t
where x=0,1,2,…,M-1, y=0,1,2,…,N-1,
] ∑∑ [w( x + s, y + t ) − w ] }
2 12
2
s
(11)
t
is the average value of the pixels in w
(computed only once), is the average value of f in the region coincident with the
current location of w, and the summations are taken over the coordinates common to
both f and w. The correlation coefficient γ(x, y) is scaled in the range -1 to +1,
independent of scale changes in the amplitude of f and w.
Although the correlation function can be normalized for amplitude changes
via the correlation coefficient, obtaining normalization for changes in size and
rotation can be difficult. Normalising for size involves spatial scaling, a process that
in itself adds a significant amount of computation. Normalization for rotation is even
more difficult. If a clue regarding rotation can be extracted from f(x,y), then we
simply rotate w(x,y) so that it aligns itself with the degree of rotation in f(x,y).
24
However, if the nature of rotation is unknown, looking for the best match requires
exhaustive rotations of w(x,y). This procedure is impractical and, as a consequence,
correlation seldom is used in cases when arbitrary or unconstrained rotation is
present.
Correlation can also be carried out in the frequency domain via the FastFourier Transform. If f and w are the same size, this approach can be more efficient
than direct implementation of correlation in the spatial domain. Equation (11) is
used when w is much smaller than f. The correlation coefficient is more difficult to
implement in the frequency domain. It generally is computed directly in the spatial
domain.
3.6
Neural Networks
The approaches discussed above are based on the sample pattern to estimate
statistical parameters of each pattern class. The minimum distance classifier is
specified completely by the mean vector of each class.
Similarly, the Bayes
classifier for Gaussian populations is specified completely by the mean vector and
covariance matrix of each class. The patterns (of known class membership) used to
estimate these parameters usually are called training patterns, and a set of such
patterns from each class is called the training set. The process by which a training
set is used to obtain decision functions is called learning or training.
In the approaches just discussed, training is a simple matter. The training
patterns of each class are used to compute the parameters of the decision function
corresponding to that class. After the parameters in question have been estimated,
the structure of the classifier is fixed, and its eventual performance will depend on
25
how well the actual pattern populations satisfy the underlying statistical assumptions
made in the derivation of the classification method being used.
The statistical properties of the pattern classes in a problem often are
unknown or cannot be estimated. In practice such decision-theoretic problems are
best handled by methods that yield the required decision functions directly via
training. Then, making assumptions regarding the underlying probability density
functions or other probabilistic information about the pattern classes under
consideration is unnecessary. In this section we discuss approaches that meet this
criterion.
Figure 3.3: Basic Form of Neural Network Architecture
Learning machines, called perceptrons, when trained with linearly separable
training sets (i.e. training sets separable by a hyperplane), would converge to a
solution in a finite number of iterative steps.
The solution took the form of
coefficients of hyperplanes capables of correctly separating the classes represented
by patterns of the training sets.
In its most basic form, the perceptron learns a linear decision function that
dichotomizes two linearly separable training sets.
26
CHAPTER 4
IMAGE EXTRACTION
4.1
Setting The Scene
For the purpose of recognising faces, the very first step of the biometric FR
process is the capturing of facial images of individuals.
The objective of this project is not to dwell in-depth into the technicality of
cameras for image capturing or the complexity of image formation and display but to
process image signals for the object (facial) recognition.
Photos used in this project were captured using three types of cameras: a
photo-studio 5.1 megapixel camera, a 1.2 megapixel PC-built-in camera (MacBook)
and a portable generic 0.6 megapixel USB camera (Logitech QuickCam Messenger)
Figure 4.1 shows the framework of digital image processing. There are six
modules in this system:
(i)
File I/O: Read/write image/video files.
(ii)
Frame Grabber: Crab the image from the image capture device.
(iii)
Image Processing Module: The module consists of a buffer for storing
intermediate image data and other functions to further process data.
27
(iv)
Date Visualization: Plot the analyzed results obtained from the image
processing module.
(v)
Bitmap: A data structure to store/display image.
(vi)
Display: Display the data/image sent from previous modules.
Figure 4.1: Framework of Image/Video Processing
4.2
Digital Image Structure
This image below is represented by 76,800 samples, or pixels (picture
elements) arranged in a two-dimensional array of 320 columns and 240 rows.
Figure 4.2: Binary values of pixels
28
The value of each pixel is converted into greyscale, where 0 is black, 255 is white,
and the intermediate values are shades of grey.
Images have their information encoded in the spatial domain, the image
equivalent of time domain.
Features in images are represented by edges, not
sinusoids. This means that the spacing and number of pixels are determined by how
small of features need to be seen, rather than the formal constraints of the sampling
theorem.
Images with few pixels are regarded as having unusually poor resolution;
these images look noticeably unnatural, and the individual pixels can often be seen.
The strongest motivation for using lower resolution images is that there are fewer
pixels to handle.
One of the most difficult problems in image processing is
managing massive amounts of data. It is common for 256 grey levels (quantization
levels) to be used in image processing, corresponding to a single byte per pixel. This
is because (i) a single byte is convenient for data management, (ii) the large number
of pixels in an image compensate to a certain degree for a limited number of
quantization steps, and (iii) a brightness step size of 1/256 (0.39%) is smaller than the
eye can perceive – an image presented to a human observer will not be improved by
using more than 256 levels.
The value of each pixel in the digital image represents a small region in the
continuous image being digitized.
This defines a square sample spacing and
sampling grid. The region of a continuous image that contributes to the pixel value is
called the sampling aperture; the size of the sampling aperture is often related to the
inherent capabilities of the particular imaging system being used. In most cases the
sampling grid is made approximately the same as the sampling aperture of the
system. Resolution in the final digital image will be the limited primary by the larger
of the two, the sampling grid or the sampling aperture.
Colour is added to digital images by using three numbers of each pixel,
representing the intensity of the three primary colours: red, green and blue. Mixing
these three colours generates all possible colours that the human eye can perceive. A
single byte is frequently used to store each of the colour intensity allowing the image
29
to capture a total of 256x256x256 = 16.8 million different colours. Colour is very
important when the goal is to present the viewer with a true picture of the world,
such as in television and still photography. However, this is usually not how images
are used in science and engineering, where the purpose is to analyse a twodimensional signal by using the human visual systems as a tool. For this reason,
black and white images are sufficient for this FR project.
The parameters in optical systems interact in many unexpected ways. For
example, consider how the amount of available light and the sensitivity of the light
sensor affects the sharpness of the acquired image. This is because the iris diameter
and the exposure time are adjusted to transfer the proper amount of light from the
scene being viewed to the image sensor. If more than enough light is available, the
diameter of the iris can be reduced, resulting in a greater depth-of-field (the range of
distance from the camera where an object remains in focus). A greater depth-of-field
provides a sharper image when objects are at various distances. In addition, an
abundance of light allows the exposure time to be reduced, resulting in less blur from
camera shaking and object motion. Optical systems are full of these kinds of tradeoffs.
The dynamic range of an electronic camera is typically 300 to 1000, defined
as the largest signal that can be measured, divided by the inherent noise of the
device. The same camera and lens assembly used in bright sunlight will be useless on
a dark night or in a dark room.
4.3
Image Acquisition
The most common image sensor used in electronic cameras is the charge
coupled device (CCD). CCD image sensors are capable of transforming a light
pattern (image) into an electric charge pattern (an electronic image). The heart of the
CCD is a thin wafer of silicon, typically about 1 cm square. A charge-coupled
30
device (CCD) is a sensor for recording images, consisting of an integrated circuit
containing an array of linked, or coupled, capacitors. Under the control of an external
circuit, each capacitor can transfer its electric charge to one or other of its
neighbours.
The CCD consists of several individual elements that have the capability of
collecting, storing and transporting electrical charge from one element to another.
Each photosensitive element represents a pixel. Structures are made that form lines,
or matrices of pixels. Output amplifiers at the edge of the chip collect the signals
from the CCD. An electronic image can be obtained by - after having exposed the
sensor with a light pattern - applying series of pulses that transfer the charge of one
pixel after another to the output amplifier, line after line. The output amplifier
converts the charge into a voltage. External electronics will transform this output
signal into a form suitable for monitors or frame grabbers. CCDs have extremely
low noise figures.
CCD image sensors can be a colour sensor or a monochrome sensor. In a
colour image sensor an integral RGB colour filter array provides colour responsivity
and separation.
A monochrome image sensor senses only in black and white.
Optical format is used to determine what size lens is necessary for use with the
imager. Optical format refers to the length of the diagonal of the imaging area
include 1/7 inch, 1/6 inch, 1/5 inch, 1/4 inch, 1/3 inch, 1/2 inch, 2/3 inch, 3/4 inch, and 1
inch. The number of pixels and pixel size is important to consider. Horizontal pixels
refer to the number of pixels in a row of the image sensor. Vertical pixels refer to the
number of pixels in a column of the image sensor. The greater the number of pixels,
the better the resolution. For example, VGA resolution is 640x480, this means the
number of horizontal pixels is 640 and the number of vertical pixels is 480.
Important image sensor performance specifications to consider when
searching for CCD image sensors include:
•
spectral response - spectral range (wavelength range) for which the
detector is designed
•
data rate - speed of a data transfer process, normally expressed in
MHz
31
•
quantum efficiency - ratio of photon-generated electrons that the pixel
captures to the photons incident on the pixel area; is wavelength
dependent so the given value for quantum efficiency is generally for
the peak sensitivity wavelength for the CCD
•
dynamic range - logarithmic ratio of well depth to the readout noise in
decibels; the higher the number, the better
•
number of outputs
Recently it has become practical to create an Active Pixel Sensor (APS) using
the CMOS manufacturing process. Since this is the dominant technology for all
chip-making, CMOS image sensors are cheap to make and signal conditioning
circuitry can be incorporated into the same device. The latter advantage helps
mitigate their greater susceptibility to noise, which is still an issue, though a
diminishing one. This is due to the use of low grade amplifiers in each pixel instead
of one high-grade amplifier for the entire array in the CCD. CMOS sensors also
have the advantage of lower power consumption than CCDs. An image is projected
by a lens on the capacitor array, causing each capacitor to accumulate an electric
charge proportional to the light intensity at that location. A one-dimensional array,
used in line-scan cameras, captures a single slice of the image, while a twodimensional array, used in video and still cameras, captures the whole image or a
rectangular portion of it. Once the array has been exposed to the image, a control
circuit causes each capacitor to transfer its contents to its neighbour. The last
capacitor in the array dumps its charge into an amplifier that converts the charge into
a voltage. By repeating this process, the control circuit converts the entire contents
of the array to a varying voltage, which it samples, digitizes and stores in memory.
4.4
Importance Of Facial Positioning
In this project it is necessary that facial images are taken from a plane
perpendicular with the camera. This will enable the fast identification of facial
32
features without the need of a complicated programme. This is not a significant issue
for our vehicle application where the position of the user’s face is almost fixed and it
is natural for the user to face the front where the camera is placed.
Figure 4.3: Camera viewing axis must be perpendicular with the image
Figure 4.4 illustrates the error response of the image acquisition routine
where features could not be extracted due to incorrect position of the face.
Figure 4.4: Facial image at a non-perpendicular angle results in error
In this illustration, this rejection is further exaggerated by a use of a comic picture of
a human, and the pixel size does not match with the rest of the database photos.
What it means is that, for a simplified FR system, it is necessary that the facial image
33
is taken at a particular angle (perpendicular plane) and the face is of consistent size
for comparison of features with other images.
34
CHAPTER 5
IMAGE TRANSFORMATION & PROCESSING
5.1
Grayscale Transformation
Grayscale transformation is a powerful technique for improving the
appearance of images. The idea is to increase the contrast at pixel values of interest,
at the expense of the pixel values we don’t care about. This is done by defining the
relative importance of each of the 0 to 255 possible pixel values.
The more
important the value, the greater its contrast is made in the displayed image.
In this project simulation, coloured images are converted to grayscale images
using the Video & Image Blocksets in MATLAB. The bulk of the conversion,
however, was expedited manually using a standalone freeware software, Batch
Image Processor (BIMP Lite version 1.62)20, written by Matthew Hart. Grayscale
transforms (or “Intensity” in MATLAB) can significantly improve the viewability of
an image. For the purpose of FR analysis, it is preferred to use Grayscale images
rather than Coloured images to reduce the matrix size, thus the amount of data
generated to be operated upon and saved.
21
However, it is noted that there are
researches on FR using skin colour and texture.
35
Histogram equalisation is a way to automate the procedure.
Histogram
equalisation blindly uses the histogram as the contrast weighing curve, eliminating
the need for human judgement, i.e. the output transform is found by integration and
normalisation of the histogram, rather than a manually generated curve. This results
in the greatest contrast being given to those values that have the greatest number of
pixels.
5.2
Image Thresholding
The percentage of the thresholding means the threshold level between the
maximum and minimum intensity of the initial image. Thresholding is a way to get
rid of the effect of noise and to improve the signal-noise ratio. That is, it is a way to
keep the significant information of the image while get rid of the unimportant part
(under the condition that we choose a plausible thresholding level).
5.3
Gaussian Filtering
The basic effects of (2D) Gaussian filter are for smoothing the image and
wiping off the noise. Generally speaking, for a noise-affected image, smoothing it
by Gaussian function is the first thing to do before any other further processing, such
as edge detection. The effectiveness of the Gaussian function is different for
different choosing the standard deviation sigma of the Gaussian filter.
36
5.4
Image Features Extraction - Canny Edge Detector22
The Canny operator was designed to be an optimal edge detector (according
to particular criteria). It takes as input a gray scale image, and produces as output an
image showing the positions of tracked intensity discontinuities.
The Canny operator works in a multi-stage process. First of all the image is
smoothed by Gaussian convolution. Then a simple 2-D first derivative operator is
applied to the smoothed image to highlight regions of the image with high first
spatial derivatives. Edges give rise to ridges in the gradient magnitude image. The
algorithm then tracks along the top of these ridges and sets to zero all pixels that are
not actually on the ridge top so as to give a thin line in the output, a process known
as “non-maximal suppression”. The tracking process exhibits hysteresis controlled
by two thresholds: T1 and T2, with T1 > T2. Tracking can only begin at a point on a
ridge higher than T1. Tracking then continues in both directions out from that point
until the height of the ridge falls below T2. This hysteresis helps to ensure that noisy
edges are not broken up into multiple edge fragments.
The effect of the Canny operator is determined by three parameters; the width
of the Gaussian kernel used in the smoothing phase, and the upper and lower
thresholds used by the tracker. Increasing the width of the Gaussian kernel reduces
the detector's sensitivity to noise, at the expense of losing some of the finer detail in
the image. The localization error in the detected edges also increases slightly as the
Gaussian width is increased. Usually, the upper tracking threshold can be set quite
high, and the lower threshold quite low for good results. Setting the lower threshold
too high will cause noisy edges to break up. Setting the upper threshold too low
increases the number of spurious and undesirable edge fragments appearing in the
output.
One problem with the basic Canny operator is to do with Y-junctions i.e.
places where three ridges meet in the gradient magnitude image. Such junctions can
occur where an edge is partially occluded by another object. The tracker will treat
two of the ridges as a single line segment, and the third one as a line that approaches,
37
but does not quite connect to, that line segment. Most of the major edges are
detected and lots of details have been picked out well. Note that this may be too
much detail for subsequent processing.
The Gaussian smoothing in the Canny edge detector fulfills two purposes:
first, it can be used to control the amount of detail that appears in the edge image and
second, it can be used to suppress noise.
If we scale down the image before the edge detection, we can use the upper
threshold of the edge tracker to remove the weaker edges. All the boundaries of the
objects have been detected whereas all other edges have been removed. Although
the Canny edge detector allows us to find the intensity discontinuities in an image, it
is not guaranteed that these discontinuities correspond to actual edges of the object.
Figure 5.1: Simulation of various edge detection methods
38
Figure 5.1 is a SIMULINK simulation of some of the various built-in edge detection
methods : Sobel, Prewitt and Canny. The Canny method is further simulated with
some threshold values applied. The result of each method is shown below.
(a) original image
(b) Sobel detection
(c) Prewitt detection
(d) horizontally filtered edge detection
(e) vertically filtered edge detection
(f) Canny detector
(g) Canny detector with threshold
Figure 5.2: Result of simulation of various edge detection methods
Notice the similarity between Sobel and Prewitt methods, but Prewitt method seems
slightly blurred.
Applying some threshold to the Canny method (or any other
39
methods) provides a means to fine tune the edge detection for variations in lighting
and contrast. Filtering can also be applied to remove background noise, as illustrated
by the horizontal and vertical elements of edge detection.
For comparison, this is the result of applying Canny edge detection on the
colour version (RGB) of the above photo. The simple MATLAB algorithm to obtain
this manually is shown in Appendix 2.
Figure 5.3: Canny edge detection of an RGB image
It is obvious that the edge features are still intact when we chose to convert the RGB
photos into greyscale. By comparison, the greyscale edge detection gives us better
contract and details than on RGB version of the images.
5.5
Quality of images - Brightness and Contrast Adjustments
An image must have the proper brightness and contrast for easy viewing.
Brightness refers to the overall lightness or darkness of the image. Contrast is the
difference in brightness between objects or regions. When the brightness is too high,
the whitest pixels are saturated destroying the detail in these areas. The reverse
where the brightness is set too low, saturates the blackest pixels.
40
It is very important that the images are captured under adequate lighting
conditions so that the features of the face are obvious. It is also necessary that
images are captured and stored within reasonable pixel sizes so that details are
available for analysis. These conditions are illustrated by the darker image used for
edge detection of features in Figure 5.4. Although applying some threshold values to
the Canny method shows some improvement, the main facial features are not clear,
except for the outline of the face (because the contrast against the background is very
obvious).
Figure 5.4:
Poorly lit environment produces unreliable image data
(various edge detection methods shown here)
Good lighting level is a challenge to achieve in a restrictive application such as
inside the vehicle, particularly so when the camera depends on external light source
(sunlight).
41
CHAPTER 6
SYSTEM DESIGN
6.1
System Architecture
6.1.1
Hardware Architecture
It has been noted in the introduction that the scope of this project is for
simulation of the algorithms, and the physical implementation is not realised. The
physical system design could consist of a camera placed discreetly at the vehicle
dashboard, an interactive console for user interface, and a controller hidden
conveniently, as illustrated Figure 6.1.
42
Figure 6.1: Proposed system equipment layout
6.1.2
Software Architecture
The first stage uses the wavelet decomposition that helps extract intrinsic
features of face images. As a result of this decomposition, we obtain four subimages
(namely approximation, horizontal, vertical, and diagonal detailed images). The
second stage of the approach concerns the application of classification to these four
decompositions. The choice is motivated by its insensitivity to large variation in light
direction, face pose, and facial expression. The last phase is concerned with the
aggregation of the individual classifiers by means of the fuzzy integral.
43
Figure 6.2: Facial Recognition System Architecture
An image defined in the “real world” is considered to be a function of two
real variables, for example, a(x,y) with a as the amplitude (e.g. brightness) of the
image at the real coordinate position (x,y). An image may be considered to contain
sub-images sometimes referred to as regions–of–interest (ROIs, or simply regions).
This concept reflects the fact that images frequently contain collections of objects
each of which can be the basis for a region. In a sophisticated image processing
system it should be possible to apply specific image processing operations to selected
regions. Thus one part of an image (region) might be processed to suppress motion
blur while another part might be processed to improve colour rendition.
The amplitudes of a given image will almost always be either real numbers or
integer numbers. The latter is usually a result of a quantization process that converts
a continuous range (say, between 0 and 100%) to a discrete number of levels.
A digital image a[m,n] described in a 2D discrete space is derived from an
analogue image a(x,y) in a 2D continuous space through a sampling process that is
frequently referred to as digitization. The 2D continuous image a(x,y) is divided into
N rows and M columns. The intersection of a row and a column is termed a pixel.
The value assigned to the integer coordinates [m,n] with {m=0,1,2,…,M–1} and
{n=0,1,2,…,N–1} is a[m,n]. In fact, in most cases a(x,y) - which we might consider
to be the physical signal that impinges on the face of a 2D sensor - is actually a
44
function of many variables including depth (z), color (λ), and time (t). Unless
otherwise stated, we will consider the case of 2D, monochromatic, static images.
Wavelet transforms are used to reduce image information redundancy
because only a subset of the transform coefficients are necessary to preserve the most
important facial features such as hair outline, eyes and mouth. When Wavelet
coefficients are fed into a backpropagation neural network for classification, a high
recognition rate can be achieved by using a very small proportion of transform
coefficients. This makes Wavelet-based face recognition much more accurate than
other approaches.
6.2
Wavelet Packet Analysis For Face Recognition
The MATLAB Wavelet Toolbox is used to illustrate how to perform signal or
image analysis. The proposed scheme is based on the analysis of a wavelet packet
decomposition of the face images for recognition of frontal views of human faces
under roughly constant illumination. Each face image is first located and then,
described by a subset of band filtered images containing wavelet coefficients. From
these wavelet coefficients, which characterize the face texture, we build compact and
meaningful feature vectors, using simple statistical measures.
Wavelet transformations are a method of representing signals across space
and frequency. The signal is divided across several layers of division in space and
frequency and then analyzed. The goal is to determine which space/frequency bands
contain the most information about an image’s unique features, both the parts that
define an image as a particular type, i.e. face, and those parts which aid in
classification between different images of the same type.
45
One type of discrete wavelet transform (DWT) is the orthogonal DWT. The
orthogonal DWT projects an image onto a set of orthogonal column vectors to break
the image down into coarse and fine features.
Since MATLAB stores most numbers in double precision, even a single
image takes up a lot of memory. For instance, one copy of a 512-by-512 image uses
2 MB of memory. To avoid Out-of-Memory errors, it is important to allocate
enough memory to process various image sizes; in real RAM or can be a
combination of RAM and virtual memory.
The general Linear Discrete Image Transform*
F = PfQ
(12)
can be rewritten as
M −1 N −1
F (u, v) = ∑∑ P(u , m) f (m, n)Q(n, v)
(13)
m =0 n =0
u = 0,1,...., M − 1; v = 0,1,..., N − 1
If P and Q are non-singular (non-zero determinants), inverse matrices exist and
f = P −1FQ −1
(14)
If P and Q are both symmetric (M=MT), real, and orthogonal (MTM = I), then
F = PfQ,
f = PFQ
and the transform is an orthogonal transform.
*
Commonly known theory from various sources
(15)
46
6.3
Multilevel Wavelet Decomposition
In the same way as Fourier analysis, wavelets are derived from a basis
function called the Mother function or analyzing wavelet. The simplest Mother
function is the Haar Mother function shown below.
Figure 6.3: The Haar mother function and Φ10, Φ01, Φ21
Multilevel wavelet decomposition is an iterative process, namely multiresolutional decomposition. At each iteration a lower frequency set of transformed
data coefficients generated by a prior iteration is again refined to produce a substitute
set of transformed data coefficients including a lower spatial frequency group and a
higher spatial frequency group, called subbands.
The decomposition process is iterated with successive approximations being
decomposed in turn, so that one signal is broken down into many lower resolution
components. This is called the “wavelet decomposition tree.”
Figure 6.4: Wavelet decomposition tree
Looking at a signal's wavelet decomposition tree can yield valuable information.
47
Figure 6.5: Haar wavelet decomposition to 2 levels
Figure 6.6: Haar wavelet decomposition to 4 levels
48
Figure 6.7: Details of wavelet levels
The coefficients in the upper left corner are related to a low resolution image while
the other panels correspond to high resolution features.
49
Figure 6.8: Details of a wavelet node being compressed with threshold
Figure 6.9: The inverse process (decomposition at Level 3)
This Inverse Discreet Wavelet Transformation illustrates the re-composition of the
image after filtering and de-noising.
50
Figure 6.10: Histogram information of a selected node
The Histogram information as shown in Figure 6.10 is useful for FR methods using
histogram matching, but it is not applicable for this project.
Figure 6.11: Applying de-noising
Wavelets are often used for data compression and image noise suppression.
There are many different classifiers out there that have proved to be very
effective in classifying faces. Advanced correlation filters can offer a very good
matching performance in the presence of variability such as facial expression and
illumination changes. The main idea is to synthesize a filter using a set of training
images that would produce correlation output that reduces the correlation values at
51
locations other than the origin and this value at the origin is constrained to a specific
peak value. When the filter is correlated with a test image that is authentic, the filter
will exhibit sharp correlation peaks in the correlation plane. Otherwise the filter will
output small correlation values.
6.4
Face Matching (ANN Of Wavelets)
Figure 6.12: Operation of the Neural Network
The Neural Network composes of simple elements operating in parallel. The
network function is determined largely by the connections between elements. The
neural network is trained to perform the features-matching function by adjusting the
values of the connections or weights between elements.
After the neural network structure is set, the most important thing is to
prepare the training examples. Training and learning functions are mathematical
procedures used to automatically adjust the network's weights and biases. The
training function dictates a global algorithm that affects all the weights and biases of
a given network. The learning function can be applied to individual weights and
biases within a network.
52
In the beginning of the training, we select a number of face images from each
person that are well aligned frontal view. Any of them can represent their host
clearly. All the faces here are extracted or cut by the face detection code. These
faces will be used as positive examples for their own networks and negative
examples for other networks. Here we only deal with images which assume that they
are always faces. The database is not supposed to handle non-face images because in
our situation it is unnecessary and it will make network training very difficult.
After the basic neural networks are created, we run them over new faces from
the individuals in our database. If the image fails to pass the face detection test, it
will be ignored. If the face detection code reports a face in the image, it will be
applied to the face recognition code. We check the recognition result to find more
faces for training.
Once we get these new faces, we add them to training examples and retrain the
neural networks.
Recognition errors need to be corrected and the total performance will be
improved. While adding some examples from a specific individual will improve the
performance of his own network, it will also influence the performance of other
networks.
Detailed facial recognition programme using wavelets and neural network
methods is listed in Appendix 3. In order to make the training of neural network
easier, one neural net is created for each person. Each neural net identifies whether
the input face is the network's class or not. The recognition algorithm selects the
network with the maximum output. If the output of the selected network passes a
predefined threshold, it will be reported as the class of the input face. Otherwise the
input face will be rejected. This is illustrated in Figure 6.13.
53
Figure 6.13: Image matching using Neural Network
The MATLAB programme for a comprehensive facial recognition algorithm is
customised from sourcecode obtained from Luigi Rosa, using multi-level wavelet
transform and neural network iterations, as described in Appendix 3. The adapted
code was run with MATLAB 7.0 R14 SP1 and MATLAB R2006a software.
The user-friendly GUI starts with firstly for the custodian to build up the
database of images. An image is selected from file and saved into the database in
groups or classes; each class consists of image of the same person. The face ID class
is a positive integer entered progressively for each authorised person. To achieve
higher reliability for good results, it is necessary to have several images per person.
When a test image is compared with those in the database, the facial
recognition gives as result the ID of nearest person present in database.
Face
Recognition is carried out by neural network training by 500,000 epochs. The
overall time taken on a high speed personal computer is about one minute on
Pentium4, 2GHz PC (but 15 minutes on Crusoe 780MHz subnotebook PC).
54
Other user features available from the GUI are database summary and deletion of
database.
Figure 6.14: The Facial Recognition System functionality
55
CHAPTER 7
CONCLUSION
We have discussed the basic elements of biometrics and how wavelet
transformation correlation filtering may be used to classify images within a biometric
system. We explored the advantage of using wavelet packet decomposition for
image classification. The results show that face images have adequate features that
can be extracted using wavelet decomposition, and the matching for image
verification is achieved using backpropogation neural networks.
Transform coding relies on the premise that pixels in an image exhibit a
certain level of correlation with their neighboring pixels. These correlations can be
exploited to predict the value of a pixel from its respective neighbors.
A
transformation is, therefore, defined to map this spatial (correlated) data into
transformed (uncorrelated) coefficients. Clearly, the transformation should utilize
the fact that the information content of an individual pixel is relatively small i.e., to a
large extent visual contribution of a pixel can be predicted using its neighbors.
The objective of this project is to illustrate the efficacy of multilevel wavelet
transformation on images for face verification. The transform helps separate the
image into parts (or spectral sub-bands) of differing importance (with respect to the
image's visual quality). The Discreet Wavelet Transform is similar to the Discrete
56
Fourier Transform: it transforms a signal or image from the spatial domain to the
frequency domain. Efficacy of a transformation scheme can be directly gauged by
its ability to pack input data into as few coefficients as possible. This allows the
quantizer to discard coefficients with relatively small amplitudes without introducing
visual distortion in the reconstructed image. DWT is known to exhibit excellent
energy compaction for highly correlated images.
The PCA, or Karhunen-Loeve Transform (KLT), method, which was used as
the fundamental FR method, is a linear transform where the basis functions are taken
from the statistical properties of the image data, and can thus be adaptive. It is
optimal in the sense of energy compaction, i.e. it places as much energy as possible
in as few coefficients as possible. However, the KLT transformation kernel is
generally not separable, and thus the full matrix multiplication must be performed.
KLT is data dependent and, therefore, without a fast (FFT-like) pre-computation
transform. Derivation of the respective basis for each image sub-block requires
unreasonable computational resources. Although, some fast KLT algorithms exist,
nevertheless the overall complexity of KLT is significantly higher than the DWT
algorithm.
Very soon it would seem sort of old fashioned to think about putting a key
into the door of your car. More cars are starting to show up with keyless ignition
systems as well, so that all you need is the fob in your pocket and you can start the
car up automatically.
Some home lock makers are starting to move in that direction as well, working on
systems for keyless home locks.
57
CHAPTER 8
FUTURE WORK
Being an interesting subject which has a future prospect for academic
exploration and probably potential commercial implementation, this vehicle security
facial recognition project has much room for improvement. Some of them are
described here to suggest future work.
8.1
Practical software/hardware implementation
While this project was executed using MATLAB, it is perhaps more practical
to implement the software in C++ programming so that the software could be
embedded or programmed into a working DSP microcontroller.
This requires
conversion of MATLAB codes or totally re-written codes.
The hardware may also be in the form of FPGA modules; however, the
limitations of such processors and their associated software must be considered,
especially so when handling large volumes of image data for processing.
58
8.2
Improving Image Quality
Lighting levels can significantly affecting results, because a poorly lit image
may not contain sufficient features data as shown by comparison of edge detection.
A blurred or faint image would mean important facial features are difficult to
differentiate from the background so much so that the image is considered noisy.
The majority of face recognition algorithms appear to be sensitive to
variations in illumination, such as those caused by the change in sunlight intensities
throughout the day. In the majority of algorithms evaluated under FERET, changing
the illumination resulted in a significant performance drop. For some algorithms,
this drop was equivalent to comparing images taken over the course of a year and a
half apart.
Illumination is a challenging area in design an image extraction system within
a covered space, such as our vehicle application. One possible solution is to include
a flash to the camera which will automatically be triggered if the lighting level is
below an acceptable level (which should be experimented). Alternatively, it may be
possible to use a camera which captures images based on infrared energy generated
by body heat (this is a totally new area that can be explored).
Changing facial position can also have an effect on performance. A 15degree difference in position between the query image and the database image will
adversely affect performance. At a difference of 45 degrees, recognition becomes
ineffective. While this is less of a problem for vehicle user, who would naturally
enter the vehicle and seat in the “right” position, there should be a method to ensure
the image is captured for correct pose. This could be in the form of a flashing light
to attract attention or a visual console prompting the user to glance in the camera
direction.
59
Many face verification applications make it mandatory to acquire images
with the same camera.
8.3
Robustness of algorithms
Although we can choose from several general strategies for evaluating
biometric systems, each type of biometric has its own unique properties.
This
uniqueness means that each biometric must be addressed individually when
interpreting test results and selecting an appropriate biometric for a particular
application. In the 1990s, automatic-face-recognition technology moved from the
laboratory to the commercial world largely because of the rapid development of the
technology, and now many applications use face recognition. The software must be
tested to ensure high reliability.
As an example for facial verification performance, the FERET23 tests were
technology evaluations of emerging approaches to face recognition. Research groups
were given a set of facial images to develop and improve their systems. The FERET
evaluation measured performance for both identification and verification, and
provided performance statistics for different image categories. There are still areas
which require further research, though progress has been made in these areas since
March 1997.
8.4
Combination Of Algorithms
This is a research area which is continually being endeavoured in an attempt
to achieve algorithms which have high level of efficiency, reliability and robustness.
60
Various new facial identification, recognition and verification techniques are being
introduced, such as using 3D, 2.5D, facial skin colour and contour, etc., and the
combinations of them.
What this means is that Facial Recognition is still not
perfected.
8.5
User-friendly Console Design
The following figure is an example of a console design, using MATLAB
“GUIDE”, from which the user can initiate the authorization process. In this design
the user requests for verification, his picture will appear on the screen and the result
of verification process will be shown. From there he will decide to start the vehicle
or choose to not proceed (as in the case of failure of the system to recognize). This
console design is for an implemented system; the process of inputting images of
known users in the database takes a different route which can be developed to be
user-friendly as well, but requires a high level of security to maintain reliability and
robustness of the database.
Figure 8.1 An example of the User Console design
61
APPENDIX A
Sample Of Images Used For The Project
Shown are the RGB version and their corresponding greyscale
62
APPENDIX B
MATLAB Command Codes For Extracting Edge Detection
Of A Coloured (RGB) Image
function R=edgecolor(nm);
img=imread('Alfred.jpg');
[x y z]=size(img);
if z==1
rslt=edge(img,'canny');
elseif z==3
img1=rgb2ycbcr(img);
dx1=edge(img1(:,:,1),'canny');
dx1=(dx1*255);
img2(:,:,1)=dx1;
img2(:,:,2)=img1(:,:,2);
img2(:,:,3)=img1(:,:,3);
rslt=ycbcr2rgb(uint8(img2));
end
imshow (rslt)
63
APPENDIX C
MATLAB COMMAND CODES PROGRAMMING FOR WAVELET AND
NEURAL NETWORK FACIAL VERIFICATION
The main MATLAB programme software used for this project was an adaptation
from source codes developed by Luigi Rosa24, but will not be re-produced in this
report. Illustrated below are part of algorithms to execute the Wavelet
decomposition and NN learning algorithms.
function [out] = findfeatures(entry)
global wavelength
[C,S]
= wavedec2(double(entry),3,'coif1');
dimension = S(1,:);
dimx
= dimension(1);
dimy
= dimension(2);
v
= C(1:dimx*dimy);
wavelength
= length(v);
out{1}
= v(:);
%
function [out] = ann_face_matching(features)
global wavelength
load('face_database.dat','-mat');
P = zeros(wavelength,features_size);
T = zeros(max_class-1,features_size);
for ii=1:features_size
v
= features_data{ii,1};
v
= v{1};
v
= v(:);
P(:,ii)
= v;
pos
=
features_data{ii,2};
64
for jj=1:(max_class-1)
if jj==pos
T(jj,ii) = 1;
else
T(jj,ii) = -1;
end
end
end
input_vector = features{1};
%Normalization
for ii=1:wavelength
v = P(ii,:);
v = v(:);
bii = max([v;1]);
aii = min([v;-1]);
P(ii,:) = 2*(P(ii,:)-aii)/(bii-aii)-1;
input_vector(ii) = 2*(input_vector(ii)-aii)/(bii-aii)-1;
end
[net]
= createnn(P,T);
[valmax,posmax] = max(sim(net,input_vector));
out
= posmax;
%------------------------------------------------------------function [net] = createnn(P,T)
alphabet = P;
targets
[R,Q]
= T;
= size(inputs);
[S2,Q] = size(targets);
S1
= 100;
%Train
net = newff(minmax(inputs),[S1 S2],{'tansig' 'tansig'},'traingda');
net.LW{2,1}
= net.LW{2,1}*0.01;
net.b{2}
= net.b{2}*0.01;
net.performFcn
= 'mse';
net.trainParam.goal
= 0.000000001;
net.trainParam.show
= Inf;
net.trainParam.epochs = 500000;
net.trainParam.mc
= 0.95;
P
= inputs;
T
= targets;
[net,tr]
= train(net,P,T);
%-------------------------------------------------------------
65
APPENDIX D
PROPOSED USER CONSOLE MATLAB GUI PROGRAMME
Note: The programme is generated by MATLAB from GUI created using GUIDE.
function varargout = exit(varargin)
% Begin initialization code
gui_Singleton = 1;
gui_State = struct('gui_Name',
'gui_Singleton',
mfilename, ...
gui_Singleton, ...
'gui_OpeningFcn', @exit_OpeningFcn, ...
'gui_OutputFcn',
@exit_OutputFcn, ...
'gui_LayoutFcn',
[] , ...
'gui_Callback',
[]);
if nargin && ischar(varargin{1})
gui_State.gui_Callback = str2func(varargin{1});
end
if nargout
[varargout{1:nargout}] = gui_mainfcn(gui_State,
varargin{:});
else
gui_mainfcn(gui_State, varargin{:});
end
% End initialization code
66
% --- Executes just before exit is made visible.
function exit_OpeningFcn(hObject, eventdata, handles,
varargin)
%
% Choose default command line output for exit
handles.banner = imread('C:\MATLAB7\work\Alfred.jpg')
axes(handles.axes1);
image(handles.banner)
handles.output = hObject;
% Update handles structure
guidata(hObject, handles);
% --- Outputs from this function are returned to the command
% line.
function varargout = exit_OutputFcn(hObject, eventdata,
handles)
% Get default command line output from handles structure
varargout{1} = handles.output;
% --- Executes on button press in pushbutton1.
function pushbutton1_Callback(hObject, eventdata, handles)
% --- Executes on button press in radiobutton1.
function radiobutton1_Callback(hObject, eventdata, handles)
% --- Executes on button press in radiobutton2.
function radiobutton2_Callback(hObject, eventdata, handles)
% --- Executes on button press in radiobutton3.
function radiobutton3_Callback(hObject, eventdata, handles)
% --- Executes on selection change in listbox1.
function listbox1_Callback(hObject, eventdata, handles)
%
contents{get(hObject,'Value')} returns selected item
from listbox1
% --- Executes during object creation, after setting all
properties.
function listbox1_CreateFcn(hObject, eventdata, handles)
if ispc
set(hObject,'BackgroundColor','white');
else
67
set(hObject,'BackgroundColor',get(0,'defaultUicontrolBackgroun
dColor'));
end
function edit1_Callback(hObject, eventdata, handles)
% hObject
%
handle to edit1 (see GCBO)
str2double(get(hObject,'String')) returns contents of
edit1 as a double
% --- Executes during object creation, after setting all
properties.
function edit1_CreateFcn(hObject, eventdata, handles)
% hObject
handle to edit1 (see GCBO)
if ispc
set(hObject,'BackgroundColor','white');
else
set(hObject,'BackgroundColor',get(0,'defaultUicontrolBackgroun
dColor'));
end
68
APPENDIX E
SUPPLEMENTARY NOTES
E.1
Multi-level Wavelet Transform Decomposition
Mathematical transformations are applied to signals to obtain a further
information from that signal that is not readily available in the raw signal. Several
types of transformations can be applied, among which the Fourier transforms (FT)
are probably by far the most popular. Signals in practice, are Time Domain signals
in their raw format. In many cases, the most distinguished information is hidden in
the frequency content of the signal. The frequency Spectrum of a signal is basically
the frequency components (spectral components) of that signal. The frequency
spectrum of a signal shows what frequencies exist in the signal. We can find the
frequency content of a signal using FT. If the FT of a signal in time domain is taken,
the frequency-amplitude representation of that signal is obtained.
The wavelet transform (WT), constitute only a small portion of a huge list of
transforms that are available at our disposal. Every transformation technique has its
own area of application, with advantages and disadvantages.
FT and WT are reversible transforms, that is, they allow to go back and forward
between the raw and processed (transformed) signals. However, only either of them
is available at any given time. FT gives the frequency information of the signal,
69
which means that it tells us how much of each frequency exists in the signal, but it
does not tell us when in time these frequency components exist. This information is
not required when the signal is so-called stationary . FT can be used for nonstationary signals, if we are only interested in what spectral components exist in the
signal, but not interested where these occur. However, if this information is needed,
i.e., if we want to know, what spectral component occur at what time (interval), then
Fourier transform is not the right transform to use. The WT is a transform of this
type which provides the time-frequency representation. WT is capable of providing
the time and frequency information simultaneously, hence giving a time-frequency
representation of the signal. The WT was developed as an alternative to the short
time Fourier Transform STFT to overcome the resolution problem.. In STFT, the
signal is divided into small enough segments, where these segments (portions) of the
signal can be assumed to be stationary. For this purpose, a window function "w" is
chosen. The width of this window must be equal to the segment of the signal where
its stationarity is valid. The definition of the STFT is given by
STFTxω (t , f ) = ∫ [ x(t ) • ω ∗ (t − t ' )] • e − j 2πft dt
t
The problem with the STFT has something to do with the width of the window
function that is used; known as the support of the window. If the window function is
narrow, than it is known as compactly supported. What gives the perfect frequency
resolution in the FT is the fact that the window used in the FT is its kernel, the
function, which lasts at all times from minus infinity to plus infinity. In STFT, the
window is of finite length, thus it covers only a portion of the signal, which causes
the frequency resolution to get poorer. In FT, the kernel function, allows us to obtain
perfect frequency resolution, because the kernel itself is a window of infinite length.
In STFT is window is of finite length, and we no longer have perfect frequency
resolution.
There are two main differences between the STFT and the continuous wavelet
transform (CWT):
1.
The Fourier transforms of the windowed signals are not taken, and
therefore single peak will be seen corresponding to a sinusoid, i.e.,
negative frequencies are not computed.
70
2.
The width of the window is changed as the transform is computed for
every single spectral component, which is probably the most significant
characteristic of the wavelet transform.
The CWT is defined as,
CWTxψ (τ , s ) = Ψxψ (τ , s ) =
1
∫ x(t )ψ
s
∗
⎛ t −τ ⎞
⎜
⎟dt
⎝ s ⎠
where,
ψ τ ,s =
1 ⎛ t −τ ⎞
ψ⎜
⎟
s ⎝ s ⎠
This definition of the CWT shows that the wavelet analysis is a measure of similarity
between the basis functions (wavelets) and the signal itself, in the sense of similar
frequency content.
The transformed signal is a function of two variables, τ and s , the translation
and scale parameters, respectively. χ(t) is the transforming function, and it is called
the mother wavelet . The term mother wavelet gets its name due to two important
properties of the wavelet analysis.
The term wavelet means a small wave . The smallness refers to the condition
that this (window) function is of finite length (compactly supported). The term
mother implies that the functions with different region of support that are used in the
transformation process are derived from one main function, or the mother wavelet.
In other words, the mother wavelet is a prototype for generating the other window
functions. The term translation is used in the same sense as it was used in the STFT;
it is related to the location of the window, as the window is shifted through the
signal. This term corresponds to time information in the transform domain.
In decomposition operation we split up the signal in to two parts by passing
the signal from a highpass and a lowpass filter (filters should satisfy some certain
conditions, so-called admissibility condition) which results in two different versions
of the same signal. We continue like this until we have decomposed the signal to a
71
pre-defined certain level. Then we have a bunch of signals, which actually represent
the same signal, but all corresponding to different frequency bands.
The discrete wavelet transform (DWT) series is simply a sampled version of
the CWT, and the information it provides is highly redundant as far as the
reconstruction of the signal is concerned. This redundancy requires a significant
amount of computation time and resources. The discrete wavelet transform (DWT)
provides sufficient information both for analysis and synthesis of the original signal,
with a significant reduction in the computation time.
A time-scale representation of a digital signal is obtained using digital
filtering techniques. Filters of different cutoff frequencies are used to analyze the
signal at different scales. The signal is passed through a series of high pass filters to
analyze the high frequencies, and it is passed through a series of low pass filters to
analyze the low frequencies. The resolution of the signal, which is a measure of the
amount of detail information in the signal, is changed by the filtering operations, and
the scale is changed by upsampling and downsampling (subsampling) operations.
Subsampling a signal corresponds to reducing the sampling rate, or removing some of
the samples of the signal.
Filtering a signal corresponds to the mathematical
operation of convolution of the signal with the impulse response of the filter. The
convolution operation in discrete time is defined as,
x[n] ∗ h[n] =
∞
∑ x[k ] • h[n − k ]
k = −∞
A half band lowpass filter removes all frequencies that are above half of the
highest frequency in the signal. After passing the signal through a half band lowpass
filter, half of the samples can be eliminated according to the Nyquist’s rule, since the
signal now has a highest frequency of π/2 radians instead of π radians. Simply
discarding every other sample will subsample the signal by two, and the signal will
then have half the number of points. The scale of the signal is now doubled.
Resolution is related to the amount of information in the signal, and therefore, it is
affected by the filtering operations.
72
This constitutes one level of decomposition and can mathematically be
expressed as follows:
y high [k ] = ∑ x[n] • g[2k − n]
n
ylow [k ] = ∑ x[n] • k[2k − n]
n
Figure E.1 The Subband Coding Algorithm
where yhigh[k] and ylow[k] are the outputs of the highpass and lowpass filters,
respectively, after subsampling by 2. This decomposition halves the time resolution
since only half the number of samples now characterizes the entire signal. The
bandwidth of the signal at every level is marked on the figure as "f".
For a given image, we compute the DWT of, say each row, and discard all
values in the DWT that are less then a certain threshold. We then save only those
DWT coefficients that are above the threshold for each row, and when we need to
reconstruct the original image, we simply pad each row with as many zeros as the
number of discarded coefficients, and use the inverse DWT to reconstruct each row
of the original image. We can also analyze the image at different frequency bands,
and reconstruct the original image by using only the coefficients that are of a
particular band. Examples are shown in the figures below.
E.2
Wavelet Toolbox25 Algorithms
Given a signal s of length N, the DWT consists of log2N stages at most.
Starting from s, the first step produces two sets of coefficients: approximation
coefficients cA1, and detail coefficients cD1. These vectors are obtained by
73
convolving s with the low-pass filter Lo_D for approximation, and with the high-pass
filter Hi_D for detail, followed by dyadic decimation.
More precisely, the first step is,
The length of each filter is equal to 2n. If N= length (s), the signals F and G are
of length N+ 2n - 1, and then the coefficients cA1 and cD1 are of length
The next step splits the approximation coefficients cA1 in two parts using the
same scheme, replacing s by cA1 and producing cA2 and cD2, and so on.
Two-Dimensional DWT
Decomposition Step
74
Initialisation Step:
cA0 = s for the decomposition initialisation.
75
Two-Dimensional Inverse-DWT (IDWT)
Reconstruction Step
For J = 2, the two-dimensional wavelet tree has the following form.
For biorthogonal wavelets, the same algorithms hold but the decomposition
filters on one hand and the reconstruction filters on the other hand are obtained from
two distinct scaling functions associated with two multiresolution analyses in duality.
In this case, the filters for decomposition and reconstruction are, in general, of
different odd lengths. This situation occurs, for example, for "splines" biorthogonal
wavelets used in the toolbox. By zero-padding, the four filters can be extended in
such a way that they will have the same even length.
76
REFERENCES
1
http://www.cars101.com/subaru/keyless.HTML, How to use your Subaru keyless
entry, security and immobilizer key systems
2
http://news.bbc.co.uk/go/pr/fr/-/2/hi/asia-pacific/4396831.stm
3
http://www.traka.com/products/iFob.asp?gfx=1
4
http://www.automobile-security.com/products.htm
5
http://www.face-rec.org/algorithms/Comparisons/
6
M.A. Turk, A.P. Pentland, Face Recognition Using Eigenfaces, Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 3-6 June 1991,
Maui, Hawaii, USA, pp. 586-591
7
M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face Recognition by Independent
Component Analysis, IEEE Trans. on Neural Networks, Vol. 13, No. 6, November
2002, pp. 1450-1464
8
J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, Face Recognition Using LDA-
Based Algorithms, IEEE Trans. on Neural Networks, Vol. 14, No. 1, January 2003,
pp. 195-200
9
L. Wiskott, J.-M. Fellous, N. Krueuger, C. von der Malsburg, Face Recognition by
Elastic Bunch Graph Matching, IEEE Trans. on Pattern Analysis and Machine
Intelligence, Vol. 19, No. 7, 1997, pp. 776-779
10
C. Liu, H. Wechsler, Evolutionary Pursuit and Its Application to Face
Recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 22,
No. 6, June 2000, pp. 570-582
11
M.-H. Yang, T. Diederich, S. Becker, Z. Ghahramani, Eds., Face Recognition
Using Kernel Methods, Advances in Neural Information Processing Systems, 2002,
vol. 14
77
12
S. Srisuk, M. Petrou, W. Kurutach and A. Kadyrov, Face Authentication using the
Trace Transform, Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR'03), 16-22 June 2003, Madison,
Wisconsin, USA, pp. 305-312
13
T.F. Cootes, K. Walker, C.J. Taylor, View-Based Active Appearance Models,
Proc. of the IEEE International Conference on Automatic Face and Gesture
Recognition, 26-30 March 2000, Grenoble, France, pp. 227-232
14
V. Blanz, T. Vetter, Face Recognition Based on Fitting a 3D Morphable Model,
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 9,
September 2003, pp. 1063-1074
15
A. Bronstein, M. Bronstein, R. Kimmel, and A. Spira. 3D face recognition
without facial surface reconstruction, Proceedings of ECCV 2004, Prague, Czech
Republic, May 11-14, 2004
16
B. Moghaddam, T. Jebara, A. Pentland, Bayesian Face Recognition, Pattern
Recognition, Vol. 33, Issue 11, November 2000, pp. 1771-1782
17
G. Guo, S.Z. Li, K. Chan, Face Recognition by Support Vector Machines, Proc.
of the IEEE International Conference on Automatic Face and Gesture Recognition,
26-30 March 2000, Grenoble, France, pp. 196-201
18
A.V. Nefian, M.H. Hayes III, Hidden Markov Models for Face Recognition, Proc.
of the IEEE International Conference on Acoustics, Speech, and Signal Processing,
ICASSP'98, 12-15 May 1998, Seattle, Washington, USA, pp. 2721-2724
19
M. Turk and A. Pentland, Eigenfaces for recognition, Journal of Cognitive
Neuroscience, 3 (1), 1991a. URL http://www.cs.ucsb.edu/ mturk/Papers/jcn.pdf.
20
21
http://www.imagemagick.org/
A. Pietrowcew, Face detection in colour images using fuzzy Hough transform,
Opto-Electronics Review 11(3), 247.251 (2003)
22
R. Gonzalez and R. Woods, Digital Image Processing, Addison-Wesley
Publishing Company, 1992, Chap.4
23
P.J. Phillips, The FERET Evaluation Methodology for Face-Recognition
Algorithms, NISTIR 6264, Nat’l Institute of Standards and Technology, 1998,
http://www.itl.nist.gov/iaui/894.03/pubs.html#face
24
http://people.na.infn.it/~rosa/
25
http://www.mathworks.com/MATLAB Wavelet Toolbox
Download
Study collections