SMARTEYE - VEHICLE SECURITY SYSTEM USING FACIAL RECOGNITION ALFRED RITIKOS A project report submitted in partial fulfilment of the requirements for the award of the degree of Master of Engineering Faculty of Electrical Engineering Universiti Teknologi Malaysia MAY 2007 iii DEDICATION To my beloved wife, Phoay Eng, and sons, Ephraim and Keane. iv ACKNOWLEDGEMENT I would like to acknowledge my gratitude and appreciation to my project supervisor, Professor Dr. Ruzairi bin Hj. Abdul Rahim, for his guidance, advice and friendship throughout the period of carrying out this project as well as throughout this course, and also to Prof. Madya Dr. Syed Abdul Rahman bin Syed Abu Bakar for his assistance in providing specialist advice in the area of image processing. Furthermore, some of my workplace colleagues have been kind to provide their face images to be captured for the needed database. Finally, my gratitude is extended to my fellow students who have provided much encouragement and support. v ABSTRACT Facial recognition has gained increasing interest in the recent decade. Over the years there have been several techniques being developed to achieve high success rate of accuracy in the identification and verification of individuals for authentication in security systems. This project experiments the concept of combining of multilevel wavelet decomposition transformation and neural network for facial recognition in a specific application with its own limitations, in that of vehicle security access control system. The approach of this project is to conceptualise by simulation of the various processes involved in developing an implementable system. Keywords: Facial Recognition, Facial Verification, Image Extraction, Image Processing, Principal Component Transformation, Neural Network Analysis, Edge Detection, Wavelet vi ABSTRAK Dalam masa singkat kebelakangan ini pengenalan muka (facial recognition) telah banyak menerima tumpuan. Beberapa teknik atau cara telah dikaji dan dibangunkan untuk mencapai tahap ketepatan dengan kadar kejayaan yang tinggi dalam usaha mengenalpasti seseorang individu untuk diberi kebenaran laluan dalam sistem-sistem keselamatan. Projek ini telah menyelidiki penggabungan konsep multilevel wavelet decomposition transformation dan neural network untuk Facial Recognition dalam penggunaan yang tertentu yang mempunyai had-hadnya tersendiri, iaitu system kawalan keselamatan kenderaan. Projek ini tertumpu kepada membuktikan konsep tersebut dengan cara simulasi berbagai proses aturcara yang terlibat dalam sesuatu system yang boleh direka. vii TABLE OF CONTENTS CHAPTER 1 TITLE PAGE DECLARATION ii DEDICATION iii ACKNOWLEDGEMENTS iv ABSTRACT v ABSTRAK vi TABLE OF CONTENTS vii LIST OF TABLES viii LIST OF FIGURES ix LIST OF ABBREVIATIONS x LIST OF SYMBOLS xi INTRODUCTION 1 1.1 Biometrics for Identification and Verification 1 1.2 Verification vs. Identification 2 1.3 Incentives for Facial Recognition Application in 3 Vehicle Security 2 3 PROJECT SCOPE 7 2.1 Project Background 7 2.2 Overall Objectives 9 2.3 Scope of Work and Methodology 9 FACIAL RECOGINITION 11 3.1 An Overview of Facial Recognition Biometric 11 3.2 Applications of Facial Recognition 12 3.3 Generic Facial Recognition Algorithm 13 viii 4 5 3.4 Algorithms Comparisons 15 3.5 Basis of Facial Recognition Process – The PCA 19 3.5.1 Minimum Distance Classifier 20 3.5.2 Matching by Correlation 22 3.6 Neural Networks 24 IMAGE EXTRACTION 26 4.1 Setting The Scene 26 4.2 Digital Image Structure 27 4.3 Image Acquisition 29 4.4 Importance Of Facial Positioning 31 IMAGE TRANSFORMATION & PROCESSING 34 5.1 Grayscale Transformation 34 5.2 Image Thresholding 35 5.3 Gaussian Filtering 35 5.4 Image Features Extraction - Canny Edge Detector 36 5.5 Quality of images - Brightness and Contrast 39 Adjustments 6 SYSTEM DESIGN 41 6.1 System Architecture 41 6.1.1 Hardware Architecture 41 6.1.2 Software Architecture 42 6.2 Wavelet Packet Analysis For Face Recognition 44 6.3 Discrete Cosine Transform 46 6.4 Face Matching (ANN Of Wavelets) 51 7 CONCLUSION 55 8 FUTURE WORK 57 8.1 Practical software/hardware implementation 57 8.2 Improving Image Quality 58 8.3 Robustness of algorithms 59 8.4 Combination Of Algorithms 59 Appendices A-E 61 REFERENCES 76 ix LIST OF TABLES TABLE NO. 3.1 TITLE Summary List Of Image-Based FR Algorithms PAGE 16 x LIST OF FIGURES FIGURE NO. TITLE PAGE 1.1 An intelligent car fob for a fleet management system 5 3.1 A typical Facial Recognition process 14 3.2 Generic algorithm for software programming 15 3.3 Basic form of Neural Network architecture 25 4.1 Framework of Image/Video processing 27 4.2 Binary values of pixels 27 4.3 Camera viewing axis must be perpendicular with the 32 image 4.4 Facial image at a non-perpendicular angle results in 32 error 5.1 Simulation of various edge detection methods 37 5.2 Result of simulation of various edge detection 38 methods 5.3 Canny edge detection of an RGB image 39 5.4 Poorly lit environment produces unreliable image data 40 6.1 Proposed system equipment layout 42 6.2 Facial Recognition System architecture 43 6.3 The Haar mother wavelet function 46 6.4 Wavelet decomposition tree 47 6.5 Haar wavelet decomposition to 2 levels 47 6.6 Haar wavelet decomposition to 4 levels 48 6.7 Details of wavelet levels 48 6.8 Details of a wavelet node being compressed with 49 threshold xi 6.9 The inverse process (decomposition at 3 levels) 49 6.10 Histogram information of a selected node 50 6.11 Applying de-noising 50 6.12 Operation of the Neural Network 51 6.13 Image matching using Neural Network 53 6.14 The Facial Recognition System functionality 54 8.1 An example of the User Console design 60 xii LIST OF ABBREVIATIONS 2D - Two-dimension 3D - Three-dimension AAM - Active Appearance Model ANN (NN) - Artificial Neural Network (Neural Network) CWT - Continuous Wavelet Transform DWT - Discreet wavelet Transform EBGM - Elastic Bunch Graph Matching FERET - Face Recognition Technology FFT - Fast Fourier Transform FPGA - Field-Programmable Gate Array FR - Facial (or Face) Recognition HMM - Hidden Markov Model ICA - Independent Component Analysis ID Card - Identity Card KLT - Karhunen-Loeve Transform LDA - Linear Discriminant Analysis PCA - Principal Components Analysis PIN - Personal Identification Number ROI - Range of Interest xiii LIST OF SYMBOLS c(x,y) - correlation Dj - Euclidean distance γ(x,y) - correlation coefficient mj mean vector of patterns Nj number of pattern vectors ωj pattern class xj unknown pattern vector xiv LIST OF APPENDICES APPENDIX TITLE PAGE A Sample Of Images Used For The Project 61 B MATLAB Command Codes For Extracting Edge 62 Detection Of A Coloured (RGB) Image C MATLAB Command Codes For Wavelet And 63 Neural Network Facial Verification D Proposed User Console MATLAB GUI Programme 65 E Supplementary Notes 68 CHAPTER 1 INTRODUCTION 1.1 Biometrics for Identification and Verification Biometrics is an emerging set of pattern-recognition technologies which accurately and automatically identifies or verifies individuals based upon each person’s unique physical or behavioural characteristics. Identification using biometrics has advantages over traditional methods involving ID Cards (tokens) or PIN numbers (passwords) in that the person to be identified is required to be physically present where identification is required and there is no need for remembering a password or carrying a token. PINs or passwords may be forgotten, and tokens like passports and driver's licenses may be forged, stolen, or lost. Biometrics methods work by unobtrusively matching patterns of live individuals in real-time against enrolled records. Biometric templates cannot be reverse-engineered to recreate personal information and they cannot be stolen and used to access personal information. Because of these inherent attributes, biometrics is an effective means to secure privacy and deter identity theft. Various biometric traits are being used for real-time recognition, the most popular being face, iris and fingerprint. Other biometric systems which have found 2 their usefulness are based on retinal scan, voice, signature and hand geometry. By using them together with existing tokens, passwords and keys, biometric systems are being deployed to enhance security and reduce fraud. In designing a practical biometric system, a user must first be enrolled in the system so that his biometric template can be captured. This template is securely stored in a central database or a smart card issued to him. The template is retrieved when an individual needs to be identified. Depending on the context, a biometric system can operate either in verification (authentication) or identification mode. 1.2 Verification vs. Identification There are two different ways to recognize a person: verification and identification. Verification (answers the question “Am I who I claim I am?”) involves confirming or denying a person's claimed identity. In identification, the system has to recognize a person (addressing the question “Who am I?”) from a list of N users in the template database. Identification is a more challenging problem because it involves 1:N matching compared to 1:1 matching for verification. 1.3 Incentives for Facial Recognition Application in Vehicle Security Research on automatic face recognition in images has rapidly developed into several inter-related lines, and this research has both lead to and been driven by a disparate and expanding set of commercial applications. The large number of research activities is evident in the growing number of scientific communications published on subjects related to face processing and recognition. 3 Anti-theft devices are not foolproof, but they can a deterrent or to slow down the process. The longer it takes to steal a car, the more attention the thief attracts, and the more likely the thief will look elsewhere. Anti-theft devices include those listed below: • Fuel Shut Off This blocks gasoline flow until a hidden switch is tripped. The vehicle can only be driven a short distance, until the fuel already in the carburetor is used up. • Kill Switch The vehicle will not start unless a hidden switch is activated. The switch prevents electrical current from reaching the coil or carburetor. Check your vehicle warranty before installing a “kill switch.” • Time Delay Switch The driver must turn the ignition key from “on” to “start” after a precise, preset interval or the engine won’t turn over. • Armored Ignition A second tamper proof lock must be operated in Cutoff order to start the car. “Hot wiring” (staring a car without a key) is very difficult with this device, so it is especially effective against amateurs. • Hood Locks These make it difficult to get to the battery, engine, or vehicle security system. • Time Delay Fuse Unless a concealed switch is turned off, starting the vehicle causes a sensitive fuse to burn out, cutting out power and stopping the motor. • Armoured Collar A metal shield that locks around the steering column and covers the ignition, the starter rods and the steering wheel interlock rod. • Crook Lock A long metal bar with a hook on each end to lock the steering wheel to the brake pedal. • Audible Alarm These alarm systems are positioned in the engine 4 Systems to set off a buzzer, bell or siren if an attempt is made to tamper with the hood, bypass the ignition system, or move the vehicle without starting the engine. To illustrate the “evolution” of typical vehicle security system over the recent years, here is an example of development of such products from a particular brand1 of cars:1995 passive security system (no remote); the system is armed by locking the doors with or without the key; windows could be open and the system would arm 1996 remote by coded alarm; unlocks all doors with one push 1997 remote by coded alarm changed to unlock only the driver's door with one push 1999 keyless remote 2003 remote buttons coloured; a 'chirp' replaces the audible honk 2005 remote fobs and immobilizer keys with remote entry as before 2006 remotes with recessed buttons which are harder to accidentally press on 2007 remote-start system Keyless entry is becoming a standard feature in vehicles that have installed alarm systems. A small battery operated device (fob or “remote”) hangs on the key chain and features one or more buttons for arming and disarming the alarm. The button operates the door locks as well. When one approaches the car, a press of the button will not only disarm the alarm, but unlock the driver's door, making it unnecessary to use a key. Hence, it allows keyless entry. In a biometric vehicle security system, the objective is to authenticate a user being an authorised person to have access to the ignition system. It could be a first step before ignition could commence or it could be an integrated system for autoignition subsequent to authorisation being cleared. 5 A progression from the now common keyless fob used to open a vehicle, there is a recent successful commercial implementation of biometric for authorisation, in the form of fingerprint recognition. This, however, does have its own weaknesses, such as the one depicted by a report by BBC News2 on 31 March 2005 of a local robbery incident where the owner’s finger was sliced off the end of his index finger with a machete. Potential applications of biometrics in vehicle security are for private vehicles and especially for commercial vehicle fleet, such as rented cars, taxis, transportation lorries and public buses. One of the most effective ways to optimise use of vehicles is to allow drivers to use vehicles from a motor pool. A “fleet management system” is an optimization tool aimed at making it very easy to manage vehicles in a motor pool. There is little need to look through paper records to see if someone is eligible to drive, or to check if he has received the proper training for that vehicle, or if someone’s driver’s license expired since she last used a vehicle. Electronic key manufacturers for fleet management companies make intelligent fobs which automatically record the transaction activity by date and time both on the key cabinet and on the support software. This electronic key security makes users accountable for the keys, reducing management risk and improving efficiency. One such product for commercial fleet vehicles is available from Traka, Inc.3 Figure 1.1: An intelligent car fob for a fleet management system Their iFob is inserted into receptor sockets, adjacent to the door or equipment which, check the permissions on the iFob. If acceptable, the Immobilisor will release a door magnetic lock or solenoid and the door will open. The iFob will record the access event as well as the time which it accumulates until returned to the Traka 6 cabinet at the end of the shift, when the events are downloaded. If a user attempts to use the iFob outside its period of validity, the iFob will no longer activate the Immobilisor. The iFob contains a chip with a guaranteed unique serial number, giving every one an individual ID. The special shape of the iFob allows it to automatically lock into the Traka cabinet and its smooth surface is inherently self cleaning eliminating problems associated with dust or other contaminations. Where keys need to be managed, they are attached using special self locking security seals, so that they cannot be easily detached. Being physically detached from the user, such a sophisticated device and system are still subject to loss and misuse. Although each fob is assigned a serial number and assigned to an individual person, there is no guarantee that another person will not use it for access to the vehicle. Because of its many advantages, biometrics is fast being used for physical access control, computer log-in, welfare disbursement, international border crossing (e-Passports) and national ID cards, verification of customers during transactions conducted via telephone and Internet (e-Commerce and e-Banking). In automobiles, biometrics is being adopted to replace keys for keyless entry and keyless ignition. Here are some commercially available products for such vehicle access and starting applications:Product name4 Biometrics method Identisafe-09 Fingerprint Retinasafe-18 Eyeball Recognition Brainsafe-72 Brain fingerprinting Voicesafe-36 Voice Think-Start-99 Brain waves There is much interest in using FR for security systems due to it advantages for the above listed methods. These will be explained in the next chapter. Among some advantages of Facial Recognition method for vehicle security application are:(i) more convenient, no active part of user; sensed as soon as one is seated in position (and facing the camera) 7 (ii) low risk scenario (failure means loss of one vehicle, compared to loss to company properties & confidential materials, national security and safety) (iii) a “better” alternative to existing methods. (What is the chance of a thief cutting the owner’s/ authorised persons’ face or head (!) to steal the vehicle; compare to his finger – as has happened to a driver?) Some practical questions that need to be answered include:(i) Is biometric really practical for this application? Even with fingerprint method, do we not need a key to lock and open our vehicle doors? (ii) Is there a method which is fully foolproof? Hacking/bypassing the system is undeniable. 8 CHAPTER 2 PROJECT SCOPE 2.1 Project Background This security access project is aimed at demonstrating advanced facial recognition techniques that could antiquate, substitute, or otherwise, supplement, conventional key/key fob vehicle ignition systems, and can be used as an alternative to, or complement, existing fingerprint biometrics method. A computerised system equipped with a digital camera can identify the face of a person and determine if the person is authorized to start the vehicle. A practical implementation of such a system would consists of a microcontroller with embedded high-level language software, with a small-lensed camera as data input and a LCD screen as output mediums. This integrated system would be able to authorise a user before switching on the vehicle with a key; or authorise a user and automatically switching on the vehicle without a key. Whilst facial recognition systems are by now readily available in the market, the vast majority of them are installed at large open spaces, such as in airport halls. This project is for an application where the population (of users) is very small and the area of coverage is confined within the vehicle driver enclosure. Both ends of the 9 system design, i.e. the image capturing by camera, and actuation of vehicle ignition system, are considered simpler to implement, therefore these will not be dealt with in detail beyond a brief introduction. The focus of this project is, thus, the development of an algorithm for this very specific application. 2.2 Overall Objectives The objectives of this project are:(i) to select combination of existing facial recognition techniques and/or mathematical models; considering practicality and potential implementation costs; (ii) to develop a control algorithm based on selected informationprocessing model for this specific application; and (iii) to simulate the facial recognition software programme for Vehicle Security access clearance using software tools and incorporated into the system hardware. 2.3 Scope of Work and Methodology As the project developed, it was decided that hardware implementation of the experiment such as using FPGA would encounter many technical constraints, such as the demanding time to learn new programmes and application to real-time conditions, as well as physical limitations of using FPGA, such as processing speed and need for very large memory size for data, for image processing. It was also decided that many algorithms have been developed by researchers in the area of image processing and recognition, and developing new methods requires extensive 10 knowledge in various aspects of programming and theory behind their development. Because of the complexity of developing new or improved algorithm software, it is observed that to achieve a comprehensive design requires the effort of several people working together; students and experts in multi-discipline fields. Because of these constraints, it was decided that in the best interest of achieving the requirements of this course that the coverage is simplified but to a workable product, and it is certainly better to utilise the tools available in the market, i.e. by using MATLAB programme, and exploring the combination of several techniques. The objective of this project then is for “proof of concept” of FR algorithms for a very specific application. This project uses MATLAB Version 7.0.0.19920 (Release 14), Simulink 6.0 and MATLAB Release R2006a (which contains additional and advanced blocksets not found in the previous MATLAB suite, especially the Video & Image Processing Blocksets). 11 CHAPTER 3 FACE RECOGNITION Since the beginning of time, humans have relied on facial recognition (FR) as a way to establish and verify another person’s identity. FR technology isn’t any different. Using software, a computer is able to locate human faces in images and then match overall facial patterns to records stored in a database. Because a person’s face can be captured by a camera from some distance away, FR has a clandestine or covert capability (i.e. the subject does not necessarily know he has been observed). For this reason, FR has been used in projects to identify card counters or other undesirables in casinos, shoplifters in stores, criminals and terrorists in urban areas. 3.1 An Overview of Facial Recognition Biometric FR is the study of algorithms that automatically process images of the face. FR records the spatial geometry of distinguishing features of the face. Different vendors use different methods of FR, however, all focus on measures of key features 12 of the face, including its texture. Practical problems include recognizing faces from still and moving images, analyzing and synthesizing faces, recognizing facial gestures and emotions, modelling human performance, and encoding faces. The face is a curved three-dimensional surface, whose image varies with changes in illumination, pose, hairstyle, facial hair, makeup and age. All faces have basically the same shape, yet the face of each person is different. The goal of FR then is to find a representation that can distinguish among faces of different people, yet at the same time be invariant to changes in the image of each person. The initial emphasis was on recognizing faces in still images where the sources of variation were highly controlled. This progressed to detecting faces, processing video sequences, and recognizing faces under less controlled settings. 3.2 Applications of Facial Recognition FR’s potentially many applications led to the development of FR algorithms. Among the applications are law enforcement and security, human/ computer interfaces, image compressions and coding of facial images and related areas of facial gesture recognition; and analysis and synthesis of faces. There are three basic scenarios that FR systems might address, (i) identifying an unknown person, (ii) verifying a claimed identity of a person, and (iii) analysing a face in an image. The primary interest in law enforcement and security is in identification and security. A major identification task is searching a database of known individuals for the identity of an unknown person. 13 A similar application is maintaining the integrity of an identity database, which could be compromised by, (i) a person having two identities, or (ii) two people having the same identity. Both types of errors can result in degradation of recognition performance or in false accusations being made. The main application for security is verification. The input to a verification system is a facial image and a claimed identity of the image; the output is either acceptance or rejection of the claim. Potential applications include controlling access to buildings or equipment, confirming and verifying identities. 3.3 Generic Facial Recognition Algorithm The four basic phases of FR are: (i) Image data pre-processing - reduces unwanted image variation by aligning the face imagery, equalizing the pixel values, and normalizing the contrast and brightness. (ii) Algorithm training - training creates subspaces into which test images are subsequently projected and matched (examples include PCA, PCA+LDA, EBGM and BIC algorithms). (iii) Algorithm testing - testing creates a distance matrix for union of all images to be used either as probe images or gallery images in the analysis phase. (iv) Analysis of results - performs analyses on the distance matrices; includes computing recognition rates, conducting virtual experiments, or performing other statistical analysis on the data. 14 Figure 3.1: A typical Facial Recognition process The process begins by reducing variability of the human face to a set of numbers. Using mathematical technique called Principal Components Analysis (PCA), large group of faces are examined to extract the most efficient building blocks required to describe them. Any human face can be represented as weighted sum of these building blocks, known as Eigenfaces. With PCA, essence of a human face can be reduced to just 256 bytes of information. Recognition process involves comparing Eigenface weights for two faces using an algorithm that generates a match score. Different faces will produce a poor match score; images of the same face will produce a good match score. In one-toone comparison (our Vehicle Access example), Eigenface weights of authorized personnel are recorded in a central database. When someone steps before a camera, his/her face is quickly compared to all faces in database to see if it generates a match. In a one-to-many search, a database is created containing faces of individuals whose presence would warrant action (e.g. most-wanted criminals, missing persons, etc.). Cameras, overtly or covertly deployed at strategic locations, capture, in real time, each face in the field of view and compare it with all records in the database. 15 Figure 3.2: Generic algorithm for software programming The entire process should take less than a tenth of a second, with a high degree of accuracy. 3.4 Algorithms Comparisons Table 3.1 is a list of established image-based FR algorithms, and some sample papers on each method are referenced. These will help us in designing and conducting FR experiments which best suit our applications. 16 Table 3.1: Summary List Of Image-Based FR Algorithms5 Principal Component Derived from Karhunen-Loeve's transformation. Analysis6 Given an s-dimensional vector representation of each face in a training set of images, PCA tends to find a t-dimensional subspace whose basis vectors correspond to the maximum variance direction in the original image space. This new subspace is normally lower dimensional (t<<s). If the image elements are considered as random variables, the PCA basis vectors are defined as eigenvectors of the scatter matrix. Independent Component ICA minimizes both second-order and higher-order Analysis7 dependencies in the input data and attempts to find the basis along which the data (when projected onto them) are statistically independent. M.S. Bartlett provided two architectures of ICA for face recognition task: Architecture I - statistically independent basis images, and Architecture II factorial code representation. Linear Discriminant Analysis8 LDA finds the vectors in the underlying space that best discriminate among classes. For all samples of all classes the between-class scatter matrix SB and the within-class scatter matrix SW are defined. The goal is to maximize SB while minimizing SW, in other words, maximize the ratio det|SB|/det|SW|. This ratio is maximized when the column vectors of the projection matrix are the eigenvectors of (SW-1 × SB). Elastic Bunch Graph All human faces share a similar topological Matching9 structure. Faces are represented as graphs, with nodes positioned at fiducial points. (eyes, nose...) and edges labeled with 2-D distance vectors. Each node contains a set of 40 complex Gabor wavelet 17 coefficients at different scales and orientations (phase, amplitude). They are called "jets". Recognition is based on labelled graphs. A labelled graph is a set of nodes connected by edges, nodes are labelled with jets, and edges are labelled with distances. Evolutionary Pursuit10 An eigenspace-based adaptive approach that searches for the best set of projection axes in order to maximize a fitness function, measuring at the same time the classification accuracy and generalization ability of the system. Because the dimension of the solution space of this problem is too big, it is solved using a specific kind of genetic algorithm called EP. Kernel Methods 11 The face manifold in subspace need not be linear. Kernel methods are a generalization of linear methods. Direct non-linear manifold schemes are explored to learn this non-linear manifold. Trace Transform12 The Trace transform, a generalization of the Radon transform, is a new tool for image processing which can be used for recognizing objects under transformations, e.g. rotation, translation and scaling. To produce the Trace transform one computes a functional along tracing lines of an image. Different Trace transforms can be produced from an image using different trace functionals. Active Appearance Model13 An AAM is an integrated statistical model which combines a model of shape variation with a model of the appearance variations in a shape-normalized frame. An AAM contains a statistical model if the shape and gray-level appearance of the object of interest which can generalize to almost any valid example. Matching to an image involves finding 18 model parameters which minimize the difference between the image and a synthesized model example projected into the image. 3D Morphable Model14 Human face is a surface lying in the 3D space intrinsically. Therefore the 3D model should be better for representing faces, especially to handle facial variations, such as pose, illumination, etc. Volker Blantz proposed a method based on a 3D morphable face model that encodes shape and texture in terms of model parameters, and algorithm that recovers these parameters from a single image of a face. 3D Face Recognition15 The main novelty of this approach is the ability to compare surfaces independent of natural deformations resulting from facial expressions. First, the range image and the texture of the face are acquired. Next, the range image is preprocessed by removing certain parts such as hair, which can complicate the recognition process. Finally, a canonical form of the facial surface is computed. Such a representation is insensitive to head orientations and facial expressions, thus significantly simplifying the recognition procedure. The recognition itself is performed on the canonical surfaces. 16 Bayesian Framework A probabilistic similarity measure based on Bayesian belief that the image intensity differences are characteristic of typical variations in appearance of an individual. Two classes of facial image variations are defined: intrapersonal variations and extrapersonal variations. Similarity among faces is measures using Bayesian rule. Support Vector Machine17 Given a set of points belonging to two classes, a 19 SVM finds the hyperplane that separates the largest possible fraction of points of the same class on the same side, while maximizing the distance from either class to the hyperplane. PCA is first used to extract features of face images and then discrimination functions between each pair of images are learned by SVMs. Hidden Markov Models18 HMM are a set of statistical models used to characterize the statistical properties of a signal. HMM consists of two interrelated processes: (1) an underlying, unobservable Markov chain with a finite number of states, a state transition probability matrix and an initial state probability distribution and (2) a set of probability density functions associated with each state. The aims of an FR programme are:(i) fast processing time : for immediate access upon demand (at the press of the “ignition” switch) (ii) high degree of accuracy : to ensure reliability; thus public acceptance (iii) low-cost: for wide application for various classes of vehicles. 3.5 Basis of Facial Recognition Process – the PCA The recognition process involves comparing the Eigenface weights for two faces using a proprietary algorithm that generates a match score. Different faces will produce a poor match score; images of the same face will produce a good match score. 20 The process begins by reducing the variability of the human face to a set of numbers. Using a mathematical technique called Principal Components Analysis (PCA), one can examine a large group of faces and extract the most efficient building blocks required to describe them. It turns out that any human face can be represented as the weighted sum of 128 of these building blocks, known as Eigenfaces, based on the pioneering works of M. Turk and A. Pentland.19 With this technique, the essence of a human face can be reduced to just 256 bytes of information. The recognition process involves comparing the Eigenface weights for two faces using a proprietary algorithm that generates a match score. Different faces will produce a poor match score; images of the same face will produce a good match score. The vehicle authorisation system requires a one-to-one comparison, the Eigenface weights of authorized personnel are recorded in a central database. When someone appearts before the camera, his or her face is quickly compared to all of the faces in the database to see if it generates a match. 3.5.1 Minimum Distance Classifier If we define the prototype of each pattern class to be the mean vector of the patterns of that class: mj = 1 Nj ∑x x∈ω j j j = 1, 2, …, W (1) where Nj is the number of pattern vectors from class ωj and the summation is taken over these vectors. One way to determine the class membership of an unknown pattern vector x is to assign it to the class of its closest prototype. Using the Euclidean distance to determine closeness reduces the problem to computing the distance measures: Dj(x) = ||x-mj|| j = 1, 2, …, W (2) 21 where ||a|| = (aTa)½ is the Euclidean norm. We then assign x to class ωj if Dj(x) is the smallest distance, i.e. the smallest distance implies the best match is this formulation. Selecting the smallest distance is equivalent to evaluating the functions dj(x) = xT-mj - ½mjTmj j = 1, 2, …, W and assigning x to class ωj if dj(x) yields the largest numerical value. (3) This formulation agrees with the concept of a decision function, as defined in Equation (2). From Equations (3) and (6), the decision boundary between classes ωi and ωj for a minimum distance classifier is dij(x) = di(x) - dj(x) = xT(mi - mj) -½(mi-mj)T(mi-mj) = 0 (4) The surface given by Equation (4) is the perpendicular bisector of the line segment joining mi - mj. For n=2, the perpendicular bisector is a line, for n=3 it is a plane, and for n>3 it is called a hyperplane. In practice, the minimum distance classifier works well when the distance between means is large compared to the spread or randomness of each class with respect to its mean. The minimum distance classifier yields optimum performance (in terms of minimizing the average loss of misclassification) when the distribution of each class about its mean is in the form of spherical ‘hypercloud’ in n-dimensional pattern space. The simultaneous occurrence of large mean separations and relatively small class spread occur seldomly in practice unless the system designer controls the nature of the input. 22 3.5.2 Matching by Correlation The correlation of two functions f(x,y) and h(x,y) is defined as f(x,y)○h(x,y) = 1 MN M −1 N −1 ∑∑ f * (m, n)h( x + m, y + n) (5) m=0 n=0 where f* denotes the complex conjugate of f. We normally deal with real functions (images), in which case f*=f. The correlation theorem. Let f(u,v) and H(u,v) denote the Fourier transforms of f(x,y) and h(x,y), respectively. One-half of the correlation theorem states that spatial correlation, f(x,y)○h(x,y), and the frequency domain product, F*(u,v)H(u,v), constitute a Fourier transform pair. This result, normally stated as, f(x,y)○h(x,y) Ù F*(u,v)H(u,v), (6) indicates that correlation in the spatial domain can be obtained by taking the inverse Fourier transform of the product F*(u,v)H(u,v) where F* is the complex conjugate of F. An analogous result is that correlation in the frequency domain reduces to multiplication in the spatial domain; that is, f(x,y)h(x,y) Ù F*(u,v)○H(u,v). (7) These two results comprise the correlation theorem. It is assumed that all functions have been properly extended by padding. The principle use of correlation is for matching. In matching, f(x,y) is an image containing objects or regions. If we want to determine whether f contains a particular object or region in which we are interested, we let h(x,y) be that object or region (this image is normally called a template). Then, if there is a match, the correlation of the two functions will be maximum at the location where h finds a correspondence in f. The term cross-correlation often is used in place of the term correlation to clarify that the images being correlated are different. In autocorrelation, both the images are identical; we have the autocorrelation theorem, f(x,y)○f(x,y) Ù |F(u,v)|2. (8) 23 This result states that the Fourier transform of the spatial autocorrelation is the lower spectrum. Similarly, |f(x,y)|2 Ù F(u,v)○F(u,v). (9) Image correlation is considered as a basis for finding matches of a subimage ω(x, y) of size J X K within an image f(x, y) of size M X N, where we assume that J<M and K<N. Although the correlation approach can be expressed in vector form, working directly with an image or subimage format is more intuitive. In its simplest form, the correlation between f(x, y) and ω(x, y) is c(x, y) = ∑∑ f (s, t )ω ( x + s, y + t ) s (10) t for x=0, 1, 2,….,M-1, y=0,1,2,….,N-1, and the summation is taken over the image region where ω and f overlap. The correlation function given in Equation (10) has the disadvantage of being sensitive to changes in the amplitude of f and ω. For example, doubling all the values of f doubles the value of c(x,y). An approach frequently used to overcome this difficulty is to perform matching via the correlation coefficient, which is defined as γ(x, y) = ∑∑[ f (s, t ) f (s, t )][w( x + s, y + t ) − w ] [ s t {∑∑ f ( s, t ) − f ( s, t ) s t where x=0,1,2,…,M-1, y=0,1,2,…,N-1, ] ∑∑ [w( x + s, y + t ) − w ] } 2 12 2 s (11) t is the average value of the pixels in w (computed only once), is the average value of f in the region coincident with the current location of w, and the summations are taken over the coordinates common to both f and w. The correlation coefficient γ(x, y) is scaled in the range -1 to +1, independent of scale changes in the amplitude of f and w. Although the correlation function can be normalized for amplitude changes via the correlation coefficient, obtaining normalization for changes in size and rotation can be difficult. Normalising for size involves spatial scaling, a process that in itself adds a significant amount of computation. Normalization for rotation is even more difficult. If a clue regarding rotation can be extracted from f(x,y), then we simply rotate w(x,y) so that it aligns itself with the degree of rotation in f(x,y). 24 However, if the nature of rotation is unknown, looking for the best match requires exhaustive rotations of w(x,y). This procedure is impractical and, as a consequence, correlation seldom is used in cases when arbitrary or unconstrained rotation is present. Correlation can also be carried out in the frequency domain via the FastFourier Transform. If f and w are the same size, this approach can be more efficient than direct implementation of correlation in the spatial domain. Equation (11) is used when w is much smaller than f. The correlation coefficient is more difficult to implement in the frequency domain. It generally is computed directly in the spatial domain. 3.6 Neural Networks The approaches discussed above are based on the sample pattern to estimate statistical parameters of each pattern class. The minimum distance classifier is specified completely by the mean vector of each class. Similarly, the Bayes classifier for Gaussian populations is specified completely by the mean vector and covariance matrix of each class. The patterns (of known class membership) used to estimate these parameters usually are called training patterns, and a set of such patterns from each class is called the training set. The process by which a training set is used to obtain decision functions is called learning or training. In the approaches just discussed, training is a simple matter. The training patterns of each class are used to compute the parameters of the decision function corresponding to that class. After the parameters in question have been estimated, the structure of the classifier is fixed, and its eventual performance will depend on 25 how well the actual pattern populations satisfy the underlying statistical assumptions made in the derivation of the classification method being used. The statistical properties of the pattern classes in a problem often are unknown or cannot be estimated. In practice such decision-theoretic problems are best handled by methods that yield the required decision functions directly via training. Then, making assumptions regarding the underlying probability density functions or other probabilistic information about the pattern classes under consideration is unnecessary. In this section we discuss approaches that meet this criterion. Figure 3.3: Basic Form of Neural Network Architecture Learning machines, called perceptrons, when trained with linearly separable training sets (i.e. training sets separable by a hyperplane), would converge to a solution in a finite number of iterative steps. The solution took the form of coefficients of hyperplanes capables of correctly separating the classes represented by patterns of the training sets. In its most basic form, the perceptron learns a linear decision function that dichotomizes two linearly separable training sets. 26 CHAPTER 4 IMAGE EXTRACTION 4.1 Setting The Scene For the purpose of recognising faces, the very first step of the biometric FR process is the capturing of facial images of individuals. The objective of this project is not to dwell in-depth into the technicality of cameras for image capturing or the complexity of image formation and display but to process image signals for the object (facial) recognition. Photos used in this project were captured using three types of cameras: a photo-studio 5.1 megapixel camera, a 1.2 megapixel PC-built-in camera (MacBook) and a portable generic 0.6 megapixel USB camera (Logitech QuickCam Messenger) Figure 4.1 shows the framework of digital image processing. There are six modules in this system: (i) File I/O: Read/write image/video files. (ii) Frame Grabber: Crab the image from the image capture device. (iii) Image Processing Module: The module consists of a buffer for storing intermediate image data and other functions to further process data. 27 (iv) Date Visualization: Plot the analyzed results obtained from the image processing module. (v) Bitmap: A data structure to store/display image. (vi) Display: Display the data/image sent from previous modules. Figure 4.1: Framework of Image/Video Processing 4.2 Digital Image Structure This image below is represented by 76,800 samples, or pixels (picture elements) arranged in a two-dimensional array of 320 columns and 240 rows. Figure 4.2: Binary values of pixels 28 The value of each pixel is converted into greyscale, where 0 is black, 255 is white, and the intermediate values are shades of grey. Images have their information encoded in the spatial domain, the image equivalent of time domain. Features in images are represented by edges, not sinusoids. This means that the spacing and number of pixels are determined by how small of features need to be seen, rather than the formal constraints of the sampling theorem. Images with few pixels are regarded as having unusually poor resolution; these images look noticeably unnatural, and the individual pixels can often be seen. The strongest motivation for using lower resolution images is that there are fewer pixels to handle. One of the most difficult problems in image processing is managing massive amounts of data. It is common for 256 grey levels (quantization levels) to be used in image processing, corresponding to a single byte per pixel. This is because (i) a single byte is convenient for data management, (ii) the large number of pixels in an image compensate to a certain degree for a limited number of quantization steps, and (iii) a brightness step size of 1/256 (0.39%) is smaller than the eye can perceive – an image presented to a human observer will not be improved by using more than 256 levels. The value of each pixel in the digital image represents a small region in the continuous image being digitized. This defines a square sample spacing and sampling grid. The region of a continuous image that contributes to the pixel value is called the sampling aperture; the size of the sampling aperture is often related to the inherent capabilities of the particular imaging system being used. In most cases the sampling grid is made approximately the same as the sampling aperture of the system. Resolution in the final digital image will be the limited primary by the larger of the two, the sampling grid or the sampling aperture. Colour is added to digital images by using three numbers of each pixel, representing the intensity of the three primary colours: red, green and blue. Mixing these three colours generates all possible colours that the human eye can perceive. A single byte is frequently used to store each of the colour intensity allowing the image 29 to capture a total of 256x256x256 = 16.8 million different colours. Colour is very important when the goal is to present the viewer with a true picture of the world, such as in television and still photography. However, this is usually not how images are used in science and engineering, where the purpose is to analyse a twodimensional signal by using the human visual systems as a tool. For this reason, black and white images are sufficient for this FR project. The parameters in optical systems interact in many unexpected ways. For example, consider how the amount of available light and the sensitivity of the light sensor affects the sharpness of the acquired image. This is because the iris diameter and the exposure time are adjusted to transfer the proper amount of light from the scene being viewed to the image sensor. If more than enough light is available, the diameter of the iris can be reduced, resulting in a greater depth-of-field (the range of distance from the camera where an object remains in focus). A greater depth-of-field provides a sharper image when objects are at various distances. In addition, an abundance of light allows the exposure time to be reduced, resulting in less blur from camera shaking and object motion. Optical systems are full of these kinds of tradeoffs. The dynamic range of an electronic camera is typically 300 to 1000, defined as the largest signal that can be measured, divided by the inherent noise of the device. The same camera and lens assembly used in bright sunlight will be useless on a dark night or in a dark room. 4.3 Image Acquisition The most common image sensor used in electronic cameras is the charge coupled device (CCD). CCD image sensors are capable of transforming a light pattern (image) into an electric charge pattern (an electronic image). The heart of the CCD is a thin wafer of silicon, typically about 1 cm square. A charge-coupled 30 device (CCD) is a sensor for recording images, consisting of an integrated circuit containing an array of linked, or coupled, capacitors. Under the control of an external circuit, each capacitor can transfer its electric charge to one or other of its neighbours. The CCD consists of several individual elements that have the capability of collecting, storing and transporting electrical charge from one element to another. Each photosensitive element represents a pixel. Structures are made that form lines, or matrices of pixels. Output amplifiers at the edge of the chip collect the signals from the CCD. An electronic image can be obtained by - after having exposed the sensor with a light pattern - applying series of pulses that transfer the charge of one pixel after another to the output amplifier, line after line. The output amplifier converts the charge into a voltage. External electronics will transform this output signal into a form suitable for monitors or frame grabbers. CCDs have extremely low noise figures. CCD image sensors can be a colour sensor or a monochrome sensor. In a colour image sensor an integral RGB colour filter array provides colour responsivity and separation. A monochrome image sensor senses only in black and white. Optical format is used to determine what size lens is necessary for use with the imager. Optical format refers to the length of the diagonal of the imaging area include 1/7 inch, 1/6 inch, 1/5 inch, 1/4 inch, 1/3 inch, 1/2 inch, 2/3 inch, 3/4 inch, and 1 inch. The number of pixels and pixel size is important to consider. Horizontal pixels refer to the number of pixels in a row of the image sensor. Vertical pixels refer to the number of pixels in a column of the image sensor. The greater the number of pixels, the better the resolution. For example, VGA resolution is 640x480, this means the number of horizontal pixels is 640 and the number of vertical pixels is 480. Important image sensor performance specifications to consider when searching for CCD image sensors include: • spectral response - spectral range (wavelength range) for which the detector is designed • data rate - speed of a data transfer process, normally expressed in MHz 31 • quantum efficiency - ratio of photon-generated electrons that the pixel captures to the photons incident on the pixel area; is wavelength dependent so the given value for quantum efficiency is generally for the peak sensitivity wavelength for the CCD • dynamic range - logarithmic ratio of well depth to the readout noise in decibels; the higher the number, the better • number of outputs Recently it has become practical to create an Active Pixel Sensor (APS) using the CMOS manufacturing process. Since this is the dominant technology for all chip-making, CMOS image sensors are cheap to make and signal conditioning circuitry can be incorporated into the same device. The latter advantage helps mitigate their greater susceptibility to noise, which is still an issue, though a diminishing one. This is due to the use of low grade amplifiers in each pixel instead of one high-grade amplifier for the entire array in the CCD. CMOS sensors also have the advantage of lower power consumption than CCDs. An image is projected by a lens on the capacitor array, causing each capacitor to accumulate an electric charge proportional to the light intensity at that location. A one-dimensional array, used in line-scan cameras, captures a single slice of the image, while a twodimensional array, used in video and still cameras, captures the whole image or a rectangular portion of it. Once the array has been exposed to the image, a control circuit causes each capacitor to transfer its contents to its neighbour. The last capacitor in the array dumps its charge into an amplifier that converts the charge into a voltage. By repeating this process, the control circuit converts the entire contents of the array to a varying voltage, which it samples, digitizes and stores in memory. 4.4 Importance Of Facial Positioning In this project it is necessary that facial images are taken from a plane perpendicular with the camera. This will enable the fast identification of facial 32 features without the need of a complicated programme. This is not a significant issue for our vehicle application where the position of the user’s face is almost fixed and it is natural for the user to face the front where the camera is placed. Figure 4.3: Camera viewing axis must be perpendicular with the image Figure 4.4 illustrates the error response of the image acquisition routine where features could not be extracted due to incorrect position of the face. Figure 4.4: Facial image at a non-perpendicular angle results in error In this illustration, this rejection is further exaggerated by a use of a comic picture of a human, and the pixel size does not match with the rest of the database photos. What it means is that, for a simplified FR system, it is necessary that the facial image 33 is taken at a particular angle (perpendicular plane) and the face is of consistent size for comparison of features with other images. 34 CHAPTER 5 IMAGE TRANSFORMATION & PROCESSING 5.1 Grayscale Transformation Grayscale transformation is a powerful technique for improving the appearance of images. The idea is to increase the contrast at pixel values of interest, at the expense of the pixel values we don’t care about. This is done by defining the relative importance of each of the 0 to 255 possible pixel values. The more important the value, the greater its contrast is made in the displayed image. In this project simulation, coloured images are converted to grayscale images using the Video & Image Blocksets in MATLAB. The bulk of the conversion, however, was expedited manually using a standalone freeware software, Batch Image Processor (BIMP Lite version 1.62)20, written by Matthew Hart. Grayscale transforms (or “Intensity” in MATLAB) can significantly improve the viewability of an image. For the purpose of FR analysis, it is preferred to use Grayscale images rather than Coloured images to reduce the matrix size, thus the amount of data generated to be operated upon and saved. 21 However, it is noted that there are researches on FR using skin colour and texture. 35 Histogram equalisation is a way to automate the procedure. Histogram equalisation blindly uses the histogram as the contrast weighing curve, eliminating the need for human judgement, i.e. the output transform is found by integration and normalisation of the histogram, rather than a manually generated curve. This results in the greatest contrast being given to those values that have the greatest number of pixels. 5.2 Image Thresholding The percentage of the thresholding means the threshold level between the maximum and minimum intensity of the initial image. Thresholding is a way to get rid of the effect of noise and to improve the signal-noise ratio. That is, it is a way to keep the significant information of the image while get rid of the unimportant part (under the condition that we choose a plausible thresholding level). 5.3 Gaussian Filtering The basic effects of (2D) Gaussian filter are for smoothing the image and wiping off the noise. Generally speaking, for a noise-affected image, smoothing it by Gaussian function is the first thing to do before any other further processing, such as edge detection. The effectiveness of the Gaussian function is different for different choosing the standard deviation sigma of the Gaussian filter. 36 5.4 Image Features Extraction - Canny Edge Detector22 The Canny operator was designed to be an optimal edge detector (according to particular criteria). It takes as input a gray scale image, and produces as output an image showing the positions of tracked intensity discontinuities. The Canny operator works in a multi-stage process. First of all the image is smoothed by Gaussian convolution. Then a simple 2-D first derivative operator is applied to the smoothed image to highlight regions of the image with high first spatial derivatives. Edges give rise to ridges in the gradient magnitude image. The algorithm then tracks along the top of these ridges and sets to zero all pixels that are not actually on the ridge top so as to give a thin line in the output, a process known as “non-maximal suppression”. The tracking process exhibits hysteresis controlled by two thresholds: T1 and T2, with T1 > T2. Tracking can only begin at a point on a ridge higher than T1. Tracking then continues in both directions out from that point until the height of the ridge falls below T2. This hysteresis helps to ensure that noisy edges are not broken up into multiple edge fragments. The effect of the Canny operator is determined by three parameters; the width of the Gaussian kernel used in the smoothing phase, and the upper and lower thresholds used by the tracker. Increasing the width of the Gaussian kernel reduces the detector's sensitivity to noise, at the expense of losing some of the finer detail in the image. The localization error in the detected edges also increases slightly as the Gaussian width is increased. Usually, the upper tracking threshold can be set quite high, and the lower threshold quite low for good results. Setting the lower threshold too high will cause noisy edges to break up. Setting the upper threshold too low increases the number of spurious and undesirable edge fragments appearing in the output. One problem with the basic Canny operator is to do with Y-junctions i.e. places where three ridges meet in the gradient magnitude image. Such junctions can occur where an edge is partially occluded by another object. The tracker will treat two of the ridges as a single line segment, and the third one as a line that approaches, 37 but does not quite connect to, that line segment. Most of the major edges are detected and lots of details have been picked out well. Note that this may be too much detail for subsequent processing. The Gaussian smoothing in the Canny edge detector fulfills two purposes: first, it can be used to control the amount of detail that appears in the edge image and second, it can be used to suppress noise. If we scale down the image before the edge detection, we can use the upper threshold of the edge tracker to remove the weaker edges. All the boundaries of the objects have been detected whereas all other edges have been removed. Although the Canny edge detector allows us to find the intensity discontinuities in an image, it is not guaranteed that these discontinuities correspond to actual edges of the object. Figure 5.1: Simulation of various edge detection methods 38 Figure 5.1 is a SIMULINK simulation of some of the various built-in edge detection methods : Sobel, Prewitt and Canny. The Canny method is further simulated with some threshold values applied. The result of each method is shown below. (a) original image (b) Sobel detection (c) Prewitt detection (d) horizontally filtered edge detection (e) vertically filtered edge detection (f) Canny detector (g) Canny detector with threshold Figure 5.2: Result of simulation of various edge detection methods Notice the similarity between Sobel and Prewitt methods, but Prewitt method seems slightly blurred. Applying some threshold to the Canny method (or any other 39 methods) provides a means to fine tune the edge detection for variations in lighting and contrast. Filtering can also be applied to remove background noise, as illustrated by the horizontal and vertical elements of edge detection. For comparison, this is the result of applying Canny edge detection on the colour version (RGB) of the above photo. The simple MATLAB algorithm to obtain this manually is shown in Appendix 2. Figure 5.3: Canny edge detection of an RGB image It is obvious that the edge features are still intact when we chose to convert the RGB photos into greyscale. By comparison, the greyscale edge detection gives us better contract and details than on RGB version of the images. 5.5 Quality of images - Brightness and Contrast Adjustments An image must have the proper brightness and contrast for easy viewing. Brightness refers to the overall lightness or darkness of the image. Contrast is the difference in brightness between objects or regions. When the brightness is too high, the whitest pixels are saturated destroying the detail in these areas. The reverse where the brightness is set too low, saturates the blackest pixels. 40 It is very important that the images are captured under adequate lighting conditions so that the features of the face are obvious. It is also necessary that images are captured and stored within reasonable pixel sizes so that details are available for analysis. These conditions are illustrated by the darker image used for edge detection of features in Figure 5.4. Although applying some threshold values to the Canny method shows some improvement, the main facial features are not clear, except for the outline of the face (because the contrast against the background is very obvious). Figure 5.4: Poorly lit environment produces unreliable image data (various edge detection methods shown here) Good lighting level is a challenge to achieve in a restrictive application such as inside the vehicle, particularly so when the camera depends on external light source (sunlight). 41 CHAPTER 6 SYSTEM DESIGN 6.1 System Architecture 6.1.1 Hardware Architecture It has been noted in the introduction that the scope of this project is for simulation of the algorithms, and the physical implementation is not realised. The physical system design could consist of a camera placed discreetly at the vehicle dashboard, an interactive console for user interface, and a controller hidden conveniently, as illustrated Figure 6.1. 42 Figure 6.1: Proposed system equipment layout 6.1.2 Software Architecture The first stage uses the wavelet decomposition that helps extract intrinsic features of face images. As a result of this decomposition, we obtain four subimages (namely approximation, horizontal, vertical, and diagonal detailed images). The second stage of the approach concerns the application of classification to these four decompositions. The choice is motivated by its insensitivity to large variation in light direction, face pose, and facial expression. The last phase is concerned with the aggregation of the individual classifiers by means of the fuzzy integral. 43 Figure 6.2: Facial Recognition System Architecture An image defined in the “real world” is considered to be a function of two real variables, for example, a(x,y) with a as the amplitude (e.g. brightness) of the image at the real coordinate position (x,y). An image may be considered to contain sub-images sometimes referred to as regions–of–interest (ROIs, or simply regions). This concept reflects the fact that images frequently contain collections of objects each of which can be the basis for a region. In a sophisticated image processing system it should be possible to apply specific image processing operations to selected regions. Thus one part of an image (region) might be processed to suppress motion blur while another part might be processed to improve colour rendition. The amplitudes of a given image will almost always be either real numbers or integer numbers. The latter is usually a result of a quantization process that converts a continuous range (say, between 0 and 100%) to a discrete number of levels. A digital image a[m,n] described in a 2D discrete space is derived from an analogue image a(x,y) in a 2D continuous space through a sampling process that is frequently referred to as digitization. The 2D continuous image a(x,y) is divided into N rows and M columns. The intersection of a row and a column is termed a pixel. The value assigned to the integer coordinates [m,n] with {m=0,1,2,…,M–1} and {n=0,1,2,…,N–1} is a[m,n]. In fact, in most cases a(x,y) - which we might consider to be the physical signal that impinges on the face of a 2D sensor - is actually a 44 function of many variables including depth (z), color (λ), and time (t). Unless otherwise stated, we will consider the case of 2D, monochromatic, static images. Wavelet transforms are used to reduce image information redundancy because only a subset of the transform coefficients are necessary to preserve the most important facial features such as hair outline, eyes and mouth. When Wavelet coefficients are fed into a backpropagation neural network for classification, a high recognition rate can be achieved by using a very small proportion of transform coefficients. This makes Wavelet-based face recognition much more accurate than other approaches. 6.2 Wavelet Packet Analysis For Face Recognition The MATLAB Wavelet Toolbox is used to illustrate how to perform signal or image analysis. The proposed scheme is based on the analysis of a wavelet packet decomposition of the face images for recognition of frontal views of human faces under roughly constant illumination. Each face image is first located and then, described by a subset of band filtered images containing wavelet coefficients. From these wavelet coefficients, which characterize the face texture, we build compact and meaningful feature vectors, using simple statistical measures. Wavelet transformations are a method of representing signals across space and frequency. The signal is divided across several layers of division in space and frequency and then analyzed. The goal is to determine which space/frequency bands contain the most information about an image’s unique features, both the parts that define an image as a particular type, i.e. face, and those parts which aid in classification between different images of the same type. 45 One type of discrete wavelet transform (DWT) is the orthogonal DWT. The orthogonal DWT projects an image onto a set of orthogonal column vectors to break the image down into coarse and fine features. Since MATLAB stores most numbers in double precision, even a single image takes up a lot of memory. For instance, one copy of a 512-by-512 image uses 2 MB of memory. To avoid Out-of-Memory errors, it is important to allocate enough memory to process various image sizes; in real RAM or can be a combination of RAM and virtual memory. The general Linear Discrete Image Transform* F = PfQ (12) can be rewritten as M −1 N −1 F (u, v) = ∑∑ P(u , m) f (m, n)Q(n, v) (13) m =0 n =0 u = 0,1,...., M − 1; v = 0,1,..., N − 1 If P and Q are non-singular (non-zero determinants), inverse matrices exist and f = P −1FQ −1 (14) If P and Q are both symmetric (M=MT), real, and orthogonal (MTM = I), then F = PfQ, f = PFQ and the transform is an orthogonal transform. * Commonly known theory from various sources (15) 46 6.3 Multilevel Wavelet Decomposition In the same way as Fourier analysis, wavelets are derived from a basis function called the Mother function or analyzing wavelet. The simplest Mother function is the Haar Mother function shown below. Figure 6.3: The Haar mother function and Φ10, Φ01, Φ21 Multilevel wavelet decomposition is an iterative process, namely multiresolutional decomposition. At each iteration a lower frequency set of transformed data coefficients generated by a prior iteration is again refined to produce a substitute set of transformed data coefficients including a lower spatial frequency group and a higher spatial frequency group, called subbands. The decomposition process is iterated with successive approximations being decomposed in turn, so that one signal is broken down into many lower resolution components. This is called the “wavelet decomposition tree.” Figure 6.4: Wavelet decomposition tree Looking at a signal's wavelet decomposition tree can yield valuable information. 47 Figure 6.5: Haar wavelet decomposition to 2 levels Figure 6.6: Haar wavelet decomposition to 4 levels 48 Figure 6.7: Details of wavelet levels The coefficients in the upper left corner are related to a low resolution image while the other panels correspond to high resolution features. 49 Figure 6.8: Details of a wavelet node being compressed with threshold Figure 6.9: The inverse process (decomposition at Level 3) This Inverse Discreet Wavelet Transformation illustrates the re-composition of the image after filtering and de-noising. 50 Figure 6.10: Histogram information of a selected node The Histogram information as shown in Figure 6.10 is useful for FR methods using histogram matching, but it is not applicable for this project. Figure 6.11: Applying de-noising Wavelets are often used for data compression and image noise suppression. There are many different classifiers out there that have proved to be very effective in classifying faces. Advanced correlation filters can offer a very good matching performance in the presence of variability such as facial expression and illumination changes. The main idea is to synthesize a filter using a set of training images that would produce correlation output that reduces the correlation values at 51 locations other than the origin and this value at the origin is constrained to a specific peak value. When the filter is correlated with a test image that is authentic, the filter will exhibit sharp correlation peaks in the correlation plane. Otherwise the filter will output small correlation values. 6.4 Face Matching (ANN Of Wavelets) Figure 6.12: Operation of the Neural Network The Neural Network composes of simple elements operating in parallel. The network function is determined largely by the connections between elements. The neural network is trained to perform the features-matching function by adjusting the values of the connections or weights between elements. After the neural network structure is set, the most important thing is to prepare the training examples. Training and learning functions are mathematical procedures used to automatically adjust the network's weights and biases. The training function dictates a global algorithm that affects all the weights and biases of a given network. The learning function can be applied to individual weights and biases within a network. 52 In the beginning of the training, we select a number of face images from each person that are well aligned frontal view. Any of them can represent their host clearly. All the faces here are extracted or cut by the face detection code. These faces will be used as positive examples for their own networks and negative examples for other networks. Here we only deal with images which assume that they are always faces. The database is not supposed to handle non-face images because in our situation it is unnecessary and it will make network training very difficult. After the basic neural networks are created, we run them over new faces from the individuals in our database. If the image fails to pass the face detection test, it will be ignored. If the face detection code reports a face in the image, it will be applied to the face recognition code. We check the recognition result to find more faces for training. Once we get these new faces, we add them to training examples and retrain the neural networks. Recognition errors need to be corrected and the total performance will be improved. While adding some examples from a specific individual will improve the performance of his own network, it will also influence the performance of other networks. Detailed facial recognition programme using wavelets and neural network methods is listed in Appendix 3. In order to make the training of neural network easier, one neural net is created for each person. Each neural net identifies whether the input face is the network's class or not. The recognition algorithm selects the network with the maximum output. If the output of the selected network passes a predefined threshold, it will be reported as the class of the input face. Otherwise the input face will be rejected. This is illustrated in Figure 6.13. 53 Figure 6.13: Image matching using Neural Network The MATLAB programme for a comprehensive facial recognition algorithm is customised from sourcecode obtained from Luigi Rosa, using multi-level wavelet transform and neural network iterations, as described in Appendix 3. The adapted code was run with MATLAB 7.0 R14 SP1 and MATLAB R2006a software. The user-friendly GUI starts with firstly for the custodian to build up the database of images. An image is selected from file and saved into the database in groups or classes; each class consists of image of the same person. The face ID class is a positive integer entered progressively for each authorised person. To achieve higher reliability for good results, it is necessary to have several images per person. When a test image is compared with those in the database, the facial recognition gives as result the ID of nearest person present in database. Face Recognition is carried out by neural network training by 500,000 epochs. The overall time taken on a high speed personal computer is about one minute on Pentium4, 2GHz PC (but 15 minutes on Crusoe 780MHz subnotebook PC). 54 Other user features available from the GUI are database summary and deletion of database. Figure 6.14: The Facial Recognition System functionality 55 CHAPTER 7 CONCLUSION We have discussed the basic elements of biometrics and how wavelet transformation correlation filtering may be used to classify images within a biometric system. We explored the advantage of using wavelet packet decomposition for image classification. The results show that face images have adequate features that can be extracted using wavelet decomposition, and the matching for image verification is achieved using backpropogation neural networks. Transform coding relies on the premise that pixels in an image exhibit a certain level of correlation with their neighboring pixels. These correlations can be exploited to predict the value of a pixel from its respective neighbors. A transformation is, therefore, defined to map this spatial (correlated) data into transformed (uncorrelated) coefficients. Clearly, the transformation should utilize the fact that the information content of an individual pixel is relatively small i.e., to a large extent visual contribution of a pixel can be predicted using its neighbors. The objective of this project is to illustrate the efficacy of multilevel wavelet transformation on images for face verification. The transform helps separate the image into parts (or spectral sub-bands) of differing importance (with respect to the image's visual quality). The Discreet Wavelet Transform is similar to the Discrete 56 Fourier Transform: it transforms a signal or image from the spatial domain to the frequency domain. Efficacy of a transformation scheme can be directly gauged by its ability to pack input data into as few coefficients as possible. This allows the quantizer to discard coefficients with relatively small amplitudes without introducing visual distortion in the reconstructed image. DWT is known to exhibit excellent energy compaction for highly correlated images. The PCA, or Karhunen-Loeve Transform (KLT), method, which was used as the fundamental FR method, is a linear transform where the basis functions are taken from the statistical properties of the image data, and can thus be adaptive. It is optimal in the sense of energy compaction, i.e. it places as much energy as possible in as few coefficients as possible. However, the KLT transformation kernel is generally not separable, and thus the full matrix multiplication must be performed. KLT is data dependent and, therefore, without a fast (FFT-like) pre-computation transform. Derivation of the respective basis for each image sub-block requires unreasonable computational resources. Although, some fast KLT algorithms exist, nevertheless the overall complexity of KLT is significantly higher than the DWT algorithm. Very soon it would seem sort of old fashioned to think about putting a key into the door of your car. More cars are starting to show up with keyless ignition systems as well, so that all you need is the fob in your pocket and you can start the car up automatically. Some home lock makers are starting to move in that direction as well, working on systems for keyless home locks. 57 CHAPTER 8 FUTURE WORK Being an interesting subject which has a future prospect for academic exploration and probably potential commercial implementation, this vehicle security facial recognition project has much room for improvement. Some of them are described here to suggest future work. 8.1 Practical software/hardware implementation While this project was executed using MATLAB, it is perhaps more practical to implement the software in C++ programming so that the software could be embedded or programmed into a working DSP microcontroller. This requires conversion of MATLAB codes or totally re-written codes. The hardware may also be in the form of FPGA modules; however, the limitations of such processors and their associated software must be considered, especially so when handling large volumes of image data for processing. 58 8.2 Improving Image Quality Lighting levels can significantly affecting results, because a poorly lit image may not contain sufficient features data as shown by comparison of edge detection. A blurred or faint image would mean important facial features are difficult to differentiate from the background so much so that the image is considered noisy. The majority of face recognition algorithms appear to be sensitive to variations in illumination, such as those caused by the change in sunlight intensities throughout the day. In the majority of algorithms evaluated under FERET, changing the illumination resulted in a significant performance drop. For some algorithms, this drop was equivalent to comparing images taken over the course of a year and a half apart. Illumination is a challenging area in design an image extraction system within a covered space, such as our vehicle application. One possible solution is to include a flash to the camera which will automatically be triggered if the lighting level is below an acceptable level (which should be experimented). Alternatively, it may be possible to use a camera which captures images based on infrared energy generated by body heat (this is a totally new area that can be explored). Changing facial position can also have an effect on performance. A 15degree difference in position between the query image and the database image will adversely affect performance. At a difference of 45 degrees, recognition becomes ineffective. While this is less of a problem for vehicle user, who would naturally enter the vehicle and seat in the “right” position, there should be a method to ensure the image is captured for correct pose. This could be in the form of a flashing light to attract attention or a visual console prompting the user to glance in the camera direction. 59 Many face verification applications make it mandatory to acquire images with the same camera. 8.3 Robustness of algorithms Although we can choose from several general strategies for evaluating biometric systems, each type of biometric has its own unique properties. This uniqueness means that each biometric must be addressed individually when interpreting test results and selecting an appropriate biometric for a particular application. In the 1990s, automatic-face-recognition technology moved from the laboratory to the commercial world largely because of the rapid development of the technology, and now many applications use face recognition. The software must be tested to ensure high reliability. As an example for facial verification performance, the FERET23 tests were technology evaluations of emerging approaches to face recognition. Research groups were given a set of facial images to develop and improve their systems. The FERET evaluation measured performance for both identification and verification, and provided performance statistics for different image categories. There are still areas which require further research, though progress has been made in these areas since March 1997. 8.4 Combination Of Algorithms This is a research area which is continually being endeavoured in an attempt to achieve algorithms which have high level of efficiency, reliability and robustness. 60 Various new facial identification, recognition and verification techniques are being introduced, such as using 3D, 2.5D, facial skin colour and contour, etc., and the combinations of them. What this means is that Facial Recognition is still not perfected. 8.5 User-friendly Console Design The following figure is an example of a console design, using MATLAB “GUIDE”, from which the user can initiate the authorization process. In this design the user requests for verification, his picture will appear on the screen and the result of verification process will be shown. From there he will decide to start the vehicle or choose to not proceed (as in the case of failure of the system to recognize). This console design is for an implemented system; the process of inputting images of known users in the database takes a different route which can be developed to be user-friendly as well, but requires a high level of security to maintain reliability and robustness of the database. Figure 8.1 An example of the User Console design 61 APPENDIX A Sample Of Images Used For The Project Shown are the RGB version and their corresponding greyscale 62 APPENDIX B MATLAB Command Codes For Extracting Edge Detection Of A Coloured (RGB) Image function R=edgecolor(nm); img=imread('Alfred.jpg'); [x y z]=size(img); if z==1 rslt=edge(img,'canny'); elseif z==3 img1=rgb2ycbcr(img); dx1=edge(img1(:,:,1),'canny'); dx1=(dx1*255); img2(:,:,1)=dx1; img2(:,:,2)=img1(:,:,2); img2(:,:,3)=img1(:,:,3); rslt=ycbcr2rgb(uint8(img2)); end imshow (rslt) 63 APPENDIX C MATLAB COMMAND CODES PROGRAMMING FOR WAVELET AND NEURAL NETWORK FACIAL VERIFICATION The main MATLAB programme software used for this project was an adaptation from source codes developed by Luigi Rosa24, but will not be re-produced in this report. Illustrated below are part of algorithms to execute the Wavelet decomposition and NN learning algorithms. function [out] = findfeatures(entry) global wavelength [C,S] = wavedec2(double(entry),3,'coif1'); dimension = S(1,:); dimx = dimension(1); dimy = dimension(2); v = C(1:dimx*dimy); wavelength = length(v); out{1} = v(:); % function [out] = ann_face_matching(features) global wavelength load('face_database.dat','-mat'); P = zeros(wavelength,features_size); T = zeros(max_class-1,features_size); for ii=1:features_size v = features_data{ii,1}; v = v{1}; v = v(:); P(:,ii) = v; pos = features_data{ii,2}; 64 for jj=1:(max_class-1) if jj==pos T(jj,ii) = 1; else T(jj,ii) = -1; end end end input_vector = features{1}; %Normalization for ii=1:wavelength v = P(ii,:); v = v(:); bii = max([v;1]); aii = min([v;-1]); P(ii,:) = 2*(P(ii,:)-aii)/(bii-aii)-1; input_vector(ii) = 2*(input_vector(ii)-aii)/(bii-aii)-1; end [net] = createnn(P,T); [valmax,posmax] = max(sim(net,input_vector)); out = posmax; %------------------------------------------------------------function [net] = createnn(P,T) alphabet = P; targets [R,Q] = T; = size(inputs); [S2,Q] = size(targets); S1 = 100; %Train net = newff(minmax(inputs),[S1 S2],{'tansig' 'tansig'},'traingda'); net.LW{2,1} = net.LW{2,1}*0.01; net.b{2} = net.b{2}*0.01; net.performFcn = 'mse'; net.trainParam.goal = 0.000000001; net.trainParam.show = Inf; net.trainParam.epochs = 500000; net.trainParam.mc = 0.95; P = inputs; T = targets; [net,tr] = train(net,P,T); %------------------------------------------------------------- 65 APPENDIX D PROPOSED USER CONSOLE MATLAB GUI PROGRAMME Note: The programme is generated by MATLAB from GUI created using GUIDE. function varargout = exit(varargin) % Begin initialization code gui_Singleton = 1; gui_State = struct('gui_Name', 'gui_Singleton', mfilename, ... gui_Singleton, ... 'gui_OpeningFcn', @exit_OpeningFcn, ... 'gui_OutputFcn', @exit_OutputFcn, ... 'gui_LayoutFcn', [] , ... 'gui_Callback', []); if nargin && ischar(varargin{1}) gui_State.gui_Callback = str2func(varargin{1}); end if nargout [varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:}); else gui_mainfcn(gui_State, varargin{:}); end % End initialization code 66 % --- Executes just before exit is made visible. function exit_OpeningFcn(hObject, eventdata, handles, varargin) % % Choose default command line output for exit handles.banner = imread('C:\MATLAB7\work\Alfred.jpg') axes(handles.axes1); image(handles.banner) handles.output = hObject; % Update handles structure guidata(hObject, handles); % --- Outputs from this function are returned to the command % line. function varargout = exit_OutputFcn(hObject, eventdata, handles) % Get default command line output from handles structure varargout{1} = handles.output; % --- Executes on button press in pushbutton1. function pushbutton1_Callback(hObject, eventdata, handles) % --- Executes on button press in radiobutton1. function radiobutton1_Callback(hObject, eventdata, handles) % --- Executes on button press in radiobutton2. function radiobutton2_Callback(hObject, eventdata, handles) % --- Executes on button press in radiobutton3. function radiobutton3_Callback(hObject, eventdata, handles) % --- Executes on selection change in listbox1. function listbox1_Callback(hObject, eventdata, handles) % contents{get(hObject,'Value')} returns selected item from listbox1 % --- Executes during object creation, after setting all properties. function listbox1_CreateFcn(hObject, eventdata, handles) if ispc set(hObject,'BackgroundColor','white'); else 67 set(hObject,'BackgroundColor',get(0,'defaultUicontrolBackgroun dColor')); end function edit1_Callback(hObject, eventdata, handles) % hObject % handle to edit1 (see GCBO) str2double(get(hObject,'String')) returns contents of edit1 as a double % --- Executes during object creation, after setting all properties. function edit1_CreateFcn(hObject, eventdata, handles) % hObject handle to edit1 (see GCBO) if ispc set(hObject,'BackgroundColor','white'); else set(hObject,'BackgroundColor',get(0,'defaultUicontrolBackgroun dColor')); end 68 APPENDIX E SUPPLEMENTARY NOTES E.1 Multi-level Wavelet Transform Decomposition Mathematical transformations are applied to signals to obtain a further information from that signal that is not readily available in the raw signal. Several types of transformations can be applied, among which the Fourier transforms (FT) are probably by far the most popular. Signals in practice, are Time Domain signals in their raw format. In many cases, the most distinguished information is hidden in the frequency content of the signal. The frequency Spectrum of a signal is basically the frequency components (spectral components) of that signal. The frequency spectrum of a signal shows what frequencies exist in the signal. We can find the frequency content of a signal using FT. If the FT of a signal in time domain is taken, the frequency-amplitude representation of that signal is obtained. The wavelet transform (WT), constitute only a small portion of a huge list of transforms that are available at our disposal. Every transformation technique has its own area of application, with advantages and disadvantages. FT and WT are reversible transforms, that is, they allow to go back and forward between the raw and processed (transformed) signals. However, only either of them is available at any given time. FT gives the frequency information of the signal, 69 which means that it tells us how much of each frequency exists in the signal, but it does not tell us when in time these frequency components exist. This information is not required when the signal is so-called stationary . FT can be used for nonstationary signals, if we are only interested in what spectral components exist in the signal, but not interested where these occur. However, if this information is needed, i.e., if we want to know, what spectral component occur at what time (interval), then Fourier transform is not the right transform to use. The WT is a transform of this type which provides the time-frequency representation. WT is capable of providing the time and frequency information simultaneously, hence giving a time-frequency representation of the signal. The WT was developed as an alternative to the short time Fourier Transform STFT to overcome the resolution problem.. In STFT, the signal is divided into small enough segments, where these segments (portions) of the signal can be assumed to be stationary. For this purpose, a window function "w" is chosen. The width of this window must be equal to the segment of the signal where its stationarity is valid. The definition of the STFT is given by STFTxω (t , f ) = ∫ [ x(t ) • ω ∗ (t − t ' )] • e − j 2πft dt t The problem with the STFT has something to do with the width of the window function that is used; known as the support of the window. If the window function is narrow, than it is known as compactly supported. What gives the perfect frequency resolution in the FT is the fact that the window used in the FT is its kernel, the function, which lasts at all times from minus infinity to plus infinity. In STFT, the window is of finite length, thus it covers only a portion of the signal, which causes the frequency resolution to get poorer. In FT, the kernel function, allows us to obtain perfect frequency resolution, because the kernel itself is a window of infinite length. In STFT is window is of finite length, and we no longer have perfect frequency resolution. There are two main differences between the STFT and the continuous wavelet transform (CWT): 1. The Fourier transforms of the windowed signals are not taken, and therefore single peak will be seen corresponding to a sinusoid, i.e., negative frequencies are not computed. 70 2. The width of the window is changed as the transform is computed for every single spectral component, which is probably the most significant characteristic of the wavelet transform. The CWT is defined as, CWTxψ (τ , s ) = Ψxψ (τ , s ) = 1 ∫ x(t )ψ s ∗ ⎛ t −τ ⎞ ⎜ ⎟dt ⎝ s ⎠ where, ψ τ ,s = 1 ⎛ t −τ ⎞ ψ⎜ ⎟ s ⎝ s ⎠ This definition of the CWT shows that the wavelet analysis is a measure of similarity between the basis functions (wavelets) and the signal itself, in the sense of similar frequency content. The transformed signal is a function of two variables, τ and s , the translation and scale parameters, respectively. χ(t) is the transforming function, and it is called the mother wavelet . The term mother wavelet gets its name due to two important properties of the wavelet analysis. The term wavelet means a small wave . The smallness refers to the condition that this (window) function is of finite length (compactly supported). The term mother implies that the functions with different region of support that are used in the transformation process are derived from one main function, or the mother wavelet. In other words, the mother wavelet is a prototype for generating the other window functions. The term translation is used in the same sense as it was used in the STFT; it is related to the location of the window, as the window is shifted through the signal. This term corresponds to time information in the transform domain. In decomposition operation we split up the signal in to two parts by passing the signal from a highpass and a lowpass filter (filters should satisfy some certain conditions, so-called admissibility condition) which results in two different versions of the same signal. We continue like this until we have decomposed the signal to a 71 pre-defined certain level. Then we have a bunch of signals, which actually represent the same signal, but all corresponding to different frequency bands. The discrete wavelet transform (DWT) series is simply a sampled version of the CWT, and the information it provides is highly redundant as far as the reconstruction of the signal is concerned. This redundancy requires a significant amount of computation time and resources. The discrete wavelet transform (DWT) provides sufficient information both for analysis and synthesis of the original signal, with a significant reduction in the computation time. A time-scale representation of a digital signal is obtained using digital filtering techniques. Filters of different cutoff frequencies are used to analyze the signal at different scales. The signal is passed through a series of high pass filters to analyze the high frequencies, and it is passed through a series of low pass filters to analyze the low frequencies. The resolution of the signal, which is a measure of the amount of detail information in the signal, is changed by the filtering operations, and the scale is changed by upsampling and downsampling (subsampling) operations. Subsampling a signal corresponds to reducing the sampling rate, or removing some of the samples of the signal. Filtering a signal corresponds to the mathematical operation of convolution of the signal with the impulse response of the filter. The convolution operation in discrete time is defined as, x[n] ∗ h[n] = ∞ ∑ x[k ] • h[n − k ] k = −∞ A half band lowpass filter removes all frequencies that are above half of the highest frequency in the signal. After passing the signal through a half band lowpass filter, half of the samples can be eliminated according to the Nyquist’s rule, since the signal now has a highest frequency of π/2 radians instead of π radians. Simply discarding every other sample will subsample the signal by two, and the signal will then have half the number of points. The scale of the signal is now doubled. Resolution is related to the amount of information in the signal, and therefore, it is affected by the filtering operations. 72 This constitutes one level of decomposition and can mathematically be expressed as follows: y high [k ] = ∑ x[n] • g[2k − n] n ylow [k ] = ∑ x[n] • k[2k − n] n Figure E.1 The Subband Coding Algorithm where yhigh[k] and ylow[k] are the outputs of the highpass and lowpass filters, respectively, after subsampling by 2. This decomposition halves the time resolution since only half the number of samples now characterizes the entire signal. The bandwidth of the signal at every level is marked on the figure as "f". For a given image, we compute the DWT of, say each row, and discard all values in the DWT that are less then a certain threshold. We then save only those DWT coefficients that are above the threshold for each row, and when we need to reconstruct the original image, we simply pad each row with as many zeros as the number of discarded coefficients, and use the inverse DWT to reconstruct each row of the original image. We can also analyze the image at different frequency bands, and reconstruct the original image by using only the coefficients that are of a particular band. Examples are shown in the figures below. E.2 Wavelet Toolbox25 Algorithms Given a signal s of length N, the DWT consists of log2N stages at most. Starting from s, the first step produces two sets of coefficients: approximation coefficients cA1, and detail coefficients cD1. These vectors are obtained by 73 convolving s with the low-pass filter Lo_D for approximation, and with the high-pass filter Hi_D for detail, followed by dyadic decimation. More precisely, the first step is, The length of each filter is equal to 2n. If N= length (s), the signals F and G are of length N+ 2n - 1, and then the coefficients cA1 and cD1 are of length The next step splits the approximation coefficients cA1 in two parts using the same scheme, replacing s by cA1 and producing cA2 and cD2, and so on. Two-Dimensional DWT Decomposition Step 74 Initialisation Step: cA0 = s for the decomposition initialisation. 75 Two-Dimensional Inverse-DWT (IDWT) Reconstruction Step For J = 2, the two-dimensional wavelet tree has the following form. For biorthogonal wavelets, the same algorithms hold but the decomposition filters on one hand and the reconstruction filters on the other hand are obtained from two distinct scaling functions associated with two multiresolution analyses in duality. In this case, the filters for decomposition and reconstruction are, in general, of different odd lengths. This situation occurs, for example, for "splines" biorthogonal wavelets used in the toolbox. By zero-padding, the four filters can be extended in such a way that they will have the same even length. 76 REFERENCES 1 http://www.cars101.com/subaru/keyless.HTML, How to use your Subaru keyless entry, security and immobilizer key systems 2 http://news.bbc.co.uk/go/pr/fr/-/2/hi/asia-pacific/4396831.stm 3 http://www.traka.com/products/iFob.asp?gfx=1 4 http://www.automobile-security.com/products.htm 5 http://www.face-rec.org/algorithms/Comparisons/ 6 M.A. Turk, A.P. Pentland, Face Recognition Using Eigenfaces, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3-6 June 1991, Maui, Hawaii, USA, pp. 586-591 7 M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face Recognition by Independent Component Analysis, IEEE Trans. on Neural Networks, Vol. 13, No. 6, November 2002, pp. 1450-1464 8 J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, Face Recognition Using LDA- Based Algorithms, IEEE Trans. on Neural Networks, Vol. 14, No. 1, January 2003, pp. 195-200 9 L. Wiskott, J.-M. Fellous, N. Krueuger, C. von der Malsburg, Face Recognition by Elastic Bunch Graph Matching, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 19, No. 7, 1997, pp. 776-779 10 C. Liu, H. Wechsler, Evolutionary Pursuit and Its Application to Face Recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 22, No. 6, June 2000, pp. 570-582 11 M.-H. Yang, T. Diederich, S. Becker, Z. Ghahramani, Eds., Face Recognition Using Kernel Methods, Advances in Neural Information Processing Systems, 2002, vol. 14 77 12 S. Srisuk, M. Petrou, W. Kurutach and A. Kadyrov, Face Authentication using the Trace Transform, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'03), 16-22 June 2003, Madison, Wisconsin, USA, pp. 305-312 13 T.F. Cootes, K. Walker, C.J. Taylor, View-Based Active Appearance Models, Proc. of the IEEE International Conference on Automatic Face and Gesture Recognition, 26-30 March 2000, Grenoble, France, pp. 227-232 14 V. Blanz, T. Vetter, Face Recognition Based on Fitting a 3D Morphable Model, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 9, September 2003, pp. 1063-1074 15 A. Bronstein, M. Bronstein, R. Kimmel, and A. Spira. 3D face recognition without facial surface reconstruction, Proceedings of ECCV 2004, Prague, Czech Republic, May 11-14, 2004 16 B. Moghaddam, T. Jebara, A. Pentland, Bayesian Face Recognition, Pattern Recognition, Vol. 33, Issue 11, November 2000, pp. 1771-1782 17 G. Guo, S.Z. Li, K. Chan, Face Recognition by Support Vector Machines, Proc. of the IEEE International Conference on Automatic Face and Gesture Recognition, 26-30 March 2000, Grenoble, France, pp. 196-201 18 A.V. Nefian, M.H. Hayes III, Hidden Markov Models for Face Recognition, Proc. of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP'98, 12-15 May 1998, Seattle, Washington, USA, pp. 2721-2724 19 M. Turk and A. Pentland, Eigenfaces for recognition, Journal of Cognitive Neuroscience, 3 (1), 1991a. URL http://www.cs.ucsb.edu/ mturk/Papers/jcn.pdf. 20 21 http://www.imagemagick.org/ A. Pietrowcew, Face detection in colour images using fuzzy Hough transform, Opto-Electronics Review 11(3), 247.251 (2003) 22 R. Gonzalez and R. Woods, Digital Image Processing, Addison-Wesley Publishing Company, 1992, Chap.4 23 P.J. Phillips, The FERET Evaluation Methodology for Face-Recognition Algorithms, NISTIR 6264, Nat’l Institute of Standards and Technology, 1998, http://www.itl.nist.gov/iaui/894.03/pubs.html#face 24 http://people.na.infn.it/~rosa/ 25 http://www.mathworks.com/MATLAB Wavelet Toolbox