2. Database Population - BioBiMo

advertisement
SP4 : Segmentation et Authentification
conjointes de deux modalités (voix/visage)
Rapport – Protocole de Base de Données
EURECOM- LIA
BIOBIMO RNRT – BIOmétrie BImodale sur MObile
http://biobimo.eurecom.fr
1
Table of Contents
1. Contents ..................................................................................... 3
2. Database Population .................................................................. 3
3. Video Format ............................................................................. 4
4. Audio Format ............................................................................. 5
5. Sessions ...................................................................................... 5
BIOBIMO RNRT – BIOmétrie BImodale sur MObile
http://biobimo.eurecom.fr
2
The RNRT project BioBiMo requirements call for the acquisition of a mini database for
the purpose of testing and demonstrating bimodal algorithms combining audio and face
recognition. This document outlines the acquisition protocol for the database.
1. Contents
The database contents were restrained by the fact that they would be used both for Audio
and face based recognition systems, thus they must exhibit the required variability.
Blinking
Each recording session will start with a number of blinks (20 Blinks per 10 seconds), for
localization of the eyes and thus the head.
Password Set
The password set consists of 10 passwords which are predefined words or short sentences
of 1 to 3 second duration. The aim of this dataset is to provide data for audio person
recognition in a password based scenario.
Question / Answer Session
This session is required to provide text independent data for behavioral face recognition.
The session consists of answers to one out of the 10 predefined questions recorded for 30
seconds.
Numbers
The person narrates numbers 1 up to 30.
2. Database Population
The database has been divided into 3 parts based on the project scenario and testing
protocol. They are
Client
The first part consists of client data of 20 speakers and all 10 sessions are recorded for
this dataset. It is to be used both for training and testing the algorithms. Each session
consists of
Password Set
Question / Answer Session
Numbers
Imposter
The aim of this dataset is to test the algorithm by presenting imposters. It consists of data
of 10 speakers and all 10 sessions are recorded for this dataset. Each session consists of
Password Set
BIOBIMO RNRT – BIOmétrie BImodale sur MObile
http://biobimo.eurecom.fr
3
Question / Answer Session
Numbers
World
This dataset is required to create a general model of the world, and would consist of
minimum 20 persons speaking random sentences for duration of 30-40 seconds.
3. Video Format
Camera Settings
Making manual setting to cameras should be avoided, i.e. whichever camera model is
selected should be able and left to set the exposure and white balance on its own.
Illumination
Illumination is one of the major concerns in visual feature extraction. Bad lighting
conditions affect the process in two ways, number one it alters the color composition
which is one of basis for feature extraction. The other being it hides the features
altogether like shadow of the nose can hide parts of the mouth.
Video resolution
This specification also plays an important role in feature extraction. If the resolution is
too low it will hinder in exact localization of feature points like the tips of the lips etc.
Thus we propose a minimum resolution of 640 X 480 pixels.
Temporal Resolution
Although it does not have a direct effect on the feature extraction but a low frame rate
can cause lose of data and consequently lose of classification information. We normally
work on 25 frames / sec.
Distance between eyes
The distance between the eyes define the size of the face in terms of pixel. This
specification is necessary to avoid a situation in which the video is 640 X 480 pixels but
the face represents only a very small portion of the image thus being totally useless. In
our study the distance between the eyes should be between 40-60 pixels.
Video compression
Although it is not feasible to avoid video compression totally, but we would prefer if
there is no compression. This is a specification that can greatly affect the performance of
our system. Compression usually introduces a blocking effect that destroys most edge
information that is used by our system.
Video Format
The standard video format that we are using right now is “avi”, but this is not a major
concern as format can be easily modified any time in the future.
BIOBIMO RNRT – BIOmétrie BImodale sur MObile
http://biobimo.eurecom.fr
4
Color
Currently our system uses videos with color depth of 16 bits per pixel, as our system uses
color also for feature extraction so we would like the color depth to either remain the
same or higher.
4. Audio Format
Recording is possible with various range of microphones, only the sampling frequency
must remain constant i.e. 16 kHz.
5. Sessions
A total of 10 sessions will have to be recorded, to ensure the necessary variability and to
collect enough data for behavioral face recognition. The following the breakdown of the
sessions.
Indoor Session
6 indoor session in a semi controlled office environment with normal office lighting and
minimal noise.
Outdoor Session
3 external recording session per person in a corridor or outside building, where other
people may be present or walking behind the speaker, with normal street noise levels.
The lighting conditions could vary with situation but levels should be consistent with
office lighting.
Studio Session
1 studio recording with studio lighting and controlled background with noise as minimal
as possible.
BIOBIMO RNRT – BIOmétrie BImodale sur MObile
http://biobimo.eurecom.fr
5
Download