Document

advertisement
Voice-Based Gender Classification
Using Support Vector Machine
Project Presentation
for Class COMS E6772, Fall 2006
Student: Wenwei Wang
Advisor: Prof. Tony Jebara
Columbia University
December 11, 2006
2
Motivation

•
•

•
•

•
•
•
Gender classification plays an important role in:
Speech/Speaker recognition
Other applications, such as HCI, passive surveillance and smart living
environmental
Bi-model gender classification can improve the overall performance:
Image based gender classification performance varies with the factors,
such as the environment light and face angle;
Voice based gender classification can be degraded by the factors, such as
the environment noise and recording channels.
About this project
Focus on the voice based gender classification using Support Vector
Machines.
Gaussian Mixture Model method were used as a comparison.
Both cases of text dependent and text independent were explored.
Columbia University, Electrical Engineering Department
2
3
Voice Data Source

Train and Test Voices:
•
Each of 25 speakers were asked to read two different paragraphs, the longer
one for training voice and the shorter one for testing voice;
Recording Method:
Different offices with the normal level of noises during the working hours;
Ordinary telephone microphone;
Microsoft Sound Recorder, Version 5.1;
16 bit, 16 KHz, Mono mode;
Record length ranges from 40 to 60 seconds.
Speakers Summary: Male: 15; Female: 10
•
o
o
o
o
o
•
Columbia University, Electrical Engineering Department
3
4
Voice Feature

MFCC: The most commonly used for Speech/Speaker
Recognition/Verification
•
Feature Extraction:
Pre-emphasizing: H(z) = 1 - 0.95 Z -1
Framing: window size=500 samples; overlap=200 samples;
Filtering: hamming window;
Training Voice: 800 MFCC vectors with order of 12 per speaker;
Testing Voice: 400 MFCC vectors with order of 12 per speaker;
On the top of each MFCC vector, the delta MFCC vector, and the delta delta
MFCC delta were created (they were experimented for better results, besides
MFCC, but results showed no improvement for gender classification).
A true gender matrix with the value 1 for male or -1 for female were created
for each of MFCC vectors.
o
o
o
o
o
o
o
Columbia University, Electrical Engineering Department
4
5
SVC Implementation

SVC Model Training
•
100 frames of training MFCC per speaker; (100 frames yielded the best results based on
the overall classification performance)
For each MFCC vector, only the first 3 coefficients were used; (adding more coefficients,
or using other combinations with delta MFCC, and delta delta MFCC coefficients didn’t
improve the overall classification performance)
1st frame from each speaker generated 1st SVC model, then 2nd frame from each speaker
generated 2nd SVC model, and so on. 100 frames generated 100 SVC models;
Kernel: RFB with sigma =0.1 and cost =inf used (RFB, ERBF, and BSPLINE gave the
same good model with 100% gender classification, and for RFB, the value of sigma =0.1
and cost =inf were selected based on the overall classification performance)
SVC Classification
Text independent: 100 frames of testing MFCC per speaker; (100 frames and 3 coefficients
yielded the best results)
Text dependent: 100 frames of training MFCC per speaker; (100 frames and 3 coefficients
yielded the best results)
100 predicted gender data from 100 SVC models were simply averaged as the final gender
score.
SVC tool: SVM software written by Dr. Steve Gunn
•
•
•

•
•
•

Columbia University, Electrical Engineering Department
5
6
SVC Model

•
•
•

SVC Model Performance
SVC plot for 1st 2 dimensions of
training MFCC features
Top: for frame 1 from 25 speakers
Blue: MALE; Red: FEMALE.
Bottom: for frame 100 from 25
speakers
Blue: MALE; Red: FEMALE
As examined one by one, all 100
frames are classified 100%
accurate. Only two frames are
shown here as examples. Overall,
the SVC model is super in its
accuracy!
Columbia University, Electrical Engineering Department
6
7
SVC Text Independent Classification
•
•
•
•

25 test voices used
MFCC: 100 frames per voice;
21 voices detected correctly;
3 male voices detected as female, and 1 female voice detected as male;
Overall Gender Detection Accuracy Rate: 84%
Test
Voice
1
2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Detected
m
m f m f m m m f f f m m f m f m f f m m f f f m
Label
m
mf mf mmmf f mmmf mmmmf f mf f f m
Columbia University, Electrical Engineering Department
7
8
SVC Text Dependent Classification
•
•
•

25 train voices used
MFCC: 100 frames per voice;
All 25 voices detected correctly;
Overall Gender Detection Accuracy Rate: 100%
Test
Voice
1
2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Detected
m
m f m f m m m f f m m m f m m m m f f m f f f m
Label
m
mf mf mmmf f mmmf mmmmf f mf f f m
Columbia University, Electrical Engineering Department
8
9
GMM Implementation

GMM Model Training
•
25 x 800 frames of training MFCC were divided into two groups, male or female;
For each MFCC vector, only the first 2 coefficients were used; (adding more coefficients,
or using other combinations with delta MFCC, and delta delta MFCC coefficients didn’t
improve the overall classification performance)
Male GMM model and Female GMM model were trained from two MFCC groups;
GMM parameters: 2 dimensions, 5 mixtures, diag, 20 EM iterations (selected based on
overall classification results)
GMM Classification
Text independent: all 400 frames of testing MFCC with 1 st 2 coefficients per speaker;
Text dependent: all 800 frames of training MFCC with 1st 2 coefficients per speaker;
Each frame fed into the Male GMM model and Female GMM model, respectively. The
ratio of two resulted values decides the gender;
400 or 800 predicted gender data were simply averaged as the final gender score.
GMM tool: Netlab software written by Dr. Ian Nabney and Dr. Christopher Bishop.
•
•
•

•
•
•
•

Columbia University, Electrical Engineering Department
9
10
GMM Model
5

•
•

GMM Model Performance
PDF plots (top) of 1st 2 dimensions
of MFCC features
Left: MALE; Right: FEMALE.
The combined PDF plots in 3D
(bottom) for the 1st 2 dimensions of
MFCC features based on GMM
Red: MALE; Blue: FEMALE.
The Gaussian peaks clearly show
the differences between male and
female, but Gaussian bodies show
the overlaps. Overall the GMM
model is NOT as good as the SVC
model!
5
0.022
4
4
0.02
3
0.025
3
0.018
2
0.016
2
0.02
1
0.014
1
0
0.012
0
0.01
-1
0.015
-1
0.008
-2
0.01
-2
0.006
-3
-3
0.004
-4
0.002
-5
0.005
-4
-5
-25
-20
-15
-10
Columbia University, Electrical Engineering Department
-25
-20
-15
-10
10
11
GMM Text Independent Classification
•
•
•
•

25 test voices used
MFCC: 400 frames per voice;.
21 voices detected correctly;
3 male voices detected as female, and 1 female voice detected as male;
Overall Gender Detection Accuracy Rate: 84%
Test
Voice
1
2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Detected
m
m f m f m m m f f f m m f m f m f f f m f m f m
Label
m
mf mf mmmf f mmmf mmmmf f mf f f m
Columbia University, Electrical Engineering Department
11
12
GMM Text Dependent Classification
•
•
•
•

25 train voices used
MFCC: 800 frames per voice;
22 voices detected correctly;
2 male voices detected as female, and 1 female voice detected as male;
Overall Gender Detection Accuracy Rate: 88%
Test
Voice
1
2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
Detected
m
m f m f m m m f f f m m f m f m m f f m f m f m
Label
m
mf mf mmmf f mmmf mmmmf f mf f f m
Columbia University, Electrical Engineering Department
12
13
Result Summary

Results summary:
SVC
GMM
Model
Super model with
100% accuracy
PDF peak clearly separated; bodies
are overlapped in some degree.
Text Independent
Classification
84% accuracy
84% accuracy
Text Dependent
Classification
100% accuracy
86% accuracy
Columbia University, Electrical Engineering Department
13
14
Conclusion
•
•
•
SVC model itself is a super accurate model, and hence has more
potentials than the GMM model in the voice-based gender
classification, and possibly in other classification applications;
For text dependent type of classification, the SVC could be the
best choice;
For text independent type of classification, the SVC is one of the
choices.
Columbia University, Electrical Engineering Department
14
15
Future Work
•
•
•
•
Investigate the reasons why such a super SVC model can’t perform
well for the text independent gender classification;
Explore the possible voice features which might improve the SVC
text independent classification performance;
It could be meaningful to compare SVC performance with other
classification model, such as HMM and NNW;
Examine SVC model for other voice based classification
applications, such as age and spoken language.
Columbia University, Electrical Engineering Department
15
16
References
•
Steve R. Gunn, ‘Support Vector Machines for Classification and Regression,’
Technical report, University of Southampton, 1998.
•
W.M.Campbell, J.P.Campbell, T.P. Gleason, D.A. Reynolds, and T.R.Leek,’HighLevel Speaker Verification With Support Vector Machines,’ ICASSP, 2004.
•
And others ( will be listed in the final report)
Columbia University, Electrical Engineering Department
16
Download