Prediction of Speech for Articulation Disorder Snehal.S.Laghate , Sanjivani S.Bhabad

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 25 Number 2- July 2015
Prediction of Speech for Articulation Disorder
Snehal.S.Laghate#1, Sanjivani S.Bhabad*2
#
Student, Electronics& Telecommunication, Pune University
Nasik, India
Abstract — Speech is the most natural form of human
verbal communication. The main goal of speech recognition
is to develop techniques and system for understanding
speech and natural language. The accuracy of automatic
speech recognition for Articulatory disordered individuals is
one of the important research challenges. This paper is
confined to improve the accuracy and reliability of present
speech recognizer by prediction of speech for Articulatory
substitution errors. Support Vector Machine is used for
classification purpose. Results are obtained on MATLAB
R2012b using database created.
Sr.
No
1
System
IBM, The
Tangora
System (1985)
Keywords: ASR, MATLAB, STFT, MFCC, ZCR, SVM,
ANN.
I. INTRODUCTION
The idea of human machine interaction led to
research in Speech recognition. Automatic speech
recognition (ASR) system converts speech to string of
words by means of algorithm applied as computer
program. The two major phases involved in this are
Training phase and Recognition phase. In Training
phase, speech is recorded and parametric depiction of
speech is extracted and stored in speech database
while in recognition phase for given speech, features
are extracted and ASR system compares it with
suggestion template to recognise speech utterance.
Articulation refers to specific way an individual
produces speech. It is a methodology to process
speech signal, extract features and finally classify the
signal to detect articulation problems. These problems
mainly occur when person produces sounds, syllables
incorrectly so that listeners do not understand what is
being said or have to pay more attention to the way
words sound than to what they mean. Articulation
errors are mainly classified into substitution, omission
and distortion.
This paper is organised as follows: Section II
describes Literature review of ASR followed by
Implementation of Proposed system in Section III.
Section IV presents Software details followed by
Results in section V and Conclusion and Future scope
in last section.
II. LITERATURE REVIEW
In recent years speech recognition has reached high
levels of performance with word error rates dropping
by factor of five in past five years. Also the
performance of speech recognition has rapidly
increased with accuracy rate of 90% using different
algorithm and classifier.
Table I below shows the progress in speech
recognition system.
ISSN: 2231-5381
2
Iso.K (1990)
3
Hild. H
(1993)
4
IBM via voice
5
INRS
6
LIMSI
Lab ,Gauvian
et.al
Condition
System
dependent:5000
words vocabulary
with natural
language like
grammar with
perplexity160
Predictive Neural
network model,
speaker dependent,
vocabulary : 5000
words
Speaker dependent
1000 sentences,
multimode TDNN
Speaker
independent,vocabu
lary:32000 Chinese
words
Speaker dependent,
vocabulary:
72000 words
Large vocabulary
continuous speech
recognition system
using CDHMM
with Gaussian
mixture,
vocabulary:
65122 words
Recogni
tion rate
(%)
97.1
97.6
98.5
95
89.5
Overall
word
transcri
ption
error
13.6 %
Table I: Literature Survey of Speech Recognition System
Developing new and improved technologies for
providing robust, reliable and accurate speech
recognition for normal individuals is popular area of
research. For the Articulatory handicapped individual,
absence of standard database technologies and
diversity of articulator handicaps are major obstacles
in constructing reliable speech recognition system. It
is therefore interesting to study and develop
algorithms for speech recognition of Articulatory
disordered individual which is discussed in Section III.
III. PROPOSED SYSTEM
The Fig 1 below shows the proposed system for
Articulatory disordered individuals. The input speech
considered here is database of 1100 samples of
isolated words recorded in multi sessions in noise
proof room using RODE NT1 MIC microphone and
NUENDO 4 software. The recorded speech was
http://www.ijettjournal.org
Page 63
International Journal of Engineering Trends and Technology (IJETT) – Volume 25 Number 2- July 2015
loaded into computer through memory card and data
was stored as .wav files.
Input Speech
Pre-processing
decision boundaries in feature space which separates
different classes from each other. Classification
techniques can be categorized into supervised
techniques and unsupervised techniques. In supervised
technique the target (result) is known and is given as
input to model during learning process. The
supervised learning approach used here is Support
Vector Machine (SVM).SVM classifier finds optimal
hyper-plane that correctly separates (classifies) largest
fraction of data points while maximizing distance of
either class from hyper-plane.
IV. SOFTWARE DETAILS
The proposed system for corrected speech is
implemented using MATLAB R 12b by replacing the
incorrect speech by predicted correct speech for type
of articulation error called as substitution error. In
substitution error, for example a child may say /ko/
instead of /two/.For correction of speech supervised
algorithm called Binary SVM was used. Database
consisting of 1100 samples were partitioned into
Training set and Testing set. The training set consisted
of 90% samples and 10% samples were given to
testing set. In testing phase using the test data
comparison was made. The result showed that using
SVM classifier an accuracy of 98% is obtained.
Feature
Extraction
Feature Selection
Classification
using SVM
Corrected Speech
Fig . 1 Proposed system
Input speech considered here is isolated and
discontinuous speech as it is easy to recognise word
boundaries than continuous speech. Pre-processing is
advocated as a crucial step in development of speech
recognition as it consists of segregating the voiced
part of speech from silence or unvoiced part. This preprocessed data has a big impact on the performance of
speech classifier. Also selection of features is crucial
for classification. In this system, 183 features are
obtained by considering spectral features, temporal
features and psychoacoustic features. Frame length
used here is 256 samples with frame shift of 64
samples. Spectral features are calculated from Short
time Fourier transform (STFT) for every short time
frame of speech. These include spectral centroid,
Spectral roll off, Spectral flatness, MFCC (Mel
Frequency cepstral coefficient).Temporal features
consisted of Loudness (energy), ZCR (zero crossing
rate), and temporal centroid. After feature selection
next step is to classify signal .Classifier defines
ISSN: 2231-5381
V. RESULTS
Table II shows selected feature vector for “Zero” to
“Ten” speech sample. After feature extraction 183 X
1100 matrix was obtained in MATLAB.
Sr.No Speech Feature1 Feature2 Feature3
1
Zero
0.0088
1.8648
1.5590
2
One
0.0017
1.7391
0.0842
3
Two
0.0031
0.8596
0.3446
4
Three
0.0011
1.3387
0.7007
5
Four
0.0015
1.0564
0.6052
6
Five
0.0026
1.8625
0.8048
7
Six
0.0010
0.8931
0.4512
8
Seven
0.0017
1.1764
0.2765
9
Eight
0.0018
1.0811
0.8806
10
Nine
0.0016
1.4517
0.8358
11
Ten
0.0011
1.1878
0.4730
Table II: Selection of Feature coefficients
Fig. 2 shows simulation window obtained in Matlab
for 1100 samples.
Fig. 2: Database of 1100 samples
http://www.ijettjournal.org
Page 64
Feature4
0.0510
0.0143
0.0011
0.0122
0.0343
0.0270
0.0300
0.1019
0.0131
0.1158
0.0458
International Journal of Engineering Trends and Technology (IJETT) – Volume 25 Number 2- July 2015
[7]
Consider a speech sample of incorrect word “zero”
as shown in Fig.3.After simulation using SVM
classifier corrected word is obtained as shown in Fig 4. [8]
P. Suresh, N. Vasudevan, and N. Ananthanarayanan,
“Computer-aided interpreter for hearing and speech
impaired", 4th International Conference on Computational
Intelligence, Communication Systems and Networks, 2012.
Seddik, Ahmed Farag and El Adawy, Mohamed and Shahin,
Ahmed Ismail “A computer-aided speech disorders
correction system for Arabic language”, Advances in
Biomedical Engineering (ICABME), 2nd International
Conference on Sept 2013, pages 18-20.
Fig .3: Zero speech sample of Articulatory disordered individual
Fig 4: Prediction of corrected word “zero”
CONCLUSION AND FUTURE SCOPE: The
proposed system was implemented using SVM
classifier and accuracy of about 98% was obtained. In
this system, Articulatory disordered speech was
replaced by predicted correct speech thereby
increasing reliability of speech recognition system.
This system can also be implemented in future using
ANN (Artificial Neural Network) for better
performance using standard data set of words and
conversation.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
Aggarwal, Swapna and Sarma, Kandarpa Kumar
“Composite feature set for mood recognition in dialectal
Assamese speech”, Signal Processing and Integrated
Networks (SPIN), 2nd International Conference on Feb 2015,
pages 691- 695.
Metzger, Richard A. and Doherty, John F. and Jenkins,
David M. “Analysis of compressed speech signals in an
Automatic Speaker Recognition system”, Information
Sciences and Systems (CISS), 49th Annual Conference on
March 2015, pages1-5.
Zbancioc, M.D. and Feraru, M. “Integrated system for
prosodic features detection from speech”, Electrical and
Power Engineering (EPE), International Conference and
Exposition on Oct 2014, pages 114-117.
Preeti Saini, Parneet Kaur, “Automatic Speech Recognition:
A Review”, IJETT 2013, vol 4, pages 132-136.
Preeti Saini, Parneet Kaur, Mohit Dua, “Hindi Automatic
Speech Recognition using HTK", IJETT 2013, vol 4, pages
2223-2229.
A. S and D. P, “Survey about speech recognition and its
usage for impaired (disabled) persons", International Journal
of Scientific Engineering Research, vol 4, Issue 2, Feb 2013,
ISSN 2229-5518.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 65
Download