International Journal of Engineering Trends and Technology (IJETT) – Volume 25 Number 2- July 2015 Prediction of Speech for Articulation Disorder Snehal.S.Laghate#1, Sanjivani S.Bhabad*2 # Student, Electronics& Telecommunication, Pune University Nasik, India Abstract — Speech is the most natural form of human verbal communication. The main goal of speech recognition is to develop techniques and system for understanding speech and natural language. The accuracy of automatic speech recognition for Articulatory disordered individuals is one of the important research challenges. This paper is confined to improve the accuracy and reliability of present speech recognizer by prediction of speech for Articulatory substitution errors. Support Vector Machine is used for classification purpose. Results are obtained on MATLAB R2012b using database created. Sr. No 1 System IBM, The Tangora System (1985) Keywords: ASR, MATLAB, STFT, MFCC, ZCR, SVM, ANN. I. INTRODUCTION The idea of human machine interaction led to research in Speech recognition. Automatic speech recognition (ASR) system converts speech to string of words by means of algorithm applied as computer program. The two major phases involved in this are Training phase and Recognition phase. In Training phase, speech is recorded and parametric depiction of speech is extracted and stored in speech database while in recognition phase for given speech, features are extracted and ASR system compares it with suggestion template to recognise speech utterance. Articulation refers to specific way an individual produces speech. It is a methodology to process speech signal, extract features and finally classify the signal to detect articulation problems. These problems mainly occur when person produces sounds, syllables incorrectly so that listeners do not understand what is being said or have to pay more attention to the way words sound than to what they mean. Articulation errors are mainly classified into substitution, omission and distortion. This paper is organised as follows: Section II describes Literature review of ASR followed by Implementation of Proposed system in Section III. Section IV presents Software details followed by Results in section V and Conclusion and Future scope in last section. II. LITERATURE REVIEW In recent years speech recognition has reached high levels of performance with word error rates dropping by factor of five in past five years. Also the performance of speech recognition has rapidly increased with accuracy rate of 90% using different algorithm and classifier. Table I below shows the progress in speech recognition system. ISSN: 2231-5381 2 Iso.K (1990) 3 Hild. H (1993) 4 IBM via voice 5 INRS 6 LIMSI Lab ,Gauvian et.al Condition System dependent:5000 words vocabulary with natural language like grammar with perplexity160 Predictive Neural network model, speaker dependent, vocabulary : 5000 words Speaker dependent 1000 sentences, multimode TDNN Speaker independent,vocabu lary:32000 Chinese words Speaker dependent, vocabulary: 72000 words Large vocabulary continuous speech recognition system using CDHMM with Gaussian mixture, vocabulary: 65122 words Recogni tion rate (%) 97.1 97.6 98.5 95 89.5 Overall word transcri ption error 13.6 % Table I: Literature Survey of Speech Recognition System Developing new and improved technologies for providing robust, reliable and accurate speech recognition for normal individuals is popular area of research. For the Articulatory handicapped individual, absence of standard database technologies and diversity of articulator handicaps are major obstacles in constructing reliable speech recognition system. It is therefore interesting to study and develop algorithms for speech recognition of Articulatory disordered individual which is discussed in Section III. III. PROPOSED SYSTEM The Fig 1 below shows the proposed system for Articulatory disordered individuals. The input speech considered here is database of 1100 samples of isolated words recorded in multi sessions in noise proof room using RODE NT1 MIC microphone and NUENDO 4 software. The recorded speech was http://www.ijettjournal.org Page 63 International Journal of Engineering Trends and Technology (IJETT) – Volume 25 Number 2- July 2015 loaded into computer through memory card and data was stored as .wav files. Input Speech Pre-processing decision boundaries in feature space which separates different classes from each other. Classification techniques can be categorized into supervised techniques and unsupervised techniques. In supervised technique the target (result) is known and is given as input to model during learning process. The supervised learning approach used here is Support Vector Machine (SVM).SVM classifier finds optimal hyper-plane that correctly separates (classifies) largest fraction of data points while maximizing distance of either class from hyper-plane. IV. SOFTWARE DETAILS The proposed system for corrected speech is implemented using MATLAB R 12b by replacing the incorrect speech by predicted correct speech for type of articulation error called as substitution error. In substitution error, for example a child may say /ko/ instead of /two/.For correction of speech supervised algorithm called Binary SVM was used. Database consisting of 1100 samples were partitioned into Training set and Testing set. The training set consisted of 90% samples and 10% samples were given to testing set. In testing phase using the test data comparison was made. The result showed that using SVM classifier an accuracy of 98% is obtained. Feature Extraction Feature Selection Classification using SVM Corrected Speech Fig . 1 Proposed system Input speech considered here is isolated and discontinuous speech as it is easy to recognise word boundaries than continuous speech. Pre-processing is advocated as a crucial step in development of speech recognition as it consists of segregating the voiced part of speech from silence or unvoiced part. This preprocessed data has a big impact on the performance of speech classifier. Also selection of features is crucial for classification. In this system, 183 features are obtained by considering spectral features, temporal features and psychoacoustic features. Frame length used here is 256 samples with frame shift of 64 samples. Spectral features are calculated from Short time Fourier transform (STFT) for every short time frame of speech. These include spectral centroid, Spectral roll off, Spectral flatness, MFCC (Mel Frequency cepstral coefficient).Temporal features consisted of Loudness (energy), ZCR (zero crossing rate), and temporal centroid. After feature selection next step is to classify signal .Classifier defines ISSN: 2231-5381 V. RESULTS Table II shows selected feature vector for “Zero” to “Ten” speech sample. After feature extraction 183 X 1100 matrix was obtained in MATLAB. Sr.No Speech Feature1 Feature2 Feature3 1 Zero 0.0088 1.8648 1.5590 2 One 0.0017 1.7391 0.0842 3 Two 0.0031 0.8596 0.3446 4 Three 0.0011 1.3387 0.7007 5 Four 0.0015 1.0564 0.6052 6 Five 0.0026 1.8625 0.8048 7 Six 0.0010 0.8931 0.4512 8 Seven 0.0017 1.1764 0.2765 9 Eight 0.0018 1.0811 0.8806 10 Nine 0.0016 1.4517 0.8358 11 Ten 0.0011 1.1878 0.4730 Table II: Selection of Feature coefficients Fig. 2 shows simulation window obtained in Matlab for 1100 samples. Fig. 2: Database of 1100 samples http://www.ijettjournal.org Page 64 Feature4 0.0510 0.0143 0.0011 0.0122 0.0343 0.0270 0.0300 0.1019 0.0131 0.1158 0.0458 International Journal of Engineering Trends and Technology (IJETT) – Volume 25 Number 2- July 2015 [7] Consider a speech sample of incorrect word “zero” as shown in Fig.3.After simulation using SVM classifier corrected word is obtained as shown in Fig 4. [8] P. Suresh, N. Vasudevan, and N. Ananthanarayanan, “Computer-aided interpreter for hearing and speech impaired", 4th International Conference on Computational Intelligence, Communication Systems and Networks, 2012. Seddik, Ahmed Farag and El Adawy, Mohamed and Shahin, Ahmed Ismail “A computer-aided speech disorders correction system for Arabic language”, Advances in Biomedical Engineering (ICABME), 2nd International Conference on Sept 2013, pages 18-20. Fig .3: Zero speech sample of Articulatory disordered individual Fig 4: Prediction of corrected word “zero” CONCLUSION AND FUTURE SCOPE: The proposed system was implemented using SVM classifier and accuracy of about 98% was obtained. In this system, Articulatory disordered speech was replaced by predicted correct speech thereby increasing reliability of speech recognition system. This system can also be implemented in future using ANN (Artificial Neural Network) for better performance using standard data set of words and conversation. REFERENCES [1] [2] [3] [4] [5] [6] Aggarwal, Swapna and Sarma, Kandarpa Kumar “Composite feature set for mood recognition in dialectal Assamese speech”, Signal Processing and Integrated Networks (SPIN), 2nd International Conference on Feb 2015, pages 691- 695. Metzger, Richard A. and Doherty, John F. and Jenkins, David M. “Analysis of compressed speech signals in an Automatic Speaker Recognition system”, Information Sciences and Systems (CISS), 49th Annual Conference on March 2015, pages1-5. Zbancioc, M.D. and Feraru, M. “Integrated system for prosodic features detection from speech”, Electrical and Power Engineering (EPE), International Conference and Exposition on Oct 2014, pages 114-117. Preeti Saini, Parneet Kaur, “Automatic Speech Recognition: A Review”, IJETT 2013, vol 4, pages 132-136. Preeti Saini, Parneet Kaur, Mohit Dua, “Hindi Automatic Speech Recognition using HTK", IJETT 2013, vol 4, pages 2223-2229. A. S and D. P, “Survey about speech recognition and its usage for impaired (disabled) persons", International Journal of Scientific Engineering Research, vol 4, Issue 2, Feb 2013, ISSN 2229-5518. ISSN: 2231-5381 http://www.ijettjournal.org Page 65