International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: Email:, Volume 2, Issue 6, June 2013 ISSN 2319 - 4847 Speech Evaluation with Special Focus on Children Suffering from Problems of Vocal Tract Manasi Dixit1, Dr.Shaila Apte2 1 Department of Electronics K.I.T.’s College of Engineering, Kolhapur,416234 Maharashtra, India 2Department of Electronics and Communication Engineering, Rajarshi Shahu College of Engineering, Pune Maharashtra,India Abstract Speech disorders are very complicated in individuals suffering from problems of vocal tract. In this paper , the pathological cases of speech disabled children affected with various vocal tract health problems are analyzed. The speech signal samples of children of age between three to ten years are considered for the present study. These speech signals are digitized and the fundamental frequency analysis ,formant analysis, harmonic analysis is carried out. This analysis is conducted on speech data samples which are concerned with both place of articulation and manner of articulation. The speech disability of pathological subjects was estimated using results of above analysis. The speech samples are further processed using glottal analysis. Keywords: Pitch, formants, Jitter, Shimmer, Close Quotient estimation 1. INTRODUCTION Many children are mentally retarded and speech disabled. The speech disorders may be acquired or developmental. The characteristics are slow speech, sound distortions, prolonged durations of sounds, reduced prosody, consistent errors within an utterance, difficulties initiating speech and groping attempts to find the correct articulatory position etc. Vocal Pathologies arise due to accidents, diseases, misuse of the voice, or surgery affecting the vocal folds and have a drastic impact on patient’s life. Most voice-related pathologies are due to irregular masses or the presence of pathologies in the larynx such as vocal fold nodules or a vocal fold polyps located on the vocal folds interfering in their normal and regular vibration [1,3]. This phenomenon causes a decrease in voice quality. The voice quality is analyzed by estimating numerous parameters that indicate amplitude and frequency perturbations, the level of air leakage, the degree of turbulence, and glottis functionality. 1.1.Voice Quality Analysis Due to the vibration of the vocal folds, they may be more or less stretched so as to achieve higher or lower pitch tones. In normal conditions everybody exhibits the pitch variation ability ,leading to regular or proper phonation. Any transformation on the vocal fold’s tissue can cause an irregular, non periodic vibration which will change the shape of the glottal source signal from one period to the next, introducing jitter [9 ]. The same problem can occur in amplitude. If, the vocal folds are too stiff, they will need a higher sub glottal pressure to vibrate. The glottal cycle can thus be irregularly disturbed also in amplitude, leading to shimmer. A partial closure of the vocal folds, will cause an air leakage through the glottis, providing a turbulence effect. It creates high frequency noise during the closed phase of the glottal cycle. All these phenomena affect the glottal source signal. The estimation of the glottal source signal from the voice signal is complex phenomena[4,8]. The influence of the vocal tract is approximated by a linear filter. Using this approximation, the voice signal can be filtered by inverse of this filter to obtain an estimate of the glottal source signal [ 9]. The estimation of irregularities in the vibration of the vocal folds is commonly measured by the jitter parameter. Jitter measures the irregularities in a quasi-periodic signal. The jitter of a voiced speech signal is usually considered as a measure of the change in the duration of consecutive glottal cycles. The vocal fundamental frequency is an indicator of the biomechanical characteristics of the vocal folds as they interact with sub glottal pressure. Jitter is defined as the small cycle-to-cycle changes in frequency or frequency perturbations. The amount of variability or perturbation of the vocal signal reflects the stability of the vocal mechanism and its ability to make the necessary phonatory adjustments during speech. When fundamental frequency increases jitter decreases[2,6]. Variations of pitch perturbation is expected to change in relation to the degree of tension present in the vocal folds, where high tension reflects lower perturbation values and low tension reflects higher perturbation values. Voice onset and termination also affect the pitch, jitter, shimmer parameters. 2. Methodology 2.1. Procedure Volume 2, Issue 6, June 2013 Page 551 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: Email:, Volume 2, Issue 6, June 2013 ISSN 2319 - 4847 The present work is based on study of children speaking with Marathi as their mother tongue. The speech data of normal subjects/children and pathological subjects/children of the same age group between 3 to 10 years is collected. The children were trained to utter similar words before recording. The speech data consists of isolated words, connected words , fast uttered sentences and songs for eg. Prarthana-School-Prayer, National anthem and Pledge ,Nursery Rhymes ,famous film songs etc. The speech data was recorded using Sony Intelligent Portable Ocular Device (IPOD) in digital form. The recording was carried out in a pleasant atmosphere and maintaining the children in tension-stress free environment. The recorded signal is transformed into .wav file by using GOLDWAVE software. The data was collected at Chetana Vikas Mandir, a special school established to educate Mentally Retarded children as well as children with various disorders. It is located at Kolhapur, India. The data is also collected from the patients under the treatment of a speech therapist in Kolhapur city. 2.2. Formant and Pitch variation Analysis The Formant analysis was carried out for particular isolated words. The utterances made by 20 normal subjects were analyzed and reference /threshold level was considered for each phoneme. Various Misarticulation cases were analyzed in case of pathological subjects[11]. The spectrograms were studied for Formant analysis. Fast uttered words or continuous sentences exhibit greater complexities with respect to speech intelligibility. The Fundamental Frequency f0 and variations of f0 in speech data of various normal and pathological subjects is given in Table-2.The Formants f1, f2 ,f3 for different normal and pathological subjects is given in Table-3 2.3.Jitter Measurements The Jitter ,Shimmer and deviations in close quotient parameters are given in Table4.The graphs of deviations in close quotient and % Jitter Estimation of a Pathological subject are included in the figure. The Results obtained with the help of proposed MATLAB Algorithm were verified using Praat Open source software and SFS Open source software. This measure is commonly referred as percent jitter or relative jitter, while Jitta is the absolute jitter value expressed in microseconds. In Praat it is called Jitter (Local). Praat estimates the jitter value by computing the average absolute difference between consecutive periods (from the period sequence P0(n)), divided by the average period expressed as a percentage. N 1 1 | P 0 n 1 P 0 n | N 1 k 1 Jitter 100 N 1 1 P 0 n ( N 1) k 1 where P0(n) is the sequence of pitch periods lengths measured in microseconds. The speech signal can be effectively expressed as summation of pulsative signals generated at glottis that passes throuh vocal tract and gets radiated via oral cavity or lips.The speech signal is represented as the convolution output as follows. Each pulse representation xi(t) prior to averaging as a repetitive signal s(t) plus a noise term ei(t) xi(t) = s(t) + ei(t). (1) This representation has been used for source and radiated signals . If we denote the glottal flow waveform as g(t), the vocal tract impulse response as h(t), the radiation at lips as r(t), and the turbulent noise generated at the glottis as n(t), the components of the pulse waveform in (1) can be expressed differently for the source and radiated signals[10]. If (1) represents the excitation signal, then s(t) =g(t), and e(t) = n(t), while for radiated signals s(t) =g(t) ∗ h(t) ∗ r(t) and e(t) = n(t) ∗ h(t) ∗ r(t) 3.Observations Table-I. List of specific Marathi letters uesd with reference to place of articulation and manner of articulation Sr. Place of articulation Manner of English Letters Marathi Letters No. articulation 1 bilabial p,b plosive Pa f ba Ba bilabial m nasal ma bilabial w approximant va 2 labio-dental f,v fricative f vh 3 dental th,th fricative t qa d Qa 4 alveolar t,d plosive TzDZ Volume 2, Issue 6, June 2013 Page 552 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: Email:, Volume 2, Issue 6, June 2013 ISSN 2319 - 4847 5 6 7 Sr.No . 1 2 3 4 5 6 7 8 Sr. No. 1 2 3 4 5 6 7 8 alveolar alveolar alveolar post-alveolar nasal fricative approximant post-alveolar post-alveolar velar velar glottal affricate approximant plosive nasal fricative n s,z r/l sh,zh fricative ch,j y k ,g ng h na Na sa r la L Ya Sa Ja ca C ja ya k K ga Ga D. h Table-II.Estimation of f 0 and variations of f 0 for different subjects f0 Std Dev of f 0 f 0 Coefficient Remarks of Variance Speaker 1 256.4 17.80 Hz .07 Young Normal Female Subject Speaker 2 256.0 48.33 Hz 0.19 Old Normal Female Subject Speaker 3 275.9 32.38 Hz 0.12 Pathological Female Subject Speaker 4 259.6 103.83 Hz 0.4 Pathological Female Subject Speaker 5 128.3 58.39 Hz 046 Old Normal Male Subject Speaker 6 139.1 5.23 Hz 0.04 Young Normal Male Subject Speaker 7 242.0 69.81 Hz 0.29 Pathological Male Subject Speaker 8 109.49 Hz 0.39 Patological Male Subject 281.3 Subject Table-III. Formants and Variations of f1 ,f2 and f3 for Different Subjects Subject f1 Min f1 Max f2 Min f2 Max f3 Min f3 Max Speaker 1 Speaker 2 Speaker 3 Speaker 4 Speaker 5 Speaker 6 Speaker 7 Speaker 8 4322.01 5131.69 3242.33 4394.36 3381.39 3090.18 3516.75 4327.78 236.44 84.20 300.59 236.36 65.67 215.01 267.73 270.43 1336.95 2424.70 929.49 1766.55 1691.07 1179.32 981.88 1672.05 896.21 378.66 899.07 992.83 729.69 969.74 764.06 781.70 3016.47 3528.44 2438.73 3750.99 2749.12 2318.21 2462.33 3068.25 2409.10 1793.05 1594.21 1665.57 1533.77 1682.87 1329.79 1714.02 Table-IV. Estimation of Jitter , Shimmer and Close Quotient for Different Subjects Sr. Subject % Jitter % Shimmer %Close Quotient %Close Quotient No. Mean Std Dev 1 Speaker 1 1.553 6.077 49.5 11.9 2 Speaker 2 2.482 7.951 57.8 14.4 3 Speaker 3 1.08 4.313 38.4 14.8 4 Speaker 4 0.889 6.186 47.3 14.2 5 Speaker 5 1.494 9.33 32.0 12.9 6 Speaker 6 3.017 8.365 29.8 13.2 7 Speaker 7 2.897 11.316 45.2 16.3 8 Speaker 8 10.859 44.5 16.8 3.298 The close quotient deviations and variations in %jitter parameters for pathological female subject speaker number 4 are given below. Volume 2, Issue 6, June 2013 Page 553 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: Email:, Volume 2, Issue 6, June 2013 ISSN 2319 - 4847 4. CONCLUSION The pathological subjects affected with speech disorder related to vocal folds exhibit very weak consonant-vowelconsonant(CV-VC-CVC) production .Various types of mis articulation errors occur in different subjects. consonants .In case of pathological subjects it is observed that all the diagnostic markers namely standard deviation of pitch, variance of pitch ,jitter shimmer, close quotient mean and deviations exhibit higher range of parameters. The Formants were seen to be widely spread in pathological subjects. Acknowledgment THE AUTHORS WOULD LIKE TO THANK THE PARTICIPANTS OF RESEARCH WORK- NAMELY THE NORMAL CHILDREN, THE MENTALLY RETARDED CHILDREN WITH SPEECH DISORDER RELATED TO VOCAL FOLDS, THE DIRECTOR OF THE SPECIAL CHETANA VIKAS MANDIR MR.PAWAN KHEBUDKAR AND THE PATIENTS PARTICIPATED UNDER THE TREATMENT OF SPEECH THERAPIST IN KOLHAPUR CITY. SCHOOL References [1.] Daryush Mehta “Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification” MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2006 [2.] Kumara Shama, Anantha krishna, and Niranjan U. Cholayya Research Article “Study of Harmonics-to-Noise Ratio and Critical-Band Energy Spectrum of Speech as Acoustic Indicators of Laryngeal and Voice Pathology” Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2007, Article ID 85286, 9 pages doi:10.1155/2007/85286 [3.] Constantine Kotropoulos and Gonzalo R. Arce Research Article “Linear Classifier with Reject Option for the Detection of Vocal Fold Paralysis and Vocal Fold Edema” Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2009, Article ID 203790, 13 pages doi:10.1155/2009/203790 [4.] Zoran C´ irovic´,1 MilanMilosavljevic Research Article “Multimodal Speaker Verification Based on Electroglottograph Signal and Glottal Activity Detection” Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2010, Article ID 930376, 8 pages doi:10.1155/2010/930376 [5.] Arpit Mathur ,Shankar M Reddy ,Rajesh M Hegde “Significance of Parametric Spectral Ratio Methods in Detection and Recognition of Whispered Speech” EURASIP Journal on Advances in Signal Processing doi:10.1186/1687-6180-2012-157 [6.] DOLLE DEANN PAGEL “VOCAL FOLD PERTURBATION RATES: COMPARISON BETWEEN ULTIPLE SCLEROSIS AND NORMAL SUBJECTS” M.S.IN SPEECH AND HEARING SCIENCES Texas Tech University [7.] D´arcio G. Silva,1 Lu´ıs C. Oliveira,1 andM´ario Andrea2 Research Article “Jitter Estimation Algorithms for Detecrdwqtion of Pathological Voices” Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2009, Article ID 567875, 9 pages doi:10.1155/2009/567875 [8.] Peter J. Murphy and Olatunji O. Akande”Noise estimation in voice signals using short-term cepstral analysis” 2007 Acoustical Society of America. _DOI: 10.1121/1.2427123_PACS number_s_: 43.70.Gr, 43.70.Dn _AL_ Pages: 1679–1690 [9.] D´arcio G. Silva,1 Lu´ıs C. Oliveira,1 andM´ario Andrea2 Research Article “Jitter Estimation Algorithms for Detection of Pathological Voices” Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2009, Article ID 567875, 9 pages doi:10.1155/2009/567875 [10.] Carlos Ferrer, Eduardo Gonzalez Research Article “Removing the Influence of Shimmer in the Calculation of Harmonics-To-Noise Ratios Using Ensemble-Averages in Voice Signals” Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2009, Article ID 784379, 7 pages doi:10.1155/2009/784379 [11.] Anandthirtha B. GUDI1, H. K. Shreedhar2 and H. C. Nagaraj “ESTIMATION OF SEVERITY OF SPEECH DISABILITY THROUGH SPEECH ENVELOPE” Signal & Image Processing : An International Journal (SIPIJ) Vol.2, No.2, June 2011 Biography Mrs.Manasi Dixit is working as Associate Professor in KIT’s College of Engineering,Kolhapur .Her teaching experience is 28 years. Her main fields of interest are Digital Signal Processing,Speech Processing,Image Processing and Microwave Engineering. 16 PG students have completed their research work and have been awarded M.E.(E&TC) degree under her guidance. She has published 14 papers in reputed international journals.She has worked in the capacity of the SENATE member Shivaji University,Kolhapur and BOS-Board of Studies Member for Electronics and Telecommunication Engineering, Shivaji University,Kolhapur. Volume 2, Issue 6, June 2013 Page 554 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: Email:, Volume 2, Issue 6, June 2013 ISSN 2319 - 4847 Prof. Dr. Shaila Dinkar Apte is currently working as a professor on PG side in Rajarshi Shahu College of Engineering, Pune and as reviewer for the International Journal of Speech Technology by Springer Publication, International Journal of Digital Signal Processing, Elsevier Publication. She is currently guiding 5 Ph.D. candidates. Four candidates have completed their Ph.D. under her guidance. About 50 candidates completed their M.E. dissertations under her guidance. Almost all dissertations are in the area of signal processing. She has a vast teaching experience of 32 years in electronics engineering, and enjoys great popularity amongst students. She has been teaching Digital Signal Processing and Advanced Digital Signal Processing since last 18 years. Her previous designations include being an Assistant Professor in Walchand College of Engineering, Sangli, for 27 years; a member of board of studies for Shivaji University and a principle investigator for a research project sponsored by ARDE, New Delhi. She has published 28 papers in reputed international journals, more than 40 papers in international conferences and about 15 papers in national conferences. She has a patent published to her credit related to generation of mother wavelet from speech signal. Her books titled “Digital Signal Processing” and “Speech and Audio Processing” are published by Wiley India. A second reprint of second edition of the first book is in the market. The book titled “Advanced Digital Signal Procesing” will be published recently in June 2013 by Wiley publishers. Volume 2, Issue 6, June 2013 Page 555