Koneru.Dileep Narayan Praat Prosodic Feature Extraction Tool INTRODUCTION: There has been an increasing interest in utilizing a wide variety of knowledge sources in order to perform automatic tagging of speech events, such as sentence boundaries and dialogue acts. In addition to the word spoken, the prosodic content of the speech has been proved quite valuable in a variety of spoken language processing tasks such as sentence segmentation and tagging, dis_uency detection, dialog act segmentation and tagging, and speaker recognition.Here in this project I used an open source prosodic tool for extracting the prosodic analysis.This tool uses praat for its implementation. Feature Description Duration features: Duration features are obtained based on the word and phone alignments of human transcriptions (or ASR output). Pause duration and its normalization after each word boundary are extracted.We also measure the duration of the last vowel and the last rhyme, as well as their normalizations, for each word preceding a boundary. The duration and the normalized duration of each word are also included as duration features. F0 features: Praat's autocorrelation based pitch tracker is used to obtain raw pitch values. The pitch baseline and topline, as well as the pitch range, are computed based on the mean and variance of the logarithmic F0 values. Voiced/unvoiced (VUV) regions are identi_ed and the original pitch contour is stylized over each voiced segment. Several different types of F0 features are computed based on the stylized pitch contour. Range features: These features re_ect the pitch range of a single word or a window preceding or following a word boundary. These include the minimum, maximum, mean, and last F0 values of a speci_c region (i.e., within a word or window) relative to each word boundary. These features are also normalized by the baseline F0 values, the topline F0 values, and the pitch range using linear difference and log difference. Movement features: These features measure the movement of the F0 contour for the voiced regions of the word or window preceding and the word or window following a boundary. The minimum,maximum, mean, the _rst, and the last stylized F0 values are computed and compared to that of the following word or window, using log difference and log ratio. Slope features: Pitch slope is generated from the stylized pitch values. The last slope value of the word preceding a boundary and the _rst slope value of the word following a boundary are computed. We also include the slope difference and dynamic patterns (i.e., falling, rising, and unvoiced)across a boundary as slope features, since a continuous trajectory is more likely to correlate with non-boundaries; whereas, a broken trajectory tends to indicate a boundary of some type. Energy features: The energy features are computed based on the intensity contour produced by Praat.Similar to the F0 features, a variety of energy related range features, movement features, and slope features are computed, using various normalization methods. Other features: We add the gender type to our feature set. Currently the gender information is provided in a metadata _le, rather than obtaining it via automatic gender detection. USING THE TOOL: Here a step by step procedure is given for extracting the prosodic features from a corpus with audio and time aligned words and phones as input, our tool _rst extracts a set of basic elements (e.g., raw pitch, stylized pitch, VUV) representing duration, F0, and energy information, as is shown in Figure 1 (c). Then a set of duration statistics (e.g., the means and variances of pause duration, phone duration, and last rhyme duration), F0 related statistics (e.g., the mean and variance of logarithmic F0 values), and energy related statistics are calculated. The requirements needed for the tool and its basic elements and implementation is shown below: Procedures to obtain the basic elements that are directly needed for prosodic feature extraction. The grayed ovals represent operations implemented in the tool, while the grayed rect-angles represent the basic elements. Note that Forced Alignment is not a part of the tool, and so it appears in white. And the use of basic elements for various feature calculations can be viewed below STATISTICAL FEATURE COMPUTATION: A metadata _le is needed to provide the session ID, speaker ID, gender, and the path (which can be absolute path or relative path relative to the Praat script) to the audio _le. Our tool supports multiple sessions per speaker and makes use of the speaker information across the sessions for normalization. The TextGrid format word/phone alignments are assumed to be in the same directory as the audio _le, and their _le names (as well as names for the other _les generated for that audio recording) are hard-coded based on the name of the audio _le. For example, if the audio file is ../demo/data/demo_ C.wav, then the word and phone alignment files are both located at directory ../demo/data, and are named demo _C-word.TextGrid and demo _Cphone.TextGrid respectively.Below is the metadata file demo-wavinfo list.txt SESSION SPEAKER GENDER WAVEFORM demo_ C C female ../demo/data/demo C.wav demo_ D D male ../demo/data/demo D.wav demo _E E male ../demo/data/demo E.wav demo _F F male ../demo/data/demo F.wav demo _G G male ../demo/data/demo G.wav Here is an example of running stats _batch.praat at the command line: praat stats _batch.praat ../demo-wavinfo list.txt ../demo/work dir yes Below are the steps to run the same example in the Praat ScriptEditor: 1. Run Praat. 2. Open stats batch.praat from \Read!Read fromfile..." on the menu of Praat Objects. 3. Click \Run!Run" on the menu of ScriptEditor. 4. Enter parameters. Type ..\demo-wavinfo list.txt and ..\demo\work dir in the two boxes, and then check \yes" in the radio box. 5. Click \OK" to start processing with the con_gurations or \Cancel" to close the interface. Clicking the \Apply" button (if available) also starts processing but it keeps the interface on after the work is done. The \Standards" button (if available) gives the option to restore the default con_gurations. 6. Process related information is displayed in the Praat Info Window.After computation is complete, the statistics _les can be found at ../demo/work dir/stats _les. The desired output features can be selected by including them in the \output prosodic feature selection list" _le. This _le is a one-column table with \FEATURE NAME" as the column label in the _rst line and followed by one feature name or one feature class name per line. It is a convention that the name for a string feature, such as `GEN$', ends with symbol `$', and the name for a numeric feature does not have a `$' in the end. Below is an example of the \output prosodic feature selection list"file . FEATURE NAME WORD$ WAV$ SPKR ID$ GEN$ PAUSE DUR NORM LAST RHYME DUR DERIVE FEATURE Below are the steps to run the main_batch file in the Praat ScriptEditor: 1. Run Praat. 2. Open main batch.praat from \Read!Read from _le..." on the menu of Praat Objects. 3. Click \Run!Run" on the menu of ScriptEditor. 4. Enter parameters. Type ../demo-wavinfo list.txt, user pf name table.Tab,..\demo\work dir\stats files, and ..\demo\work dir one by one in the four boxes,and then check \yes" in the radio box. 5. Click \OK" to start processing with the con_gurations or \Cancel" to close the interface. Clicking the \Apply" button (if available) also starts processing but it keeps the interface on after the work is done. The \Standards" button (if available) gives the option to restore the default con_gurations. 6.Process related information is displayed in the Praat Info Window. After computation is complete, the prosodic feature _les can be found at ../demo/work dir/pf files. The intensity of the speech signal and its information is as below Intensity tire: Intensity information: Pitch : Pitch information: Raw pitch: Raw pitch information: Stylised pitch: Stylised pitch information: Prosodic features: Information of the prosodic features table: RESULT: The prosodic features of a speech signal whose word and phone alignment are given are extracted and the extracted prosodic features are stored in the work directory under pf_files. CONCLUSION: The extracted prosodic features can be used along with a language model to build an event detection system. BIBLOGRAPHY: [1] An Open Source Prosodic Feature Extraction Tool Zhongqiang Huang, Lei Chen, Mary Harper [2] The manual for purdue prosodic feature extraction tool.