praat-prosody docume..

advertisement
Koneru.Dileep Narayan
Praat Prosodic Feature Extraction Tool
INTRODUCTION:
There has been an increasing interest in utilizing a wide variety of knowledge sources in order to
perform automatic tagging of speech events, such as sentence boundaries and dialogue acts. In
addition to the word spoken, the prosodic content of the speech has been proved quite valuable in
a variety of spoken language processing tasks such as sentence segmentation and tagging,
dis_uency detection, dialog act segmentation and tagging, and speaker recognition.Here in this
project I used an open source prosodic tool for extracting the prosodic analysis.This tool uses
praat for its implementation.
Feature Description
Duration features: Duration features are obtained based on the word and phone alignments of
human transcriptions (or ASR output). Pause duration and its normalization after each word
boundary are extracted.We also measure the duration of the last vowel and the last rhyme, as
well as their normalizations, for each word preceding a boundary. The duration and the
normalized duration of each word are also included as duration features.
F0 features: Praat's autocorrelation based pitch tracker is used to obtain raw pitch values. The
pitch baseline and topline, as well as the pitch range, are computed based on the mean and
variance of the logarithmic F0 values. Voiced/unvoiced (VUV) regions are identi_ed and the
original pitch contour is stylized over each voiced segment. Several different types of
F0 features are computed based on the stylized pitch contour.
Range features: These features re_ect the pitch range of a single word or a window preceding
or following a word boundary. These include the minimum, maximum, mean, and last F0 values
of a speci_c region (i.e., within a word or window) relative to each word boundary. These
features are also normalized by the baseline F0 values, the topline F0 values, and the pitch range
using linear difference and log difference.
Movement features: These features measure the movement of the F0 contour for the voiced
regions of the word or window preceding and the word or window following a boundary. The
minimum,maximum, mean, the _rst, and the last stylized F0 values are computed and compared
to that of the following word or window, using log difference and log ratio.
Slope features: Pitch slope is generated from the stylized pitch values. The last slope value
of the word preceding a boundary and the _rst slope value of the word following a boundary are
computed. We also include the slope difference and dynamic patterns (i.e., falling, rising, and
unvoiced)across a boundary as slope features, since a continuous trajectory is more likely to
correlate with non-boundaries; whereas, a broken trajectory tends to indicate a boundary of some
type.
Energy features: The energy features are computed based on the intensity contour produced by
Praat.Similar to the F0 features, a variety of energy related range features, movement features,
and slope features are computed, using various normalization methods.
Other features: We add the gender type to our feature set. Currently the gender information is
provided in a metadata _le, rather than obtaining it via automatic gender detection.
USING THE TOOL:
Here a step by step procedure is given for extracting the prosodic features from a corpus with
audio and time aligned words and phones as input, our tool _rst extracts a set of basic elements
(e.g., raw pitch, stylized pitch, VUV) representing duration, F0, and energy information, as is
shown in Figure 1 (c). Then a set of duration statistics (e.g., the means and variances of pause
duration, phone duration, and last rhyme duration), F0 related statistics (e.g., the mean
and variance of logarithmic F0 values), and energy related statistics are calculated.
The requirements needed for the tool and its basic elements and implementation is shown
below:
Procedures to obtain the basic elements that are directly needed for prosodic feature extraction. The grayed ovals
represent operations implemented in the tool, while the grayed rect-angles represent the basic elements. Note that
Forced Alignment is not a part of the tool, and so it appears in white.
And the use of basic elements for various feature calculations can be viewed below
STATISTICAL FEATURE COMPUTATION:
A metadata _le is needed to provide the session ID, speaker ID, gender, and the path (which
can be absolute path or relative path relative to the Praat script) to the audio _le. Our tool
supports multiple sessions per speaker and makes use of the speaker information across the
sessions for normalization. The TextGrid format word/phone alignments are assumed to be in the
same directory as the audio _le, and their _le names (as well as names for the other _les
generated for that audio recording) are hard-coded based on the name of the audio _le. For
example, if the audio file is ../demo/data/demo_ C.wav, then the word and phone alignment files
are both located at directory ../demo/data, and are named demo _C-word.TextGrid and demo _Cphone.TextGrid respectively.Below is the metadata file demo-wavinfo list.txt
SESSION
SPEAKER GENDER WAVEFORM
demo_ C
C
female
../demo/data/demo C.wav
demo_ D
D
male
../demo/data/demo D.wav
demo _E
E
male
../demo/data/demo E.wav
demo _F
F
male
../demo/data/demo F.wav
demo _G
G
male
../demo/data/demo G.wav
Here is an example of running stats _batch.praat at the command line:
praat stats _batch.praat ../demo-wavinfo list.txt ../demo/work dir yes
Below are the steps to run the same example in the Praat ScriptEditor:
1. Run Praat.
2. Open stats batch.praat from \Read!Read fromfile..." on the menu of Praat Objects.
3. Click \Run!Run" on the menu of ScriptEditor.
4. Enter parameters. Type ..\demo-wavinfo list.txt and ..\demo\work dir in the two
boxes, and then check \yes" in the radio box.
5. Click \OK" to start processing with the con_gurations or \Cancel" to close the
interface. Clicking the \Apply" button (if available) also starts processing but it
keeps the interface on after the work is done. The \Standards" button (if available)
gives the option to restore the default con_gurations.
6. Process related information is displayed in the Praat Info Window.After computation is
complete, the statistics _les can be found at ../demo/work dir/stats _les.
The desired output features can be selected by including them in the \output prosodic feature
selection list" _le. This _le is a one-column table with \FEATURE NAME" as the column label
in the _rst line and followed by one feature name or one feature class name per line. It is a
convention that the name for a string feature, such as `GEN$', ends with symbol `$', and the
name for a numeric feature does not have a `$' in the end. Below is an example of the \output
prosodic feature selection list"file
.
FEATURE NAME
WORD$
WAV$
SPKR ID$
GEN$
PAUSE DUR
NORM LAST RHYME DUR
DERIVE FEATURE
Below are the steps to run the main_batch file in the Praat ScriptEditor:
1. Run Praat.
2. Open main batch.praat from \Read!Read from _le..." on the menu of Praat Objects.
3. Click \Run!Run" on the menu of ScriptEditor.
4. Enter parameters. Type ../demo-wavinfo list.txt, user pf name table.Tab,..\demo\work
dir\stats files, and ..\demo\work dir one by one in the four boxes,and then check \yes" in
the radio box.
5. Click \OK" to start processing with the con_gurations or \Cancel" to close the
interface. Clicking the \Apply" button (if available) also starts processing but it
keeps the interface on after the work is done. The \Standards" button (if available)
gives the option to restore the default con_gurations.
6.Process related information is displayed in the Praat Info Window. After computation is
complete, the prosodic feature _les can be found at ../demo/work dir/pf files.
The intensity of the speech signal and its information is as below
Intensity tire:
Intensity information:
Pitch :
Pitch information:
Raw pitch:
Raw pitch information:
Stylised pitch:
Stylised pitch information:
Prosodic features:
Information of the prosodic features table:
RESULT:
The prosodic features of a speech signal whose word and phone alignment are given are
extracted and the extracted prosodic features are stored in the work directory under
pf_files.
CONCLUSION:
The extracted prosodic features can be used along with a language model to build an
event detection system.
BIBLOGRAPHY:
[1] An Open Source Prosodic Feature Extraction Tool
Zhongqiang Huang, Lei Chen, Mary Harper
[2] The manual for purdue prosodic feature extraction tool.
Download