Speech tools Jean-Philippe Goldman 03.03.2004

advertisement
Speech tools
Jean-Philippe Goldman
03.03.2004
Two questions

What kind of data ?

Which task ?
2
What kind of data ?


Speech content (noise, multivoice,…)
Data File




Sound/Transcription/PitchCurve
Sampling/Quantization
16k 12k 8k 4k
8bit
Size 16k16bit,256kbps  1.9Mo/mn  115Mo/h
Format



Sound: wav, wma, mp3, ogg, aiff, aifc, au, vox, raw,
sd, CSL, Ogg/Vorbis, NIST/Sphere
Transcription: HTK, TIMIT, TextGrid, Phondat
Number of files
3
Which task ?

Visualization and Edition:


Analysis:


segmentation, labeling
Scripting:


Filtering, mixing, adding effects, prosodic manipulation
Annotation:


spectral, pitch
Speech manipulation:


Record, Play, edit, mix, add effects
Batch, communication with outside
Plotting
4
Examples of tasks





build stimuli for an experiment (i.e. crosssplicing)
manage a speech database for a TTS engine
create a prosodic database
analyze speech corpus from experiment
recordings
verify/correct an automatic segmentation
5
Two questions


What kind of data ?
Which task ?
Two rules


there is no unique tool to do everything
there are plenty of ways to do one thing
6
Tool features






Visualization/Edition
Analysis
Speech manipulation
Annotation
Scripting
Plotting





Supported format
Platform/installation
Evolution/community
Accessibility
Price
7
Softwares








Goldwave
(audio editor)
Esps Xwaves
(routines + visual.)
Praat
(speech analysis)
Wavesurfer
(speech editor)
Transcriber
(annotation tool)
Matlab
(general purpose soft)
OGI speech tools (routines + app. dev.)
…winpitch, pitchworks, phonedit, cooledit…..
8
Goldwave

self-defined as “top rated, professional digital
audio editor”
9
Goldwave



pros : edition (good gestion of memory for big
files), many FX, noise reduction, real-time
spectrum and VU meters, various formats,
batch conversion, chain effects, easy
interface
cons: nothing for speech (pitch, formant),
windows only, no scripting
Good for file edition not for speech
10
11
Esps - Waves


Developed by Entropic + AT&T. Now public
Comp.speech FAQ says:


Esps: comprehensive set of speech
analysis/processing tools
Waves is a graphical front-end for speech
processing (waveforms, spectrograms, pitch)
includes a signal labeling utility
12
13
Esps – waves


pros: powerful, designed for big files,
cons: UNIX only (free BSD), not standard
formats, requires programming skills,
development has stopped
14
Praat


Developed by P.Boersma and D.Weenink at
the Institute of Phonetic Sciences, University
of Amsterdam
general purpose speech tool : edition,
segmentation and labeling, prosodic
manipulation
15
16
Praat


pros: designed for speech analysis (not only
sound edition or spectrogram visualization),
nice GUI, scripting, active development and
community, prosodic manipulation
cons: limited scripting language, native
format of transcription and pitch files
17
WaveSurfer




Open Source tool for sound visualization and
manipulation
speech/sound analysis and sound
annotation/transcription
platform for more advanced/specialized
applications: extending WaveSurfer with new
custom plug-ins or embedding WaveSurfer
visualization components in other applications
Requires SnackToolKit
18
19
Transcriber





Authors: C. Barras, E. Geoffrois
Relies on Snack (Tcl/tk)
Good for annotation
Nice, simple GUI
No speech analysis
20
21
Matlab (Mathworks)





Math. environment
Signal processing toolbox : filter-design,
spectral analysis, waveform generation,
linear prediction
voicebox (2002) mike.brookes@ic.ac.uk
pitch determination algorithm (2002)
Xuejing Sun sunxj@northwestern.edu
colea speech editor (1998) Philip Loizou
loizou@utdallas.edu Univ of Texas-Dallas
22
Matlab (Mathworks)


pros: open, powerful, scripting, excellent
plotting
cons: poor speech community, standards, not
designed for big files
23
OGI speech tools/CSLU Toolkit




development started in 1992 in C on Unix, at Center for Spoken
Language Understanding (CSLU) at OGI
Includes :
 An X windows display tool (LYRE) display, edit speech signal,
spectrograms, phoneme labels, and other information
 a set of C library routines (LIBNSPEECH), utilities for converting
file formats, filtering, Neural Network training, vector-quantizer,
database utility to automate speech database related enquiries
 a set of PERL Scripts which have been used mainly to automate
the use of the OGI Speech Tools.
 MAN Pages
 RAD rapid application development
points of entry: Package(C), script(tcl), GUI(tk) levels
free for research use
24
25
Summary
Price
Comm
Evolut.
OS
Format
Plot
Script
Annot
Manip
Anal
Edit
Goldwave
Esps
Waves
Praat
wavesurfer
+snack
transcriber
win
$40
C sh
Unix
free
yes
native
console
sendpraat
src
free
C
tcl/tk
python
src
free
xml
free
OGI
Toolkit
matlab +
Sigproc+
packages
free
native
no BSD
stud.
$100
$40/tbx
26
= yes but requires some dev.
Expect to do conversions

Sound files



goldwave (win)
sox (unix)
Transcription files

scripts to convert text-formatted label files
27
Links










www.goldwave.com
www.speech.kth.se/software/#esps
www.praat.org
www.speech.kth.se/software/#wavesurfer
www.cse.ogi.edu/toolkit
www.mathworks.com (Matlab)
www.lpl.univ-aix.fr/~sqlab/ (phonedit)
www.sciconrd.com/pworks.htm (PitchWorks)
www.winpitch.com (WinPitch)
www.adobe.com (CoolEdit > Audition)
28
Download