Speech tools Jean-Philippe Goldman 03.03.2004 Two questions What kind of data ? Which task ? 2 What kind of data ? Speech content (noise, multivoice,…) Data File Sound/Transcription/PitchCurve Sampling/Quantization 16k 12k 8k 4k 8bit Size 16k16bit,256kbps 1.9Mo/mn 115Mo/h Format Sound: wav, wma, mp3, ogg, aiff, aifc, au, vox, raw, sd, CSL, Ogg/Vorbis, NIST/Sphere Transcription: HTK, TIMIT, TextGrid, Phondat Number of files 3 Which task ? Visualization and Edition: Analysis: segmentation, labeling Scripting: Filtering, mixing, adding effects, prosodic manipulation Annotation: spectral, pitch Speech manipulation: Record, Play, edit, mix, add effects Batch, communication with outside Plotting 4 Examples of tasks build stimuli for an experiment (i.e. crosssplicing) manage a speech database for a TTS engine create a prosodic database analyze speech corpus from experiment recordings verify/correct an automatic segmentation 5 Two questions What kind of data ? Which task ? Two rules there is no unique tool to do everything there are plenty of ways to do one thing 6 Tool features Visualization/Edition Analysis Speech manipulation Annotation Scripting Plotting Supported format Platform/installation Evolution/community Accessibility Price 7 Softwares Goldwave (audio editor) Esps Xwaves (routines + visual.) Praat (speech analysis) Wavesurfer (speech editor) Transcriber (annotation tool) Matlab (general purpose soft) OGI speech tools (routines + app. dev.) …winpitch, pitchworks, phonedit, cooledit….. 8 Goldwave self-defined as “top rated, professional digital audio editor” 9 Goldwave pros : edition (good gestion of memory for big files), many FX, noise reduction, real-time spectrum and VU meters, various formats, batch conversion, chain effects, easy interface cons: nothing for speech (pitch, formant), windows only, no scripting Good for file edition not for speech 10 11 Esps - Waves Developed by Entropic + AT&T. Now public Comp.speech FAQ says: Esps: comprehensive set of speech analysis/processing tools Waves is a graphical front-end for speech processing (waveforms, spectrograms, pitch) includes a signal labeling utility 12 13 Esps – waves pros: powerful, designed for big files, cons: UNIX only (free BSD), not standard formats, requires programming skills, development has stopped 14 Praat Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences, University of Amsterdam general purpose speech tool : edition, segmentation and labeling, prosodic manipulation 15 16 Praat pros: designed for speech analysis (not only sound edition or spectrogram visualization), nice GUI, scripting, active development and community, prosodic manipulation cons: limited scripting language, native format of transcription and pitch files 17 WaveSurfer Open Source tool for sound visualization and manipulation speech/sound analysis and sound annotation/transcription platform for more advanced/specialized applications: extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applications Requires SnackToolKit 18 19 Transcriber Authors: C. Barras, E. Geoffrois Relies on Snack (Tcl/tk) Good for annotation Nice, simple GUI No speech analysis 20 21 Matlab (Mathworks) Math. environment Signal processing toolbox : filter-design, spectral analysis, waveform generation, linear prediction voicebox (2002) mike.brookes@ic.ac.uk pitch determination algorithm (2002) Xuejing Sun sunxj@northwestern.edu colea speech editor (1998) Philip Loizou loizou@utdallas.edu Univ of Texas-Dallas 22 Matlab (Mathworks) pros: open, powerful, scripting, excellent plotting cons: poor speech community, standards, not designed for big files 23 OGI speech tools/CSLU Toolkit development started in 1992 in C on Unix, at Center for Spoken Language Understanding (CSLU) at OGI Includes : An X windows display tool (LYRE) display, edit speech signal, spectrograms, phoneme labels, and other information a set of C library routines (LIBNSPEECH), utilities for converting file formats, filtering, Neural Network training, vector-quantizer, database utility to automate speech database related enquiries a set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools. MAN Pages RAD rapid application development points of entry: Package(C), script(tcl), GUI(tk) levels free for research use 24 25 Summary Price Comm Evolut. OS Format Plot Script Annot Manip Anal Edit Goldwave Esps Waves Praat wavesurfer +snack transcriber win $40 C sh Unix free yes native console sendpraat src free C tcl/tk python src free xml free OGI Toolkit matlab + Sigproc+ packages free native no BSD stud. $100 $40/tbx 26 = yes but requires some dev. Expect to do conversions Sound files goldwave (win) sox (unix) Transcription files scripts to convert text-formatted label files 27 Links www.goldwave.com www.speech.kth.se/software/#esps www.praat.org www.speech.kth.se/software/#wavesurfer www.cse.ogi.edu/toolkit www.mathworks.com (Matlab) www.lpl.univ-aix.fr/~sqlab/ (phonedit) www.sciconrd.com/pworks.htm (PitchWorks) www.winpitch.com (WinPitch) www.adobe.com (CoolEdit > Audition) 28