readme - LDC Catalog

advertisement
Lincoln Laboratory Handset Database
( LLHDB )
Recorded at MIT Lincoln Laboratory
Speech Systems Technology Group
This corpus is delivered "as is" and no claims
suitability. The data may be used for research
be
further distributed or transmitted without the
Lincoln
Laboratory. Use of this data implies agreement
are made for specific
purposes only and may not
written consent of MIT
with the above conditions.
Introduction
-----------The LLHDB corpus is recordings of people speaking into different
telephone
handsets. The aim was to create a corpus for the study of
telephone transducer effects on speech which minimized confounding
factors,
such as variable telephone channels and background noise. LLHDB was
created by
having volunteers speak prompted and extemporaneous speech into different
transducers in a sound-proof room and directly digitizing the output from
the
transducers on a SunSparc A/D at a 8kHz sampling rate and a 16 bit
resolution.
There were three types of speech recorded for
each handset. First, the speaker read the rainbow passage" [Nolan 83],
a ninety-seven word passage sometimes used in phonetic research. Second,
the
speaker read 10 sentences extracted from the TIMIT (Each speaker
was was assigned to one of the TIMIT speakers and was prompted to read
each
of the TIMIT speaker's ten sentences). Finally, the speaker was asked to
describe a photograph for approximately 40 seconds. (A different
photograph
was used for each handset.) LLHDB contains speech from 53 speakers
(24 males and 29 females) recruited from the Laboratory.
Ten transducers were used, as described in the table below. Most of the
telephone handsets are not new (except el2) and were obtained from the
Lincoln
Telecom office. Handsets with obvious damage were not used, but in order
to
obtain some diversity with a limited number of handsets, handsets were
selected
to have variable sound characteristics, transducer designs or, in the
case of
electrets, different grill designs. For example, cb1-cb3 have the same
handset
manufacture name (NT G-type) but the carbon-button transducer is
different in
each. In addition, cb3 and cb4 were selected because they had
particularly
poor (although not pathological) sound characteristics.
Table 1: Transducers used in corpus.
--------------------------------------------------------------------------Transducer Name | Description
----------------|---------------------------------------------------------senh
| Sennheizer head-mounted microphone
----------------|---------------------------------------------------------pt1
| Sony portable (cord-less) telephone
----------------|---------------------------------------------------------el1
| Northern-Telecom Unity electret (3-line grill)
----------------|---------------------------------------------------------el2
| Northern-Telecom Unity Noisy-Environment electret
| (2-line grill)
----------------|---------------------------------------------------------el3
| Unknown manufacture electret (64-hole grill)
----------------|---------------------------------------------------------el4
| Radio Shack Chronophone-255 electret telephone
----------------|---------------------------------------------------------cb1
| Northern-Telecom G-type carbon-button
| (center hole membrane transducer)
----------------|---------------------------------------------------------cb2
| Northern-Telecom G-type carbon-button
| (6 hole metal transducer)
----------------|---------------------------------------------------------cb3
| Northern-Telecom G-type carbon-button
| (6 hole membrane transducer)
----------------|---------------------------------------------------------cb4
| ITT carbon-button (6 hole membrane/attached
transducer)
--------------------------------------------------------------------------The handsets are the same handset used in the collection of the HTIMIT
corpus
(also available through the LDC). It is thus possible to compare the
effects of
artificially creating transducer degradations by playing speech through
handsets to people speaking into handsets.
Data Organization
----------------The files are organized in the following hierarchy:
<Handset1> <Handset2> ... <Handset10>
________|___________
/
|
\
<spkr1> <spkr2> ... <spkr53>
______|___________
/
|
\
sa1.sph sa2.sph ... extemp.sph
The following TIMIT-style naming convention is used.
<HANDSET>/<SEX><SPEAKER_ID>/<SENTENCE_ID>.<FILE_TYPE>
where,
HANDSET :== cb1 | cb2 | cb3 | cb4 | el1 | el2 | el3 | el4 | pt1 | senh
(see Table 1 for handset code description)
SEX :== m | f
SPEAKER_ID :== <INITIALS><DIGIT>
where,
INITIALS :== speaker initials, 3 letters
DIGIT :== number 1-9 to differentiate speakers with identical
initials
SENTENCE_ID :== <TEXT_TYPE><SENTENCE_NUMBER> | rainbow | extemp
where,
TEXT_TYPE :== sa | si | sx
(see TIMIT documentation for text type description)
SENTENCE_NUMBER :== 1 ... 2342
FILE_TYPE :== sph | txt
where,
sph :== Speech waveform file with NIST Sphere header
txt :== Text of TIMIT sentences (not transcriptions of what was
actually said)
Example:
cb1/mdar/sa1.sph
(carbon-button 1 handset, male speaker, speaker-ID "dar",
sentence text "sa1", speech waveform file)
The doc directory contains the following files:
- rainbow.txt : The text of the rainbow passage spoken by all speakers on
all
handsets.
- spkrs.lst
: A list of the speakers' initials, sex and birth year.
- icassp97.ps : A Postscript version of an ICASSP paper describing the
HTIMIT and LLHDB collection procedures.
Updates:
This 2 CD-ROM set is a reprint of Lincoln Laboratory Handset Database
(LLHDB),
produced by Linguistic Data Consortium, catalog number LDC1998S68, isbn
1-58563-136-1.
Relative to the original CD-ROMs produced in 1998 by the Linguistic Data
Consortium, the extension of the audio files was changed from ".wav" to
".sph".
Download