the acquisition of a speech corpus for limited domain translation

THE ACQUISITION OF A SPEECH CORPUS FOR LIMITED DOMAIN TRANSLATION Demetrio Aiello, Loredana Cerrato, Cristina Delogu, Andrea Di Carlo {demetrio, loredana, cristina, adicarlo}@fub.it Fondazione Ugo Bordoni - Via B. Castiglione, 59 - 00142 Rome, Italy Abstract In this paper we report on the ongoing collection of the speech corpus for purposes of the ESPRIT LTR project n. 30268, EuTrans. The corpus is intended to provide training material for speaker independent continuous speech recognition over the telephone line, based on a vocabulary of few thousands words for recognition and for translation training. Due to its application the corpus is structured so to contain speech material for acoustic modelling, and textual material for language modelling and translation modelling. The speech material which is being collected, and which we will describe in this paper, has been uttered in a natural way. The corpus will be described with the aid of some statistic results obtained to better illustrate the characteristics of the acquired material. We will finally present our future plan for the collection of other parts of the corpus and in particular a new "dialogue oriented" collection paradigm will be introduced.

the acquisition of a speech corpus for limited domain translation

Related documents

Products

Support

the acquisition of a speech corpus for limited domain translation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib