Recursos lingüísticos orales para síntesis y reconocimiento Doktorego kurtsoetako ikerketa 1. Sarrera – Zertarako behar diren korpusak What are language resources? Recursos Linguisticos son materiales tales como bases de datos de voz grabada, lexicones, gramáticas, corporas de texto y datos terminológicos. Son eseciales para el desarrollo de sistemas de procesado de voz y texto robustos con amplia cobertura. Los sistemas de procesamiento del lenguaje tienen un amplio potencial en un gran abanico de aplicaciones en la tecnologia de la informacion. Sin embargo, el coste de desasarrollar los recursos linguisticos necesarios para tales aplicaciones puede ser prohibitivo, incluso para empresas muy grandes. El problema es especialmente grave para las regiones lingüisticas en las que actualmente no existe un mercado apreciable, como el del euskera. 2. Qué es un corpus oral Conjunto de señales de voz con transcripciones alineadas temporalmente. Las señales de voz pueden ser de audio ó fisiológicas, naturales ó artificiales, en forma básica ó derivada. 3. Formato Los corpus de voz etiquetada han sido durante muchos años un componente crítico de la investigación en las ciencias de la voz. Hoy en día se estan creando y compilando estos corpus para un conjunto de idiomas, disciplinas y tecnologías en rápida expansión. Alrededor de este objetivo se han creado formatos y herramientas en abundancia, lo cual facilita y a la vez dificulta el avance. 4. Etiquetado lingüístico http://www.ldc.upenn.edu/annotation/ Linguistic Annotation "Etiquetado lingüístico" se refiere a cualquier anotación analítica ó descriptiva aplicada a los datos lingüísticos en bruto. Los datos básicos pueden estar en forma de funciones dependientes del tiempo – grabaciones de audio, video y/ó fisiológicas – ó pueden ser textuales. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, "named entity" identification, co-reference annotation, and so on. Las etiquetas añadidas pueden incluir transcripciones de todo tipo (desde características fonéticas a estructuras del discurso), The focus is on tools which have been widely used for constructing annotated linguistic databases, and on the formats commonly adopted by such tools and databases. This page began as a set of links to systems for speech annotation, and the coverage of textual annotation is still inadequate. Please advise us of any updates or corrections. http://www.esca-speech.org/genpres.html ESCA is a non-profit organization. Its statutes were deposited in Grenoble, France, on March 27th, 1988, by René CARRÉ. The European scientific and technological effort in Speech Communication is considerable. For example, more than 3,000 people are working in this area in European public or industrial laboratories. But the effort is too diffuse, and needs more coordination to insure better efficiency. This effort also represents a cultural challenge: The promotion of efficient communication between European countries, while maintaining the cultural heritage and future of each of those countries as framed in its own language. The main goal of the Association is "to promote Speech Communication Science and Technology in a European context, both in the industrial and Academic areas", covering all the aspects of Speech Communication (Acoustics, Phonetics, Phonology, Linguistics, Natural Language Processing, Artificial Intelligence, Cognitive Science, Signal Processing, Pattern Recognition, etc.). http://www.elsnet.org/resources/eciCorpus.html The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has produced Multilingual Corpus I (ECI/MCI) of over 98 million words, covering most of the major European languages, as well as Turkish, Japanese, Russian, Chinese, Malay and more. The primary focus in this effort is on textual material of all kinds, including transcriptions of spoken material. http://www.tue.nl/ipo/sli/etrw.html Motivation Now that spoken dialogue systems are becoming more sophisticated, increasing demands are placed on the way these systems deal with prosody, both in the generation of system utterances as in the processing of user utterances. On the one hand, appropriate prosody may facilitate the processing of system utterances by users of a dialogue system. On the other hand, dialogue systems may profit from taking into consideration prosodic information of user utterances at different levels of representation. The domain of dialogue modelling for spoken dialogue systems has received much attention in recent years, and in many cases reference has been made to the contribution that prosody might make to improving the performance of spoken dialogue systems. However, for experts in the area of prosody it is not always easy to link their work to the developments in the domain of dialogue modelling, whereas researchers working in the area of dialogue modelling often are rather naive with respect to prosodic modelling. We feel that bringing together researchers from both domains will provide either group with a better view of developments in the other domain. In our view this is a necessary pre-condition to enhance progress. The ETRW on Dialogue and Prosody intends to provide a state-of-the-art overview of research in dialogue modelling and of attempts to improve the performance of spoken dialogue systems by means of the analysis and generation of prosodic features. file:///G|/usr/imanol/www/eagle/node13.html Spoken Language systems There is a wide range of technologies which fall under the general banner of ``spoken language processing'' (SLP) including: ``automatic speech recognition '' ASR (also known as ``direct voice input'' DVI, and ``speech input'' SI), ``automatic speech generation'' ASG (also referred to as ``direct voice output'' DVO, ``speech synthesis'' SS, and ``text-tospeech'' TTS ), ``speech input/output'' SIO (which includes ``speech understanding systems'' SUS, ``spoken dialogue systems'' SDS, and ``speech-to-speech translation systems'' STS), ``speech coding'' (covering wide-band coding at over 4k bps, narrow-band secure voice between 1200 bps and 4k bps, and very-low data-rate speech communications at under 1200 bps), ``speech analysis or paralinguistic processing'' (which includes speaker identification/verification , language identification/verification and topic spotting), general speech processing applications such as ``speech enhancement'' and ``voice conversion'', and ``speech systems technology'' (which is concerned with the technology of database recording, corpus transcription , annotation , storage and distribution). Many of these technologies rely heavily on the availability of substantial quantities of recorded speech material: first, as a source of data from which to derive the parameters of their constituent models (manually or automatically), and second, in order to assess their behaviour under controlled (repeatable) test conditions. Of course very few spoken language processing applications involve stand-alone spoken language technology. Spoken language provides an essential component of the more general human-computer interface alongside other input/output modalities such as handwriting, typing, pointing, imaging and graphics (see Figure 1.3). This means that the actions and behaviours of the speech-specific components of a spoken language system inevitably have to be orchestrated with respect to the other modalities and to the application itself by some form of interactive dialogue process (simultaneously taking into account the wide range of human factors involved). The complexity of the human-computer interface , and the subtle role of speech and language processing within it, has been (and continues to be) a prime source of difficulty in deploying spoken language systems in ``real'' applications. Not only are field conditions very different to laboratory conditions, but there has been a serious lack of agreed protocols for testing such systems and for measuring their overall effectiveness. 5. Estandarra (SGML) AhoDat korpusaren azalpena (Topaketen ppt aurkezpenetik) Datu-base etiketatua Helburua Iturria Kalitatea Elementu a Hitzen azentua Baserria Kaseta, Hitzak Minidiska Aditzen Baserria Kaseta, Hitzak deklinabidea Minidiska f0 kurbak Laborategi Kaseta, Esaldiak a, Baserria Minidiska Intonazioa, Irratia, Minidiska Pasarteak erritmoa Telebista Ipuinak Baserria Kaseta, Elkarrizket Minidiska ak Ahots fitxategiak (16 bit PCM lineala, 4-22 KHz) Grabaketari buruzko informazioa: Hizkera, herrialdea/baserria, hizlaria... Informazio linguistikoa: Transkripzio ortografikoa, fonetikoa eta prosodikoa, alderapen gradu desberdinekin Informazio paralinguistikoa: Lokutorearen aldaketak, beste motatako soinuak... ------Maritxu Talletako <text> <u who=”emakumea” trans=”smooth”> <s> Urtekerie eztaitz, baye lenengo agiñe ataraten dana, bai. </s> <s> Oin kante bear dxako orreri, ta talletu ganera bota. </s> </u> <u who=”Iñaki” trans=”overlap”> <s>ta ze kantaten <overlap> du? </overlap> </s> </u> <u who=”emakumea” trans=”overlap”> <s><overlap><rs desc=”fikziozko personaia”>Maritxu </rs> </overlap> talletako gona gorridune itzi agin sarra ta ekarri barridxe. </s> </u> </text> ------- AhoDat-en erabilpenak (Topaketen ppt aurkezpenetik) PROZESAGAILU LINGUISTIKOAREN HOBEKUNTZA