Recursos lingüísticos orales para síntesis y reconocimiento

advertisement
Recursos lingüísticos orales para síntesis y reconocimiento
Doktorego kurtsoetako ikerketa
1.
Sarrera – Zertarako behar diren korpusak
What are language resources?
Recursos Linguisticos son materiales tales como bases de datos de voz grabada, lexicones, gramáticas, corporas de texto y datos
terminológicos.
Son eseciales para el desarrollo de sistemas de procesado de voz y texto robustos con amplia cobertura.
Los sistemas de procesamiento del lenguaje tienen un amplio potencial en un gran abanico de aplicaciones en la tecnologia de la
informacion.
Sin embargo, el coste de desasarrollar los recursos linguisticos necesarios para tales aplicaciones puede ser prohibitivo, incluso
para empresas muy grandes.
El problema es especialmente grave para las regiones lingüisticas en las que actualmente no existe un mercado apreciable, como
el del euskera.
2.
Qué es un corpus oral
Conjunto de señales de voz con transcripciones alineadas temporalmente. Las señales de voz pueden ser de audio ó fisiológicas,
naturales ó artificiales, en forma básica ó derivada.
3.
Formato
Los corpus de voz etiquetada han sido durante muchos años un componente crítico de la investigación en las ciencias de la voz.
Hoy en día se estan creando y compilando estos corpus para un conjunto de idiomas, disciplinas y tecnologías en rápida
expansión.
Alrededor de este objetivo se han creado formatos y herramientas en abundancia, lo cual facilita y a la vez dificulta el avance.
4.
Etiquetado lingüístico
http://www.ldc.upenn.edu/annotation/
Linguistic Annotation
"Etiquetado lingüístico" se refiere a cualquier anotación analítica ó descriptiva aplicada a los datos lingüísticos en bruto.
Los datos básicos pueden estar en forma de funciones dependientes del tiempo – grabaciones de audio, video y/ó fisiológicas – ó
pueden ser textuales.
The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and
sense tagging, syntactic analysis, "named entity" identification, co-reference annotation, and so on.
Las etiquetas añadidas pueden incluir transcripciones de todo tipo (desde características fonéticas a estructuras del discurso),
The focus is on tools which have been widely used for constructing annotated linguistic databases, and on the formats commonly
adopted by
such tools and databases. This page began as a set of links to systems for speech annotation, and the coverage of textual
annotation is still
inadequate. Please advise us of any updates or corrections.
http://www.esca-speech.org/genpres.html
ESCA is a non-profit organization. Its statutes were deposited in Grenoble, France, on March 27th,
1988, by René CARRÉ.
The European scientific and technological effort in Speech Communication is considerable. For
example, more than 3,000 people are working in this area in European public or industrial laboratories.
But the effort is too diffuse, and needs more coordination to insure better efficiency. This effort also
represents a cultural challenge: The promotion of efficient communication between European countries,
while maintaining the cultural heritage and future of each of those countries as framed in its own
language.
The main goal of the Association is "to promote Speech Communication Science and
Technology in a European context, both in the industrial and Academic areas", covering all the
aspects of Speech Communication (Acoustics, Phonetics, Phonology, Linguistics, Natural Language
Processing, Artificial Intelligence, Cognitive Science, Signal Processing, Pattern Recognition, etc.).
http://www.elsnet.org/resources/eciCorpus.html
The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and
supports existing and
projected national and international efforts to carefully design, collect and publish large-scale multilingual written and
spoken corpora.
ECI has produced Multilingual Corpus I (ECI/MCI) of over 98 million words, covering most of the major European
languages, as well as Turkish,
Japanese, Russian, Chinese, Malay and more. The primary focus in this effort is on textual material of all kinds,
including transcriptions of spoken
material.
http://www.tue.nl/ipo/sli/etrw.html
Motivation
Now that spoken dialogue systems are becoming more sophisticated, increasing demands are placed on the way these systems
deal with
prosody, both in the generation of system utterances as in the processing of user utterances. On the one hand, appropriate prosody
may
facilitate the processing of system utterances by users of a dialogue system. On the other hand, dialogue systems may profit from
taking into
consideration prosodic information of user utterances at different levels of representation.
The domain of dialogue modelling for spoken dialogue systems has received much attention in recent years, and in many cases
reference has
been made to the contribution that prosody might make to improving the performance of spoken dialogue systems. However, for
experts in
the area of prosody it is not always easy to link their work to the developments in the domain of dialogue modelling, whereas
researchers
working in the area of dialogue modelling often are rather naive with respect to prosodic modelling. We feel that bringing together
researchers from both domains will provide either group with a better view of developments in the other domain. In our view this is a
necessary pre-condition to enhance progress.
The ETRW on Dialogue and Prosody intends to provide a state-of-the-art overview of research in dialogue modelling and of
attempts to
improve the performance of spoken dialogue systems by means of the analysis and generation of prosodic features.
file:///G|/usr/imanol/www/eagle/node13.html
Spoken Language systems
There is a wide range of technologies which fall under the general banner of ``spoken language processing'' (SLP) including:
``automatic speech recognition '' ASR (also known as ``direct voice input'' DVI, and ``speech input'' SI),
``automatic speech generation'' ASG (also referred to as ``direct voice output'' DVO, ``speech synthesis'' SS, and ``text-tospeech'' TTS ),
``speech input/output'' SIO (which includes ``speech understanding systems'' SUS,
``spoken dialogue systems'' SDS, and ``speech-to-speech translation systems'' STS),
``speech coding'' (covering wide-band coding at over 4k bps, narrow-band secure voice between 1200 bps and 4k bps, and
very-low data-rate speech
communications at under 1200 bps),
``speech analysis or paralinguistic processing'' (which includes speaker identification/verification , language
identification/verification and topic spotting),
general speech processing applications such as ``speech enhancement'' and ``voice conversion'', and ``speech systems
technology'' (which is concerned with
the technology of database recording, corpus transcription , annotation , storage and distribution).
Many of these technologies rely heavily on the availability of substantial quantities of recorded speech material: first, as a source of
data from which to derive the
parameters of their constituent models (manually or automatically), and second, in order to assess their behaviour under controlled
(repeatable) test conditions.
Of course very few spoken language processing applications involve stand-alone spoken language technology. Spoken language
provides an essential component of
the more general human-computer interface alongside other input/output modalities such as handwriting, typing, pointing, imaging
and graphics (see Figure 1.3). This
means that the actions and behaviours of the speech-specific components of a spoken language system inevitably have to be
orchestrated with respect to the other
modalities and to the application itself by some form of interactive dialogue process (simultaneously taking into account the wide
range of human factors involved).
The complexity of the human-computer interface , and the subtle role of speech and language processing within it, has been (and
continues to be) a prime source of
difficulty in deploying spoken language systems in ``real'' applications. Not only are field conditions very different to laboratory
conditions, but there has been a
serious lack of agreed protocols for testing such systems and for measuring their overall effectiveness.
5.
Estandarra (SGML)
AhoDat korpusaren azalpena
(Topaketen ppt aurkezpenetik)
Datu-base etiketatua
Helburua
Iturria Kalitatea Elementu
a
Hitzen azentua Baserria
Kaseta,
Hitzak
Minidiska
Aditzen
Baserria
Kaseta,
Hitzak
deklinabidea
Minidiska
f0 kurbak
Laborategi Kaseta,
Esaldiak
a, Baserria Minidiska
Intonazioa,
Irratia,
Minidiska Pasarteak
erritmoa
Telebista
Ipuinak
Baserria
Kaseta,
Elkarrizket
Minidiska ak
Ahots fitxategiak (16 bit PCM lineala, 4-22 KHz)
Grabaketari buruzko informazioa:
Hizkera, herrialdea/baserria, hizlaria...
Informazio linguistikoa:
Transkripzio ortografikoa, fonetikoa eta prosodikoa, alderapen gradu desberdinekin
Informazio paralinguistikoa:
Lokutorearen aldaketak, beste motatako soinuak...
------Maritxu Talletako
<text>
<u who=”emakumea” trans=”smooth”>
<s> Urtekerie eztaitz, baye lenengo agiñe ataraten dana, bai. </s>
<s> Oin kante bear dxako orreri, ta talletu ganera bota. </s>
</u>
<u who=”Iñaki” trans=”overlap”>
<s>ta ze kantaten <overlap> du? </overlap> </s>
</u>
<u who=”emakumea” trans=”overlap”>
<s><overlap><rs desc=”fikziozko personaia”>Maritxu </rs> </overlap>
talletako gona gorridune itzi agin sarra ta ekarri barridxe. </s>
</u>
</text>
-------
AhoDat-en erabilpenak
(Topaketen ppt aurkezpenetik)
PROZESAGAILU LINGUISTIKOAREN HOBEKUNTZA
Download