CLARIN Questionnaire - University of Malta

advertisement
LETTER OF INVITATION FOR PARTICIPATION
Dear colleague,
Do you wish it would be easier to find Maltese language resources to use in applications or for
educational purposes?
You can contribute to this by answering this questionnaire which is attached!
This is all part of a national venture to develop "An Infrastructure for Maltese Language Technology",
funded by the Malta Council for Science and Technology (MCST) and strongly supported by the
University of Malta through its Departments of Artificial Intelligence in collaboration with the
Institute of linguistics and other members of the language technology community in Malta. This
national venture is in fact one of the expected outcomes which the project CLARIN (www.clarin.eu)
is aiming at. The CLARIN initiative aims to bring together all institutions in Europe that provide
written and spoken language resources and technologies as well as institutions especially in the
Humanities, Social Sciences and Sciences, that will make use of such language technologies in their
research.
Research and development of language technology systems needs an infrastructure of publicly
available and standardized basic resources for the Maltese language. These resources can be data,
or programs to process and use the data.
A set of such basic resources is called a BLARK - Basic Language Resource Kit.
Examples of language resources applications which you might be able to find useful for your
everyday use within your work, can include but are not limited to: digitised text or manuscripts with
search facilities, text to speech or speech to text applications, automated spell checking in Maltese,
online thesauri or dictionaries in Maltese, automated translators, and tools for processing language
data.
A BLARK has to be created for each language separately. For Maltese, there are several resources,
but it is unclear what type they are of, and to what degree they are available.
Therefore, we need to make an inventory and describe the existing language resources and how
they are used. It is also necessary to survey the need of such resources for future development and
usage.
The goal of the present work is to prepare for the creation of an infrastructure for Maltese language
technology. To make the Maltese BLARK as useful as possible, it is of great importance that
everybody who works with Maltese language technology participate in the inventory process.
The work on surveying existing basic language resources and developing missing resources will be
carried out in three phases. In the first phase, we wish to find out what resources are needed, and
get an overview of existing resources. As the next step, we will use the information gathered to
define what types of resources should be part of the Maltese BLARK, describe the existing resources,
and point out what resources are missing. Lastly, missing language resources will be developed in
the order of need.
For the survey of language resources and need, we would like to ask you to answer a questionnaire.
1|Page
CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE //
web: www.clarin.eu
email: clarin@um.edu.mt
You can do that by downloading the attached text file and sending this back to this email address.
We would like your answer as soon as possible, but before March 27th 2009, if your answers are to
be included in the overview.
During phase two, after the initial survey, we will contact you again with more specific questions
about the resources.
If you have any questions, please, do not hesitate to contact us!
Thank you for your co-operation.
Sincerely,
CLARIN Project Team
University of Malta
2|Page
CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE //
web: www.clarin.eu
email: clarin@um.edu.mt
QUESTIONNAIRE
PERSONAL CONTACT DATA
Name: ____________________________________________________________________________
Position: __________________________________________________________________________
E-mail: ____________________________________________________________________________
Web page: _________________________________________________________________________
Address, if not given on web page: _____________________________________________________
__________________________________________________________________________________
INFORMATION ABOUT THE ORGANISATION
Name of organisation: _______________________________________________________________
Organisation contact data: ____________________________________________________________
Web site: __________________________________________________________________________
Address, if not given on web site: _____________________________________________________
__________________________________________________________________________________
Type of organisation (tick where applicable)
( ) Company
( ) University
( ) Public organisation
( ) Other, please specify:
Number of employees
( ) Less than 10
( ) 10-49
( ) 50-99
( ) Over 100
Main activity
[ ] Software development
[ ] Language technology product vendor
[ ] Research
[ ] Teaching
[ ] Culture/Museum
[ ] Minority language organisation
[ ] Content provider
[ ] Interpreting/Translating/Localisation
[ ] Telecommunications
[ ] E-commerce
[ ] Other, please specify:
Main language technology area
3|Page
CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE //
web: www.clarin.eu
email: clarin@um.edu.mt
[
[
[
[
[
[
[
] Language learning
] Language resources
] Speech technologies
] Written technologies
] Search and knowledge mining
] Machine translation/Computer-assisted translation
] Other, please specify:
MALTESE LANGUAGE RESOURCE NEEDS
What should a basic LR kit for Maltese contain to fulfil your needs?
[
[
[
[
[
[
] Monolingual
] Bilingual
] Multilingual
] Equivalent (non-parallel but same domain)
] Enriched (annotated with e.g. tags, clusters)
] Translation memories
Needs for terminological databases
[ ] Monolingual
[ ] Bilingual
[ ] Multilingual
Needs for grammars
[ ] Rule-based
[ ] Language models
Needs for semantic networks
[ ] Semantic networks (wordnets, thesauri, ontologies, etc.)
Needs for genres
[ ] News
[ ] Reports
[ ] Documentation
[ ] E-mail
[ ] Chat
[ ] Balanced
[ ] Other, please specify:
Which tools do you need for processing written data?
[ ] Optical character recognition
[ ] Formatter (character encoding, file format, etc.)
[ ] Normaliser (upper/lower case, numeric expressions, etc.)
[ ] Tokeniser
[ ] Discourse segmenter
4|Page
CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE //
web: www.clarin.eu
email: clarin@um.edu.mt
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
] Sentence splitter
] Clause splitter
] Part-of-speech tagger
] Morfological segmenter (stemmer, lemmatiser, compound analyser, etc.)
] Named entity recogniser
] Chunker
] Parser
] Generator
] Lexical semantics analyser (word sense disambiguation, etc.)
] Formal semantics analyser (reference resolution, etc.)
] Term extractor
] Identifier of attitudinal expressions (attitudes, opinions, feelings, etc.)
] Text/Genre classifier
] Sentence aligner
] Word aligner
] Other, please specify:
Needs for speech language resources and types
[ ] Read speech
[ ] Spontaneous speech
[ ] Prompted speech
[ ] Dialogue speech
[ ] Multi-party speech
[ ] Other, please specify:
Which tools do you need for processing speech data?
[ ] Speech recording
[ ] Checking of recording
[ ] Orthographic labeling of speech
[ ] Phonetic labeling of speech
[ ] Linguistic labeling of speech
[ ] Pragmatic labeling of speech
[ ] Automatic speech analysis (formants, F0, etc.)
[ ] Automatic phonetic segmentation
[ ] Speech recognition - few words (voice call, etc.)
[ ] Speech recognition - a couple of thousand words (call center, etc.)
[ ] Speech recognition - dictation
[ ] Speaker recognition (verifying/identifying)
[ ] Speech response with prerecorded speech
[ ] Speech synthesis with augmented control (F0, emphasis, reductions, etc.)
[ ] Text-to-speech
[ ] Other, please specify:
INFORMATION ABOUT YOUR MALTESE LANGUAGE RESOURCES, if applicable
What resources do you have that could fit into a Language Resource Kit?
5|Page
CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE //
web: www.clarin.eu
email: clarin@um.edu.mt
These general questions are about your existing resources. Feel free to answer this part as is
applicable to your organisation/department:
Your written Language Resources
[ ] Monolingual
[ ] Bilingual
[ ] Enriched (annotated with e.g. tags, clusters)
[ ] Translation memories
Your terminological databases
[ ] Monolingual
[ ] Bilingual
Your grammars
[ ] Rule-based
[ ] Language models
Your semantic networks
[ ] Semantic networks (wordnets, thesauri, ontologies, etc.)
Your genres
[ ] News
[ ] Reports
[ ] Documentation
[ ] E-mail
[ ] Chat
[ ] Balanced
[ ] Other, please specify:
Which tools do you have for processing written data?
[ ] Optical character recognition
[ ] Formatter (character encoding, file format, etc.)
[ ] Normaliser (upper/lower case, numeric expressions, etc.)
[ ] Tokeniser
[ ] Discourse segmenter
[ ] Sentence splitter
[ ] Clause splitter
[ ] Part-of-speech tagger
[ ] Morfological segmenter (stemmer, lemmatiser, compound analyser, etc.)
[ ] Named entity recogniser
[ ] Chunker
[ ] Parser
[ ] Generator
[ ] Lexical semantics analyser (word sense disambiguation, etc.)
[ ] Formal semantics analyser (reference resolution, etc.)
[ ] Term extractor
6|Page
CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE //
web: www.clarin.eu
email: clarin@um.edu.mt
[
[
[
[
[
] Identifier of attitudinal expressions (attitudes, opinions, feelings, etc.)
] Text/Genre classifier
] Sentence aligner
] Word aligner
] Other, please specify:
Your speech language resources and types
[ ] Read speech
[ ] Spontaneous speech
[ ] Prompted speech
[ ] Dialogue speech
[ ] Multi-party speech
[ ] Other, please specify:
Which tools do you have for processing speech data?
[ ] Speech recording
[ ] Checking of recording
[ ] Orthographic labelling of speech
[ ] Phonetic labelling of speech
[ ] Linguistic labelling of speech
[ ] Pragmatic labelling of speech
[ ] Automatic speech analysis (formants, F0, etc.)
[ ] Automatic phonetic segmentation
[ ] Speech recognition - few words (voice call, etc.)
[ ] Speech recognition - a couple of thousand words (call centre, etc.)
[ ] Speech recognition - dictation
[ ] Speaker recognition (verifying/identifying)
[ ] Speech response with pre-recorded speech
[ ] Speech synthesis with augmented control (F0, emphasis, reductions, etc.)
[ ] Text-to-speech
[ ] Other, please specify:
Do you use Language Resources
[ ] produced internally?
[ ] produced by specific contracted vendors?
[ ] distributed by data centres?
If you use your own Resources, do you follow specific standards?
( ) Yes, please specify:
( ) No
ACQUISITION OF MALTESE LANGUAGE RESOURCES
Have you ever acquired Maltese Language Resources?
( ) Yes
7|Page
CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE //
web: www.clarin.eu
email: clarin@um.edu.mt
( ) No
If you have ever acquired Maltese Language Resources
From where did you acquire them?
[ ] External vendors
[ ] Others, please specify:
Did the acquired Language Resources fulfil your requirements?
( ) Yes
( ) No, please specify:
If you have never acquired Maltese Language Resources
What is the reason for not acquiring any Maltese Language Resources?
[ ] The LRs were not available or non-existing
[ ] The available LRs were too expensive
[ ] The available data did not live up to your quality requirements; please specify below.
Portability
[ ] Lack of adaptability (smooth integration)
[ ] Lack of conformance (regarding format issues)
[ ] Lack of reusability
Functionality
[ ] Lack of coverage
[ ] Lack of adequate information types
[ ] Lack of data quality (about the content)
GENERAL COMMENTS
If you have general comments on the questions or the resources, you can give them here:
8|Page
CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE //
web: www.clarin.eu
email: clarin@um.edu.mt
Download