LETTER OF INVITATION FOR PARTICIPATION Dear colleague, Do you wish it would be easier to find Maltese language resources to use in applications or for educational purposes? You can contribute to this by answering this questionnaire which is attached! This is all part of a national venture to develop "An Infrastructure for Maltese Language Technology", funded by the Malta Council for Science and Technology (MCST) and strongly supported by the University of Malta through its Departments of Artificial Intelligence in collaboration with the Institute of linguistics and other members of the language technology community in Malta. This national venture is in fact one of the expected outcomes which the project CLARIN (www.clarin.eu) is aiming at. The CLARIN initiative aims to bring together all institutions in Europe that provide written and spoken language resources and technologies as well as institutions especially in the Humanities, Social Sciences and Sciences, that will make use of such language technologies in their research. Research and development of language technology systems needs an infrastructure of publicly available and standardized basic resources for the Maltese language. These resources can be data, or programs to process and use the data. A set of such basic resources is called a BLARK - Basic Language Resource Kit. Examples of language resources applications which you might be able to find useful for your everyday use within your work, can include but are not limited to: digitised text or manuscripts with search facilities, text to speech or speech to text applications, automated spell checking in Maltese, online thesauri or dictionaries in Maltese, automated translators, and tools for processing language data. A BLARK has to be created for each language separately. For Maltese, there are several resources, but it is unclear what type they are of, and to what degree they are available. Therefore, we need to make an inventory and describe the existing language resources and how they are used. It is also necessary to survey the need of such resources for future development and usage. The goal of the present work is to prepare for the creation of an infrastructure for Maltese language technology. To make the Maltese BLARK as useful as possible, it is of great importance that everybody who works with Maltese language technology participate in the inventory process. The work on surveying existing basic language resources and developing missing resources will be carried out in three phases. In the first phase, we wish to find out what resources are needed, and get an overview of existing resources. As the next step, we will use the information gathered to define what types of resources should be part of the Maltese BLARK, describe the existing resources, and point out what resources are missing. Lastly, missing language resources will be developed in the order of need. For the survey of language resources and need, we would like to ask you to answer a questionnaire. 1|Page CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE // web: www.clarin.eu email: clarin@um.edu.mt You can do that by downloading the attached text file and sending this back to this email address. We would like your answer as soon as possible, but before March 27th 2009, if your answers are to be included in the overview. During phase two, after the initial survey, we will contact you again with more specific questions about the resources. If you have any questions, please, do not hesitate to contact us! Thank you for your co-operation. Sincerely, CLARIN Project Team University of Malta 2|Page CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE // web: www.clarin.eu email: clarin@um.edu.mt QUESTIONNAIRE PERSONAL CONTACT DATA Name: ____________________________________________________________________________ Position: __________________________________________________________________________ E-mail: ____________________________________________________________________________ Web page: _________________________________________________________________________ Address, if not given on web page: _____________________________________________________ __________________________________________________________________________________ INFORMATION ABOUT THE ORGANISATION Name of organisation: _______________________________________________________________ Organisation contact data: ____________________________________________________________ Web site: __________________________________________________________________________ Address, if not given on web site: _____________________________________________________ __________________________________________________________________________________ Type of organisation (tick where applicable) ( ) Company ( ) University ( ) Public organisation ( ) Other, please specify: Number of employees ( ) Less than 10 ( ) 10-49 ( ) 50-99 ( ) Over 100 Main activity [ ] Software development [ ] Language technology product vendor [ ] Research [ ] Teaching [ ] Culture/Museum [ ] Minority language organisation [ ] Content provider [ ] Interpreting/Translating/Localisation [ ] Telecommunications [ ] E-commerce [ ] Other, please specify: Main language technology area 3|Page CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE // web: www.clarin.eu email: clarin@um.edu.mt [ [ [ [ [ [ [ ] Language learning ] Language resources ] Speech technologies ] Written technologies ] Search and knowledge mining ] Machine translation/Computer-assisted translation ] Other, please specify: MALTESE LANGUAGE RESOURCE NEEDS What should a basic LR kit for Maltese contain to fulfil your needs? [ [ [ [ [ [ ] Monolingual ] Bilingual ] Multilingual ] Equivalent (non-parallel but same domain) ] Enriched (annotated with e.g. tags, clusters) ] Translation memories Needs for terminological databases [ ] Monolingual [ ] Bilingual [ ] Multilingual Needs for grammars [ ] Rule-based [ ] Language models Needs for semantic networks [ ] Semantic networks (wordnets, thesauri, ontologies, etc.) Needs for genres [ ] News [ ] Reports [ ] Documentation [ ] E-mail [ ] Chat [ ] Balanced [ ] Other, please specify: Which tools do you need for processing written data? [ ] Optical character recognition [ ] Formatter (character encoding, file format, etc.) [ ] Normaliser (upper/lower case, numeric expressions, etc.) [ ] Tokeniser [ ] Discourse segmenter 4|Page CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE // web: www.clarin.eu email: clarin@um.edu.mt [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ ] Sentence splitter ] Clause splitter ] Part-of-speech tagger ] Morfological segmenter (stemmer, lemmatiser, compound analyser, etc.) ] Named entity recogniser ] Chunker ] Parser ] Generator ] Lexical semantics analyser (word sense disambiguation, etc.) ] Formal semantics analyser (reference resolution, etc.) ] Term extractor ] Identifier of attitudinal expressions (attitudes, opinions, feelings, etc.) ] Text/Genre classifier ] Sentence aligner ] Word aligner ] Other, please specify: Needs for speech language resources and types [ ] Read speech [ ] Spontaneous speech [ ] Prompted speech [ ] Dialogue speech [ ] Multi-party speech [ ] Other, please specify: Which tools do you need for processing speech data? [ ] Speech recording [ ] Checking of recording [ ] Orthographic labeling of speech [ ] Phonetic labeling of speech [ ] Linguistic labeling of speech [ ] Pragmatic labeling of speech [ ] Automatic speech analysis (formants, F0, etc.) [ ] Automatic phonetic segmentation [ ] Speech recognition - few words (voice call, etc.) [ ] Speech recognition - a couple of thousand words (call center, etc.) [ ] Speech recognition - dictation [ ] Speaker recognition (verifying/identifying) [ ] Speech response with prerecorded speech [ ] Speech synthesis with augmented control (F0, emphasis, reductions, etc.) [ ] Text-to-speech [ ] Other, please specify: INFORMATION ABOUT YOUR MALTESE LANGUAGE RESOURCES, if applicable What resources do you have that could fit into a Language Resource Kit? 5|Page CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE // web: www.clarin.eu email: clarin@um.edu.mt These general questions are about your existing resources. Feel free to answer this part as is applicable to your organisation/department: Your written Language Resources [ ] Monolingual [ ] Bilingual [ ] Enriched (annotated with e.g. tags, clusters) [ ] Translation memories Your terminological databases [ ] Monolingual [ ] Bilingual Your grammars [ ] Rule-based [ ] Language models Your semantic networks [ ] Semantic networks (wordnets, thesauri, ontologies, etc.) Your genres [ ] News [ ] Reports [ ] Documentation [ ] E-mail [ ] Chat [ ] Balanced [ ] Other, please specify: Which tools do you have for processing written data? [ ] Optical character recognition [ ] Formatter (character encoding, file format, etc.) [ ] Normaliser (upper/lower case, numeric expressions, etc.) [ ] Tokeniser [ ] Discourse segmenter [ ] Sentence splitter [ ] Clause splitter [ ] Part-of-speech tagger [ ] Morfological segmenter (stemmer, lemmatiser, compound analyser, etc.) [ ] Named entity recogniser [ ] Chunker [ ] Parser [ ] Generator [ ] Lexical semantics analyser (word sense disambiguation, etc.) [ ] Formal semantics analyser (reference resolution, etc.) [ ] Term extractor 6|Page CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE // web: www.clarin.eu email: clarin@um.edu.mt [ [ [ [ [ ] Identifier of attitudinal expressions (attitudes, opinions, feelings, etc.) ] Text/Genre classifier ] Sentence aligner ] Word aligner ] Other, please specify: Your speech language resources and types [ ] Read speech [ ] Spontaneous speech [ ] Prompted speech [ ] Dialogue speech [ ] Multi-party speech [ ] Other, please specify: Which tools do you have for processing speech data? [ ] Speech recording [ ] Checking of recording [ ] Orthographic labelling of speech [ ] Phonetic labelling of speech [ ] Linguistic labelling of speech [ ] Pragmatic labelling of speech [ ] Automatic speech analysis (formants, F0, etc.) [ ] Automatic phonetic segmentation [ ] Speech recognition - few words (voice call, etc.) [ ] Speech recognition - a couple of thousand words (call centre, etc.) [ ] Speech recognition - dictation [ ] Speaker recognition (verifying/identifying) [ ] Speech response with pre-recorded speech [ ] Speech synthesis with augmented control (F0, emphasis, reductions, etc.) [ ] Text-to-speech [ ] Other, please specify: Do you use Language Resources [ ] produced internally? [ ] produced by specific contracted vendors? [ ] distributed by data centres? If you use your own Resources, do you follow specific standards? ( ) Yes, please specify: ( ) No ACQUISITION OF MALTESE LANGUAGE RESOURCES Have you ever acquired Maltese Language Resources? ( ) Yes 7|Page CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE // web: www.clarin.eu email: clarin@um.edu.mt ( ) No If you have ever acquired Maltese Language Resources From where did you acquire them? [ ] External vendors [ ] Others, please specify: Did the acquired Language Resources fulfil your requirements? ( ) Yes ( ) No, please specify: If you have never acquired Maltese Language Resources What is the reason for not acquiring any Maltese Language Resources? [ ] The LRs were not available or non-existing [ ] The available LRs were too expensive [ ] The available data did not live up to your quality requirements; please specify below. Portability [ ] Lack of adaptability (smooth integration) [ ] Lack of conformance (regarding format issues) [ ] Lack of reusability Functionality [ ] Lack of coverage [ ] Lack of adequate information types [ ] Lack of data quality (about the content) GENERAL COMMENTS If you have general comments on the questions or the resources, you can give them here: 8|Page CLARIN // COMMON LANGUAGE RESOURCES AND TECHNOLOGY INFRASTRUCTURE // web: www.clarin.eu email: clarin@um.edu.mt