Scope

Scope The proposed research has the following objectives:  Interlingua Design: We will develop a design for an interlingual representation (IL2) based on a careful study of text corpora in English, Arabic, Chinese, Korean, and two LCTLs (Hindi and Persian). Building upon the representation frameworks currently in use or under development, we will develop or adopt appropriate notations and representational primitives for linguistic phenomena. The design will include a formal definition of the representation language along with coding manuals for the main components of meaning (lexical semantic concepts, events and objects, time, aspect, modality, etc.). An important part of this work will involve reducing ambiguity and vagueness in a large-scale ontological specification of meaning.  Ontology Development and Testing: We will create an ontology with coverage of the language phenomena mentioned above, and we will continue to refine the levels of specificity and granularity in the ontology. More specifically, we will analyze annotation inconsistencies and identify and correct problematic content. We will also extend the ontology as necessary. In addition, we will design, develop, test, and deploy ontology-based annotation support tools.  Corpus Annotation: We will annotate the bilingual corpora in all levels (IL0, IL1, and IL2) using the coding manuals and the interlingual representation design documents. This effort will also include a straightforward extension of those corpora as needed, without further research being required. We will also design, develop, test, and deploy annotation support tools for these languages.  Annotation Evaluation: We will design new metrics and conduct various evaluations of the interlingual representations, both of annotator agreement and for choosing a granularity of meaning representation that is appropriate for a given task. Our current metrics, based on inter-annotator reliability, will be augmented to consider also the growth rate of the interlingual representation, the ability to handle translation divergences, and the quality of the target language text that can be generated from the interlingua. We will also examine closely on a case-by-case basis the interaction between reliable coding of the univocal semantics of the text and legitimate differences in understanding/interpretation as indicated by alternative translations. Milestones Year 1  5.2.1(a) IL design o Identify phenomena to be included into IL2 o Develop or adopt appropriate notations and representational primitives for phenomena o Test annotation procedures for phenomena o Produce annotator manuals  5.2.3 Ontology creation o Continue to refine ontology levels of specificity, granularity     o Analyze annotation inconsistencies and reports to identify and correct problematic ontology content o Analyze annotations and extend ontology as necessary o Design, develop, test, and deploy ontology-based annotation support tools 5.2.1(b) Annotation o Annotate 100 texts at IL0 and IL1 levels for English (all partners) o Annotate 50 texts at IL0 and IL1 levels for Arabic (Maryland, Columbia), Korean (MITRE and ISI), Chinese (CMU and NMSU), Hindi (Columbia and Maryland, CMU), Persian (NMSU and MITRE) o Apply reconciliation procedures to annotated texts (all partners) o Apply cross-translation consistency checking to annotated texts o Design, develop, test, and deploy annotation support tools o Annotate some texts to test new IL2 ideas 5.2.1(c) Annotation evaluation o Apply evaluation measures to annotated corpora o Develop new evaluation measures to augment Kappa statistic Reporting o Write scientific papers o Produce reports o Hold telephone conferences weekly o Attend scientific conferences and present papers o Organize workshops and panels Management and administration o Attend project meetings four times a year o Attend program/PI meetings twice a year Year 2 Milestones for Years 2 and 3 will follow the same structure as Year 1 (see above). We will continue to work on IL design, Ontology, Corpus Annotation and Evaluation for English, Arabic, Chinese, Korean, Hindi and Persian incrementally. In Year 2, our focus will shift to IL2 design and annotation. We will analyze IL2 annotations produced by annotators and identify problems in IL. We will develop improved IL2 design and identify additional phenomena to be included into IL2. For corpus annotation, we will annotate IL2 levels for Arabic (Maryland, Columbia), Korean (MITRE and ISI), Chinese (CMU and NMSU), Hindi (Columbia, Maryland, CMU), Persian (NMSU and MITRE). Year 3 (Optional) In addition to the continued work on IL design, Ontology, IL2 corpus annotation and evaluation, we will further investigate and identify phenomena to be included into IL3, including discourse (multi-sentence connectives), idioms, etc.

Scope

Related documents

Products

Support

Scope

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib