Scope

advertisement
Scope
The proposed research has the following objectives:

Interlingua Design: We will develop a design for an interlingual representation (IL2)
based on a careful study of text corpora in English, Arabic, Chinese, Korean, and two
LCTLs (Hindi and Persian). Building upon the representation frameworks currently
in use or under development, we will develop or adopt appropriate notations and
representational primitives for linguistic phenomena. The design will include a formal
definition of the representation language along with coding manuals for the main
components of meaning (lexical semantic concepts, events and objects, time, aspect,
modality, etc.). An important part of this work will involve reducing ambiguity and
vagueness in a large-scale ontological specification of meaning.

Ontology Development and Testing: We will create an ontology with coverage of
the language phenomena mentioned above, and we will continue to refine the levels
of specificity and granularity in the ontology. More specifically, we will analyze
annotation inconsistencies and identify and correct problematic content. We will also
extend the ontology as necessary. In addition, we will design, develop, test, and
deploy ontology-based annotation support tools.

Corpus Annotation: We will annotate the bilingual corpora in all levels (IL0, IL1,
and IL2) using the coding manuals and the interlingual representation design
documents. This effort will also include a straightforward extension of those
corpora as needed, without further research being required. We will also design,
develop, test, and deploy annotation support tools for these languages.

Annotation Evaluation: We will design new metrics and conduct various
evaluations of the interlingual representations, both of annotator agreement and for
choosing a granularity of meaning representation that is appropriate for a given task.
Our current metrics, based on inter-annotator reliability, will be augmented to
consider also the growth rate of the interlingual representation, the ability to handle
translation divergences, and the quality of the target language text that can be
generated from the interlingua. We will also examine closely on a case-by-case
basis the interaction between reliable coding of the univocal semantics of the text and
legitimate differences in understanding/interpretation as indicated by alternative
translations.
Milestones
Year 1
 5.2.1(a) IL design
o Identify phenomena to be included into IL2
o Develop or adopt appropriate notations and representational primitives for
phenomena
o Test annotation procedures for phenomena
o Produce annotator manuals
 5.2.3 Ontology creation
o Continue to refine ontology levels of specificity, granularity




o Analyze annotation inconsistencies and reports to identify and correct
problematic ontology content
o Analyze annotations and extend ontology as necessary
o Design, develop, test, and deploy ontology-based annotation support tools
5.2.1(b) Annotation
o Annotate 100 texts at IL0 and IL1 levels for English (all partners)
o Annotate 50 texts at IL0 and IL1 levels for Arabic (Maryland, Columbia),
Korean (MITRE and ISI), Chinese (CMU and NMSU), Hindi (Columbia and
Maryland, CMU), Persian (NMSU and MITRE)
o Apply reconciliation procedures to annotated texts (all partners)
o Apply cross-translation consistency checking to annotated texts
o Design, develop, test, and deploy annotation support tools
o Annotate some texts to test new IL2 ideas
5.2.1(c) Annotation evaluation
o Apply evaluation measures to annotated corpora
o Develop new evaluation measures to augment Kappa statistic
Reporting
o Write scientific papers
o Produce reports
o Hold telephone conferences weekly
o Attend scientific conferences and present papers
o Organize workshops and panels
Management and administration
o Attend project meetings four times a year
o Attend program/PI meetings twice a year
Year 2
Milestones for Years 2 and 3 will follow the same structure as Year 1 (see above).
We will continue to work on IL design, Ontology, Corpus Annotation and Evaluation for
English, Arabic, Chinese, Korean, Hindi and Persian incrementally. In Year 2, our
focus will shift to IL2 design and annotation. We will analyze IL2 annotations
produced by annotators and identify problems in IL. We will develop improved IL2
design and identify additional phenomena to be included into IL2. For corpus
annotation, we will annotate IL2 levels for Arabic (Maryland, Columbia), Korean
(MITRE and ISI), Chinese (CMU and NMSU), Hindi (Columbia, Maryland, CMU),
Persian (NMSU and MITRE).
Year 3 (Optional)
In addition to the continued work on IL design, Ontology, IL2 corpus annotation and
evaluation, we will further investigate and identify phenomena to be included into IL3,
including discourse (multi-sentence connectives), idioms, etc.
Download