Error typology and refinement revisited (from SS03) Reason or source of Who’s Refinement or fix error responsible A. No Lack of lexical RR Add a lexical entry translation coverage - use user’s feedback to determine the translation, i.e. the y-side of it. - need to determine the POS (maybe I can do this separately, using the known POS sequence and MLE) B. Wrong 1. right word is not RR In either case, we need to Agreement1 in the lexicon. Right pinpoint the incorrect word2 (user (num, pers, now, the TE outputs will correct it and give us the gen, tense) the first lexical entry right form of the word) with the same form 1. need to build a new lexical entry with the same form but 2. agreement with the right constraints (this constraint missing in will depend on the case, if it’s a one or more rules verb, need to add the constraints (and probably in the of the subject) relevant lexical 2. need to find feature which entries) would have triggered the correct word to come up3 (hopefully given by user) Error type C. 1 Wrong 1. Missing right RR In both cases, we need to duplicate the appropriate grammar rule(s) and refine it to include the missing agreement constrainst. 4 1. Add a new lexical entry with It is not uncommon that agreement constraints change from language to language. Examples of this are the constraint agreement on gender (in addition to number and person) between the subject and the verb in Hebrew, which only transfers to Spanish for the past participle forms, and which does not transfer in any case for English. Another example is that adjectives in Semitic languages (Hebrew and Arabic) have to be marked for definiteness (the same way determiners do), but in Romance languages, adjectives are marked for gender and number (and for English adjectives have no agreement constraints). 2 Sometimes user won’t know, which is the incorrect one. For example, given the English “the white sheep”, what’s the incorrect word in “las ovejas blanca”? 3 In general, feature agreement is preferable over feature value specifications, since feature values tend to over specify. However, I still need to figure out which feature agreement is correct, if we had the source German sentence “Das House des Vaters” and the English “the father’s houses”, the agreement constraint needed would be an XY constraint (X noun NUM = Y noun NUM), as opposed to an agreement constraint on the number of the father and the house (Y noun num != Y gen num). 4 I’ll need to generate and retain more than 1 hypothesis (at a later stage I will generate discriminatory examples, active learning), edit the rule and cross-validate with attached examples, if any. I would be editing both learned rules (have a history and a cross-validation set attached) and manual rules (might have a cross-validation set attached). sense of the sense in the lexicon word 2. lexical entries with different senses for the same srs word, are underspecified wrt. their application D. Incorrect 1.Due to context: RR 7 word -phonetic -syntactic -semantic8 -selectional restrictions9 (figure out boundaries) 2. lexicalized expression, figure of speech (idiom)10 3. morphology (overregularization11) that sense as the tgt word. All the features that made the rule apply, are added to the entry, but there might be some other features that also need to be added to that entry and maybe to other entries in the sentence (-> refinement).5 2. need to find feature which would have triggered the correct sense of the word to come up6 1. need to determine what kind of context is affecting the form of the word. (hard to determine) -Hypothesize selectional restrictions w. head and generate examples that test all the hypothesis (active learning)12. Need to refine the grammar rule to include the new constraints. 2. need to detect the words that constitute the idiom and add them as an entry to the lexicon (hopefully, user will have aligned things correctly) 3. need to add missing irregular noun and verb forms to the For example “me sente en el banco”-> “I sat on the bank”: bank -> bench, which will have the same features (POS = N, num = sg, etc.) so that the same rule (NP->Det N_SG) can apply. But the entry for “sat” needs to be refined to prefer physical objects as their direct object, and “bench” needs to be refined as being of type “physical object”. Arguably, “me sente en el banco” could mean “I sat in a bank”, but it is a less preferred reading 6 In general, feature agreement is preferable over feature value specifications, since feature values tend to over specify. However, I still need to figure out which feature agreement is correct, if we had the source German sentence “Das House des Vaters” and the English “the father’s houses”, the agreement constraint needed would be an XY constraint (X noun NUM = Y noun NUM), as opposed to an agreement constraint on the number of the father and the house (Y noun num != Y gen num). 7 Example: “We had an sixty meter rope and two ninety meter ones” should be “We had a sixty meter rope and two ninety meter ones”. 8 Example: “all climber was patiently waiting for her turn” should be “every climber was patiently waiting for her turn”. Another example of this would be the mass vs countable N distinction (need to refine the lexicon and add a mass/count feature so that the appropriate determiners and quantifiers can be used with each word). 9 Example: “It was very freezing” should be “It was freezing” or “It was very cold”. 10 Example: “John kicked the bucket” -> “Juan se murió” and there is no “cubo” anywhere in the source sentence. 11 This is just one reason, among many possible other ones. 12 If we have 3 hypothesis, we can generate 3 distinct semantic features automatically (XXX, YYY, and ZZZ) and test them by generating new examples and have user tell you which are right/wrong. Need to have part of the lexicon with semantic features, so that we can generate these examples automatically. If the semantic constraint that seems to hold is ZZZ, say, the grammar writer can look at the examples later and give it a more mnemonic name (if “llevar” is the head, the N in the object NP has to be a physical object, as opposed to institution or geographic place, and thus ZZZ-> physical object). 5 E. Wrong 1. local WWO 1. RR Word order 2. constituent level WWO 2. RL F. Information missing 1. a lexical entry might be missing 2. might indicate a problem with a transfer rule G. extra TL uses less words word(s) to express the concept H. 1. rule (syntactic Translation pattern) missing error 2. other I. Beyond help, too impossible many errors to say A = Ariadna E = Erik K = Kathrin MR = manual rules RL = Rule Learning Module RR = Rule Refinement Module TE = Transfer engine RL or RR? RR RL or RR14 RL lexicon later: get morphological info from the morphological analyzer 1. modify local ordering of POS in a rule13 2. harder, experiment with it, probably there is a missing rule Will depend on user feedback (need to see examples) Need to refine the grammar rule to not generate that word at the y-side Need to add new rule, might need to create POS RR needs to pass it to the RL as a new training example Should I D to the TCTool error choices? NECESSARY REFINEMENTS15 GRAMMAR Not fixed in grammar2.trf: + need to create a rule for clitic pron as obj where they move in front of the verb in spanish: 6,8,9,13,20,21,22,25,26,31 type of error: C.1 (tu->te (+tonic)) + E.1 In cases like “el amarillo banco”, just need to change DET ADJ N to DET N ADJ. However, when we get more examples, we might run into “el gran banco”, which is correct and which should cause the RR to backtrack and recover the original rule (last rule in the history of the DET N ADJ rule) and specify some constraints. To determine which type of adjectives should be further specified, we can again present the user with a few automatically generated cases and then see what is the most common order, the default order, and mark the “marked” adjectives, in this case, “bueno” should be marked with a “pre-mod” feature. 14 Depending on whether it’s a modification of an existing pattern or a completely new pattern 15 by each refinement, there is the number of the sentence in input-xfer.out.debug which would be affected by it 13 + prep + tonic clitic pronoun: 7,23 type of error: C.1 (tu->te (+tonic)) + H.1 + duplicate clitic in front of verb: 25 type of error: C.1 (him->lo (+clitic)) + E.1 (or maybe H.1) + agreement between subj and adj predicate (create new rule w. appropriate constraint): 4,29,30 type of error: B.2 (missing agreement constraint) + add obj agreement for "gustar": 5 [+lexicon refinement] type of error: B.2. + refine vp(aux v) rule to deal with future: 25 type of error: B.2. + add governed prepositions: 16,31(play), 25(help) [+lexicon refinement] ?add lexical entry for "jugar a" + det N (al, a la). Need to do some kind of postprocessing to get a+el->al type of error: D.1. + wh-questions coverage: 19 type of error: E.1. and/or H.1. + modal questions (inverted subject): 26 type of error: E.1. and/or H.1. Fixed in grammar2.trf: + "it" can only be translated into "lo/la" when it's accusative added: {S,1} ((y1 case) = nom) ; subj it !-> la/lo LEXICON + the right form of the verb is missing from the lexicon: (1,2),10,13,17,18,21,27 -> ! need to detect that it's a lexical gap as opposed to agr-unification failure type of error: B.1. + OOVW: 23(girl; -> Det-N agreement error) type of error: A + add a different sense(translation) for an existing entry: 15(to->para +que,a) 23(in->en +dentro) 30(look->parecer +mirar) type of error: C.1. + refine lexical entry: 24(would like TO),29(lexicalized entry "there is/are" -> hay) type of error: D.2. + ser/estar difference (lexical selection?): (4),11,19,29 type of error: B.2. and/or D.1. + lexical selection: 30( shining brightly ->! brillaba brillantemente -> vivamente, mucho) type of error: D.1. We can try to group these sources of errors into larger problem classes. Here is a first approach of what are the different cases that users will encounter and a sketch of what strategy I might need to adopt to refine the rules: 1. Detection (simplest case) When there is more than one translation, have the user pick the ones that are correct (for each translation, the user can assign a binary label: wrong/correct), or have him set preferences (this is the best, these 2 are ok, but not great, etc.). If the difference between a correct and an incorrect translation is that one or more different words were used, ultimately, I need to determine whether they have the same root, and the morpheme was wrong (conjugation problem), or whether the root itself was wrong. For now, while there is no morphology module, I can do a letter comparison, measure the distance, or see if they have the same affixes (prefix/suffix). When we have morphology incorporated, need to detect if a word is a morphological variant or a different root (hard for irregular verbs). 2. Lexical problems - lexical ambiguity - wrong sense of the word has been translated (banco -> bench, bank) a. if missing in the lexicon, add the other sense to the dictionary b. set up a strategy to determine which one should be the default - I could interact with the user to determine which one should be the default, or - I could look at head word (V or N) and introduce semantic constraints (selectional restrictions). This is harder, since I need to elicit the semantic constraints from user, and Erik might need to modify the Transfer engine to be able to deal with semantic constraints. - poor coverage (OOV word -> need to augment bilingual lexicon) - wrong register (add a formality constraint?) - etc. 3. Structural problems The source of the problem can be in the automatically learned rules or in the manually written rules, the different kinds of rules need to be tagged in the Transfer engine. Either way, problems that originate in the rules of the MTS should result in at least 2 different cases of learning. In the situation where Kathrin’s RL module generalizes from S1-T1 and S2-T2 and obtains a rule R, given a new sentence S3, that we think is similar to S1 and S2, we run it through R and obtain T3’ (instead of T3), we anticipate the learning to happen in one of these 2 ways (note that this is also valid if R is manually written): 1. Refinining the rule R, so that it successfully translates S1, S2 and S3. Examples: a. “la casa rojo -> la casa roja”, and probably the permanent vs temporary Quechua example given at the beginning of this section. RR needs to add a new constraint (gender{masc,fem,neut}, perm{+,-}) b. different grammatical relations (GR) might be translated differently (in German object and subject have different cases) 2. Bifurcating or Splitting the rule R, so that R translated S1 and S2 (as before) and rule R’ translates S3. We might need to restrict R so that it doesn’t apply to s3. Example: a. “casa grande/azul”, but “*casa gran” -> “gran casa” need a new rule (R’) that applies only to pre-modifier adjectives (premod +). Note that we also need to restrict the application of R to (premod -) adjectives only, so that it doesn’t apply to S3. When we encounter a new adjective with the same behavious, we know we need to tag it with the feature (premod +). The 2nd case will probably apply to all exception rules [check Marisa’s LREC 02 paper on how to deal with diminutives in Spanish]. On the other hand, every time that a rule is applied correctly to successfully translate a sentence (once we pick the correct translation from all the alternatives), we should probably increase its weight. What I need to do now is find a way to know which one of the 2 cases needs to apply based on the feedback (correction) the bilingual informant gives (through the TCTool); and whether there are any other learning cases that do not fall in any of the two previously mentioned categories. To be able to refine rules when there is a structural problem, I need Kathrin to provide me with the history of the generalization of the rules. For the manual rules, and refined rules, maybe we can put them into the same hierarchical lattice, as if they were seed rules. Kathrin’s Rule Learning module makes safe generalizations, and when it is not sure, it makes a hypothesis which gets validated or not through the TCTool. At run time, Kathrin needs to pass me all the rules applied in the translation, so that I can interact with the user throughout the process. And when I change a rule, I need to keep track of the history of the rule (linked list), so we need to modify Kathrin's DS to support linked lists. In sum, what I need to do is a diagnosis of what kind of errors there are for each kind of rules (overgeneralization, missing feature, etc.). If a feature is missing from our learned grammar rules, hopefully we can learn it through the RR module, and this should effectively place the refined rule below the seeded rule space.