Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words Linfeng Song, Jun Xie, Xing Wang, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences {songlinfeng,xiejun,wangxing,lvyajuan,liuqun}@ict.ac.cn 1 Motivation • Spoken language translation suffers serious problem of missing content words no, you need 10 minutes to go to the main street, (the bus) comes every 10 minutes 2 Motivation • further investigation shows that this happens due to the usage of incorrect MT rules 我 想 买 茶叶 送给 家人 做 礼物 。 rule:#X1# 茶叶 #X2#-> #X1# #X2# 我想买 I would like to buy 送给 家人 做 礼物 。 souvenir for my family . result: I would like to buy souvenir for my family . 3 Motivation • There is no specific feature in classic SMT framework to distinguish bad rules from good ones. • An obvious way to tackle this problem is to find a way to distinguish those bad MT rules from the good ones. 4 two rules 推荐 的 茶 a good rule R1 tea recommended 推荐 的 茶 R2 tea a bad rule that miss the translation of content word “推荐” 5 two rules 推荐 的 茶 R1 tea recommended 推荐 的 茶 R2 tea R2 may be favored by classic MT system Since it generate shorter translation result 6 Our Model Score ( S , T ) si S score ( s i , T ) count ( s i ) si S arg m ax M I ( s i , t j ) j count ( s i ) 7 Our Model Score ( S , T ) si S score ( s i , T ) count ( s i ) 推荐 的 茶 score ( R 1) R1 si S arg m ax M I ( s i , t j ) j count ( s i ) M I ( 推 荐 , r ecommended ) M I ( 茶 , t ea ) 2 tea recommended 推荐 的 茶 score ( R 2) R2 M I ( 推 荐 , NULL ) M I ( 茶 , t ea ) 2 tea 8 Training 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended …… bilingual corpus with word alignment info 9 Training 这里 有 推荐 的 日本 茶 吗 推荐 茶 日本 日本 茶 recommended tea japanese japanese tea N do you have any japanese tea recommended …… bilingual corpus with word alignment info 10 isn’t content phrase Training content phrase 这里 有 推荐 的 日本 茶 吗 stoplist 么 吗 的 … do you have any japanese tea recommended …… content words are label with bold face bilingual corpus with word alignment info 11 Training 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended …… 推荐 茶 日本 日本 茶 … recommended tea japanese japanese tea N bilingual corpus with word alignment information MI ( e , f ) P (e, f ) P (e) P ( f ) P ( ) # count ( ) e or f or both Co-relation table 茶 tea 13.76 茶 Japanese tea 4.89 … N 12 Two penalties • Source Unaligned Penalty – the number of unaligned source content words in a rule • Target Unaligned Penalty – the number of unaligned target content words in a rule 13 Experiment • Data Sets – training : 280K CH-EN spoken language sentences – tuning : DEVSET2 of IWSLT 2010 – test : DEVSET3 ~ DEVSET6 of IWSLT 2010 – training set is used to our model 14 Experiment 15 Thanks Q&A 16