日本茶

advertisement
Rule Refinement for Spoken Language
Translation by Retrieving the Missing Translation
of Content Words
Linfeng Song, Jun Xie, Xing Wang, Yajuan Lü and Qun Liu
Institute of Computing Technology
Chinese Academy of Sciences
{songlinfeng,xiejun,wangxing,lvyajuan,liuqun}@ict.ac.cn
1
Motivation
• Spoken language translation suffers serious
problem of missing content words
no, you need 10 minutes to go to the main
street, (the bus) comes every 10 minutes
2
Motivation
• further investigation shows that this happens
due to the usage of incorrect MT rules
我 想 买 茶叶 送给 家人 做 礼物 。
rule:#X1# 茶叶 #X2#-> #X1# #X2#
我想买
I would like to buy
送给 家人 做 礼物 。
souvenir for my family .
result: I would like to buy souvenir for my family .
3
Motivation
• There is no specific feature in classic SMT
framework to distinguish bad rules from good
ones.
• An obvious way to tackle this problem is to
find a way to distinguish those bad MT rules
from the good ones.
4
two rules
推荐 的 茶
a good rule
R1
tea
recommended
推荐 的 茶
R2
tea
a bad rule that miss the translation
of content word “推荐”
5
two rules
推荐 的 茶
R1
tea
recommended
推荐 的 茶
R2
tea
R2 may be favored by classic MT system
Since it generate shorter translation result
6
Our Model
Score ( S , T ) 

si S
score ( s i , T )
count ( s i )


si  S
arg m ax M I ( s i , t j )
j
count ( s i )
7
Our Model
Score ( S , T ) 

si S
score ( s i , T )


count ( s i )
推荐 的 茶
score ( R 1) 
R1
si  S
arg m ax M I ( s i , t j )
j
count ( s i )
M I ( 推 荐 , r ecommended )  M I ( 茶 , t ea )
2
tea
recommended
推荐 的 茶
score ( R 2) 
R2
M I ( 推 荐 , NULL )  M I ( 茶 , t ea )
2
tea
8
Training
这里 有 推荐 的 日本 茶
吗
do you have any japanese tea recommended
……
bilingual corpus with word alignment info
9
Training
这里 有 推荐 的 日本 茶
吗
推荐
茶
日本
日本 茶
recommended
tea
japanese
japanese tea
N
do you have any japanese tea recommended
……
bilingual corpus with word alignment info
10
isn’t content
phrase
Training
content
phrase
这里 有 推荐 的 日本 茶
吗
stoplist
么
吗
的
…
do you have any japanese tea recommended
……
content words are
label with bold face
bilingual corpus with word alignment info
11
Training
这里 有 推荐 的 日本 茶
吗
do you have any japanese tea recommended
……
推荐
茶
日本
日本 茶
…
recommended
tea
japanese
japanese tea
N
bilingual corpus with word
alignment information
MI ( e , f ) 
P (e, f )
P (e) P ( f )
P ( ) 
# count ( )
  e or f or both
Co-relation table
茶 tea
13.76
茶 Japanese tea 4.89
…
N
12
Two penalties
• Source Unaligned Penalty
– the number of unaligned source content words in
a rule
• Target Unaligned Penalty
– the number of unaligned target content words in
a rule
13
Experiment
• Data Sets
– training : 280K CH-EN spoken language sentences
– tuning : DEVSET2 of IWSLT 2010
– test : DEVSET3 ~ DEVSET6 of IWSLT 2010
– training set is used to our model
14
Experiment
15
Thanks
Q&A
16
Download