17289 >> Jianfeng Gao: Good morning. We are very... presentation about their work. Let's give a very brief...

advertisement
17289
>> Jianfeng Gao: Good morning. We are very glad to have Yang Liu here to give the
presentation about their work. Let's give a very brief introduction to Yang Liu.
Yang Liu is assistant researcher at the Institute of Computing [inaudible] Chinese Academy of
Sciences. He received his Ph.D. degree in computer science from ICT in 2007. His major
research interests include the statistical machine translation and Chinese information processing.
He has been working on syntax-based modeling word alignment and system combination. His
paper tree-to-string translation won the
Meritorious Asia NRP Paper award According [inaudible] 2006. He served as reviewers of TLP,
TSRP, GNG, ACEMP, AMTA and SSST. [phonetic] let's welcome Yang Liu.
[applause]
>> Yang Liu: Thanks a lot for the generous introduction. Hello everyone. It's my honor to be
here and give a talk to introduce our major work on statistical machine translation.
The title of this talk is an overview of tree-to-string translation models. This is the outline of my
talk. First I will give a brief introduction to our group. Then I will present four tree-to-string
translation models. Tree-based, tree sequence-based forest-based and context-aware. And the
talk ends with a conclusion.
Our institution is called Institute of Computing Technology Chinese Academy of Sciences. It is
located in China, Beijing. And our NRP group is led by professor Chin Liu. And there are five
faculties staffed in over 20 Ph.D. and master students in our group. And our recent areas include
machine translation, lexical analysis processing, information retrieval and information extraction.
We started our SMT research in 2004, and we have been working on the following directions:
Syntax-based models and maximum entry-based reordering word selection, word alignment,
system combination and domain adaptation.
We published a number of ACR and EMRP papers on syntax-based models. We propose a
series of tree-to-string models in recent three years. Tree-based model in SR 2006. Tree
sequence-based model in SR 2007. And the forest-based model in ACR and EMRP 2008 and
the context-aware model in EMRP 2008 and dependency-based tree-to-string model in WMT
2007.
We have paper published in SR 2009 about forest-based tree-to-string translation. And we also
published many papers in other directions. We proposed maximum entropy-based reorder model
for BTG. That is bracketing transduction grammar, and we also used maximum entry model to
take contextual information into account to have select appropriated rules in decoding, both for
higher and tree-to-string model.
And we proposed one of the first discriminative word alignment method in the ACR 2005. And
this year we propose to use weighted alignment matrix to help improve statistical machine
translation.
Regarding system combination, we have two papers published this year. One paper is about
joint decoding with multiple translation models. We just try to directly combine different systems
in the decoding phase.
In other words, we try to derive a joint decoder for multiple systems and another paper is about
replacing confusion network with lattice in system combination.
And we have a paper about domain adaptation in EMRP 2007. We have been very active in
recent machine translation evaluations. In this year's NIST evaluation we participated in the
Chinese-to-English track and won the surplus of the system combination.
And we also achieved very good results in last year's WSLT evaluation [phonetic]. We have
successfully turned our research into profit. We collaborated with some corporations and
developed a patent translation system. We also developed an SMT system that translates travel
expressions in real time mobile devices.
Now I begin to introduce our four tree-to-string models. So what is tree-to-string? In our
tree-to-string model, a tree is first a tree on the source side and the targeted side is a stream.
Often we use word alignment to indicate the correspondence between the tree and the string. So
our hope is that we can exploit the syntactic information on the source side to direct translation.
So our tree-to-string model is closely related to the work by USCSI and Microsoft Research. In
2001, Yamada and Knight proposed a noisy channel model for string-to-tree translation, and in
2004 Getty and others proposed a GMHK [phonetic] algorithm to extract string-to-tree rules
automatically from annotated training data. And in 2005 Clerk and others proposed a tree
system. They use dependent tree on the source side.
And in 2006 Galley extended his GMHK algorithm to obtain composed rules and handle aligned
words in a better way. So inspired by Clerk and Galley's work that we proposed our tree-to-string
model in [inaudible] 2006.
So similar to Galley's work, we use a free structure tree but in the reverse direction. And like
Clerk's work, we put emphasis on the syntax on the source side rather than on the targeted side.
And later this year Leng Waun [phonetic] also proposed a very similar tree-to-string model. And
we think that the two work actually equivalent. So based on our work on 2006, we propose three
extended tree-to-string models. Tree sequence-based in 2007 and the forest-based in 2008 and
contact sensitive model in 2008. Now I will first introduce original tree-based model and then give
a brief introduction to the three extended models.
There are two basic problems in tree-to-string model. First how to extract tree-to-string rules
automatically from the annotated training data. And the second how to decode with these
extracted rules.
So I will first -- we will first discuss rule extraction algorithm. The input of our algorithm is a
training example. This is source side past and water-aligned sentence pair. So how to extract
tree-to-string rule from this training example.
Basically our algorithm is quite similar to that of higher, hierarchical-based system. We call that
when extracting hierarchical phrases when they first to identify an initial first pair and then
subtract sub [inaudible] first pairs to obtain rules with variables.
So the difference here is that we require that there must be a tree over the source phrase. So it
is tree string pair rather than a string-string pair. For example, consider this Chinese word
[chinese] and we will examine whether there is a tree that dominates this phrase.
So we can find a tree rooted at node NR. So this is syntactic phrase. And we can find a
corresponding phrase [chinese] with the alignment information. So this is a tree-string pair.
And this pair is consistent with alignment. So we can extract tree-to-string rule here, directly use
this tree-to-string pair as a rule. The left-hand side of this rule is a source tree. And the
right-hand side is a string. So it is a tree-to-string rule.
And, similarly, for the second source word, and we can find a tree at node P and corresponding
targeted phrase is "with." So it's also tree-to-string rule.
And for [chinese], similarly we can also extract a rule. However, for the source word [chinese],
we cannot extract tree-to-string rule because source word [chinese] outside the tree-to-string pair
is aligned to the targeted word [inaudible] inside. So it is not consistent with word alignment, so
we cannot extract a rule here.
Similarly, we cannot extract rule for [chinese] because [chinese] is aligned inside here.
And for [chinese] we can also extract a rule. And then we will examine the two-word phrases. So
first we consider this phrase [chinese], and we find that we cannot find a tree dominate this
source phrase. So this is not a syntactic phrase. And we cannot extract a rule here.
And for [chinese], we can find a tree over the source phrase and the targeted phrase is "with
Sharon". So we can extract a rule here and then like higher, we can subtract some sub tree to
string pairs to obtain rules with variables.
For example, we can remove [chinese] and NR here and "Sharon" we can replace them with
variable X and obtain a rule with variables.
So in this way we can extract many rules from this training example. Some just with -- some who
just have terminals and some have both terminals and non-terminals. And some rules just have
non-terminals. Okay. So the second problem is how to decode with this tree-to-string rules. So
the input of our decoder is a source tree. So our job is try to decompose the tree into many small
tree fragments, and then use extracted rules to match this tree fragment and it forms the
translation.
So our decoder runs in a bottom-up order. First we consider the node NR and we search in the
rule table and try to find a rule that matched the tree rooted at NR. Suppose that we find a rule in
the table [chinese] to [chinese] and the source tree of the rule exactly match the tree rooted at
NR.
So we can take the targeted side of the rule [chinese] as the translation for the node NR. And
then we consider its parent NPB. Suppose we find a rule that has one variable. And this rule just
partially matched the tree rooted at NTB.
So according to the rule, we can replace X-1 with the translation of NR. So the translation for
MPB is also [chinese]. Similarly, we can find the translation for PP. Translation is "with." And for
NR it's also exact match. So its translation is "Sharon". And for NPB it's also "Sharon." And for
the node PBB, the rule partially match the tree. So we can replace X-1 with the translation of
NPB, "Sharon," so its translation is "with Sharon." So we can translate the other node in a similar
way.
The node gives the rule used. So [chinese] translation is hold, has, talk, talk, had a talk, and had
a talk with Sharon. So finally for the root node IP we find a rule that partially match the tree. And
then according to the rule we can replace X-1 with the translation of NPB and replace X-2 with
translation of [inaudible]. So its translation is "Bush had a talk with Sharon." So actually this is
just a toy example to illustrate how the decoding process is.
In our real decoder we will not start a string at each node. Actually, we have [inaudible] for
efficiency. So we compared our tree-based model with Ferrell on the NIST 2005
Chinese-to-English test set and the absolute improvement is about .9 blue points and the
difference is statistically significant.
Okay. Although very promising, this model faces several problems. First, tree-based
tree-to-string model impose a syntactic constraint on the source side requiring that there must be
a source tree over the source phrase.
So making many bilingual phrases inaccessible. So this will decrease translation quality
dramatically. And second, passing is very important for syntax-based models. So if the tree is
earphone -- if the translation is wrong, the translation will be wrong too. The third problem is that
the tree-based model does not take contextual information into account.
Currently, this is not a big problem because many systems are context-free. But it will be very
interesting if we can take contextual information into the tree-to-string model.
So accordingly, we proposed three extended tree-to-string model, tree sequence-based model,
forest-based model and the context-aware model to alleviate the three problems.
The tree sequence-based model is designed to alleviate the rule coverage problem. As
mentioned above, tree-based tree-to-string model requires that there must be a tree over the
source phrase. So for this phrase [chinese], we cannot find a tree that dominates this source
phrase. So [chinese] dominates [chinese]. And AS dominates [chinese]. And [chinese]
dominates [chinese]. There's no tree that can subsume this source phrase.
So in tree-based tree-to-string model we cannot extract tree-to-string rule here. However, this
bilingual phrase [chinese] and [chinese] is consistent with alignment. So it is a valid bilingual
phrase. It can be used by Moses or by Ferrell, but it cannot be used by our system.
And [inaudible] and others 2006 report that about 28 percent first pairs are not syntactic on the
English/Chinese data. So losing such nonsyntactic first pairs will decrease translation quality
dramatically.
So it's very important for tree-to-string model to capture such nonsyntactic first pairs. So our
solution
>>: Excuse me, just to be clear, though. You can capture a phrase with a hole in it that says
[chinese] with a noun phrase that gets translated?
>> Yang Liu: Yes, that's true.
>>: But you can't find just the [chinese].
>> Yang Liu: Yes.
>>: Requiring it be followed by a noun phrase.
>> Yang Liu: Yes, but we require more context. It must contain [chinese], yeah, we can include
this information by a bigger rule.
>>: And you can also then generalize out the [chinese] to just be any noun, right?
>> Yang Liu: Yeah, yes.
>>: Okay.
>> Yang Liu: Yes.
>>: But then you require to have a single object, for instance, and it can't have -- it's a brittle
rule?
>> Yang Liu: Yes.
>>: Okay. Thanks.
>> Yang Liu: So our solution is that we allow several trees rather than single trees here. So we
call this tree sequence. It's just a sequence of trees. So if we are allowed two trees over the
source phrase, this will be valid tree sequence rule.
[inaudible] is not a valid tree-to-string rule. So this is a new rule. We call this tree sequence to
string rule. So now in our new model we extract both tree-to-string rule and the tree sequence to
string rules.
We use tree-to-string rules to capture syntactic first pairs and use tree sequence to string rule to
capture nonsyntactic first pairs. So in principle there's no loss in rule coverage. We can use all
the bilingual phrases that can be used by Moses.
So in decoding, our input is still source tree. And we use both tree-based and the tree
sequence-based rules to match the input tree. So due to time limit I will not discuss rule
extraction algorithm and the decoding algorithm in detail here.
We compared our tree sequence-based model with Ferrell in our tree-based model. So on the
NIST 2005 Chinese-to-English test set and absolute improvement over Ferrell is about 2.2 blue
points. And the improvement over tree-based is about one point blue points. Okay.
>>: I know you didn't want to talk about decoding, but can you tell us if there's a computational
hit, or is it basically ->> Yang Liu: Yes, yes. I mean, the decoding speed is much slower. Maybe five times slower.
>>: Okay.
>> Yang Liu: Yeah. Okay. I can give some detail here. In our tree-based decoding, we just,
with this node here, yeah, every node. But in our tree sequence-based decoder with spend, with
every spend it's a tree or sequence, so we will visit more spans than the tree-based decoder. So
it will be more computational expensive.
>>: With a single tree, there's only a linear number of nodes but a quadratic number of spans,
order of magnitude?
>> Yang Liu: Yeah. So actually in our tree [inaudible] we use a chart to store the [inaudible].
And tree-based we just use some stacks associated with each node here.
>>: So is this considered tree sequence?
>> Yang Liu: No.
>>: Must be the node.
>> Yang Liu: Yeah. This is not considered with alignment. Because aligned here and some
phrase aligned here. So it's not considered with alignment.
>>: Isn't the tree sequence very similar to the phrase of ->> Yang Liu: Yeah, actually ->>: So is it equivalent, if we extract only the tree sequence? They should be the same as ->> Yang Liu: Actually, it depends on the definition of tree sequence. In our original definition in
this work we require that there must be tree over the phrase.
But [inaudible], he applied our tree sequence to tree-to-string translation. So the definition is
much looser. So the tree -- the first pair as tree sequence. So there's no tree over the source
phrase.
So in this paper I don't think that the tree sequence rule is equivalent to first pair, because there
must be a tree over ->>: So given B aligns to pair [phonetic] just assume.
>> Yang Liu: Assume that there's a link.
>>: That there's a link, even in that case [chinese].
>> Yang Liu: Cannot be translated.
>>: Says no tree.
>> Yang Liu: No, no. Because you don't hear ->>: There's no link here, but there's a link here.
>> Yang Liu: Yeah, yeah. We can extract tree sequence rule here. Yeah. We can. Yeah. It's
just word alignment. So we can only translate -- if it's aligned to here and there's no links aligned
to here, we can extract.
>>: Even if they don't get on to the same sub tree?
>> Yang Liu: Yes. And the third model is packed forest-based model. We know that parsing is
very important to syntax-based models. On the [inaudible] of matter for English is around 90
percent. And for Chinese it's around 85 percent.
And just accuracy will go down dramatically when handling real word text because of the domain
change. So we propose to use packed forest to replace one best trees in our model. This idea
was inspired by Clerk and others 2005 and he mentioned we can replace tree with packed forest.
So actually there are many past trees for sentence. So suppose for this sentence we have two
different past trees, and then we can pack the two trees into this structure by sharing common
nodes and [inaudible] so this is called a packet forest. And a packet forest can store
exponentially many trees in just a polynomial space so we can replace the one best trees with
packed forest with our tree-to-string model both in the rule extraction and the decoding.
So for extracting tree-to-string rules now we have a forest string pair with word alignment. So
how to extract tree-to-string rule from this training example. Can we still use rule extraction
algorithm for tree-based model? The answer is no. Recall that for the tree-based rule extraction
we need to first identify source phrase and then examine whether there is a tree subsumed in the
source phrase and then we try to find the target phrase.
However, in the packed forest there are exponentially many trees over a source phrase. This is
just a toy example. We have two trees that dominate this sentence. In our real training data,
often we have millions of trees over a string. So it's impractical to enumerate all the trees
explicitly to find the tree string pairs.
So instead we resort to the GHKM algorithm proposed by Mitchell Galley. So most important
idea in his algorithm is try to find so-called frontier node, which indicates where to cut the forest to
obtain tree fragment. And the two form the tree-to-string rule. So the red node are frontier node.
So what is a frontier node? It's actually very simple. If the first pair subsumed by a node is
consistent with alignment, it's a frontier node. So for the node VPB, it dominates first pair to
[chinese] to "here to talk" and this is a valid bilingual phrase which is considered with alignment.
So VPB is frontier node.
On the contrary, for the node [chinese], yeah, the corresponding first pair is [chinese] and "Bush
had a talk with Sharon." It's not consistent with alignment because [chinese] outside the tree
string pair is aligned inside. So NP is not a frontier node.
So now given the tree annotated with frontier node, how to extract tree-to-string rules, it's very
simple. First we will visit every frontier node. For example, [chinese] and [chinese] has two
incoming [inaudible], so consider the first [inaudible], it has two tail nodes. The first is NPB and
the second is [chinese] and both NPB and [chinese] are frontier node. So we can stop here and
check the alignment information and we can extract a rule there.
So this is called a minimal rule. And then for the second [inaudible], we will first check the
[inaudible] and its two tail nodes are NP and VPB. VPB is a frontier node and NPB is not a
frontier node. So we have to keep examining its incoming [inaudible] and arrived at NPB, CC and
NPB. And all the three nodes are frontier nodes so we can stop here and extract a tree-to-string
rule.
Similarly for the node VPB, its two tail nodes are frontier nodes. And for this node NR is also a
valid tree-to-string rule. So in this way we can extract all minimal rules and then we can combine
the minimal rules in different ways to obtain compose the rules.
So for decoding, the input is packed forest rather than tree. So how to match the forest to find the
translation. Actually, the decoding algorithm is almost the same. The difference is that a node
might have multiple incoming [inaudible]. So still we consider the node NR and we find a rule
translate Bush and the rule exactly match the tree rooted at NR and we take the target side Bush
as a translation of NR.
So if a node just have one incoming [inaudible] like NR, NPB and NPP, the decoding algorithm is
actually the same with tree-based decoder. So we can easily find the translations like tree-based
decoder for this node.
Sharon and Sharon and Bush and Sharon. These three nodes and the PP with Sharon. "Hold,"
"has" "talk." "Talk." "Had a talk." So the difference is root node. IP has two different incoming
[inaudible]. So our strategy is that we handle the [inaudible] individually. First we consider this
[inaudible] and we will search in the rule table, try to match this [inaudible].
Suppose that we have a rule that matched this [inaudible] and then we can replace X-1 with the
translation of NPB and X-2 with a translation of VP. And the translation is "Bush had a talk with
Sharon."
And then we consider the second [inaudible]. Suppose that the rule matched forest like this and
then we replace X-1, X-2, X-3, X-4 with a translation of this already translated node.
So the second translation is "Bush and Sharon had a talk." So actually this is also a toy example.
In our real decoder we will use rule table to convert the packed forest into a translation forest. So
in a translation forest, each [inaudible] is associated with tree-to-string rule rather than a CMG
rule here. And then we decode on the translation forest using language model to output the one
best and K best durations. I will not give the details here.
This slide gives our major results. The column indicates where the rules are extracted from, from
one best trees, 30 best trees, packed forest. And the rule indicate on which we decode. So the
input is one best tree or packed forest.
So we can see that if we use one best tree in both rule extraction and decoding, the blue score is
about 25. And if we use packed forest in both rule extraction and decoding, the blue score is
about 28.
So the improvement is about 2.5 blue points. It's very significant. And also if we replace one
best trees with packed forest, in both training or decoding, we can also obtain significant
improvement.
And our result -- our forest-based decoder also outperformed higher, the state-of-the-art
hierarchical-based system. And the third extended model we called it context-aware model. So
in machine translation, a source word might have multiple target word as translations and for
source phrase we might have multiple targeted phrases.
So in our tree-to-string model, for source tree we might have multiple strings, targeted strings.
Had X-1, called H-1 and X-1 took place and blah, blah, blah.
So in decoding which rule should be used? So conventionally we use four probabilities. Relative
frequency in two directions and lexical ways in two directions.
And maybe we will resort to language model to have, to choose right rule. So we argue which
candidate to be used in decoding should depend on context.
If the surrounding context changed, the right rule should be changed. So for each left-hand side,
I mean the tree, we will build a maximum entropy classifier to take contextual information to select
the right, the best right-hand side. So the basic idea is quite simple. So in training time we will
memorize the surrounding context and encode the contextual information into maximum entropy
model. And then in decoding for each rule we will examine the surrounding context information
and compare then and calculate a score for each rule using the maximum entropy classifier.
So we call this context aware tree-to-string model we will design many features to capture the
contextual information.
So suppose that this is a training example, and we can replace a rule there. We just replace this
as variables. So when extracting this rule, we will memorize some contextual information.
So the first feature used in our maximum entropy model is called external boundary words. We
care about left neighbor of the source phrase and the right neighbor of the source phrase.
And another feature is that we care about the boundary words of the subtracted source phrase in
the variable. In this case it's [chinese]. And we think that the part of speech of the neighbors, we
call this external boundary part of speech might be useful for selecting the rules. So in this case
it's NR and WEV. And similarly it's a part of speech of the inside of the inside boundary words.
And we also care about how many words are in the subtracted source phrases. And the parent
node of the tree. And the sub node of the tree. So we collect all this information in our training
time and encode in the maximum entropy model and we will collect many training examples to
train on the training data.
And the decoding we will use this maximum entropy classifier to decide which rule should be
used in decoding.
>>: Can I ask a clarification? Do you have one classifier for each source configuration?
>>: Yang Liu: Yes, so we have many, many classifiers.
>>: A lot of the source configurations may be very sparse in data. I've only seen a very small
number of instances of one source configuration. So you won't have many examples to learn
your classifier from.
>> Yang Liu: All right. So in this slide I don't have any example.
>>: No, no. I'm saying that if you've only seen, say, four examples, four in the training data,
you've only seen four times a particular source configuration, then you have very little data to
train a classifier.
>> Yang Liu: Yes, that's true. Yes. That's a problem. Because I mean the training example
occurs so infrequently.
>>: Right.
>> Yang Liu: Yeah. So actually our paper did not investigate the fact of the training corpus. So if
we -- I guess if we use more, larger training data, maybe we could have more accurate
estimation.
>>: It's a large training, the more classifiers you need to train. So for each classifier you will have
more training data.
>> Yang Liu: Yes. For each left-hand side -- for each tree we have to train as well.
>>: People just transfer across these sub trees.
>> Yang Liu: Oh, yeah.
>>: And just to get more training for each classifier.
>> Yang Liu: Yeah, I think this is good information.
>>: Before we move on. You talked a lot about -- a lot of the features you've shown don't
depend on what prediction you're making.
>> Yang Liu: You mean the top side?
>>: Yeah, most of your features are features of the source side.
>> Yang Liu: Source side, yes.
>>: How do those get tied to each possible target side? Do you treat each target side as a
separate class or do you pick words individually?
>> Yang Liu: Actually on the targeted side we have the in-ground targeted model. So the model
can capture some non-local dependencies. And another problem is that if we design the features
on the targeted side, we cannot use dynamic programming in decoding.
>>: Right. Right. So I'm just saying for a given room, you're going to have a source side.
Maybe I missed this some some of your features. But for a given rule you'll have a source side
and then a bunch of targeted sides. Ignoring the rest of the sentence just a bunch of possible
translations for that source.
>> Yang Liu: Yeah.
>>: Do you treat those each as a distinct class or do you look at them ->> Yang Liu: Yeah, distinct class.
>>: Like option one, option two?
>> Yang Liu: Yes. So every candidate targeted string is a targeted class in the classifier.
>>: Can I have some idea of how many classifiers are in there?
>> Yang Liu: How many classifiers? Actually if a tree just have one string in the rule table we will
not train classifier. I don't know the number.
>>: [inaudible].
>> Yang Liu: Maybe thousand, yeah. Maybe thousand on the APIS covers.
And we compared our context aware model with context-free. That is tree-based model. And
underneath the 2003 test set, the improvement is about .9. And on the NIST 2005 test set the
improvement is about 1.2.
And our tree-to-string models have kept the ones in the recent two years. And in SR 2008
MingJon from Singapore apply our tree sequence to tree-to-string model and they obtained very
significant improvement. Actually their tree sequence-based tree-to-string system outperformed
Moses.
I think this might be the first tree-to-string system outperform Moses. And this year in SR 2009
We Jong [phonetic] also from Singapore, he combined the tree sequence and the packet forest
together for tree-to-string translation. I didn't read the paper yet, but I think it is an interesting
paper.
And also in SR 2009 we apply the packet forest to tree-to-string translation, and we also obtained
very significant improvement.
So to conclude, our tree-to-string translation model is one of the syntax-based models. So we put
emphasis on the syntax, on the source side and on the targeted side it's just a string. And we
have presented a series of tree-to-string string models tree-based sequence-based forest-based
and context-aware. And our work has an increasing impact in the community and many
researchers are interested in our work and we follow up this direction. Okay. Thanks.
[applause]
Any questions?
>>: Give us a very quick description of the system you used in this year's competition.
>> Yang Liu: This evaluation?
>>: In this evaluation. The tree-to-string, what is the ->> Yang Liu: In this year's NIST evaluation we used about four single systems. One is this one.
Forest-based tree-to-string system. And another is a reimplementation of higher. Hierarchical
free-based. And another is BTT.
Yes, it's a first-based system.
>>: So using the entropy model to reorder.
>> Yang Liu: To reorder, yeah. And another is Moses. We developed. And we developed a
new simple combination technique and our improvement over the single best system was three
point in NIST evaluation.
>>: What is the single best system?
>> Yang Liu: I don't know. Maybe ->>: Which system?
>> Yang Liu: Maybe the tree-to-string, the forest-based tree-to-string. We used about six million
pairs of sentences to train our tree-to-string model. And we used our in-house parser to parse
the Chinese text.
>> Jianfeng Gao: Let's thank the speaker again.
[applause]
Download