Searching Question

advertisement
Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu
Shanghai Jiao Tong University & MSRA
ACL 2008
2008/7/9
Rick Liu
1

Question & their Answers
 A very large archives
 Built up by

Example
 Traditional FAQ services

Q&A services
▪ Emerging
▪ Yahoo! Answers, Live QnA, Baidu Zhidao
2008/7/9
Rick Liu
2





Question Search
Help users to search previous answers
Any nice hotels in Berlin or Hamburg?
How long does it take to Hamburg from Berlin?
Cheap hotels in Berlin?
2008/7/9
Rick Liu
3
2008/7/9
Rick Liu
4

Identifying question topic & focus
 Question tree
 Determining the tree cut

Modeling question topic & focus for search
 Language model
2008/7/9
Rick Liu
5

Topic terms
 BaseNP, WH-ngram

Topic profile
 probability distribution of categories

Specificity
 inverse of the entropy of the topic profile

Topic chain
 topic terms ordered by specificity value (desc)

Topic tree
2008/7/9
Rick Liu
6
2008/7/9
Rick Liu
7

M=(Γ,θ)
 Γ = [ C1, C2, .. Ck ] , tree cut
 Θ = [ P(C1), P(C2), .. P(Ck) ] , prob param vector


A cut is any set of nodes
Σi=1..kP( Ci ) = 1
2008/7/9
Rick Liu
8
[n0, n11], [n12, n21, n22, n23], [n13, n24]
[n11, n21, n22, n23, n24]
2008/7/9
Rick Liu
9

Minimum Description Length
Ref : Li and Abe, 1998
2008/7/9
Rick Liu
10
2008/7/9
Rick Liu
11

~
P( q | q )
 q : queried question
~
 q : targeted question
2008/7/9
Rick Liu
12


Yahoo! Answers
Resolved questions
 travel : 314,616 items
 computers & internet : 210,785 items

Tree fields
 title ( only used )
 description
 answers
2008/7/9
Rick Liu
13




Employed Vector Space Model
Manual judgments : relevant / irrelevant
Baseline : VSM, LMIR
Evaluation : MAP, R-precision, MRR
2008/7/9
Rick Liu
14
2008/7/9
Rick Liu
15
2008/7/9
Rick Liu
16
2008/7/9
Rick Liu
17


Examine the correctness of question topics
and question foci
200 queried question => 69 question incorrect
 (a) Only have the head part ( 59 )
 (b) Incorrect order ( 10 )

(a) explains why λ is 0.7
2008/7/9
Rick Liu
18


FAQ data
Community based
 Jeon et al., 2005
 Compared four different retrieval methods
▪ Vector space model
▪ Okapi
▪ Language model
▪ Translation-based model
 Translation-based model performed the best
2008/7/9
Rick Liu
19

Lexical chasm
 Where to stay in Hamburg?
 The best hotel in Hamburg?

IBM model 1
 Use question titles and question description as the
parallel corpus
2008/7/9
Rick Liu
20
2008/7/9
Rick Liu
21
1)
2)
3)
4)


2008/7/9
Data Structure
Use MDL-based Tree Cut Model to Identify
A new form of language modeling for
question search
Extensive experiments
Now only community-based
From forum sites / FAQ sites
Rick Liu
22
2008/7/9
Rick Liu
23
2008/7/9
Rick Liu
24
2008/7/9
Rick Liu
25
Download