Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu Shanghai Jiao Tong University & MSRA ACL 2008 2008/7/9 Rick Liu 1 Question & their Answers A very large archives Built up by Example Traditional FAQ services Q&A services ▪ Emerging ▪ Yahoo! Answers, Live QnA, Baidu Zhidao 2008/7/9 Rick Liu 2 Question Search Help users to search previous answers Any nice hotels in Berlin or Hamburg? How long does it take to Hamburg from Berlin? Cheap hotels in Berlin? 2008/7/9 Rick Liu 3 2008/7/9 Rick Liu 4 Identifying question topic & focus Question tree Determining the tree cut Modeling question topic & focus for search Language model 2008/7/9 Rick Liu 5 Topic terms BaseNP, WH-ngram Topic profile probability distribution of categories Specificity inverse of the entropy of the topic profile Topic chain topic terms ordered by specificity value (desc) Topic tree 2008/7/9 Rick Liu 6 2008/7/9 Rick Liu 7 M=(Γ,θ) Γ = [ C1, C2, .. Ck ] , tree cut Θ = [ P(C1), P(C2), .. P(Ck) ] , prob param vector A cut is any set of nodes Σi=1..kP( Ci ) = 1 2008/7/9 Rick Liu 8 [n0, n11], [n12, n21, n22, n23], [n13, n24] [n11, n21, n22, n23, n24] 2008/7/9 Rick Liu 9 Minimum Description Length Ref : Li and Abe, 1998 2008/7/9 Rick Liu 10 2008/7/9 Rick Liu 11 ~ P( q | q ) q : queried question ~ q : targeted question 2008/7/9 Rick Liu 12 Yahoo! Answers Resolved questions travel : 314,616 items computers & internet : 210,785 items Tree fields title ( only used ) description answers 2008/7/9 Rick Liu 13 Employed Vector Space Model Manual judgments : relevant / irrelevant Baseline : VSM, LMIR Evaluation : MAP, R-precision, MRR 2008/7/9 Rick Liu 14 2008/7/9 Rick Liu 15 2008/7/9 Rick Liu 16 2008/7/9 Rick Liu 17 Examine the correctness of question topics and question foci 200 queried question => 69 question incorrect (a) Only have the head part ( 59 ) (b) Incorrect order ( 10 ) (a) explains why λ is 0.7 2008/7/9 Rick Liu 18 FAQ data Community based Jeon et al., 2005 Compared four different retrieval methods ▪ Vector space model ▪ Okapi ▪ Language model ▪ Translation-based model Translation-based model performed the best 2008/7/9 Rick Liu 19 Lexical chasm Where to stay in Hamburg? The best hotel in Hamburg? IBM model 1 Use question titles and question description as the parallel corpus 2008/7/9 Rick Liu 20 2008/7/9 Rick Liu 21 1) 2) 3) 4) 2008/7/9 Data Structure Use MDL-based Tree Cut Model to Identify A new form of language modeling for question search Extensive experiments Now only community-based From forum sites / FAQ sites Rick Liu 22 2008/7/9 Rick Liu 23 2008/7/9 Rick Liu 24 2008/7/9 Rick Liu 25