INTRODUCING THE WEB INTELLIGENCE (WIT) GROUP Microsoft Research Asia TALK OUTLINE Introducing WIT – Web InTelligence Group SQuAD Summary Mission Statement Enable synergetic collaboration between people and between people and computers to enlighten them and enrich their lives http://research.microsoft.com/en-us/groups/WIT/ Vision – a Web with Intelligence Satisfy user needs, simplify key tasks, promote serendipitous discovery, and foster task-oriented social network Web Intelligence Content Reviews Forums Action … Search Browse People … Friends Experts … Web InTelligence group (WIT) Yunbo Cao I’m the manager! Youngin Song Chin-Yew Lin Wei Lai Bo Wang I’m the FIRST Korean researcher at MSRA! Tetsuya Sakai I’m the SECOND Japanese researcher at MSRA! WIT spun off from the Natural Language Computing group in June 2009! I joined MSRA in April 2009! I joined MSRA in May 2009! WIT research topics Sentiment analysis Social question answering and summarisation Expert and social search User intent/activity recognition and prediction Inarticulate user assistance Information access evaluation TALK OUTLINE Introducing WIT – Web InTelligence Group SQuAD Summary Mining Community Knowledge: Social Q&A and Its Application Web Intelligence (WIT), Microsoft Research Asia Chin-Yew LIN cyl@microsoft.com Search vs. Question Answering (QA) User intention Understanding what users want is difficult! QA Complements Search 100% 90% 80% 70% 60% Both (bad) Both (diff) 50% Both (similar) Prefer Yahoo 40% Prefer Bing 30% 20% 10% 0% short query short queries high Query long query long queries mid low high mid low 50 50 50 49 50 50 question 134 122 94 136 119 67 Total 184 172 144 185 169 117 question • short: length <= 2, long: length >= 3 • high: freq >100K, mid: between 1K and 50K, low: freq < 300 Scalable Question Answering & Distillation Goal: Methods: Create a scalable question and answering service Index all question and answer pairs (QnA) and their authors on the web Enrich QnA through summarization Expand QnA database by auto-posting questions to and acquiring answers from community QnA services Refine QnA through Wiki-style online collaboration Motivations: Leverage and add value to search Leverage questions that already have been answered Leverage people’s knowledge and their networks CampusCS Baidu Zhidao (百度知道) 17,012,767 resolved questions in two years’ operation. 8,921,610 are knowledge related. 96.7% of questions are resolved. 10,000,000 daily visitors. 71,308 new questions per day. 3.14 answers per question. Baidu Zhidao Top 10 Question Types 768,668 732,976 709,438 Internet Education Hardware OS Language Relationship Computer Software Music Cell Phone 579,133 574,001 500,762 481,882 468,268 409,447 359,285 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 http://www.searchlab.com.cn (中国人搜索行为研究/User Research Lab of Chinese Search) A Traditional QA Architecture A QA system gives direct answers to a question instead of documents Falcon QA system (LCC) Moldovan et al. ACL 2000 Surdeanu et al. IEEE Trans. PDS 2002 Best QA system in TREC 8 & 9 Traditional IR Module QP •Average question answering time •TREC 8: 48 seconds •TREC 9: 94 seconds TREC8 TREC9 1.1% 1.2% PR (21.3 sec) 44.4% (24.9 sec) 26.5% PS 5.4% 2.2% PO 0.1% 0.1% AP (23.4 sec) 48.7% (65.5 sec) 69.7% Falcon QA system module analysis: processing time Community Question and Answering http://weblogs.hitwise.com/leeann-prescott/2006/12/yahoo_answers_captures_96_of_q.html Yahoo! Answers has 19,041,128 resolved questions in 26 categories adding about 48K questions per day. (August 24, 2007) Community QnA in Details Topic Context 1 Context 2 Online Discussion Forum topic FAQ Context dependent About 28,424,184 results on Live Search using query: “FAQ travel” (Google: about 64,200,000) Challenges Question Mining Answer Summarization Question Answering Question Generation Question Utility Question Search & Recommendation List of Papers Accepted Recommending Questions Using the MDL-based Tree Cut Model – Cao et al.; WWW 2008 Searching Questions by Identifying Question Topic and Question Focus – Duan et al.; ACL 2008 Using Conditional Random Fields to Extract Contexts and Answers of Questions from Online Forums – Ding el al.; ACL 2008 Finding Question Answer Pairs from Online Forums – Cong et al.; SIGIR 2008 Question Utility: A Novel Static Ranking of Question Search – Song et al.; AAAI 2008 Answer Summarization: Understanding and Summarizing Answers in Community-Based Question Answering Services – Liu et al; COLING 2008 Automatic Question Generation from Queries – Lin; NSF Workshop on Question Generation Shared Task and Evaluation Challenge 2008 Question Mining & Answering (ACL 2008 & SIGIR 2008) Extract question and answer pairs Community QnA Create a resolved question list Extract & index question, best answer, and other answers Live Qna, Yahoo! Answers, Baidu Zhidao, … Forum Extract and index threads and postings, find questions and their answers QA Pairs in Online Forums Question Search & Recommendation (ACL 2008 & WWW 2008) Query Question search We would like to know what will be available to see in the Forbidden City because we understand that it will be under repairs. Is it true that the Forbidden City is undergoing renovation & we won't be allow to enter? Question recommendation Would you get a lower price by not needing a guide for the Forbidden City and etc? Can anybody recommend a budget hotel near Forbidden City? Question = Topic + Focus + Others (TFO) Search: same topic similar foci Recommend: same topic different foci Identifying Topic and Focus Travel @Yahoo! Answers China Travel @Yahoo! Answers 1. Asia Asia Pacific Pacific China 2. China 3. Anyone know where to see the Dragon Boat Festival in Beijing? Where is a good (Less expensive) place to shop in Beijing? What's the cheapest way to get from Beijing to Hong Kong? Japan … … Europe … Japan Europe Europe 1. 2. 3. 4. How far is it from Berlin to Hamburg? What is the cheapest way from Berlin to Hamburg? Where to see between Hamburg and Berlin? How long does it take from Hamburg to Berlin? … Specificity: the inverse of the entropy of the topic term‘s distribution over the sub-categories Order topic terms by their specificity Question Utility (AAAI 2008) Motivation How useful is a question? How should we rank questions without queries? Definition How likely a question would be asked again? p(Q) p(Q' | Q) argmaxQQ p(Q | Q' ) p(Qp' |(Q argmaxQQ p(Q) w)| Q) pw(Q Q' ' ) The prior probability of question Q reflecting a static rank of the question i.e. Question Utility The probability generating query Q’ from question Q (Relevance score) Answer Summarization (COLING 2008) Example: “Where to stay in Paris?” 2,645 answers (Yahoo! Answers 03/04/09) Is the “best answer” the best answer? Question clustering Find similar questions Answer Taxonomy Answer summarization Aggregate answers for a question cluster Question Taxonomy Travel FAQ Microsoft Travel Guide Http://travel.msra.cn TALK OUTLINE Introducing WIT – Web InTelligence Group SQuAD Summary Mixed Mode Question Knowledge Distillation Answering Knowledge Distillation & and Dissemination Dissemination • Mixed Mode Scalable Question Answering and Distillation FAQ QnA Forum Web • Highly Structured QnA • Structured QnA • Semi-structured QnA • Unstructured QnA Q&A = Knowledge = Power Q&A is complement to web keyword search Q&A can enhance existing QnA and search services Leverage existing knowledge in the question and answer forms and their authors Acquire or elicit human knowledge automatically Discussion