SQuAD - the Starting Point of Web Intelligence

advertisement
INTRODUCING THE
WEB INTELLIGENCE (WIT)
GROUP
Microsoft Research Asia
TALK OUTLINE
Introducing WIT –
Web InTelligence Group
 SQuAD
 Summary

Mission Statement
Enable synergetic collaboration
between people and
between people and computers to
enlighten them and
enrich their lives
http://research.microsoft.com/en-us/groups/WIT/
Vision – a Web with Intelligence
Satisfy user needs, simplify key tasks,
promote serendipitous discovery, and
foster task-oriented social network
Web Intelligence
Content
Reviews
Forums
Action
…
Search
Browse
People
…
Friends
Experts
…
Web InTelligence group (WIT)
Yunbo Cao
I’m the manager!
Youngin
Song
Chin-Yew Lin
Wei Lai
Bo Wang
I’m the FIRST Korean
researcher at MSRA!
Tetsuya Sakai
I’m the SECOND
Japanese researcher
at MSRA!
WIT spun off from the
Natural Language
Computing group in June
2009!
I joined MSRA in April 2009!
I joined MSRA in May 2009!
WIT research topics
Sentiment analysis
Social question answering
and summarisation
Expert and social search
User intent/activity
recognition and
prediction
Inarticulate user
assistance
Information access
evaluation
TALK OUTLINE
Introducing WIT –
Web InTelligence Group
 SQuAD
 Summary

Mining Community Knowledge:
Social Q&A and Its Application
Web Intelligence (WIT), Microsoft Research Asia
Chin-Yew LIN cyl@microsoft.com
Search vs. Question Answering (QA)
User intention

Understanding what users want is difficult!
QA Complements Search
100%
90%
80%
70%
60%
Both (bad)
Both (diff)
50%
Both (similar)
Prefer Yahoo
40%
Prefer Bing
30%
20%
10%
0%
short query
short queries
high
Query
long query
long queries
mid
low
high
mid
low
50
50
50
49
50
50
question
134
122
94
136
119
67
Total
184
172
144
185
169
117
question
• short: length <= 2, long: length >= 3
• high: freq >100K, mid: between 1K and 50K,
low: freq < 300
Scalable Question Answering & Distillation

Goal:


Methods:





Create a scalable question and answering service
Index all question and answer pairs (QnA) and their authors on the web
Enrich QnA through summarization
Expand QnA database by auto-posting questions to and acquiring answers
from community QnA services
Refine QnA through Wiki-style online collaboration
Motivations:



Leverage and add value to search
Leverage questions that already have been answered
Leverage people’s knowledge and their networks
CampusCS
Baidu Zhidao (百度知道)






17,012,767 resolved questions in two years’ operation.
8,921,610 are knowledge related.
96.7% of questions are resolved.
10,000,000 daily visitors.
71,308 new questions per day.
3.14 answers per question.
Baidu Zhidao Top 10 Question Types
768,668
732,976
709,438
Internet
Education
Hardware
OS
Language
Relationship
Computer
Software
Music
Cell Phone
579,133
574,001
500,762
481,882
468,268
409,447
359,285
0

100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
http://www.searchlab.com.cn (中国人搜索行为研究/User Research Lab of Chinese Search)
A Traditional QA Architecture
A QA system gives direct answers to a
question instead of documents
Falcon QA system (LCC)
Moldovan et al. ACL 2000
Surdeanu et al. IEEE Trans. PDS 2002
Best QA system in TREC 8 & 9
Traditional IR
Module
QP
•Average question answering time
•TREC 8: 48 seconds
•TREC 9: 94 seconds
TREC8
TREC9
1.1%
1.2%
PR
(21.3 sec) 44.4%
(24.9 sec) 26.5%
PS
5.4%
2.2%
PO
0.1%
0.1%
AP
(23.4 sec) 48.7%
(65.5 sec) 69.7%
Falcon QA system module analysis: processing time
Community Question and Answering
http://weblogs.hitwise.com/leeann-prescott/2006/12/yahoo_answers_captures_96_of_q.html
Yahoo! Answers has 19,041,128 resolved questions
in 26 categories adding about 48K questions per day.
(August 24, 2007)
Community QnA in Details
Topic
Context 1
Context 2
Online Discussion Forum
topic
FAQ
Context dependent
About 28,424,184 results on Live Search
using query: “FAQ travel”
(Google: about 64,200,000)
Challenges
Question Mining
Answer
Summarization
Question
Answering
Question
Generation
Question Utility
Question Search
&
Recommendation
List of Papers Accepted

Recommending Questions Using the MDL-based Tree Cut Model – Cao
et al.; WWW 2008



Searching Questions by Identifying Question Topic and Question Focus
– Duan et al.; ACL 2008
Using Conditional Random Fields to Extract Contexts and Answers of
Questions from Online Forums – Ding el al.; ACL 2008
Finding Question Answer Pairs from Online Forums – Cong et al.; SIGIR
2008

Question Utility: A Novel Static Ranking of Question Search – Song et
al.; AAAI 2008


Answer Summarization: Understanding and Summarizing Answers in
Community-Based Question Answering Services – Liu et al; COLING 2008
Automatic Question Generation from Queries – Lin; NSF Workshop on
Question Generation Shared Task and Evaluation Challenge 2008
Question Mining & Answering
(ACL 2008 & SIGIR 2008)

Extract question and answer pairs
 Community
QnA
 Create a resolved question list
 Extract & index question, best answer, and other
answers
 Live Qna, Yahoo! Answers, Baidu Zhidao, …
 Forum
 Extract and index threads and postings, find
questions and their answers
QA Pairs in Online Forums
Question Search & Recommendation
(ACL 2008 & WWW 2008)

Query


Question search



We would like to know what will be available to see in the Forbidden City because we
understand that it will be under repairs.
Is it true that the Forbidden City is undergoing renovation & we won't be allow to
enter?
Question recommendation

Would you get a lower price by not needing a guide for the Forbidden City and etc?

Can anybody recommend a budget hotel near Forbidden City?
Question = Topic + Focus + Others (TFO)

Search: same topic similar foci

Recommend: same topic different foci
Identifying Topic and Focus
Travel @Yahoo! Answers
China
Travel @Yahoo! Answers
1.
Asia
Asia Pacific
Pacific
China
2.
China
3.
Anyone know where to see the Dragon Boat
Festival in Beijing?
Where is a good (Less expensive) place to shop in
Beijing?
What's the cheapest way to get from Beijing to
Hong Kong?
Japan
…
…
Europe
…
Japan Europe
Europe
1.
2.
3.
4.
How far is it from Berlin to Hamburg?
What is the cheapest way from Berlin to
Hamburg?
Where to see between Hamburg and Berlin?
How long does it take from Hamburg to Berlin?
…


Specificity: the inverse of the entropy of the topic term‘s distribution
over the sub-categories
Order topic terms by their specificity
Question Utility
(AAAI 2008)

Motivation
 How
useful is a question?
 How should we rank questions without queries?

Definition
 How
likely a question would be asked again?
p(Q) p(Q' | Q)
argmaxQQ p(Q | Q' ) 
p(Qp' |(Q
 argmaxQQ p(Q) 
w)| Q)
pw(Q
Q' ' )
The prior probability of question Q
reflecting a static rank of the question
i.e. Question Utility
The probability generating query Q’
from question Q (Relevance score)
Answer Summarization
(COLING 2008)

Example: “Where to stay
in Paris?”
2,645 answers (Yahoo!
Answers 03/04/09)
 Is the “best answer” the
best answer?


Question clustering


Find similar questions
Answer Taxonomy
Answer summarization

Aggregate answers for a
question cluster
Question Taxonomy
Travel FAQ

Microsoft Travel Guide
 Http://travel.msra.cn
TALK OUTLINE
Introducing WIT –
Web InTelligence Group
 SQuAD
 Summary

Mixed Mode Question
Knowledge Distillation
Answering
Knowledge Distillation
& and
Dissemination
Dissemination
• Mixed Mode Scalable Question
Answering and Distillation
FAQ
QnA
Forum
Web
• Highly Structured QnA
• Structured QnA
• Semi-structured QnA
• Unstructured QnA
Q&A = Knowledge = Power




Q&A is complement to web keyword search
Q&A can enhance existing QnA and search services
Leverage existing knowledge in the question and
answer forms and their authors
Acquire or elicit human knowledge automatically
Discussion
Download