Question Answering over
Implicitly Structured Web Content
Eugene Agichtein*
Chris Burges
Eric Brill
Emory University
Microsoft Research
Microsoft Research
* Research done while at Microsoft Research
Questions are Problematic for Web Search
What was the name of
president Fillmore’s cat?
Who invented crocs?
…
Agichtein et al., WI 2007
Web search: What was the name of
president Fillmore’s cat?
Agichtein et al., WI 2007
Web Question Answering
Why are questions problematic for web search engines?
Search engines treat questions as keyword queries,
ignoring the semantic relationships between words, and
the explicitly stated information need
Poor performance for long (> 5 terms) queries
Problem exacerbated when common keywords are
included
Agichtein et al., WI 2007
… and millions more of other tables and lists …
Agichtein et al., WI 2007
Implicitly Structured Web Content
HTML Tables, Lists
Product descriptions
Example: Lists of favorite things, “top 10” lists, etc.
HTML Syntax (sometimes) reflects semantics
Authors imply semantic relationships, entity types by grouping
Can infer information about ambiguous entities from others in the
same column
Millions of HTML tables, lists on the “surface” web alone
No common schema
Keyword queries: primary access method.
How to exploit this structured content for good (e.g., for Question
Answering) at web scale?
Agichtein et al., WI 2007
Related Work
Web Question Answering
AskMSR (TREC 2001) Aranea (TREC 2003)
Mulder (WWW 2001)
A No-Frills Architecture for Lightweight Answer Retrieval (WWW 2007)
Web-scale Information Extraction
QXtract (ICDE 2003): learn keyword queries to retrieve content
KnowItAll (WWW 2004): minimal supervision, larger scale
TextRunner (IJCAI 2007): single pass scan, disambiguate at query time
Towards Domain-Independent Information Extraction from Web Tables
(WWW 2007)
Agichtein et al., WI 2007
Our System TQA: Overview
1.
Index all promising HTML tables
2.
Translate a question into select/project query
3.
Select table rows, project candidate answers
4.
Rank candidate answers
5.
Return top K answers
Agichtein et al., WI 2007
TableQA: Indexing
Crawl the Web
Identify “promising”
tables (heuristic, could
be improved)
Extract metadata for
each table
Context
Document content
Document metadata
Index extracted
metadata
Agichtein et al., WI 2007
Table Metadata
Combines information about the source document, and table context
Agichtein et al., WI 2007
TQA Question Processing
Agichtein et al., WI 2007
Table QA: Querying Overview
Agichtein et al., WI 2007
Features for Ranking Candidate Answers
Agichtein et al., WI 2007
Ranking Answer Candidates
Frequency-based (AskMSR):
Heuristic weight assignment (AskMSR improved)
Neither is robust or general
Agichtein et al., WI 2007
Ranking Answer Candidates (cont)
Solution: machine learning-based ranking
Naïve Bayes:
Score(answer) =
p(relevant | answer.F )
i
i
RankNet (Burges et al. 2005): scalable Neural
Net implementation:
Optimized for ranking – predicting an ordering of items,
not scores for each
Trains on pairs (where first point is to be ranked higher
or equal to second)
Uses cross entropy cost and gradient descent to set
weights
Agichtein et al., WI 2007
Some Implementation Details
Lucene, distributed indices (20M tables per index)
NLP Tools:
MS internal Named Entity tagger (many free ones exist)
Porter Stemmer
Relatively light-weight architecture:
Client (question processing): desktop machine
Table index server: dual-processor, 8 Gb RAM, WinNT
Agichtein et al., WI 2007
Experimental Setup
Queries: TREC QA 2002, 2003 questions
Corpus: 100M web pages (a “random” subset
of an MSN Search crawl, from 2005)
Evaluation: TREC QA factoid patterns
“Minimal” regular expressions to match only right
answers
Not comprehensive (based on judgement pool)
Agichtein et al., WI 2007
Evaluation Metrics
MRR (mean reciprocal rank):
MRR @ K =
1
, averaged over all
i 1.. K rel ( answeri )
questions
Recall @ K:
The fraction of the questions for which a system
returned a correct answer ranked at or above K.
Agichtein et al., WI 2007
Results (1): Accuracy vs. Corpus Size
Agichtein et al., WI 2007
Results (2): Comparing Ranking Methods
If output consumed by another system, large K ok
Agichtein et al., WI 2007
Results (3): Accuracy on Hard Questions
TQA can retrieve answer in top 100 when best QA system not
able to return any answer
Agichtein et al., WI 2007
Result Summary
Requires indexing more than 150M tables
before respectable accuracy achieved
Performance was around median on TREC
2002, 2003 benchmarks
Can be helpful for questions difficult for
traditional QA systems
Agichtein et al., WI 2007
Promising Directions for Future Work
Craw-time: aggressive pruning/classification
Index-time: Integration of related tables
Query-time: taxonomies integration/hypernimy
User behavior modeling
Past clickthrough to rerank candidate tables, answers
Query reformulation
Agichtein et al., WI 2007
Conclusions
Implicitly structured web content can be
useful for web question answering
We demonstrated scalability of a lightweight
table-based web QA approach
Much room for improvement, future research
Agichtein et al., WI 2007
Thank you!
Questions?
E-mail: eugene@mathcs.emory.edu
Plug: User Interactions for Web Question Answering:
http://www.mathcs.emory.edu/~eugene/uqa/
E. Agichtein, E. Brill, S. Dumais, Mining user behavior to improve web
search ranking, SIGIR 2006
E. Agichtein, User Behavior Mining and Information Extraction:
Towards closing the gap, IEEE Data Engineering Bulletin, Dec. 2006
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne, Finding
High Quality Content in Social Media with applications to Communitybased Question Answering, to appear WSDM 2008
Agichtein et al., WI 2007