talk

advertisement
Trains of Thought:
Generating Information Maps
Dafna Shahaf, Carlos Guestrin
and Eric Horvitz
‘‘
The abundance of
books is a distraction
,,
Lucius Annaeus Seneca
4 BC – 65 AD
So, you want to understand
a complex topic…
Now what?
Search Engines are Great
• But do not show how it all fits together
Timeline Systems
Real Stories are not Linear
Metro Map
• A set of lines
• Each line follows a coherent narrative thread
• Structure + multiple aspects
labor unions
Merkel
bailout
Germany
protests
junk
status
austerity
strike
Map Definition
• A map M is a pair (G, P) where
– G=(V,E) is a directed graph
– P is a set of paths in G (metro lines)
– Each e  E must belong to at least one metro line
labor unions
Merkel
bailout
Germany
protests
junk
status
austerity
strike
Game Plan
Objective
Algorithm
Does it
work?
Properties of a Good Map
1. Coherence
???
Coherence: Main Idea
Coherence is not a property of local interactions:
1
2
3
4
5
Greece
Debt default
Europe
Republican
Italy
Protest
Incoherent: Each pair
shares different words
Connecting the Dots
[S, Guestrin, KDD’10]
Coherence: Main Idea
A more-coherent chain:
1
2
3
4
5
Greece
Debt default
Austerity
Republican
Italy
Protest
Coherent: a small number of
words captures the story
Connecting the Dots
[S, Guestrin, KDD’10]
Properties of a Good Map
1. Coherence
Is it enough?
Max-coherence Map
Query: Clinton
Clinton set
for Dublin
Clinton, Religious
Leaders Share
Thoughts
Clinton visits
Belfast
High hopes for
Clinton visit
Church Leaders Religion Leaders
Praise Clinton's Divided on Clinton
Moral Issue
'Spirituality'
Clinton Should
Resign, 2 Religious
Leaders Say
Properties of a Good Map
1. Coherence
2. Coverage
Should cover diverse
topics important to
the user
Coverage
Turning Down the Noise
[El-Arini, Veda, S, Guestrin, KDD’09]
• Select a small set of diverse articles that
covers the most important stories
January 17, 2009
Coverage: The Idea
• Documents cover concepts:
Corpus
Coverage
High-coverage, Coherent Map
Greek Civil Servants
Strike over
Austerity Measures
Greece Paralyzed
by New Strike
Greek Take to the
Streets, but Lacing
Earlier Zeal
Infighting Adds to
Merkel’s Woes
It’s Germany that
Matters
UK Backs
Germany’s Effort
Germany says the
IMF should Rescue
Greece
IMF more Likely to
Lead Efforts
IMF is Urged to
Move Forward
Properties of a Good Map
1. Coherence
2. Coverage
3. Connectivity
Definition: Connectivity
• Experimented with formulations
• Users do not care about connection type
• Encourage connections between pairs of lines
Tying it all Together:
Map Objective
• Coherence
– Either coherent or not: Constraint
Consider all coherent maps with
• Coverage
maximum possible coverage.
– Must have!
Find the most connected one.
• Connectivity
– Nice to have
Game Plan
Objective
Algorithm
Does it
work?
Approach Overview
Documents D
1. Coherence graph G
2. Coverage function f
f(
)=?
…
3. Increase
Connectivity
Coherence Graph: Main Idea
• Vertices correspond to short coherent chains
• Directed edges between chains which can be
conjoined and remain coherent
1
2
3
1 2 3 5 8 9
4
5
6
5
8
9
Finding Vertices
• Vertices are short, coherent chains
• Can use [KDD’10]
– Expensive
– Solving many LPs
• Take advantage of simplicity of short stories
– No topic drift
– Sampling-based (fast) algorithm
Finding Edges
• Problem: Combining several strong chains
may result in a much-weaker chain
Discontinuity:
Change of focus
m-Coherence
• Control discontinuity points:
A chain is m-coherent if each sub-chain
(di, …, di+m) is coherent.
• m: size of user's ‘history window‘
– m=length(chain) : standard coherence
– m=1: optimize transitions without context
Observation
• If two chains are m-Coherent and have m-1
overlap, the conjoined chain is m-coherent:
Using the Observation
• If two chains are m-Coherent and have m-1
overlap, the conjoined chain is m-coherent:
• Useful for divide and conquer:
– Add edge if m-1 overlap
1
2
3
1 2 3 5
2
3
4
2
3
5
Approach Overview
Documents D
1. Coherence graph G
2. Coverage function f
f(
)=?
…
3. Increase
Connectivity
Finding High-Coverage Chains
• Paths correspond to coherent chains.
• Problem: find a path of length K maximizing
coverage of underlying articles
1
2
3
Cover(
2
3
4
2
3
5
1 2 3 4
?
) > Cover( 1
2 3 5
)
Reformulation
• Paths correspond to coherent chains.
• Problem: find a path of length K maximizing
coverage of underlying articles
a function of the nodes visited
• Submodular orienteering
Orienteering
– [Chekuri
and Pal, 2005]
– Quasipolynomial time recursive greedy
– O(log OPT) approximation
Approach Overview: Recap
Documents D
1. Coherence graph G
2. Coverage function f
f(
)=?
…
Encodes all
m-coherent
chains as
graph paths
Submodular orienteering
[Chekuri & Pal, 2005]
Quasipoly time recursive
greedy3. Increase
O(log OPT)Connectivity
approximation
Example Map: Greece Debt
Game Plan
Objective
Algorithm
Does it
work?
Evaluation
• User study
– Document selection: capturing important content?
– Micro-knowledge: question-answering
– Macro-knowledge: high-level summaries
– Effect of structure
• New York Times (2008-2010)
– 18K+ articles
– Chile, Haiti, Greece
Document Selection
• Experts compose a list of important events
• Subtopic recall (% of events in the map):
Subtopic
recall
# lines
Micro-Knowledge
(Question Answering)
• Mechanical Turk
Question 2: How many miners were trapped?
• Competitors:
– Google News
– Event threading (TDT) [Nallapati et al, 04]
– Structureless maps
• Results: minor gains
– map structure helps
Macro-Knowledge
(High-Level Summaries)
• Summarize complex story in a paragraph
– Maps vs. Google News
– ~15 paragraphs per task
• Mturk to evaluate paragraphs:
– Which paragraph provided a more complete and
coherent picture of the story?
– Justification: Paragraph A is more…
– ~300 evaluations per task
Macro-Knowledge: Results
• Greece: 72% prefer maps
– Justifications:
Bottom line:
maps are more useful as high-level tools for
stories
without a single dominant
storyline
Google News
Maps
• Haiti: 59% prefer maps
– Map users mostly summarized one story line
Conclusions
•
•
•
•
•
Formulated metrics characterizing good maps
Efficient methods with theoretical guarantees
User studies highlight the promise of the method
Website on the way!
Personalization
Thank you!
Finding Coherent Chains
• Goal: represent all coherent chains
• Problem: intractable
• Divide and conquer:
– Find short coherent chains
– Concatenate to form longer coherent chains
Website
Download