Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz ‘‘ The abundance of books is a distraction ,, Lucius Annaeus Seneca 4 BC – 65 AD So, you want to understand a complex topic… Now what? Search Engines are Great • But do not show how it all fits together Timeline Systems Real Stories are not Linear Metro Map • A set of lines • Each line follows a coherent narrative thread • Structure + multiple aspects labor unions Merkel bailout Germany protests junk status austerity strike Map Definition • A map M is a pair (G, P) where – G=(V,E) is a directed graph – P is a set of paths in G (metro lines) – Each e E must belong to at least one metro line labor unions Merkel bailout Germany protests junk status austerity strike Game Plan Objective Algorithm Does it work? Properties of a Good Map 1. Coherence ??? Coherence: Main Idea Coherence is not a property of local interactions: 1 2 3 4 5 Greece Debt default Europe Republican Italy Protest Incoherent: Each pair shares different words Connecting the Dots [S, Guestrin, KDD’10] Coherence: Main Idea A more-coherent chain: 1 2 3 4 5 Greece Debt default Austerity Republican Italy Protest Coherent: a small number of words captures the story Connecting the Dots [S, Guestrin, KDD’10] Properties of a Good Map 1. Coherence Is it enough? Max-coherence Map Query: Clinton Clinton set for Dublin Clinton, Religious Leaders Share Thoughts Clinton visits Belfast High hopes for Clinton visit Church Leaders Religion Leaders Praise Clinton's Divided on Clinton Moral Issue 'Spirituality' Clinton Should Resign, 2 Religious Leaders Say Properties of a Good Map 1. Coherence 2. Coverage Should cover diverse topics important to the user Coverage Turning Down the Noise [El-Arini, Veda, S, Guestrin, KDD’09] • Select a small set of diverse articles that covers the most important stories January 17, 2009 Coverage: The Idea • Documents cover concepts: Corpus Coverage High-coverage, Coherent Map Greek Civil Servants Strike over Austerity Measures Greece Paralyzed by New Strike Greek Take to the Streets, but Lacing Earlier Zeal Infighting Adds to Merkel’s Woes It’s Germany that Matters UK Backs Germany’s Effort Germany says the IMF should Rescue Greece IMF more Likely to Lead Efforts IMF is Urged to Move Forward Properties of a Good Map 1. Coherence 2. Coverage 3. Connectivity Definition: Connectivity • Experimented with formulations • Users do not care about connection type • Encourage connections between pairs of lines Tying it all Together: Map Objective • Coherence – Either coherent or not: Constraint Consider all coherent maps with • Coverage maximum possible coverage. – Must have! Find the most connected one. • Connectivity – Nice to have Game Plan Objective Algorithm Does it work? Approach Overview Documents D 1. Coherence graph G 2. Coverage function f f( )=? … 3. Increase Connectivity Coherence Graph: Main Idea • Vertices correspond to short coherent chains • Directed edges between chains which can be conjoined and remain coherent 1 2 3 1 2 3 5 8 9 4 5 6 5 8 9 Finding Vertices • Vertices are short, coherent chains • Can use [KDD’10] – Expensive – Solving many LPs • Take advantage of simplicity of short stories – No topic drift – Sampling-based (fast) algorithm Finding Edges • Problem: Combining several strong chains may result in a much-weaker chain Discontinuity: Change of focus m-Coherence • Control discontinuity points: A chain is m-coherent if each sub-chain (di, …, di+m) is coherent. • m: size of user's ‘history window‘ – m=length(chain) : standard coherence – m=1: optimize transitions without context Observation • If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent: Using the Observation • If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent: • Useful for divide and conquer: – Add edge if m-1 overlap 1 2 3 1 2 3 5 2 3 4 2 3 5 Approach Overview Documents D 1. Coherence graph G 2. Coverage function f f( )=? … 3. Increase Connectivity Finding High-Coverage Chains • Paths correspond to coherent chains. • Problem: find a path of length K maximizing coverage of underlying articles 1 2 3 Cover( 2 3 4 2 3 5 1 2 3 4 ? ) > Cover( 1 2 3 5 ) Reformulation • Paths correspond to coherent chains. • Problem: find a path of length K maximizing coverage of underlying articles a function of the nodes visited • Submodular orienteering Orienteering – [Chekuri and Pal, 2005] – Quasipolynomial time recursive greedy – O(log OPT) approximation Approach Overview: Recap Documents D 1. Coherence graph G 2. Coverage function f f( )=? … Encodes all m-coherent chains as graph paths Submodular orienteering [Chekuri & Pal, 2005] Quasipoly time recursive greedy3. Increase O(log OPT)Connectivity approximation Example Map: Greece Debt Game Plan Objective Algorithm Does it work? Evaluation • User study – Document selection: capturing important content? – Micro-knowledge: question-answering – Macro-knowledge: high-level summaries – Effect of structure • New York Times (2008-2010) – 18K+ articles – Chile, Haiti, Greece Document Selection • Experts compose a list of important events • Subtopic recall (% of events in the map): Subtopic recall # lines Micro-Knowledge (Question Answering) • Mechanical Turk Question 2: How many miners were trapped? • Competitors: – Google News – Event threading (TDT) [Nallapati et al, 04] – Structureless maps • Results: minor gains – map structure helps Macro-Knowledge (High-Level Summaries) • Summarize complex story in a paragraph – Maps vs. Google News – ~15 paragraphs per task • Mturk to evaluate paragraphs: – Which paragraph provided a more complete and coherent picture of the story? – Justification: Paragraph A is more… – ~300 evaluations per task Macro-Knowledge: Results • Greece: 72% prefer maps – Justifications: Bottom line: maps are more useful as high-level tools for stories without a single dominant storyline Google News Maps • Haiti: 59% prefer maps – Map users mostly summarized one story line Conclusions • • • • • Formulated metrics characterizing good maps Efficient methods with theoretical guarantees User studies highlight the promise of the method Website on the way! Personalization Thank you! Finding Coherent Chains • Goal: represent all coherent chains • Problem: intractable • Divide and conquer: – Find short coherent chains – Concatenate to form longer coherent chains Website