“As we may think” (Vannevar Bush, 1945) www.webat25.org Source: http://w3.org/Proposal.html Lance Ulanoff, Mashable.com, December 4 2015 Hiro: Who worshipped Asherah? Librarian: Everyone who lived between India and Spain, from the second millennium B.C. up into the Christian era. With the exception of the Hebrews, who only worshipped her until the religious reforms.... Hiro: I thought the Hebrews were monotheists…. Librarian: Monolatrists. They did not deny the existence of other gods. Asherah was venerated as the consort of Yahweh. Hiro: I don't remember anything about God having a wife in the Bible. Librarian: The Bible didn't exist at that point. Judaism was just a loose collection of Yahwistic cults, each with different shrines and practices. Hiro and the Librarian, Chapter 30, Snow Crash (1992) , Neal Stephenson In Arabia In Ugarit In Egypt In Israel and Judah Semantic Web (Tim Berners-Lee, 2000) “The intelligent agent that people have touted for ages will finally materialize.” http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide10-0.html Semantic Web Knowledge Web • Human readable vs machine readable contents • Machine reads human readable contents • Human defines standard for data formats and models • Machine learns to conflate different formats of the same thing • Explicit and precise specification of knowledge representation that everyone has to agree upon • Latent and fuzzy representation of knowledge learned by mining big data Paradigm Shift in Web Search (the “Librarian”) TRADITIONAL WEB SEARCH KNOWLEDGE WEB SEARCH Index Keywords in Documents Digest World’s Knowledge Match Keywords in Queries Match User Intent Relevance of “10 blue links” Dialog Experience 1. “Bing Dialog Model: Knowledge, Intent and Dialog”, MSR Faculty Summit, July 2010 2. “Introducing the Knowledge Graph: things, not strings”, Official Google Blog, May 2012 3. “Chinese Search Engine – Baidu’s Practice”, SIRIP, SIGIR 2014, July 2014 “Dialog Acts” in Bing/Cortana • Answer • Confirmation • Disambiguation • Suggestion • Progressive: Refinement • Digressive: Recommendation (reactive + proactive) • Key difference from human-to-human dialog • Not limited to anthropomorphic natural language dialogs • Each dialog turn can present multiple acts • Can overload back channel communications Closed-loop Dynamic Bayesian Inference Bayesian Minimum Risk It = arg max P(I | Ut, K, It-1) At = arg min E[Cost(A, It )] Knowledge + History Previous Inferences (K, It - 1) Expected Behavior (U’t) + + - Inferred Intent (It) Intent Model User Behavior (Ut) Expected Behavior (U’t) Inferred Intent (It ) Behavior Model Interaction Model Inferred Action (At) Digital Librarian for Researchers: How far are we? A Case Study on Microsoft Academic Search Predictive Completion and Disambiguation Knowledge Driven Suggestions Research Challenges • How to complete never foreseen academic queries? • How to rank completion suggestions? • How to avoid making completions leading no search results? More on Intent Inference • Generative model approach: πΌπ‘ = arg max π(πΌ|ππ‘ , πΎ, πΌπ‘−1 ) πΌ = arg max π ππ‘ πΌ π(πΌ|πΎ, πΌπ‘−1 ) πΌ • Dynamic ranking, π(ππ‘ |πΌ), score depending on user behavior (e.g., query) • Static ranking, π(πΌ|πΎ, πΌπ‘−1 ), score determined by knowledge and dialog history Special Case 1: Static Rank at Onset • Given knowledge graph, find π(πΌ|πΎ) for all entity types • Journal, article, conference, author, institution • Journal impact factor: E. Garfield, Science, 1972 • Page Rank: A paper is important if cited by important papers • G. Pinski and F. Narin, Information Processing and Management, 1976 • N. Geller, Information Processing and Management, 1978 • Rediscovery of Perron-Frobenius theorem (1904) • How to make better use of heterogeneity of the graph? Static Rank of a Paper • Inaugural WSDM Cup, Autumn 2015 • Industry organizer: MSR and Elsevier • http://www.wsdm-conference.org/2016/wsdm-cup.html Microsoft Academic Graph Author (> 40M) Paper (> 100M) Event (> 46K) Venue (> 23K) Citations (billions) Institution (20K) Field of Study (> 50K) Microsoft Academic Graph (MAG) • Data Releases on Azure • Free Azure accesses for research • http://research.microsoft.com/ MAG • Web Service API coming! • Community properties Special Case 2: “Zero-query” Suggestion • Digital librarian to notify me new materials I should read • Find πΌπ‘ = arg max π(πΌ|πΎ, πΌπ‘−1 ) whenever the knowledge graph grows • Best if • Tailored to user based on interests inferred from aggregated behaviors • Following user wherever, whenever and whatever • Cortana: intelligent personal assistant • Windows, Android, IOS Summary • Intelligent agent at web scale (“digital librarian”): • From keyword matching to intent/knowledge understaning • One year old for academic services! • Conduct interactive dialog or forage on behalf of users behind the scene • Albeit w/o anthropomorphic façade • Microsoft Academic Services: • • • • Search (reactive) Cortana notification (proactive) Data and Intelligent API We want to build a community