CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne1, Vivek Gokuladas2, Jaime Teevan3, Susan Dumais3, Eytan Adar1 1University of Michigan, 2Qualcom, 3Microsoft #SIGIR18 #JaimesBackyard CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne1, Vivek Gokuladas2, Jaime Teevan3, Susan Dumais3, Eytan Adar1 1University of Michigan, 2Qualcom, 3Microsoft Search Engines Focus on Speed Why Do We Cite? • • • • • • • • Paying homage to pioneers Giving credit for related work Identifying methodology Providing background Correcting one’s work Correcting the work of others Substantiating claims … [Garfield, 1965] How Do We Cite? • Many resources – Search engines – Bibliographic tools – Colleagues • Work practice – Papers we know – Papers we should know Why × How = 2 Specs • Spec 1 – I know what I want, give it to me now – Citation context: • “… calculating the differences between blocks of text [“ • Spec 2 – I don’t know or can’t remember what I want • [cite] • Complex, dynamic search space = slow – Inherent trade-off • Can we build a system to support both? The CiteSight User Interface Split World Into Two Stuff I want fast = stuff I know about Microsoft Academic Stuff I don’t know about Strategy • Small, personalized index – Updated dynamically • What you’ve cited before • What you’ve cited now • What other people have cited – Venue, co-citation, etc. • Run a big index for everything else Ranking • Query: Citation context – “… calculating the differences between blocks of text [“ • Dynamic recommendations – Immediately: Search the cache – In the background: Search the full index • Rank retrieved papers: – Gradient boosted regression tree – Features: network + text • Popularity, author similarity, textual similarity,… Citation Context • Citation context is really good at picking out “winners” • People talk about a paper the same way as you! • Not the same way the author talks about their work XYZ is similar to ABC […] Bob et al. introduced ABC in […] We utilize ABC to…[…] Paper text Citations That’s nice… (S. Redner, 1998) Context Coupling • A and B related – Co-cited: When B is mentioned, A is • “Borrow” contexts from A to B • Borrowed context used as a feature in ranking papers A Popular paper B Less-popular paper CiteSight Evaluation • Can CiteSight predict existing citations? – 1000 randomly selected CS papers (2011) • Criteria: 20-40 citations – 5-fold cross validation – Metric: NDCG • Gain of 1 when guesses correct citation • Gain related to # of co-citations for close guesses • User feedback from 5 CS grad students Results • Large improvement – Context coupling – All features Features NDCG@10 Text only 40.8% Context coupling 46.5% All features 61.9% Results • Large improvement – Context coupling – All features – Citation-related features > text • More info = better – Authors – Citations, to a point Features NDCG@10 Text only 40.8% Context coupling 46.5% All features 61.9% + keywords 46.5% + title 46.6% + authors similarity 47.5% + abstract 47.8% + citation count 48.6% + venue relevancy 49.2% + citations 53.0% + co-citations 56.7% + authors history 57.6% Cache v. Corpus • Relevance – Cache accounts for 46% of NDCG@10 of the corpus – 10% cache is better • Speed – Cache: 6 ms • Instantaneous! – Corpus: 450 ms Summary • Differential need for speed • CiteSight – differential search – Two different use cases = two indices 1. Local index updated dynamically, contextually 2. Global index with full content – Context coupling improves relevance – Local index improves speed • Able to provide instantaneous results • Often relevant because contextually updated Questions?