CiteSight: Contextual Citation Recommendation with Differential Search

advertisement
CiteSight:
Contextual Citation
Recommendation with
Differential Search
Avishay Livne1, Vivek Gokuladas2, Jaime
Teevan3, Susan Dumais3, Eytan Adar1
1University of Michigan, 2Qualcom, 3Microsoft
#SIGIR18 #JaimesBackyard
CiteSight:
Contextual Citation
Recommendation with
Differential Search
Avishay Livne1, Vivek Gokuladas2, Jaime
Teevan3, Susan Dumais3, Eytan Adar1
1University of Michigan, 2Qualcom, 3Microsoft
Search Engines Focus on Speed
Why Do We Cite?
•
•
•
•
•
•
•
•
Paying homage to pioneers
Giving credit for related work
Identifying methodology
Providing background
Correcting one’s work
Correcting the work of others
Substantiating claims
…
[Garfield, 1965]
How Do We Cite?
• Many resources
– Search engines
– Bibliographic tools
– Colleagues
• Work practice
– Papers we know
– Papers we should know
Why × How = 2 Specs
• Spec 1
– I know what I want, give it to me now
– Citation context:
• “… calculating the differences between blocks of text [“
• Spec 2
– I don’t know or can’t remember what I want
• [cite]
• Complex, dynamic search space = slow
– Inherent trade-off
• Can we build a system to support both?
The CiteSight User Interface
Split World Into Two
Stuff I want fast =
stuff I know about
Microsoft
Academic
Stuff I don’t
know about
Strategy
• Small, personalized index
– Updated dynamically
• What you’ve cited before
• What you’ve cited now
• What other people have cited
– Venue, co-citation, etc.
• Run a big index for everything else
Ranking
• Query: Citation context
– “… calculating the differences between
blocks of text [“
• Dynamic recommendations
– Immediately: Search the cache
– In the background: Search the full index
• Rank retrieved papers:
– Gradient boosted regression tree
– Features: network + text
• Popularity, author similarity, textual similarity,…
Citation Context
• Citation context is
really good at
picking out
“winners”
• People talk about a
paper the same
way as you!
• Not the same way
the author talks
about their work
XYZ is
similar to
ABC […]
Bob et al.
introduced
ABC in
[…]
We utilize
ABC
to…[…]
Paper
text
Citations
That’s nice…
(S. Redner, 1998)
Context Coupling
• A and B related
– Co-cited: When B
is mentioned, A is
• “Borrow” contexts
from A to B
• Borrowed context
used as a feature
in ranking papers
A
Popular paper
B
Less-popular paper
CiteSight Evaluation
• Can CiteSight predict existing citations?
– 1000 randomly selected CS papers (2011)
• Criteria: 20-40 citations
– 5-fold cross validation
– Metric: NDCG
• Gain of 1 when guesses correct citation
• Gain related to # of co-citations for close guesses
• User feedback from 5 CS grad students
Results
• Large improvement
– Context coupling
– All features
Features
NDCG@10
Text only
40.8%
Context coupling
46.5%
All features
61.9%
Results
• Large improvement
– Context coupling
– All features
– Citation-related
features > text
• More info = better
– Authors
– Citations, to a point
Features
NDCG@10
Text only
40.8%
Context coupling
46.5%
All features
61.9%
+ keywords
46.5%
+ title
46.6%
+ authors similarity
47.5%
+ abstract
47.8%
+ citation count
48.6%
+ venue relevancy
49.2%
+ citations
53.0%
+ co-citations
56.7%
+ authors history
57.6%
Cache v. Corpus
• Relevance
– Cache accounts for
46% of NDCG@10
of the corpus
– 10% cache is better
• Speed
– Cache: 6 ms
• Instantaneous!
– Corpus: 450 ms
Summary
• Differential need for speed
• CiteSight – differential search
– Two different use cases = two indices
1. Local index updated dynamically, contextually
2. Global index with full content
– Context coupling improves relevance
– Local index improves speed
• Able to provide instantaneous results
• Often relevant because contextually updated
Questions?
Download