The Complex Task of making Search Simple

advertisement
T H E C O M P L E X TA S K O F
MAKING SEARCH SIMPLE
Jaime Teevan (@jteevan)
Microsoft Research
UMAP 2015
THE WORLD WIDE WEB
20 YEARS AGO
Content
2,700 websites (14% .com)
Tools
Mosaic only 1 year old
Pre-Netscape, IE, Chrome
4 years pre-Google
Search Engines
54,000 pages indexed by Lycos
1,500 queries per day
THE WORLD WIDE WEB TODAY
Trillions of pages indexed.
Billions of queries per day.
1996
We assume information is static.
But web content changes!
SEARCH RESULTS CHANGE
New, relevant content
Improved ranking
Personalization
General instability
Can change during a query!
SEARCH RESULTS CHANGE
BIGGEST CHANGE ON THE WEB
Behavioral data.
BEHAVIORAL DATA
MANY YEARS AGO
Marginalia adds value to books
Students prefer annotated texts
Do we lose marginalia when we
move to digital documents?
No! Scale makes it possible to look
at experiences in the aggregate,
and to tailor and personalize
It is impossible to
separate a cube into
two cubes, or a
fourth power into
two fourth powers,
or in general, any
power higher than
the second, into two
like powers. I have
discovered a truly
marvellous proof of
this, which this
margin is too
narrow to contain.
PAST SURPRISES ABOUT WEB SEARCH
Early log analysis
 Excite logs from 1997, 1999
 Silverstein et al. 1999; Jansen et al. 2000; Broder 2002
Queries are not 7 or 8 words long
Advanced operators not used or “misused”
Nobody used relevance feedback
Lots of people search for sex
Navigational behavior common
Prior experience was with library search
SEARCH IS COMPLEX, MULTI-STEPPED PROCESS
Typical query involves more than one click
 59% of people return to search page after their first click
 Clicked results often not the endpoint
 People orienteer from results using context as a guide
 Not all information needs can be expressed with current tools
 Recognition is easier than recall
Typical search session involves more than one query
 40% of sessions contain multiple queries
 Half of all search time spent in sessions of 30+ minutes
Search tasks often involves more than one session
 25% of queries are from multi-session tasks
IDENTIFYING VARIATION ACROSS INDIVIDUALS
Normalized DCG
1
0.95
0.9
0.85
0.8
0.75
1
2
3
4
Number of People in Group
Group
Individual
5
6
WHICH QUERY HAS LESS VARIATION?
campbells soup recipes v. vegetable soup recipe
tiffany’s v. tiffany
nytimes v. connecticut newspapers
www.usajobs.gov v. federal government jobs
singaporepools.com v. singapore pools
NAVIGATIONAL QUERIES WITH LOW VARIATION
Use everyone’s clicks to identify queries with low click entropy
 12% of the query volume
 Only works for popular queries
Clicks predicted only 72% of the time
 Double the accuracy for the average query
 But what is going on the other 28% of the time?
Many typical navigational queries are not identified
 People visit interior pages
 craigslist – 3% visit http://geo.craigslist.org/iso/us/ca
 People visit related pages
 weather.com – 17% visit http://weather.yahoo.com
INDIVIDUALS FOLLOW PATTERNS
Getting ready in the morning.
Getting to a webpage.
FINDING OFTEN INVOLVES REFINDING
Repeat query (33%)
 user modeling, adaptation, and
personalization
Repeat click (39%)
 http://umap2015.com/
 Query  umap
Lots of repeats (43%)
Repeat
Click
New
Click
Repeat
Query
33%
29%
4%
New
Query
67%
10%
57%
39%
61%
IDENTIFYING PERSONAL NAVIGATION
Use an individual’s clicks to identify repeat (query, click) pairs
 15% of the query volume
 Most occur fewer than 25 times in the logs
Queries more ambiguous
 Rarely contain a URL fragment
 Click entropy the same as for general Web queries
 Multiple meanings – enquirer
 Found navigation – bed bugs
 Serendipitous encounters – etsy
95%
National Enquirer
http://www.medicinenet.com/bed_bugs/article.htm
Cincinnati Enquirer
Etsy.com
[Informational]
Regretsy.com (parody)
SUPPORTING PERSONAL NAVIGATION
Tom Bosley - Wikipedia, the free encyclopedia
Bosley died
Thomas
Edward
at 4:00
"Tom"
a.m.
Bosley
of heart
(October
failure1,on1927
October
October
19, 2010,
19, 2010)
at a was
hospital
an American
near his
actor, in
home
best
Palm
known
Springs,
for portraying
California.Howard
… His agent,
Cunningham
Sheryl on
Abrams,
the long-running
said BosleyABC
hadsitcom
been
Happy Days.
battling
lung cancer.
Bosley was born in Chicago, the son of Dora and Benjamin Bosley.
en.wikipedia.org/wiki/tom_bosley
PATTERNS A DOUBLE EDGED SWORD
Patterns are predictable.
Changing a pattern is confusing.
CHANGE INTERRUPTS PATTERNS
Example: Dynamic menus
 Put commonly used items at top
 Slows menu item access
Does search result change
interfere with refinding?
CHANGE INTERRUPTS REFINDING
When search result ordering changes people are
 Less likely to click on a repeat result
 Slower to click on a repeat result when they do
 More likely to abandon their search
Even happens when the
repeat result moves up!
How to reconcile the benefits
of change with the interruption?
9
Time to click S2 (secs)
Happens within a query
and across sessions
5.5
Down
Gone
Stay
Up
2
0
4
8
12
Time to click S1 (secs)
16
20
USE MAGIC TO MINIMIZE INTERRUPTION
ABRACADABRA
Magic happens.
YOUR CARD IS GONE!
CONSISTENCY ONLY MATTERS SOMETIMES
BIAS PERSONALIZATION BY EXPERIENCE
CREATE CHANGE BLIND WEB EXPERIENCES
CREATE CHANGE BLIND WEB EXPERIENCES
THE COMPLEX TASK OF MAKING SEARCH SIMPLE
Challenge: The web is complex
 Tools change, content changes
 Different people use the web differently
Fortunately, individuals are simple
 We are predictable, follow patterns
 Predictability enables personalization
Beware of breaking expectations!
 Bias personalization by expectations
 Create magic personal experiences
REFERENCES
Broder. A taxonomy of web search. SIGIR Forum, 2002
Donato, Bonchi, Chi & Maarek. Do you want to take notes? Identifying
research missions in Yahoo! Search Pad. WWW 2010.
Dumais. Task-based search: A search engine perspective. NSF Task-Based
Information Search Systems Workshop, 2013.
Jansen, Spink & Saracevic. Real life, real users, and real needs: A study
and analysis of user queries on the web. IP&M, 2000.
Kim, Cramer, Teevan & Lagun. Understanding how people interact with web
search results that change in real-time using implicit feedback. CIKM 2013.
Lee, Teevan & de la Chica. Characterizing multi-click search behavior and
the risks and opportunities of changing results during use. SIGIR 2014.
Mitchell & Shneiderman. Dynamic versus static menus: An exploratory
comparison. SIGCHI Bulletin, 1989.
Selberg & Etzioni. On the instability of web search engines. RIAO 2000.
Silverstein, Marais, Henzinger & Moricz. Analysis of a very large web
search engine query log. SIGIR Forum, 1999.
Somberg. A comparison of rule-based and positionally constant
arrangements of computer menu items. CHI 1986.
Svore, Teevan, Dumais & Kulkarni. Creating temporally dynamic web
search snippets. SIGIR 2012.
Teevan. The Re:Search Engine: Simultaneous support for finding and refinding. UIST 2007.
Teevan. How people recall, recognize and reuse search results. TOIS, 2008.
Teevan, Alvarado, Ackerman & Karger. The perfect search engine is not
enough: A study of orienteering behavior in directed search. CHI 2004.
Teevan, Collins-Thompson, White & Dumais. Viewpoint: Slow search. CACM,
2014.
Teevan, Collins-Thompson, White, Dumais & Kim. Slow search: Information
retrieval without time constraints. HCIR 2013.
Teevan, Cutrell, Fisher, Drucker, Ramos, Andrés & Hu. Visual snippets:
Summarizing web pages for search and revisitation. CHI 2009.
Teevan, Dumais & Horvitz. Potential for personalization. TOCHI, 2010.
Teevan, Dumais & Liebling. To personalize or not to personalize: Modeling
queries with variation in user intent. SIGIR 2008.
Teevan, Liebling & Geetha. Understanding and predicting personal
navigation. WSDM 2011.
Tyler & Teevan. Large scale query log analysis of re-finding. WSDM 2010.
More at: http://research.microsoft.com/~teevan/publications/
THANK YOU!
Jaime Teevan (@jteevan)
teevan@microsoft.com
EXTRA SLIDES
How search engines can make
use of change to improve search.
CHANGE CAN IDENTIFY IMPORTANT TERMS
Divergence from norm
 cookbooks
 frightfully
 merrymaking
 ingredient
 latkes
Staying power in page
Sep.
Oct.
Nov.
Time
Dec.
CHANGE CAN IDENTIFY IMPORTANT SEGMENTS
Page elements change at different rates
Pages are revisited at different rates
Resonance can
serve as a filter for
important content
EXTRA SLIDES
Impact of change on
refinding behavior.
BUT CHANGE HELPS WITH FINDING!
Change to click
 Unsatisfied initially
 Gone > Down > Stay > Up
 Satisfied initially
 Stay > Down > Up > Gone
NSAT
SAT
Up
2.00
4.65
Stay
2.08
4.78
Down
2.20
4.75
Gone
2.31
4.61
Changes around click
 Always benefit NSAT users
 Best below the click for
satisfied users
NSAT
SAT
Changes
Static
Above
2.30
4.93
2.21
4.93
Below
2.09
4.79
1.99
4.61
EXTRA SLIDES
Privacy issues and
behavioral logs.
PUBLIC SOURCES OF BEHAVIORAL LOGS
Public Web service content
 Twitter, Facebook, Digg, Wikipedia
Research efforts to create logs
 Lemur Community Query Log Project
 http://lemurstudy.cs.umass.edu/
 1 year of data collection = 6 seconds of Google logs
Publicly released private logs
 DonorsChoose.org
 http://developer.donorschoose.org/the-data
 Enron corpus, AOL search logs, Netflix ratings
EXAMPLE: AOL SEARCH DATASET
August 4, 2006: Logs released to academic community
 3 months, 650 thousand users, 20 million queries
 Logs contain anonymized User IDs
August 7, 2006: AOL pulled the files, but already mirrored
AnonID
Query
QueryTime
-------------------------------1234567
jitp
2006-04-04 18:18:18
“A Face1234567
Is Exposedjiptfor
AOL
Searcher
No.
4417749”
submission process
2006-04-04 18:18:18
computational
socialinscinece
2006-04-24
09:19:32
Queries1234567
for businesses,
services
Lilburn,
GA (pop.
11k)
1234567
computational social science 2006-04-24 09:20:04
Queries1234567
for Jarrettseattle
Arnold
(and others of
the Arnold
clan)
restaurants
2006-04-24
09:25:50
1234567 all perlman
montreal
2006-04-24
10:15:14
NYT contacted
14 people
in Lilburn with
Arnold
surname
1234567
jitp 2006 notification
2006-05-20 13:13:13
When contacted,
Thelma Arnold acknowledged her queries
…
ItemRank
------------1
3
ClickURL
-----------http://www.jitp.net/
http://www.jitp.net/m_mscript.php?p=2
2
2
4
http://socialcomplexity.gmu.edu/phd.php
http://seattletimes.nwsource.com/rests
http://oldwww.acm.org/perlman/guide.html
August 9, 2006: New York Times identified Thelma Arnold





August 21, 2006: 2 AOL employees fired, CTO resigned
September, 2006: Class action lawsuit filed against AOL
EXAMPLE: AOL SEARCH DATASET
Other well known AOL users
 User 927 how to kill your wife
 User 711391 i love alaska
 http://www.minimovies.org/documentaires/view/ilovealaska
Anonymous IDs do not make logs anonymous
 Contain directly identifiable information
 Names, phone numbers, credit cards, social security numbers
 Contain indirectly identifiable information
 Example: Thelma’s queries
 Birthdate, gender, zip code identifies 87% of Americans
EXAMPLE: NETFLIX CHALLENGE
October 2, 2006: Netflix announces contest
 Predict people’s ratings for a $1 million dollar prize
 100 million ratings, 480k users, 17k movies
 Very careful with anonymity post-AOL
All customer identifying information has
been removed; all that remains are ratings
and dates. This follows our privacy policy. . .
12, 3, 2006-04-18
[CustomerID,
Date]
 Paper published
by Narayanan
&Rating,
Shmatikov
1234, 5 , 2003-07-08 [CustomerID, Rating, Date]
Even if, for example, you knew all your own
2468, 1, 2005-11-12
[CustomerID,
 Uses background
knowledge
fromRating,
IMDBDate]
ratings and their dates you probably couldn’t
…
 Robust to perturbations in data
identify them reliably in the data because
Movie Titles
…
only a small sample was included (less than
December
17, 2009: Doe v. Netflix
10120, 1982, “Bladerunner”
one tenth of our complete dataset) and that
17690, 2007, “The Queen”
… 2010: Netflix cancels second competition
March 12,
data was subject to perturbation.
May 18,Ratings
de-anonymized
1:2008: Data [Movie
1 of 17770]
Download