Search with a Little Help from Your Friends Making Web Search more Collaborative Barry Smyth CLARITY: Centre for Sensor Web Technologies University College Dublin Tuesday 3 May 2011 The Web’s Killer App? Tuesday 3 May 2011 The Web’s Killer App? Tuesday 3 May 2011 lycos.com (circa 1996) Tuesday 3 May 2011 Excite.com (circa 1996) Tuesday 3 May 2011 webcrawler.com (circa 1997) Tuesday 3 May 2011 “the verb” (Beta, circa 1998) Tuesday 3 May 2011 Google.com (circa 1999) Tuesday 3 May 2011 ask.com (circa 2000) Tuesday 3 May 2011 metacrawler.com (2000) Tuesday 3 May 2011 directhit.com (2000) Tuesday 3 May 2011 Index Size, x (10^x) 12 1.0 2.0 8 4 Terms 0 1994 1996 Terms+Links 1998 2000 2002 2004 2006 2008 Year Approx. 1 billion queries per day! Tuesday 3 May 2011 2010 Web Search 101 Query I Q S Result List Tuesday 3 May 2011 R Index Search Engine Overview The State of Web Search - Key Challenges Potential Solutions - Context in Web Search Towards Social Web Search - HeyStaks Tuesday 3 May 2011 Challenges Tuesday 3 May 2011 Vague Queries The Vocabulary Problem One-Size-Fits-All Content Farming & SEO Tuesday 3 May 2011 50% Seach Failure Rate 25% of searches Average query size (2-3 terms) is insufficient to guarantee effective search engine retrieval click to back button! !"#$%&'()*+,, -+.$(/, !"#$%&# '(%# )*+,-# ./00(&&1/2# .(3%0-# * iProspect (2006), Jansen & Spink (2007), Morris et al (2008), Coyle et al (2007,2008) Tuesday 3 May 2011 The Vocabulary Gap Tuesday 3 May 2011 t o N s e Do One Size Fits All ^ Tuesday 3 May 2011 Tuesday 3 May 2011 ? Tuesday 3 May 2011 Content Farms, SEO, Gaming Focused SEO to promote commissionedcontent. (!1m items / month) Tuesday 3 May 2011 The DemandMedia Model SE/ISP Logs SE Ad Data SE Logs Similar Past Ads Search Terms Ad Rates Content Titles Projected Ad Revenue $x PPC “how to paint an easter egg” Life Time Value (LTV) easter egg Freelance Assignments $15/article, $20/video Tuesday 3 May 2011 Black Hat SEO JC Penny Content Farming New York Times, Feb 2011 Large-Scale Link Farming 2000+ sites linking to JCP for terms like “black dresses”, “casual dresses”, etc. Significant Benefits to JCP Millions of inbound visitors during the holiday season! Tuesday 3 May 2011 The Google Response Major Algorithmic Change Reducing the ranking of low-quality sites, impacting 12% of queries. Spam-blocking Extension Google Chrome extension to allow users to block low-quality sites from result-lists. Major Impact on Many Sites ... including some surprises (SlideShare, Technorati). Tuesday 3 May 2011 Vague Queries The Vocabulary Problem One-Size-Fits-All Content Farming & SEO Tuesday 3 May 2011 Web Search is changing... Tuesday 3 May 2011 Improving search by better understanding user needs and search context ... Tuesday 3 May 2011 Context in Search User Context C Q I Preferences, usage history, profiles Document Context Meta-data, content features S Task Context Current activity, location etc. R Tuesday 3 May 2011 Social Context Leveraging the social graph. User Context Personalizing Web Search Adapting search responses according to the needs/interests of the searcher. Modeling User Interests ODP Categories, query histories, result selections, etc. Adaptation Strategies Re-ranking, query modification, etc. See also Chirita et al (SIGIR 2005), Shen et al (CIKM 2005),Teevan et al (TOCHI, 2010) Tuesday 3 May 2011 PSearch (Teevan et al, SIGIR 2005) Client-Side Profiling Explicit, Content, Behaviour Personalized Ranking User Context Tuesday 3 May 2011 Potential for Personalization Teevan et al (SIGIR 2008) How well can a single resultlist satisfy a group of users? Not every query benefits from personalization. Predicting the PfP of search Potential for personalization (PfP) curves queries (click-based measures) [from Teevan et al (2008)] Tuesday 3 May 2011 Google AdSense Document Context Tuesday 3 May 2011 Tuesday 3 May 2011 Task Context Activity/Task Context E.g. writing a talk, planning a trip, shopping, etc. Location-Based Search Google on the iPhone ... Embedded Web Search Watson, (Budzik & Hammond, IUI 2000), RemembranceAgent (Rhodes, IEEE Trans. Comp 2003) Tuesday 3 May 2011 IntelliZap Context → Query Augmentation Finkelstein wt al. WWW 2001 Tuesday 3 May 2011 Task User Tuesday 3 May 2011 Doc So cia l Web Search as a Solitary Activity Tuesday 3 May 2011 Recovery vs Discovery Repeat New Click Click Repeat 29% Query 4% New 10% Query 57% Teevan et al, 2007 Tuesday 3 May 2011 57% 43% Recovery Discovery !"#$"%" &"'()*+,('")*("-.*"'./(0,#$1"0,(" '()*+,(*",)'")2*()34"-.5$3"35*#$1" )"6*(7#.5'"'()*+,"'(''#.$89" :"#$";" &"'()*+,('")*("-.*"'./(0,#$1"0,)0" )"'()*+,(*'"-*#($3'".*"+.22()15('" ,)7("*(+($024"-.5$389" Search should be more personal & collaborative! * Morris et al (2008), Teevan et al (2007), Smyth et al (2004,2006,2008) Tuesday 3 May 2011 Web Search as a Social Activity Tuesday 3 May 2011 !"#$%&'()*+)",' !-%+#.'/"01-$2,' /%,01-)*#' /-,0)*#' !""#$%&'()*#&'+,-""&'.' 23445'67%07' 2L445'67%07' 8,#%'9*:%;' /"1),$'!0,I-' !<4'()$$)"*'8,#%7' !'>4'D0)$$)"*'B%$,M"*7-)I7' >445'?6%0)%7@:,A' O>45'5)$$)"*'7-,0%7@:,A' B%$%C,*1%' B%I6P,M"*' =6%0)%7' D%057&'E)*F7&'8,#%B,*F' Tuesday 3 May 2011 G,1%(""F&'HA/I,1%&DJ)K%0&.' N"556*)M%7' /"1),$'B,*FQ' !"#$%&'()* +,"$'"-* Tuesday 3 May 2011 !&#$'()* ./00,('1"-* !"#$%&'()* Social News Searching the RTW !&#$'()* Semantic Search Social Indexing/Filtering Social Bookmarking Personalization Social Q&A Meta Search People Search +,"$'"-* Tuesday 3 May 2011 ./00,('1"-* Flavours of Social Search... Tuesday 3 May 2011 Search & the Real-Time Web Text } Tuesday 3 May 2011 Live Twitter Feed Exploiting the Social Graph www.vark.com – Social Q&A Tuesday 3 May 2011 Text A Conversational Thread on Aardvark Tuesday 3 May 2011 Key Features Social Indexing People vs Documents: User-Topic / UserUser Relationships Question Classification Scored topic list (classification-based approach) Query Routing An aspect model routes questions to potential answerers. Damon Horowitz, Sepandar D. Kamvar: The anatomy of a large-scale social search engine. WWW 2010: 431-440 Tuesday 3 May 2011 Answer Ranking Ranked candidate answerers based on topic, expertise, availability. Searching Social Content Harnessing the Social Graph ... Tuesday 3 May 2011 Smyth et al, 2006 Teevan et al, 2007 Morris et al, 2008 Collaboration in Search Tuesday 3 May 2011 90% of people have engaged in some form of collaboration during Web search. 87% of people have exhibited “back-seat searching.” 86% of people go on to share results with others. 25%-40% of the time we are re-searching for things we have previously found. 66% of the time we are looking for something that a friends or colleague has recently found. Repetition & Regularity in Communities ... Smyth et al, UMUAI 2004 Tuesday 3 May 2011 Collaborative IR Tuesday 3 May 2011 Remote Search Together (UIST ’07) HeyStaks (UMAP ’09) Co-Located CoSearch (CHI ’08) ? Synch Asynch SearchTogether Morris et al (2008) Tuesday 3 May 2011 Tuesday 3 May 2011 HeyStaks A Case-Study in Social Search Tuesday 3 May 2011 Motivating HeyStaks Web Search. Shared! Harness the collaborative nature of Web Search by providing integrated support for the sharing of search experiences. User Control Support the searcher by providing fine-grained control over collaboration features and facilities. Integrate with Mainstream Search Engines Users want to search as normal, using their favourite search engines, while, at the same time, benefiting from collaboration. Tuesday 3 May 2011 HeyStaks: A Search Utility Create Staks Sharing Users can easily create Search Staks (public/ private) as a way to capture search activities. Share Knowledge Share Staks with friends and others to grow community/task-based search expertise. Search & Promote Relevance Tuesday 3 May 2011 As users search within a Stak(s), relevant results are promoted and enhanced. Tuesday 3 May 2011 Stak 5 Stak 6 Stak 4 Stak 2 Stak 1 Tuesday 3 May 2011 Stak 3 Stak 7 Task Multi-Task Groups Multi-Task Individual Single Task Groups Single Task Individual Social Tuesday 3 May 2011 HeyStaks Architecture Tuesday 3 May 2011 Tuesday 3 May 2011 Tuesday 3 May 2011 Tuesday 3 May 2011 Tuesday 3 May 2011 Tuesday 3 May 2011 HeyStaks Architecture Tuesday 3 May 2011 Search Staks page URLS terms query, tag, snippet P1 P2 12 Pn 32 votes + - 4 6 9 2 27 12 3 score(qT, Pi, Sj) = Rel(qT, Pi, Sj)∗TFIDF(qT, Pi, Sj) Tuesday 3 May 2011 The Social Life of Search Tuesday 3 May 2011 Initial Evaluation HeyStaks Beta Trial Focus on 95 early, active HeyStaks-Beta users who registered with HeyStaks during the period October 2008 - January 2009. Stak Creation/Sharing Do users take the time to create and share search staks (and search experiences)? Collaboration Effects Do searchers benefit from the effects of search collaboration in general, and stak promotions in particular? Tuesday 3 May 2011 '!" !"#$%&+#$,(&-*)$#)./0"12).3&+#$,&45..1)(3&'()*&6&'()*& &#" ()**+,-" &!" 5" !"#$%& %#" %!" $#" ./+0,*" 12,34,*" 7$8&9:;&-*)$#)./0"12)./45..1)(&<)*&'()*&& 7=8&+">1$=%)&:(&+"%1#$*?&'()*(& &!7" '" %" 6!7" !" $!" #" !" '()*(& Stak Creation & Sharing Tuesday 3 May 2011 8/1+3(9," 8/9+432:" Producers & Consumers P C Basic Unit of Collaboration Searcher C selects a promotion previously selected by P. Tuesday 3 May 2011 Search Collaboration Tuesday 3 May 2011 !"#$%&'"()$*+$,*--."*)./*)0$ !"#$%&'()*+&,$-$.'/,)0+&,$ "%%# ('# '%# !"# %#"'$ )*+,-./0,# 10*2-3/0,# $!# %&"'$ +,-./012.$ 32,4/512.$ *"'$ !"#$ )"'$ &'# %# ("'$ '"'$ Producers & Consumers Tuesday 3 May 2011 %&"#$ Search Leaders & Followers Tuesday 3 May 2011 #!!" '&!" !""#$!#%&%'%()$ '%!" '$!" '#!" '!!" -%./+$!""#$0)$*"+,$!#%&%'%()$ &!" %!" ($)" $!" *++," -+./" %%)" #!" !" !" #!" $!" %!" &!" '!!" '#!" '$!" *"+,$!#%&%'%()$ Promotion Sources Tuesday 3 May 2011 '%!" '&!" #!!" Users create & share staks. Collaboration commonplace. Users benefit from peers. Tuesday 3 May 2011 Reputation All searchers are not created equal! Staks are likely composed of a mixture of novice and expert searchers. Can we identify the best searchers? Overall or at stak-level? Can we use this reputation information to further influence recommendation? Tuesday 3 May 2011 Modeling Reputation rep P C pi C confers reputation on P Searcher C selects a promotion previously selected by P. Tuesday 3 May 2011 User Reputation McNally, O’Mahony, Smyth (IUI, 2010) Tuesday 3 May 2011 Reputation Model Producer Reputation Result Reputation Reputation Ranking Tuesday 3 May 2011 Initial Evaluation HeyStaks Reputation Trial 64 undergraduate students participated in a general-knowledge quiz using HeyStaks to guide their searches. Multiple Stak Sizes Users were segregated into different stak sizes (1,5, 9,19, 25) to analyse the relationship between stak size and performance. Ground-Truth Based Performance Analysis Fixed Q&A facilitated a definitive analysis of the relevance of organic and promoted results. Tuesday 3 May 2011 Query Coverage Percentage of queries receiving promotions. Tuesday 3 May 2011 Organic vs Promoted Relative relevance of organic and promoted results. Tuesday 3 May 2011 Reputation Analysis Tuesday 3 May 2011 Reputation x Time Tuesday 3 May 2011 Final User Reputation Tuesday 3 May 2011 Relative Benefit Tuesday 3 May 2011 www.heystaks.com Tuesday 3 May 2011 Conclusions The conservative world of Web Search is changing! Context in Web search Adaptation. Collaboration in Web Search Harnessing the Social Graph. From relevance to reputation Improved click-thru rates. Tuesday 3 May 2011 Lessons Learned Mainstream Web Search Integration There is little value in developing competing Web search offerings; users want to search as normal using their favourite search engine (Google,Yahoo, Bing, ...) Personlization vs User Experience An improved user experience can translate into much greater user-takeup than incremental improvements in personalization. Tuesday 3 May 2011