Recent Developments at Yahoo! in Search & Mobile, and Future Challenges WikiMania’08, July 18, 2008 Alexandria, Egypt Usama Fayyad Chief Data Officer & Executive VP Yahoo! Inc. Usama_fayyad@yahoo.com Research 0 1 Overview • About Yahoo! and its business • Yahoo! Mobile Philosophy • OneSearch 2.0 • Challenges in Mobile Search • Some words about search advertising • Examples of Search Evolution at Yahoo! • Concrete examples of the changes that are relevant to Social Web • Concluding thoughts Research 2 Globally, Internet Users Number Over 1 Billion Internet Users in Millions: Worldwide Total Japan Rest of World Western Europe Asia/Pacific United States 1,200.0 1,000.0 1,076.7 974.3 868.3 787.5 800.0 600.0 702.4 602.4 506.3 400.0 200.0 0.0 2001 Research Source: IDC, December 2003. 2002 2003 2004 2005 2006 2007 3 Yahoo! is the #1 Destination on the Web More people visited Yahoo! in the past month than: 73% of the U.S. Internet population uses Yahoo! – Over 500 million users per month globally! • Global network of content, commerce, media, search and access products • 100+ properties including mail, TV, news, shopping, finance, autos, travel, games, movies, health, etc. • 25 terabytes of data collected each day… and growing • • • • • • • Use coupons Vote Recycle Exercise regularly Have children living at home Wear sunscreen regularly Representing thousands of cataloged consumer behaviors Data is used to develop content, consumer, category and campaign insights for our key content partners and large advertisers Research Sources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005. 4 Yahoo! Data – A league of its own… Terrabytes of Warehoused Data Millions of Events Processed Per Day 14,000 5,000 1,000 500 YSM Y! Global GRAND CHALLENGE PROBLEMS OF DATA PROCESSING TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNET Y! Data Challenge Exceeds others by 2 orders of magnitude Research Y! Main warehouse NYSE Walmart VISA 100 Y! Panama Warehouse SABRE 94 Y! LiveStor 225 49 AT&T 120 Amazon 50 25 Korea Telecom 2,000 5 What About Yahoo! Mobile? • Fast growing initiative that is one of the companies priorities in the future • Great success in distribution – signed deals with 29 carriers, and therefore it’s accessible to 600 million subscribers, who are now under contract. – OneSearch is Yahoo’s mobile search application that it launched 13 months ago. Just launched OneSearch 2.0 • Marco Boerries, EVP of Mobile at Yahoo!: “No one has never amassed that kind of distribution under that short period of time.” Research 6 Mobile Device Internet Penetration Will Eclipse the PC 1 Billion people across the world use the Internet * 3.3 Billion people across the world are mobile service subscribers (that’s half the global population)** Research * U.N Telecommunications Agency, Sept 07 ** Informa, Nov 07 7 Yahoo!’s Global Mobile Reach 16.9 Million in USA 16.9 Million Unique Users Per Month In The U.S. Alone Unique Users Per Month (mm) Research Yahoo! 16.9 Google MSN AOL 12.1 8.9 8.6 8 Mobile Search Built for the Consumer PC Search Research = Mobile Search 9 The Mobile Use Case is Different Give me Answers, Entertainment, Images… Research 10 Y! oneSearch Changed the Game Answers Instead of Web Links. Relevant, Complete Results Research 11 Yahoo! Mobile Approach to Search • OneSearch is a special federated search engine – Analyses Concept and Intent of the query against a large collection of “vertical” backends • Web, News, Images, Finance, etc… • UGC such as Wikipedia and Yahoo! Answers – Aggregates results from verticals and blends to optimize to user query and to device used for query • Goal is to minimize clicks by taking user to results around tasks • Query sources: – Browsers: WAP/XHTML – Java app interface for Yahoo! Go – SMS text messaging for Yahoo! Mobile SMS Research 12 Approach • Be as Open as possible on interfaces • Fundamentally believe the mobile OS market will remain fragmented from a platforms perspective for quite a while – Windows Mobile only reached 30M users after more than 7 years of effort • Provide an environment to allow users to program to one target platform and let Yahoo! bear the effort of making it run on wide range of devices • Focus on the highest value apps for users today involving access to on-line world (less on client apps) • Return results and not links Research 13 Yahoo! Mobile Products Yahoo! Home Page Yahoo! Go 3.0 Yahoo! oneSearch Yahoo! onePlace Research * M:Metrics, October 2008 **All Yahoo! Mobile services are free. Check with your wireless carrier about data plan charges that may apply. Yahoo! oneConnect 14 OneSearch 2.0 • OneSearch is being opened up to all publishers and content owners so they can write rich metadata that will be returned as part of results, rather than just a link, – Similar to Yahoo’s Search Monkey service for the Internet. • More about this later… • Three new major upgrades – Search Assist: The search box will predict what you are typing. – Voice input: Users can search by speaking into the device instead of typing (provided by Vlingo) – The search box will be integrated into the home screen of the phones. Research 15 OneSearch 2.0 Better answers Turning web search results into answers Unlocking the power of the semantic web Providing more relevant content Research 16 OneSearch 2.0 Easier, faster input Predictive text completion Contextual recommendations Research Easier input 17 OneSearch 2.0 Speak your search Search for anything Personalized to your voice Research Voice input 18 OneSearch 2.0 Persistent 1-click access Gateway to the Internet Supports text & voice Research Always there Internet Use on Mobile vs. PC Research 19 20 Mobile Use • Today, we believe Internet use on PC is about 10x that of Mobile • Mobile is faster growing, in all regions • There are > 3x mobiles today than Internet users globally – But most phones are not data capable yet • The world today: – We are learning from the web, and attempting to figure out what makes sense for mobile users – Trying to work with the Smart Phones users as they represent the early adopters Research 21 Classical web search user needs • Informational (~25%) – want to learn about something Low hemoglobin • Navigational (~40%) – want to go to that page United Airlines • Transactional (~35%) – want to do something (webmediated) – Access a service – Downloads – Shop • Gray areas Mendocino weather Mars surface images Nikon CoolPix Car rental Finland – Find a good hub – Exploratory search “see what’s there” Research Broder 2002, A Taxomony of web search 22 What about on Mobile • No good classification • Several studies that cover – Query frequency distribution – Words per query – Characters per query • Categorization by query type into traditional categories: – Adult and Entertainment, Autos, Consumer Goods, Finance, Government & politics, Sports, Technology, Travel, etc… • Best known studies by – Kamvar and Baluja (2006 and 2007) – Yi, Maghoul, and Pedersen (2008) • Good quantitative statistics, little on qualitative purpose-driven analysis (early days still) Research 23 What do We Believe about Mobile Queries • We believe it is a different distribution than the query distribution for PC users – Bias towards shorter queries • Data contradicts that: 2.6 words per query, same # chars as PC – Difficulty of query entry is a significant hurdle – Much higher location-based activity – Much more task-oriented than exploration or research • Notifications adds a whole new “push” dimension – Trigger alerts (stocks, news, auctions) – Location-based (geo-driven) – Event-based (calendar entires such as travel alerts, flight delays, etc.) • Can learn much more about user intent and hence eventually more promising for advertising Research 24 Implications and Challenges • Task-orientation • Specialized content packaging • Locality Inference from queries • Locality Inference from device (LBS) • Minimize typing and round-trips: get results, not just links – Less room to display SERP + other accessories • Monetization strategies to fund this model still not decided – Advertising – Subscription to “premium services” – Revenue share on “leads” – Pay per usage of special high-value areas In the meantime, the web, and Search are evolving… Research 25 Even Larger Challenges • Modeling Social Media and use of mobile in social settings on the go – Understanding UGC – Classifying, categorizing, organizing UGC and folksonomy • A different problem of search -- Semantics of content are critical, especially if we are to target – – – – – Intent Task-orientation Motion dimension (distance to target of search) Push and notifications Understanding the physical world (common sense): what is close? Business hours? Holidays? • Web Content growing, changing, diversifying, fragmenting • Truly leveraging the notification abilities and finding new everyday uses – far more versatile a space than PC • Long-term memory (state) for long-running tasks and queries Research A Tale of Two Search Engines Research 26 27 Algorithmic results =Audience -$ Research Advertisements =Monetization +$ 28 Algorithmic vs. Ad Search • Analogous to classical separation of editorial vs commercial content • Technical underpinnings: – Some commonalities (IR, ML) – Many differences (incentives, spam, mechanism design) Research 29 The two engines Sponsored Links CG Appliance Express Discount Appliances (650) 756-3931 Same Day Certified Installation www.cgappliance.com San Francisco-Oakland-San Jose, CA User Miele Vacuum Cleaners Miele Vacuums- Complete Selection Free Shipping! www.vacuums.com Miele Vacuum Cleaners Miele-Free Air shipping! All models. Helpful advice. www.best-vacuum.com Web Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds) Miele, Inc -- Anything else is a compromise Web spider At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... www.miele.com/ - 20k - Cached - Similar pages Miele Welcome to Miele, the home of the very best appliances and kitchens in the world. www.miele.co.uk/ - 3k - Cached - Similar pages Miele - Deutscher Hersteller von Einbaugeräten, Hausgeräten ... - [ Translate this page ] Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit ...ein Leben lang. ... Wählen Sie die Miele Vertretung Ihres Landes. www.miele.de/ - 10k - Cached - Similar pages Herzlich willkommen bei Miele Österreich - [ Translate this page ] Herzlich willkommen bei Miele Österreich Wenn Sie nicht automatisch weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERÄTE ... www.miele.at/ - 3k - Cached - Similar pages Search Indexer The Web Research Indexes Ad indexes 30 1995: The Yahoo! Directory • Apply human expertise and editorial to organize web sites • What worked – Practical, Navigable – Trustworthy, Authoritative • What didn’t – Scalability – Granularity – Etc. Research 31 1995 : Altavista (Inktomi, Lycos, etc.) • Automate the process of acquiring pages; use “information retrieval” techniques to return pages that contain a particular term • What worked – Scalable (query for “IBM” returns 40M pages) – Simple – Granular • What didn’t – Scalability a double-edged sword – Ranking and relevance poor – Not authoritative (spam, irrelevance, etc.) Research 32 c. 1999-2006: PageRank (Google, Yahoo) • Use topology (link structure) of the web to confer authority • What works – Relevance is greatly improved – Navigational query is born (query for “IBM” gets me to ibm.com) • What doesn’t – Homogeneity of results (no personalization) means no “subjective” queries – webmasters vote by proxy for everyone – and their answer is the only answer – System easily “gamed” by spammers – leads to arms race Research 33 Meanwhile, On the Money Front… • Sponsored search ranking: Goto.com (morphed into Overture.com Yahoo!) – Your search ranking depended on how much you paid – Auction for keywords: casino was expensive! • 1998+: Link-based ranking pioneered by Google – Blew away all early engines except Inktomi – Great user experience in search of a business model – Meanwhile Goto/Overture’s annual revenues were nearing $1 billion • Result: Google added sponsored search “ads” to the side, independent of search results – 2003: Yahoo follows suit, acquiring Overture (for paid placement) and Inktomi (for search) • The Monetization Mechanisms… Conversion of marketplace machanisms in 2007 Research 34 Search query Ad Research 35 Questions for the audience • Do you think an “average” user, knows the difference between sponsored search links and algorithmic search results? • Do you think an “average” user knows there are sponsored links on the page? • Do you think a user knows where a sponsored link would navigate to upon a click? Research 36 How it works Advertiser I want to bid $5 on canon camera I want to bid $2 on cannon camera Ad Index Sponsored search engine Engine decides when/where to show this ad. Landing page Engine decides how much to charge advertiser on a click. Research 37 Engine: Three sub-problems 1. Retrieve ads matching query 2. Order the ads 3. Pricing on a click-through Research IR Econ 38 Ads go in slots like these Research 39 Higher slots get more clicks Research 40 2. Order the ads • Most generally, composite IR+Econ score … for today’s talk, focus on Econ • Original GoTo/Overture scheme: – Order by bid Research 41 Economic ordering • Bid and revenue ordering: two forms of ordering by an econ score • Does revenue ordering maximize revenue? • No – advertisers react to ordering scheme, by changing their bid behavior! • Lahaie+Pennock ACM EC 2007 – Family of schemes bridging Bid and Revenue ordering – Game-theoretic analysis Research Edelman, Ostrovsky, Schwarz 2006 42 A new convergence • Monetization and economic value an intrinsic part of system design – Not an afterthought – Mistakes are costly! • Computing meets humanities like never before – sociology, economics, anthropology … Research Towards Getting Things Done… vs. Searching Research 43 44 Example Start I want to book a vacation in Tuscany. Research Finish 46 Loved the vacation, want to make that sweet Italian coffee at home Research 47 Trends in task complexity • Dawn of search: – Navigational queries – Pockets of information • Today: – Increasing migration of content online – New forms of media only available online – Infrastructure for payments and reputation sufficient for many users Research 48 Things to notice • Long-running user goals • Search as hub: – start there – return for resource discovery and at task boundaries – traverse the web broadly to complete task • Web services integrated into task Research Content Growth Research 49 50 Content trends [Ramakrishnan and Tomkins 2007] Research 51 Metadata trends [Ramakrishnan and Tomkins 2007] Research Content Complexity Research 52 53 Content ownership • Content consumption is fragmenting – nobody owns more than 10% of WW PVs • No single place will own all the content • Best of breed processing will operate on the web version (?) • Value transitions to ecosystem Research 54 Content access is fragmenting Research 55 Content itself is fragmenting Research 56 Evolution of Social Media • Although the “traditional notion” of portal and web content is still attracting growing audiences • The original notion of “publishing content” to attract audiences is changing fast – As people discover the fact that the Internet is an Interactive Medium – The uses of the Internet enter areas we could not imagine a short time ago • A new notion of “publishing” is fast emerging – The opportunity of user-generated content Research 57 Challenges in social media • How do we use these tags for better search? • What’s the ratings and reputation system? • How do you cope with spam? • The bigger challenge: where else can you exploit the power of the people? • What are the incentive mechanisms? Research The Search Interface Evolution is starting Research 58 59 What does this mean for search? • Few changes through 2005 • Entering period of massive change to handle more complex content • Rich media, aggregation, simple task analysis, etc • Moving beyond the stateless query/response paradigm • Personalization theory Research 60 Rich media and search assistance Research 61 Structured aggregation Research 62 Simple task-focused queries Research 63 Google Base Research Open Ecosystems 64 65 Structured data on the Web • Structured databases power a vast majority of pages on the web – Certainly ecommerce catalogs etc – But also user generated content (eg blogs) • Content owners open to exposing structure, but don’t see how and why – Microformats adoption at an all-time high – Yet, it’s produced much more than is consumed • Experiments with “pure” structured data aggregation have met with mixed success – Google Base, Freebase, even Co-op Research 66 What have we announced? • Yahoo! Search Monkey: API for publishers to push metadata and structure to search engine • Wide-ranging support for semantic web standards • Vocabulary to surface structure and semantics • Community Tools to evolve standards and vocabulary Research 67 Search as Killer App for Data Web • Publishers and search engine collaborate • Users see richer search experience • Accomplish their tasks faster and more effectively • Example: abstracts surfacing structured content Research 68 Search results of the future yelp.com Gawker babycenter New York Times epicurious LinkedIn answers.com webmd Research 69 Search results of the future yelp.com Gawker babycenter New York Times epicurious LinkedIn answers.com webmd Research 70 Comprehensive support for emerging semantic web standards ++ • Microformats – hCard, hEvent, hReview, hAtom, XFN – More as they get adopted • RDFa and eRDF markup • OpenSearch – +extensions to return structured data • Atom/RSS Feeds – +extensions to embed structured data Research 71 Vocabulary to surface structure • ‘dataRSS’ provides a common framework for embedding structured data – Use with RDFa, eRDF or OpenSearch – Preferred Vocabulary includes • Atom, Dublin Core • Creative Commons • FOAF, GeoRSS, MediaRSS • RDF, RDFS, RDF Review • vCal, vCard Research 72 Community Tools • We’re seeding the Vocabulary and Standards Support • We’ll evolve both of these with the help of the Web Community • Yahoo! Groups: used to collect contributor and community suggestions, feedback, etc… • Suggestions Board to vote on changes Research 73 Implications for publishers? • Yahoo! open search platform does not modify ranking • Richer abstracts may provide more information to users and draw higher quality/quantity of clicks • We want rich abstracts that give users a better experience – We don’t want misleading abstracts Research 74 The whole story • User needs becoming more complex • Content growing, changing, diversifying, fragmenting • Search responding by increase in sophistication • Value migrating to ecosystem • Unlock the value by enabling interoperability – expose semantics Research 75 Subjective Queries The kinds of queries that rely on domain expertise… • “Do you know a reputable plumber in Atlanta?” • “Where is the cool nightlife in Soho?” • “What political blogs do you think I’d enjoy reading?” • “Where can I buy a cool pair of boots?” These kinds of queries are ill-served by today’s search engines, but are ironically the most valuable (i.e. transactional queries.) Research 76 Research 77 Research 78 Research 79 No definitive answer Unverifiable answer Community consensus Research 80 Incentives Legitimate? Research 81 Where is the Science? • Which questions are legitimate? • What is the incentive system? • How do we validate answers? • What is the role of the community? • What is the reputation system? Research 82 What are the challenges? • Community of users – Social system • Incentives and reputations – Economic system • Poorly phrased, grammatically limited queries – Language analysis • Improving user experience from past data – Data mining Research Back to Business These are early days… Research 83 84 Advertising: Brand and DR Knowledge of users & their behavior throughout the purchase funnel can grow brand & direct response revenue Awareness Consideration Purchase > $200B Brand Advertising Market Most time & activity is in consideration & engagement, but there are limited metrics & reach strategies > $200B Direct Response Market Research A question for the Audience: Why is search-related advertising so powerful? 85 86 It is all about Inferring User Intent • User type 2.8 keywords – Note the non-sense use of average – Average query returns > 600K matches! • We get an idea of intent • Coupled with immediacy (recency) – an amazing matching engine – 10x to 100x click through rate over banner ads Research Do I know this user’s intent? Research 87 88 Brand Ads and Search Ads Interact! • Is ad search strategy enough for a direct marketer? • Do brand ads play a role in search advertising? • Harris Direct Case Study Awareness Consideration Purchase Research 89 Case Study: Harris Direct Viewing These Ads: Had This Effect On: • Aided Brand Awareness – Up 7% • Brand Favorability – Up 32% • Purchase Intent – Up 15% On: Research 90 Case Study: Harris Direct People who saw display ads were 61% more likely to search on related topics… …and drove 139% more clicks on algorithmic and sponsored links… …specifically driving 249% more sponsored search clicks … …and driving 91% more activity on the HarrisDirect.com website. Research Yahoo! Research Inventing the new sciences of the Internet Research 91 92 New Science? • The Internet touches all of our lives: personal, commercial, corporate, educational, government, etc… • Yet many of the basic notions we talk about: – Search, Community, Personalization, Engagement, Interactive Content, Information Navigation, Computational Advertising – Are not at all understood, or well-defined – These are not disciplines that academia or any industry research labs focus on… Research 93 Areas of Research • Information Navigation and Advanced Search – We are in the early days of search and retrieval – Inferring intent – New ways of extracting entities and objects • Community: – – – – How do you know what to believe on the Internet? Trust models on-line and trust propagation What makes communities thrive? Whither? Social media, tagging, image and video sharing • Microeconomics: a new generation of economics driven by massive interactions – Auction marketplaces – The web as a new LEI of activities and economies • Computational Advertising – Targeting and matching sciences, Inferring user intent – Pricing models (CPM, CPC, CPA, CPL, etc…) – Large-scale optimization and yield management Research 94 Concluding Thoughts (1) • The notion of “corpus” and publishing is changing fundamentally • We still do not have the basic sciences to understand what is happening and what needs to happen to combine the new capabilities • The problem of mobile search is different, but poorly understood • The web is changing, content sources are fragmenting and changing – the source distribution is radically changing – Publisher – consumer divide is becoming fuzzy • Search engine interface is finally changing to adapt – Much of the change came from worrying about mobile search Research 95 Concluding Thoughts (2) • The view that Search is everything is LIMITED (at best) – Economics of publishing and advertising – Users do not differentiate ad and content – Behavioral data is the most powerful – “Nothing predicts behavior like behavior” • Monetization and economic value an intrinsic part of system design – Not an afterthought – Mistakes are costly! • Computing meets humanities like never before – sociology, economics, anthropology … • A more holistic view of Search and Information Navigation is needed Research Thank You! Research & Questions? Usama_fayyad@yahoo.com 96 97 No time to cover today • Micro-Economics of the Web – Auction marketplaces – Marketplace and Exchange Design – The economics of Engineering IT Decisions • Computational Advertising – Targeting and matching sciences – Inferring user intent – Pricing models (CPM, CPC, CPA, CPL, etc…) – Large-scale optimization and yield management Research