Search and the ‘Net in 2015 Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Public Library Staff For today . . . The Searchscape Behind the Screen: Current Web Search Developments New Services The Social Web and Research Bing, Yahoo and DuckDuckGo Google Free Digital Collections Linklist http://people.hws.edu/hunter/searchnet15links.htm E-Reading Rises as Device Ownership Jumps By Kathryn Zickuhr and Lee Rainie http://www.pewinternet.org/2014/01/16/e-reading-rises-asdevice-ownership-jumps American adults 18+ - % who read at least 1 book in that year American adults 18+ - % who own each device New Top Level Domains First made available 1/29/14 Over 150 now live on donuts.co (2/15/15) Content-significant .bike, .energy, .delivery, .legal, .guru Brand-specific – “vanity domains” .android, .walmart, .nyc Allow for non-roman scripts –Arabic, Chinese etc. Require proof of identity/relationship to TLD Unique TLD costs $185,000 Growth of Query Types over 1 year http://searchenginewatch.com/sew/how-to/2383498/how-willvoice-search-impact-a-search-marketers-world Web Access in 2015 Mobile has outpaced Desktop http://www.comscore.com/Insights/Presentations_and_Whitepaper s/2013/The_Digital_World_in_Focus Web Search in 2015 Who’s crawling the Web? Google Bing (aka Yahoo!) Gigablast DuckDuckGo Baidu Yandex Market Share Growth Oct. 2013– Oct. 2014 www.comscore.com 80 70 60 50 40 2013 30 2014 20 10 0 Google Bing Yahoo! Ask AOL Behind the Screen: WAY beyond matching keywords Semantic Processing Internal to the search engine Predictive Operations From data about and from the user ANSWERS, NOT JUST SEARCH RESULTS Semantic Processes NLP Parsing Knowledgebase Entities Term Frequency Data Pattern Matching Structured Data NLP Parsing Machine-learned meaning derived from human or natural language speech or text. (Adapted from Wikipedia) Analysis of large sets of documents (corpora) that have been human-annotated with parts of speech and other semantic information Machine “learns” the relationships and meaning through statistical inference Visualization at http://nlpviz.bpodgursky.com Knowledgebase Entities Google’s Knowledge Graph – Bing’s Satori Google’s Knowledge Graph – rooted in the (human) community-created entities in Freebase Crowdsourcing too slow; often ignores specialized areas of knowledge, non-English content Knowledge Vault – Automated extraction of raw data and creation of entities derived from that data DOM trees-structures that help browsers represent and interact with documents in html and other formats (Wikipedia) More Semantic Processing… Term Frequency Data Frequency, proximity, order Aids in discovery across subject areas, filetypes and entire domains Pattern Matching Algorithms Focuses on recognition of patterns and regularities in text, data and images Structured Data Structured Web tables and data sets (.xls, .kml, .sdf) Human created tags – Schema.org Predictive Operations: Inferring the user’s intent “The Holy Grail of Search” Location-based results – IP and GPS Weather, entertainment, restaurants….. Anonymous past searches and user behavior Personal data volunteered by user Time of day Device used Semantic Processing Predictive Operations --Correctly interpret the query, or a portion of the query --Give a “best guess” answer based on highly trusted sources (knowledgebase) and similar searches --Aggregate and grow the knowledgebase through iterative, real-time web crawls Discovery Apps: Personalized Search on Steroids Combines your Personal preferences Location Demographic characteristics Social network data People, Preferences, Interests, Events Suggests entertainment, restaurants and more Chat with your social network friends “Current events you may like within X miles” Gravy – Free on I Tunes Personal Assistant Apps Connects to your E-mail Calendar Facebook events Prompts for transportation times, quickest routes Includes some discovery and chat features Relies heavily on user-supplied personal data Sunrise, Tempo, et. al. Apps and the Deep Web Currently crawler-based S.E.’s cannot access content in apps unless the app allows it to. Posts Links Personal data User must have the app loaded in order to access content, even if it appears in the S.E. Education apps continue to grow in content, quality and use Google is working on it….. New Services Qwant A fresh approach to search Aims to offer a European-based service that respects user’s privacy No cookies or other tracking of user's search behavior No filtering of content unless user-initiated Launched in France in 2013 Search verticals offered: Web News Social Images Videos Shopping Boards (Online Forums, mostly European) 16 interface languages, which influence search results Instya meta engine www.instya.com Launched April, 2015 Results from each source appear in their own browser tab Sources include Web (7) Image (8) News (11) Video (7) Shopping (11) Dictionary (14) Answers (8) Social (11) Domain search offers website data, analysis 7 Backlink sources 6 Website stats 10 Domain information sites CC Search search.creativecommons.org/ Searches media in the public domain Flickr, YouTube, Jamendo, Wikimedia Commons, SoundCloud and others….. Some sponsored results appear that are not in the public domain Verify use conditions for each result Search and the Dark Web Dark Web- Networks with server addresses intentionally obscured Often house online criminal activities Includes TOR Networks Hidden Services have .onion TLD Only accessible via TOR’s private browser Content not PW protected, but not accessible to crawler-based services due to lack of linkage Memex DOD’s Dark Web Search Engine Software to visualize and organize big data Searches text, handwritten text, images, geographic data embedded in photos…. Identifies hidden relationships among websites, deep web sites and forums Can access Dark Web obscured networks Used in online criminal investigations Sex-trafficking ads ISIS-funding and other money laundering Contact memex@darpa.mil http://www.wsj.com/articles/sleuthing-search-engine-even-better-than-google1423703464 The Social Web and Research Why search the social web??? Public responses, attitudes, opinions Breaking news, events Trending topics and people Latest product reviews First-hand accounts of events-text, image, audio, video (primary sources) Security, technology topics (latest virus, etc.) Locate individuals/experts and their networks People interested in a topic/hobby Social web research projects BuzzSumo - meta for social networks Discovers the most shared content Crawls FB, TW, LinkedIn, Pinterest, Google+ Backlink and sharer data for 20 or more instances Advanced search features Boolean Author search URL or domain search Twitter user search Filters Article Giveaways Infographic Interviews Guest Post Videos Date Requires (free) account; other fee-based options Twitter Search - search.twitter.com Now includes every public Tweet since 2006 Searchable with all search features previously available at twitter.com/search-advanced Indexes ca. ½ trillion tweets, and grows by several billion tweets a week. Tweets deal with “everyday human experiences to major historical events” Entire TV, sports seasons Conferences Places Events Industry discussions Long-lived hashtags across countries, ideologies #ScotlandDecides #HongKong #Ferguson #Hamas TW as social indicator and health predictor – Upenn study Linguistic and emoticon analysis of geo- tagged tweets combined with health data from over 1,300 US counties Tweets expressing negative emotions-stress, anger, fatigue-are associated with higher heart disease risk Tweets with positive emotions-optimism, enthusiasm-are associated with lower levels of risk http://www.upenn.edu/pennnews/news/twitter-can-predict- rates-coronary-heart-disease-according-penn-research Education and the social searchscape Offers first-hand accounts of events and conditions Informative of current world cultures and trends on a wide range of subjects Gateway to blogs and other online communication that can enhance scholarship Channel for updates to educational programs Embedded links and other information often highly relevant and recent Requires careful evaluation of information found there Bing, Yahoo and DuckDuckGo Looking for a niche Bing and Yahoo represent 29% of all US searches http://comscore.com 12/1/14 Yahoo Focus is on local and personalized search results Now partnered with Yelp, local business search engine Bing Focus is on lifestyle, travel, images, maps Social search results (FB, TW) in a sidebar Bing Image Search High quality images Related search offered, based on descriptive text associated with the image Clustering by topic Filters Size Color Type Layout People Date License SafeSearch Image Match with a URL or image you upload DuckDuckGo http://ddg.gg Offers anonymous search functionality Popularity spiked after NSA PRISM search engine scandal Does not save search history of any type G. does, using it "to increase relevancy" Included as a search option in Apple's latest version of Safari Has been blocked in China !!! Google Knowledge Vault Beyond the Graph….. Knowledge Graph seeded from Freebase entities and human additions Automated generation of entities increases number and discovers hidden relationships among entities and their attributes Entities now appear at top of results page with related topics or other relevant information Type of additional information varies depending on entity Right to be Forgotten ruling EU's European Court of Justice, May 2014 G. and other search engines must remove results deemed to be "inadequate, irrelevant or no longer relevant, or excessive in relation to the purposes for which they were processed and in the light of the time that has elapsed." http://curia.europa.eu/jcms/upload/docs/application/pdf/2014-05/cp140070en.pdf Does not require them to be removed from the servers on which they are located Makes the content more difficult to find Of the initial 12,000 removal requests 33% - fraud accusations 20% - related to violent/serious crimes 12% - related to child pornography arrests App indexing G. currently indexes content from apps that open their content to G's crawlers Results from apps are combined with mobile search results if the searcher has that app installed on their mobile device. Agawi - streaming technology that breaks apps up into small files, allowing users to access content in the app while the full app is loading. (Similar to YouTube's streaming video technology) G. acquired Agawi in the fall of 2014 Google’s device-dependent results sets The intent and context of queries varies between devices G.'s search results on mobile devices vary from those on desktops or laptops by as much as 43% Mobile results Tend to focus more on local-based results Display pages with smaller file size, on average Based on analysis of first 30 results for 10,000 keyword searches “US Google Ranking Factors 2014” http://www.searchmetrics.com/news-andevents/mobile-optimization/ Maps Gallery, In-depth articles Interactive digital thematic map collections Historic city plans Climate trends Housing affordability Shipwrecks Up-to-date evacuation routes In-depth articles caveat "How to write the In Depth Articles that Google Loves" copyblogger.com Content farm orientation? Requires careful evaluation of each item; unvetted websites in particular Google's tech projects Google for Kids - under 13; more parental controls Project Loon - Provide Web access via solar-powered drones Self-driving cars Google Glass 2 Smart contact lenses Continuous health monitoring via disease-detecting nanoparticles Liftware - stabilized spoon for tremor sufferers "Google Tracker 2015" http://arstechnica.com Search in the Future Will continue to be more specialized Shopping - Amazon Travel - Kayak Movies - IMDB Real-time news - TW Discovery software will integrate more diverse types of data, crowdsourced to expert Semantic processing and predictive search will grow Social web will increase as a tool for social change Search engines will be challenged by governments worldwide in the areas of commercial monopoly and individual privacy Thank You and Enjoy Your Searching! Michael Hunter Reference Librarian Hobart and William Smith Colleges Geneva, NY 14456 (315) 781-3014 hunter@hws.edu