Web Content Development Dr. Komlodi Classes 20-21: Search systems Web Searching • Search within your site: – Full site or subsites – www.jhu.edu, www.umbc.edu • Web search: – Search indexes of web pages – www.google.com, • Metasearch: – Searching across multiple search engines – clusty.com, www.dogpile.com, www.myriadsearch.com • Web Search Engine Watch: http://searchenginewatch.com/ Does your site need a search? • Pp. 145-148 1. Sufficient content 2. Sufficient resources 3. Time and know-how to optimize system 4. Better alternatives? 5. Will users bother with it? 6. Too much information to browse 7. Fragmented site 8. Learning tool 9. User expectations 10. Dynamism • Post bullets on Blackboard discussion board Why should an IA worry about search? • You know the users • Many decisions should be user-centered and not technology-centered • It has an interface How does search work? ©2004 Google Source: http://www.google.com/technology/pigeonrank.html How does search work? Documents Searchers Search Engine Search Interface Queries Indexes Matching Results Indexing (manual or automatic) How does web search work? Documents = Web sites & pages Searchers Search Engine Search Interface Queries Indexes Matching = queries to search engine indexes Results Indexing = Automatic, spiders & robots crawl websites and index pages according to their own rules. As a result, they build large databases containing the indexes. How does search work? Documents Searchers Search Engine Search Interface Queries Indexes Matching Results Indexing (manual or automatic) What to search on… • All the content? • Determining search zones • Site search: – Subsite – Type of document • Web search: – Multimedia and heterogeneous • Full-text or metadata • Types of indexes How does search work? Documents Searchers Search Engine Search Interface Queries Indexes Matching Results Indexing (manual or automatic) Indexing by • • • • • • • Navigation vs. destination pages Audience or Reading level Topic Date of update Author Title User task What would the index look like? Full Text Indexing • Take out frequent words from documents • List the rest of the words from each document • May add frequency numbers to each word • Search the lists of words What would the index look like? Indexing Languages • An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents. • An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms. Web Search Engine Indexes • The larger a web search engine’s index is, the more web pages it can return and the more types of queries it can accommodate • However, quantity is just one measure of performance • How to compare: • http://www.google.com/help/indexsize.html • Try this! How does search work? Documents Searchers Search Engine Search Interface Queries Indexes Matching Results Indexing (manual or automatic) Search Interface • Shneiderman, Byrd, Croft, Clarifying Search, DLib, 1997 • Formulation: – – – – • • • • Sources Fields What to search for Variants Action Review of results Refinement Let’s see Google’s advanced search Query Format • Boolean: – Good for advanced users – Precise and clear why you go results back – Need to understand syntax • “Natural Language” – – – – Good for difficult questions when you can’t think of terms Or novice users Difficult to know why certain results come back Black box • Relevance Feedback – User selects relevant items from results – Search engine consider these in reformulating query • Similarity Retrieval – Similar to relevance feedback – “I want more like this” – Both are good if you don’t know what exactly you are looking for Boolean Natural Language Relevance Feedback Source: http://nayana.ece.ucsb.edu/imsearch/imsearch.html Accessed January 2007. Relevance Feedback Relevance Feedback Similarity Retrieval Other Query Building Tools • Citation networks: – This page/paper is citing/linking to? – This page/paper is cited by/linked to? – What other papers/pages cite/link to the same papers/pages? – http://portal.acm.org/dl.cfm • • • • Spell checkers in queries Phonetic tools Stemming tools Controlled vocabularies How does search work? Documents Searchers Search Engine Search Interface Queries Indexes Matching Results Indexing (manual or automatic) Matching • Boolean – AND, OR, NOT • Probabilistic • Vector model (calculate weights of words) • Natural Language – process the query as well, match lists How does search work? Documents Searchers Search Engine Search Interface Queries Indexes Matching Results Indexing (manual or automatic) Results Presentation • How many? • How much information about each item? • What can users do with each item? • Presenting results by categories Evaluation of Search Engines Your book is wrong on page 159!!! Recall: Relevant retrieved documents All relevant documents in collection Precision: Relevant retrieved documents All retrieved documents Copyright Dr. David Grossman, Source: http://ir.iit.edu/~dagr/cs529/files/handouts/01Introduction-6per.PDF Within-Site Search Bloopers 1 1. Baffling search controls. Search options require knowledge of computer or industryinsider concepts. 2. Dueling search controls. Competing search boxes on page, with no guidance. 3. Hits look alike. List of found items cannot be easily distinguished by scanning. 4. Duplicate hits. List of found items contains duplicates. 5. Search myopia: Missing relevant items. Items that should be found are not. http://www.web-bloopers.com/ Within-Site Search Bloopers 2 6. Needle in a haystack: Piles of irrelevant hits. Many items don’t match search criteria. 7. Hits sorted uselessly. Sort-order of found items doesn’t support user tasks. 8. Crazy search behavior. Modifying search criteria yields unexpected results. 9. Search-terms not shown. Not showing what search terms produced these results. 10. Number of hits not revealed. Not showing how many items were found. http://www.web-bloopers.com/ Search User Interface Design Recommendations 1 • Put a simple, reasonably long search field on every page of the site. (Nielsen: min. 27 characters long) • Use simple words to explain the process: remove all jargon and technical terms, and make sure that any icons have labels. • Avoid inventing a new interface, which will confuse users: take the best of the formats of the large public search engines • Make the search forms and results pages fit into the overall design of the web site: they should use the same colors, fonts and so on. http://www.searchtools.com/info/user-interface.html Search User Interface Design Recommendations 2 • Include site names and navigation links into results pages, so users can see the context and structure of the site. • Set up a special page to be displayed when the search does not find any matches in the index • Avoid surprises: clarify all automated search features, such as stemming, phonetic matching, thesaurus lookups and stopwords http://www.searchtools.com/info/user-interface.html How Search Should Work PWU Ch5 • Follow the standards of the large search engines: – Search box (min. 27 char-s) and a button in the top right corner of the page – Search box on every page – Linear results in order of relevance • Users expect search to be a keyword search and not other types of searches (by types of clothing, size, season, etc.) • Advanced search should be a secondary option or omitted • Scope search useful is you site has distinct sections • Do not default the search to a scope Search Engine Results Pages (PWU Ch5) • Copy the design of major search engines • List results in relevance order but no need to show measure of relevance • If appropriate, allow users to re-sort results • Each result should start with a clickable headline • Follow headline by 2-3-line summary • Include a search box with the user’s query in it to make query reformulation easier Design of No-Matches Pages • • • • • • • Site Context and Navigation Instead of a bare page saying that the search failed, show the standard site layout, including background colors, logos, text and link colors, and navigation links. If you have a site map or Yahoo-style directory for your site, include it in the no-matches page -- otherwise you may want a statement of the site scope. That provides a positive way to help people understand what is available, and browse if they choose. Search Again Field Make sure there is a Search field, so people can try a different search. Don't make them click a link or otherwise take an extra step to search again. Suggested Wording Include some text that explains why the search might have failed, and what people can do next. This list is carefully worded to be positive and helpful, rather than blaming the user for the search failure. For example: Your search returned no results. Try broadening your search (from heart attack to heart disease) or adding additional terms (from high blood pressure to high blood pressure or hypertension). http://www.searchtools.com/guide/nomatches.html Search UI Design Exercise • Work in pairs • Select an imaginary website • Design on paper: – A homepage with a search box – A search results page – A no-hit page Search Engine Optimization How do search engines find you? • Search engine optimization: – Changing your site to improve the site’s ranking in search results • Search engine submission: – To submit your site to search engines to make sure the engines know about it • Search engine marketing/promotion: the process of submitting (free or paid) and search engine optimization • http://blog.searchenginewatch.com/090402110851 (From 2:12) Search Engine Submission • Yahoo’s human-compiled directory listings (http://help.yahoo.com/l/us/yahoo/ysm/ds/index.html): – Crawlers look at those pages – Free for normal review – $299 for expedited review and commercial listing (no guarantee of listing) • Google: – Free but not guaranteed (http://www.google.com/addurl/?continue=/addurl) – Or use AdWords for payment (http://www.google.com/ads/) • Yahoo ads submission: – Yahoo sponsored search (http://searchmarketing.yahoo.com/arp/sponsoredsearch_ss.php?o=US 1806&cmp=SYC&ctv=&s=Y&s2=S&b=25) – Pay by the number of clicks Search Engine Optimization • Linguistic SEO: – Research what words users use for your content: • Search engine logs, user testing, support calls, discussion forums – Use those words to describe your content on your pages and in the metadata • Architectural SEO: – Make sure your important content is text – Make sure your linking structure leads search engine indexing crawlers to important content • Reputation SEO: – Make sure other sites link to you Search Engine Optimization • Study your guideline • Create a few bullet points to describe your guideline and post them on the discussion board • Sources: • http://searchenginewatch.com/webmasters/articl e.php/2168021 • http://searchenginewatch.com/webmasters/articl e.php/2167931