IBM Software Group Thought Leadership Whitepaper IBM Customer Experience Suite and Enterprise Search Optimization 2 IBM Customer Experience Suite and Enterprise Search Optimization Introduction Delivering an effective, dynamic web presence is increasingly important to organizations seeking to showcase and distinguish their brand versus their competitors. Acknowledging the continued proliferation of internet-based services and data, business and IT leaders prioritize capabilities that drive differentiated web experiences that attract and retain web end users, improve brand loyalty, and lower operational costs. Further, our open, socially connected Web 2.0 world constantly delivers new information from data and community based sources. In order to be useful, such information frequently requires interpretive analysis, classification or other management to be identified as relevant to current business situations. Business leaders have long recognized the value of accurately assessing and utilizing information sources inside and outside the organization. Numerous studies show that organizations can realize productivity gains and reduced costs by implementing effective search strategies. In presenting high quality, relevant search results, end users participate in an exceptional web experience that delivers precisely the information, capabilities and resources the user finds valuable and useful. In addition to increased web site audiences, and productivity gains, organizations make better use of their knowledge assets (community and data based), improve customer service operations and strengthen customer relationships. According to the recent report “The New Voice of the CIO – Insights from the Global Chief Information Officer Study”1 the ability to extract the highest value from data to enhance the customer experience was ranked among the top ten innovations that Insightful Visionary CIOs aim to implement. Leading the Way in Exceptional Web Experiences Introducing IBM Project Northstar To help organizations provide exceptional web experiences, IBM has introduced Project Northstar – IBM’s vision and multi-year roadmap for how organizations can create differentiated, exceptional web experiences. IBM Project Northstar provides a new way forward, bringing together the right combination of capabilities needed to deliver compelling online experiences, including web content management, an enterprise portal and mashup presentation framework, built-in social and real-time communication features, search, personalization, marketing tools, comprehensive integration capabilities, mobile device support, analytics, commerce, and rich media management. IBM Software Group 3 Together, these capabilities help organizations create differentiated, exceptional web experiences that attract and retain the best customers, improve brand loyalty, increase customer satisfaction, and lower operational costs. • For the next several years, IBM Project Northstar will serve as the guiding light for IBM’s investments, including both in-house advancements, partnerships, and even acquisitions. But while the roadmap is rich and exciting, there are many IBM customers that use IBM’s suite of web experience offerings to deliver killer online experiences today! Delivering Exceptional Web Experiences with Optimized Search Results IBM Customer Experience Suite – the Centerpiece Offering of Project Northstar In order to help customers meet their exceptional web experience goals in the most efficient and flexible manner possible, IBM has recently released a new offering that is designed to help organizations create highly engaging, personalized, and differentiated web experiences. This new offering is called IBM® Customer Experience Suite, and it is the centerpiece offering in the IBM Project Northstar vision. • The Basics Applying well orchestrated search service definitions, indexing and presentation practices can ‘optimize’ web site information when retrieved by external search engines, and enhance the visibility of a web site or a web page in search engines results. These efforts are often referred to as ‘organic’ search, resulting from editorial efforts. Organic search is distinguished from ‘paid search’,or Search engine marketing, (SEM) that seeks to increase web site presentation in search results through paid placements. Typically two steps are involved to attract web user traffic, retain and win new end users and customers: • With the IBM Customer Experience Suite, organizations can: • • • • Create highly personalized customer interactions by analyzing and then adapting to the preferences, behaviors, location, products owned, device, and sentiments of each visitor. Support conversations with and between customers through online communities. Empower business owners to manage the creation and delivery of content, rich media, and campaigns. Deliver rich and engaging experiences without sacrificing flexibility, scalability, or security Compose seamless Web experiences by connecting into the necessary back-end applications, commerce solutions, social media sites, and cloud-based services. Offer consistent experiences across multiple online channels. • Use Internet Search to attract web users to your website Use Site Search to once web users are at your web site to deliver relevant, contextual results from initial and subsequent search queries. When using internet search, research has shown that in external search engine results, visitors view their results for an average of 6.3 seconds before clicking on a link. The ability to optimize web content in order to be listed in the first three externally presented results, or top two ads, can directly impact organization’s ability to present their organizational brand and attract and retain new web site audiences. 4 IBM Customer Experience Suite and Enterprise Search Optimization Page ranking as calculated by external search engines generally follows these analyses: • • • • • PR(A) is PageRank of document A PR(Tn) are PageRank of document Tni, which includes a link to document A N is the count of qualifying documents C(Ti) total count of links on page Ti and d is a confidence value, where 0 ≤ d ≤ 1 (The confidence value ‚d‘ is defined as ‚0.85) For example, If this page had a PageRank score of ‘2’, and also 100 links on it, then a referenced (linked) page would only receive 2/100ths of its PageRank score The ability to successfully connect high quality information delivery with direct relevance to search queries requires the development of useful content and information that matches search queries. As the impact of community evaluations further influences decision making, search results should be interesting enough for others external to the web site to reference from their web pages. Content attributes that can usefully influence web site page results returned from search engine queries include: In striving to create and promote high quality web content that achieves ‘high rankings’ from search engine results, a few misconceptions exist as to how to improve the source content information for improved delivery. Techniques such as overuse of content metadata tags (where multiple (unrelated) keywords and descriptions may be associated with content or image definitions )often do not consistently or usefully increase relevance of content to search results. This can also include search engine application of HypertextMatching Analysis which analyzes the full content of a page and also analyzes the content of linked local pages. Experience has shown that identifiers such as “title” and “description” are most useful for to drive presentation of web site content that is relevant to initial search results: • • Page or document relevancy in which the search term appears frequently in the title and content, calculated as follows: term frequency times inverted document frequency tf x idf IBM Software Group 5 When searching the portal site, the search engine can fetch and index all pages with portlets dynamically, presenting search results to users in the context of their specific page and portlet configurations. All search results adhere to the portal security model, so users can only find content and documents they are entitled to view via portal access controls. Security filtering is highly effective by applying both pre-filtering as well as post-filtering techniques. Optionally, users can define morespecific search criteria, called Search Scopes, to limit search results to specific content locations and specific document types. PageRank calculates link popularity; how important other Internet users think a specific page is. Search engines can apply multiple rules to calculate this information and relative weighting to help drive search results. Using IBM Customer Experience Suite to Manage and Optimize Search Results The Customer Experience Suite Enterprise Portal platform services includes built-in search capabilities that allow end users to search the portal content and any linked sites. End users can initiate a search from the out-of-box search form in the portal theme banner or from the Search Center page. When users enter a search term and click on search, the portal takes them to the enhanced Search Center portlet, which displays their results ranked by relevance across portal search supported sources. From there, users can optionally refine their search criteria and also extend them to other sources., including Content Management software, Lotus Quickr document libraries, Lotus Connections, and IBM OmniFind® Enterprise Edition software–supported search sources, as well as content from the portal site and external web search sources. IBM OmniFind Enterprise Edition software, included in the offering, brings together enterprise Portal content, social software solutions such as Lotus Connections, enterprise content management, database and other enterprise information into a single, relevant and security-rich search interface. Figure: 1 IBM OmniFind Enterprise Edition supports advance search and text analytics options 6 IBM Customer Experience Suite and Enterprise Search Optimization Built on an extensible, open architecture, OmniFind Enterprise Edition software delivers rich enterprise search capabilities to help bridge the gap between an information need and the ability to take action. Figure 2: Which web page gains the most relevancy in external search queries? OmniFind Enterprise Edition software is designed to integrate with the platform Portal services to provide scalable and security-rich enterprise searches directly from within the familiar enterprise Portal interface. This integration provides several advantages over using the embedded enterprise Portal search engine, including the ability to scale to millions of documents, reach to non–web enterprise content sources and access the rich search functionality available in OmniFind Enterprise Edition software: synonym expansion, quick links, dynamic summaries, autocategorization of results and more. Steps to Ensure proper crawling of your website As the variety of content sources and associated definitions grow, new standards-based search integration paradigms are required for applications seeking to make their content/ information searchable. The Customer Experience Suite WebSphere Portal framework supports application of standards-based services to define search information in a format that is commonly utilized by external search engines, and to optimize the Portal’s web site content attributes to ensure high quality, relevant results. Approaches to Organize Web Site Search services and Improve Page Ranking results: Web Experience Solution managers can take several actions to establish content and search indexing policies to increase the quality and relevance of web site content as assessed and presented by external search engine sources. These focused approaches can arm organizations with greater potential to drive web used traffic to their web sites, and continuously extend relevant content and social services that can retain and win new web user audiences to their brand. These services are defined as Search Sitemaps, providing external search service entry point enumeration, and Seedlists, enumerating web site content attributes. These standards based services are different but complimentary in serving to maximize delivery of relevant services to external search engines and web site user audiences. The Portal platform, included search engine, out of box search services and applications, and support for the standards based Sitemap 0.90 format (www.sitemaps.org) provides organizations with new options and management controls to deliver high quality, relevant search results to internal and IBM Software Group 7 external user audiences. These same services and guidance can be applied to optimize web based content for search, retrieval and high rankings when viewed in results obtained from external search engines. These options include The Portal Search administration includes several search crawlers, and an IBM-developed search engine to provide an end-user search of the portal content and any linked sites. Search crawlers are provided to optimize crawl and index of search sources from internal or external Web sites, content published by IBM Lotus Web Content Management, Lotus Quickr document libraries, and IBM OmniFind Enterprise Edition supported search. • When searching the portal site, the Web Portal search services can retrieve and and index all pages with portlets dynamically, presenting search results to users in context of their specific page and portlet configurations. All search results adhere to the portal security model so users only find content and documents that they are entitled to view via portal access controls. • • • HTTP security and SSL definition options URL for the content source in a field and Collect documents linked from this Default Character Encoding; set the preferred language of the crawler user ID to match the language of the search collection that it crawls. The index uses this language to analyze the documents when indexing, if no other language is specified for the document. This feature enhances the quality of search results for users, as it allows them to use spelling variants, including plurals and inflections, for the search keyword. Collection status information shows index specific information such as index time, use of summarizer, categorization rules and stopword lists. Recommendations to optimize crawling and indexing of web site content: To ensure proper crawling of web site content sources by crawlers for optimal interpretation by external search engines, administrators should minimize use of URL definition re-directs, and restrict use of JavaScript to generate content or URLs that are crawled. Administrators are advised to ensure their web site presents a useful navigation structure for crawlers to access and index. WebSphere Portal Search also provides several capabilities to organize information to optimize crawl, index and resulting search results: When defining a new content source to be crawled using one of the Portal Search service crawlers, several options are available for selection to guide the quality of the results created and updated to the content source index maintained by the Portal server. External Search Crawler Awareness: The Portal platform provides external search crawler ‘awareness’ in which the Portal Server will recognize a crawler by its web agent identifier. A default list is available already covering the most popular crawlers used in today’s marketplace. Once the web agent is interpreted, the Portal will then transform all URLs that are output on the pages as so-called normalized URLs, thus making them unique. Normalized Portal URLs refers to the information maintained within the URL which would be the reference items linked to the Portal page information and 8 IBM Customer Experience Suite and Enterprise Search Optimization the required the language identification. Rendering parameters and navigational state information will be excluded, as this information is not required for the purpose of crawling a Portal site. In addition - action URLs are nullified, thus not allowing crawlers to execute actions such as ‘delete document’ or ‘login’, etc. Figure 3: Define and control web site external search results using standards-based Sitemap protocol support Portal pages and metadata. The Portal platform provides methods and APIs to for example set the title dynamically from within a Web Content rendering portlet. In addition APIs are available to set the ‘description’ metadata field as well. Search Sitemap Utility Portlet: Organizations can use the Search Sitemap Utility portlet to guide proper crawling of websites and reduce likelihood of artifacts known to adversely influence web site indexing and thereby search results, such as use of Javascript to generate content or URL’s, use of redirects, and ensure proper site navigation. Further, the IBM Search Sitemap Utility portlet enables users or portal administrators to export the sitemap in an XML file compliant with the Sitemaps protocol, supported by Google, Yahoo! and Microsoft Live Search Exporting the Sitemaps protocol2 compliant XML file: Once the sitemap portlet has been setup to contain all relevant pages that should be crawled by the respective robots, the information is ready to be exported to the file system as a Sitemaps compliant XML file. To do so, click on the icon at the top of the portlet titled Export Search Sitemap. A Browser ‘Open file ..’ dialog window will appear asking what action to take. Select the Save to Disk radio button and in the next dialog box select the appropriate target location for storing the Sitemap XML file. The final step is to allow this XML document to be made accessible to crawlers via a web server. Easiest is to simply copy the file to the respective folder managed by the web portal server. It is suggested the file be stored the document root folder of the web server. Optimizing Portal Site Search: To optimize content integrated within the Portal web site, Search seedlists are used. A Seedlist is “simply” an enumeration of content items and life-cycle events, for example: • • • Documents in a document library application Posts on a blog People in an-Employee Directory Use of standards-based search seedlist definitions to crawl content sources addresses the problem of proliferation of content and metadata by providing an well defined standards based way to index across varied content sources and associated definitions. Portal Search supports the use of seedlists to make crawling Web sites and their metadata more efficient and to provide content owners fine-grained control over how content and metadata are crawled. IBM Software Group 9 By default Portal Search is configured to use seedlist format 1.0 when indexing content for search collections. When used with Web content, seedlist format 1.0 provides advantages such as integration between search results and Web content pages, as well as support for Lotus Quickr and Lotus Connections content, and IBM Omnifind Enterprise Edition supported sources. Using the seedlist 1.0 format makes it possible to leverage the Web content page type to link and render content found in the search results on the corresponding Web content page. Web Site managers can also include custom metadata fields from a Web content item that will appear in the search seedlist, but not in the HTML source. Figure 4: Search seedlist standardize content definitions across source types and improve linked search results (display URL). The Search seedlist is a granular hierarchical construct, For example, a Seedlist can represent libraries and folders in a document library system. The search seedlist hierarchies are discoverable, and can contain (additional) metadata usually not part of the content itself (like security information) Timestamp and other mechanisms are defined to optimize crawling operations, and enable crawling only what’s necessary The Search seedlist 1.0 format can make access control information available in a way that enables content prefiltering possible. Pre-filtering provides the fastest filtering approach because it takes place in the search index level. An additional additional advantage of pre-filtering is that remote secured content sources can be searched from the portal. The filtering mode is defined as part of the search service configuration parameters. In addition, the Portal supports integration and presentation of ‘Suggested Links’ and ‘External Search Source’ content: Figure 5: The Search Center presents web platform and external source search results When crawling content defined to the Search Seedlist crawler, the Search Seedlist service abstracts content and views defined as two URLs; this differentiates between the piece of content itself, and the page(s) it’s accessible from. The Search Seedlist crawler gets the content “essentials” (crawl URL), the user gets to see the content in the right context of the portal 10 IBM Customer Experience Suite and Enterprise Search Optimization The platform’s Suggested Links portlet enables administrators to customize the display of search results to show users preferred or recommended results and associated links. The Suggested Links portlet displays predefined search results and links to users separately from the regular result set. Keywords can be added to indexed documents to control which results appear in the suggested results list. In addition, the External Search Results portlet can be configured to retrieve and display search results from thirdparty search engines such as Yahoo or Google. Using these services, the portal Portal provides the means to allow for optimized crawl and results display of a portal site (public pages) by external search engines, and the tools to allow for adequate linking of portal pages from an external site to support PageRank evaluations. The platform provides a Search REST service is to enable you to build collaborative solutions with as little effort as possible. The service is designed around open standards and Web 2.0 technologies, allowing you to build applications with a basic understanding of existing Web technologies, such as HTTP and XML. Specifically, REST-style URLs are used to search the server content. The service is based on the Atom Syndication Format as described in RFC 4287 and uses OpenSearch response elements to extend the syndication formats of Atom with the extra metadata needed to return search results, when search customization or extensions may be desirable. For more information about the WebSphere Portal Search REST service, please see these articles on IBM DeveloperWorks: Making content searchable anywhere using IBM WebSphere Portal’s publishing Seedlist Framework http:// www.ibm.com/developerworks/websphere/zones/portal/ proddoc/dw-w-seedlist/index.html, also Integrating IBM Lotus Sametime with the IBM Lotus Quickr Search REST service http://www.ibm.com/developerworks/lotus/ library/quickr-sametime/ Delivering Exceptional Web Experiences with Optimized Search Results IBM’s “W3” Employee Intranet IBM needed a powerful and flexible search solution to improve employee productivity and satisfaction. At the same time, the company wanted to reduce w3 On Demand Workplace Search management costs. IBM replaced its existing intranet project with IBM OmniFind Enterprise Edition software to power the w3 On Demand Workplace Search. OmniFind Enterprise Edition software provides a stable production environment for discretionary, high-quality enterprisewide data searches. With the ability to crawl a wide variety of data - including HTML, XML, DB2 and native Lotus Notes - the software can search through more content and retrieve better results for IBM’s 400,000 users. Further, IBM Software Group 11 its index captured more than ten million documents in just three months. OmniFind Enterprise Edition software runs on the most current editions of operating systems and the IBM WebSphere Application Server platform, which enables IBM Network Dispatcher software to recognize and disable failed search nodes. By taking advantage of the OmniFind Enterprise Edition software, IBM can provide its 400,000 w3 On Demand Workplace Search users with the stable, robust architecture and inclusive search content needed to optimize employee productivity. The solution can index more than 20 million documents for speedy and thorough searches. In addition, IBM can save an estimated US$500,000 each year because of the xSeries 460 server consolidation and the switch to the Application Hosting Environment infrastructure. With OmniFind Enterprise Edition software, IBM can realize additional savings across its enterprise as the flexible solution enables other Web sites within IBM, called “adopters,” to leverage the OmniFind software-based search engine to serve results on their own sites. The solution enables IBM to set up a separate, customized collection of search items for the adopters, and it allows them to manage their own collections. Additionally, because OmniFind Enterprise Edition software is compatible with the most current versions of operating systems and WebSphere middleware, the cluster can optimize itself in case of a failed node, thereby helping to maximize uptime. Summary The IBM Customer Experience Suite enterprise Search services provide an agile framework and standards-based optimization features sought by today’s organizations in order to ‘stand out from the crowd’ and deliver high-quality, high-value information, building increased web user loyalty, satisfaction and efficiency. For more information To learn more about the Customer Experience Suite and IBM Project Northstar, please contact your IBM sales representative or IBM Business Partner, or visit the following website: ibm.com/northstar Search Engine Marketing, Inc. Driving Search Traffic to Your Company’s Web Site, Mike Moran, Bill Hunt, IBM Press http://www.amazon.de/ Search-Marketing-Driving-Traffic-Companys/ dp/0131852922/ref=sr_1_1?ie=UTF8&s=books-intlde&qid=1202128301&sr=1-1 IBM developerWorks articles – Basics on SEO – Part 1-4 http://www.ibm.com/developerworks/search/searchResults.jsp ?searchType=1&pageLang=&displaySearchScope=dW&search Site=dW&lastUserQuery1=search+engine+optimization&last UserQuery2=&lastUserQuery3=&lastUserQuery4=&query=se arch+engine+optimization+basics&searchScope=dW&Go. x=0&Go.y=0 The New Voice of the CIO – Insights from the Global Chief Information Officer Study” http://www-935.ibm.com/services/ us/cio/ciostudy/executive-views.html 1 Sitemaps.org - [In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.] 2 © Copyright IBM Corporation 2010 IBM Global Services Route 100 Somers, NY 10589 U.S.A. Produced in the United States of America September 2010 All Rights Reserved IBM, the IBM logo and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Webat “Copyright and trademark information” at ibm.com/legal/copytrade.shtml Other company, product and service names may be trademarks or service marks of others. References in this publication to IBM products and services do not imply that IBM intends to make them available in all countries in which IBM operates. Please Recycle BCE-01565-USEN-00