IBM Customer Experience Suite and Enterprise Search Optimization

advertisement
IBM Software Group
Thought Leadership Whitepaper
IBM Customer Experience
Suite and Enterprise Search
Optimization
2
IBM Customer Experience Suite and Enterprise Search Optimization
Introduction
Delivering an effective, dynamic web presence is increasingly
important to organizations seeking to showcase and distinguish
their brand versus their competitors. Acknowledging the
continued proliferation of internet-based services and data,
business and IT leaders prioritize capabilities that drive
differentiated web experiences that attract and retain web end
users, improve brand loyalty, and lower operational costs.
Further, our open, socially connected Web 2.0 world
constantly delivers new information from data and community
based sources. In order to be useful, such information
frequently requires interpretive analysis, classification or other
management to be identified as relevant to current business
situations.
Business leaders have long recognized the value of accurately
assessing and utilizing information sources inside and outside
the organization. Numerous studies show that organizations
can realize productivity gains and reduced costs by
implementing effective search strategies. In presenting high
quality, relevant search results, end users participate in an
exceptional web experience that delivers precisely the
information, capabilities and resources the user finds valuable
and useful. In addition to increased web site audiences, and
productivity gains, organizations make better use of their
knowledge assets (community and data based), improve
customer service operations and strengthen customer
relationships.
According to the recent report “The New Voice of the CIO
– Insights from the Global Chief Information Officer Study”1
the ability to extract the highest value from data to enhance the
customer experience was ranked among the top ten
innovations that Insightful Visionary CIOs aim to implement.
Leading the Way in Exceptional Web
Experiences
Introducing IBM Project Northstar
To help organizations provide exceptional web experiences,
IBM has introduced Project Northstar – IBM’s vision and
multi-year roadmap for how organizations can create
differentiated, exceptional web experiences. IBM Project
Northstar provides a new way forward, bringing together the
right combination of capabilities needed to deliver compelling
online experiences, including web content management, an
enterprise portal and mashup presentation framework, built-in
social and real-time communication features, search,
personalization, marketing tools, comprehensive integration
capabilities, mobile device support, analytics, commerce, and
rich media management.
IBM Software Group 3
Together, these capabilities help organizations create
differentiated, exceptional web experiences that attract and
retain the best customers, improve brand loyalty, increase
customer satisfaction, and lower operational costs.
•
For the next several years, IBM Project Northstar will serve as
the guiding light for IBM’s investments, including both
in-house advancements, partnerships, and even acquisitions.
But while the roadmap is rich and exciting, there are many
IBM customers that use IBM’s suite of web experience
offerings to deliver killer online experiences today!
Delivering Exceptional Web Experiences
with Optimized Search Results
IBM Customer Experience Suite – the Centerpiece Offering of Project Northstar
In order to help customers meet their exceptional web
experience goals in the most efficient and flexible manner
possible, IBM has recently released a new offering that is
designed to help organizations create highly engaging,
personalized, and differentiated web experiences. This new
offering is called IBM® Customer Experience Suite, and it is
the centerpiece offering in the IBM Project Northstar vision.
•
The Basics
Applying well orchestrated search service definitions, indexing
and presentation practices can ‘optimize’ web site information
when retrieved by external search engines, and enhance the
visibility of a web site or a web page in search engines results.
These efforts are often referred to as ‘organic’ search, resulting
from editorial efforts. Organic search is distinguished from
‘paid search’,or Search engine marketing, (SEM) that seeks to
increase web site presentation in search results through paid
placements.
Typically two steps are involved to attract web user traffic,
retain and win new end users and customers:
•
With the IBM Customer Experience Suite, organizations can:
•
•
•
•
Create highly personalized customer interactions by analyzing
and then adapting to the preferences, behaviors, location,
products owned, device, and sentiments of each visitor.
Support conversations with and between customers through
online communities.
Empower business owners to manage the creation and
delivery of content, rich media, and campaigns.
Deliver rich and engaging experiences without sacrificing
flexibility, scalability, or security
Compose seamless Web experiences by connecting into the
necessary back-end applications, commerce solutions, social
media sites, and cloud-based services.
Offer consistent experiences across multiple online channels.
•
Use Internet Search to attract web users to your website
Use Site Search to once web users are at your web site to
deliver relevant, contextual results from initial and subsequent
search queries.
When using internet search, research has shown that in
external search engine results, visitors view their results for an
average of 6.3 seconds before clicking on a link. The ability to
optimize web content in order to be listed in the first
three externally presented results, or top two ads, can
directly impact organization’s ability to present their
organizational brand and attract and retain new web site
audiences.
4
IBM Customer Experience Suite and Enterprise Search Optimization
Page ranking as calculated by external search engines generally
follows these analyses:
•
•
•
•
•
PR(A) is PageRank of document A
PR(Tn) are PageRank of document Tni, which includes a link
to document A
N is the count of qualifying documents
C(Ti) total count of links on page Ti and
d is a confidence value, where 0 ≤ d ≤ 1 (The confidence value
‚d‘ is defined as ‚0.85) For example, If this page had a
PageRank score of ‘2’, and also 100 links on it, then a
referenced (linked) page would only receive 2/100ths of its
PageRank score
The ability to successfully connect high quality information
delivery with direct relevance to search queries requires the
development of useful content and information that matches
search queries. As the impact of community evaluations
further influences decision making, search results should be
interesting enough for others external to the web site to
reference from their web pages.
Content attributes that can usefully influence web site page
results returned from search engine queries include:
In striving to create and promote high quality web content that
achieves ‘high rankings’ from search engine results, a few
misconceptions exist as to how to improve the source content
information for improved delivery. Techniques such as overuse
of content metadata tags (where multiple (unrelated) keywords
and descriptions may be associated with content or image
definitions )often do not consistently or usefully increase
relevance of content to search results.
This can also include search engine application of HypertextMatching Analysis which analyzes the full content of a page
and also analyzes the content of linked local pages.
Experience has shown that identifiers such as “title” and
“description” are most useful for to drive presentation of web
site content that is relevant to initial search results:
•
•
Page or document relevancy in which the search term appears
frequently in the title and content, calculated as follows:
term frequency times inverted document frequency
tf x idf
IBM Software Group 5
When searching the portal site, the search engine can fetch
and index all pages with portlets dynamically, presenting search
results to users in the context of their specific page and portlet
configurations. All search results adhere to the portal security
model, so users can only find content and documents they are
entitled to view via portal access controls. Security filtering is
highly effective by applying both pre-filtering as well as
post-filtering techniques. Optionally, users can define morespecific search criteria, called Search Scopes, to limit search
results to specific content locations and specific document
types.
PageRank calculates link popularity; how important other
Internet users think a specific page is. Search engines can apply
multiple rules to calculate this information and relative
weighting to help drive search results.
Using IBM Customer Experience Suite to Manage
and Optimize Search Results
The Customer Experience Suite Enterprise Portal platform
services includes built-in search capabilities that allow end
users to search the portal content and any linked sites. End
users can initiate a search from the out-of-box search form in
the portal theme banner or from the Search Center page.
When users enter a search term and click on search, the portal
takes them to the enhanced Search Center portlet, which
displays their results ranked by relevance across portal search
supported sources. From there, users can optionally refine
their search criteria and also extend them to other sources.,
including Content Management software, Lotus Quickr
document libraries, Lotus Connections, and IBM OmniFind®
Enterprise Edition software–supported search sources, as well
as content from the portal site and external web search sources.
IBM OmniFind Enterprise Edition software, included in the
offering, brings together enterprise Portal content, social
software solutions such as Lotus Connections, enterprise
content management, database and other enterprise
information into a single, relevant and security-rich search
interface.
Figure: 1 IBM OmniFind Enterprise Edition supports advance search and
text analytics options
6
IBM Customer Experience Suite and Enterprise Search Optimization
Built on an extensible, open architecture, OmniFind Enterprise
Edition software delivers rich enterprise search capabilities to
help bridge the gap between an information need and the
ability to take action.
Figure 2: Which web page gains the most relevancy in external search
queries?
OmniFind Enterprise Edition software is designed to integrate
with the platform Portal services to provide scalable and
security-rich enterprise searches directly from within the
familiar enterprise Portal interface. This integration provides
several advantages over using the embedded enterprise Portal
search engine, including the ability to scale to millions of
documents, reach to non–web enterprise content sources and
access the rich search functionality available in OmniFind
Enterprise Edition software: synonym expansion, quick links,
dynamic summaries, autocategorization of results and more.
Steps to Ensure proper crawling of your website
As the variety of content sources and associated definitions
grow, new standards-based search integration paradigms are
required for applications seeking to make their content/
information searchable. The Customer Experience Suite
WebSphere Portal framework supports application of
standards-based services to define search information in a
format that is commonly utilized by external search engines,
and to optimize the Portal’s web site content attributes to
ensure high quality, relevant results.
Approaches to Organize Web Site Search services
and Improve Page Ranking results:
Web Experience Solution managers can take several actions to
establish content and search indexing policies to increase the
quality and relevance of web site content as assessed and
presented by external search engine sources. These focused
approaches can arm organizations with greater potential to
drive web used traffic to their web sites, and continuously
extend relevant content and social services that can retain and
win new web user audiences to their brand.
These services are defined as Search Sitemaps, providing
external search service entry point enumeration, and Seedlists,
enumerating web site content attributes. These standards
based services are different but complimentary in serving to
maximize delivery of relevant services to external search
engines and web site user audiences.
The Portal platform, included search engine, out of box search
services and applications, and support for the standards based
Sitemap 0.90 format (www.sitemaps.org) provides
organizations with new options and management controls to
deliver high quality, relevant search results to internal and
IBM Software Group 7
external user audiences. These same services and guidance can
be applied to optimize web based content for search, retrieval
and high rankings when viewed in results obtained from
external search engines.
These options include
The Portal Search administration includes several search
crawlers, and an IBM-developed search engine to provide an
end-user search of the portal content and any linked sites.
Search crawlers are provided to optimize crawl and index of
search sources from internal or external Web sites, content
published by IBM Lotus Web Content Management, Lotus
Quickr document libraries, and IBM OmniFind Enterprise
Edition supported search.
•
When searching the portal site, the Web Portal search services
can retrieve and and index all pages with portlets dynamically,
presenting search results to users in context of their specific
page and portlet configurations. All search results adhere to the
portal security model so users only find content and documents
that they are entitled to view via portal access controls.
•
•
•
HTTP security and SSL definition options
URL for the content source in a field and Collect documents
linked from this
Default Character Encoding; set the preferred language of the
crawler user ID to match the language of the search collection
that it crawls. The index uses this language to analyze the
documents when indexing, if no other language is specified for
the document. This feature enhances the quality of search
results for users, as it allows them to use spelling variants,
including plurals and inflections, for the search keyword.
Collection status information shows index specific
information such as index time, use of summarizer,
categorization rules and stopword lists.
Recommendations to optimize crawling and indexing
of web site content:
To ensure proper crawling of web site content sources by
crawlers for optimal interpretation by external search engines,
administrators should minimize use of URL definition
re-directs, and restrict use of JavaScript to generate content or
URLs that are crawled. Administrators are advised to ensure
their web site presents a useful navigation structure for
crawlers to access and index.
WebSphere Portal Search also provides several capabilities to
organize information to optimize crawl, index and resulting
search results:
When defining a new content source to be crawled using one
of the Portal Search service crawlers, several options are
available for selection to guide the quality of the results created
and updated to the content source index maintained by the
Portal server.
External Search Crawler Awareness: The Portal platform
provides external search crawler ‘awareness’ in which the
Portal Server will recognize a crawler by its web agent
identifier. A default list is available already covering the most
popular crawlers used in today’s marketplace. Once the web
agent is interpreted, the Portal will then transform all URLs
that are output on the pages as so-called normalized URLs,
thus making them unique. Normalized Portal URLs refers to
the information maintained within the URL which would be
the reference items linked to the Portal page information and
8
IBM Customer Experience Suite and Enterprise Search Optimization
the required the language identification. Rendering parameters
and navigational state information will be excluded, as this
information is not required for the purpose of crawling a
Portal site. In addition - action URLs are nullified, thus not
allowing crawlers to execute actions such as ‘delete document’
or ‘login’, etc.
Figure 3: Define and control web site external search results using
standards-based Sitemap protocol support
Portal pages and metadata. The Portal platform provides
methods and APIs to for example set the title dynamically from
within a Web Content rendering portlet. In addition APIs are
available to set the ‘description’ metadata field as well.
Search Sitemap Utility Portlet: Organizations can use the
Search Sitemap Utility portlet to guide proper crawling of
websites and reduce likelihood of artifacts known to adversely
influence web site indexing and thereby search results, such as
use of Javascript to generate content or URL’s, use of redirects,
and ensure proper site navigation.
Further, the IBM Search Sitemap Utility portlet enables users
or portal administrators to export the sitemap in an XML file
compliant with the Sitemaps protocol, supported by Google,
Yahoo! and Microsoft Live Search
Exporting the Sitemaps protocol2 compliant XML file:
Once the sitemap portlet has been setup to contain all relevant
pages that should be crawled by the respective robots, the
information is ready to be exported to the file system as a
Sitemaps compliant XML file. To do so, click on the icon at the
top of the portlet titled Export Search Sitemap. A Browser
‘Open file ..’ dialog window will appear asking what action to
take. Select the Save to Disk radio button and in the next
dialog box select the appropriate target location for storing the
Sitemap XML file. The final step is to allow this XML
document to be made accessible to crawlers via a web server.
Easiest is to simply copy the file to the respective folder
managed by the web portal server. It is suggested the file be
stored the document root folder of the web server.
Optimizing Portal Site Search:
To optimize content integrated within the Portal web site,
Search seedlists are used. A Seedlist is “simply” an enumeration
of content items and life-cycle events, for example:
•
•
•
Documents in a document library application
Posts on a blog
People in an-Employee Directory
Use of standards-based search seedlist definitions to crawl
content sources addresses the problem of proliferation of
content and metadata by providing an well defined standards
based way to index across varied content sources and associated
definitions.
Portal Search supports the use of seedlists to make crawling
Web sites and their metadata more efficient and to provide
content owners fine-grained control over how content and
metadata are crawled.
IBM Software Group 9
By default Portal Search is configured to use seedlist format
1.0 when indexing content for search collections. When used
with Web content, seedlist format 1.0 provides advantages such
as integration between search results and Web content pages,
as well as support for Lotus Quickr and Lotus Connections
content, and IBM Omnifind Enterprise Edition supported
sources.
Using the seedlist 1.0 format makes it possible to leverage the
Web content page type to link and render content found in the
search results on the corresponding Web content page. Web
Site managers can also include custom metadata fields from a
Web content item that will appear in the search seedlist, but
not in the HTML source.
Figure 4: Search seedlist standardize content definitions across source
types and improve linked search results
(display URL). The Search seedlist is a granular hierarchical
construct, For example, a Seedlist can represent libraries and
folders in a document library system. The search seedlist
hierarchies are discoverable, and can contain (additional)
metadata usually not part of the content itself (like security
information)
Timestamp and other mechanisms are defined to optimize
crawling operations, and enable crawling only what’s necessary
The Search seedlist 1.0 format can make access control
information available in a way that enables content prefiltering possible. Pre-filtering provides the fastest filtering
approach because it takes place in the search index level. An
additional additional advantage of pre-filtering is that remote
secured content sources can be searched from the portal. The
filtering mode is defined as part of the search service
configuration parameters.
In addition, the Portal supports integration and presentation of
‘Suggested Links’ and ‘External Search Source’ content:
Figure 5: The Search Center presents web platform and external source
search results
When crawling content defined to the Search Seedlist crawler,
the Search Seedlist service abstracts content and views defined
as two URLs; this differentiates between the piece of content
itself, and the page(s) it’s accessible from. The Search Seedlist
crawler gets the content “essentials” (crawl URL), the user gets
to see the content in the right context of the portal
10
IBM Customer Experience Suite and Enterprise Search Optimization
The platform’s Suggested Links portlet enables
administrators to customize the display of search results to
show users preferred or recommended results and associated
links. The Suggested Links portlet displays predefined search
results and links to users separately from the regular result set.
Keywords can be added to indexed documents to control
which results appear in the suggested results list.
In addition, the External Search Results portlet can be
configured to retrieve and display search results from thirdparty search engines such as Yahoo or Google.
Using these services, the portal Portal provides the means to
allow for optimized crawl and results display of a portal site
(public pages) by external search engines, and the tools to allow
for adequate linking of portal pages from an external site to
support PageRank evaluations.
The platform provides a Search REST service is to enable you
to build collaborative solutions with as little effort as possible.
The service is designed around open standards and Web 2.0
technologies, allowing you to build applications with a basic
understanding of existing Web technologies, such as HTTP
and XML. Specifically, REST-style URLs are used to search
the server content. The service is based on the Atom
Syndication Format as described in RFC 4287 and uses
OpenSearch response elements to extend the syndication
formats of Atom with the extra metadata needed to return
search results, when search customization or extensions may be
desirable.
For more information about the WebSphere Portal Search
REST service, please see these articles on IBM
DeveloperWorks:
Making content searchable anywhere using IBM
WebSphere Portal’s publishing Seedlist Framework http://
www.ibm.com/developerworks/websphere/zones/portal/
proddoc/dw-w-seedlist/index.html, also Integrating IBM
Lotus Sametime with the IBM Lotus Quickr Search
REST service http://www.ibm.com/developerworks/lotus/
library/quickr-sametime/
Delivering Exceptional Web Experiences
with Optimized Search Results IBM’s “W3”
Employee Intranet
IBM needed a powerful and flexible search solution to improve
employee productivity and satisfaction. At the same time, the
company wanted to reduce w3 On Demand Workplace Search
management costs.
IBM replaced its existing intranet project with IBM OmniFind
Enterprise Edition software to power the w3 On Demand
Workplace Search.
OmniFind Enterprise Edition software provides a stable
production environment for discretionary, high-quality
enterprisewide data searches. With the ability to crawl a wide
variety of data - including HTML, XML, DB2 and native
Lotus Notes - the software can search through more content
and retrieve better results for IBM’s 400,000 users. Further,
IBM Software Group 11
its index captured more than ten million documents in just
three months. OmniFind Enterprise Edition software runs on
the most current editions of operating systems and the IBM
WebSphere Application Server platform, which enables IBM
Network Dispatcher software to recognize and disable failed
search nodes.
By taking advantage of the OmniFind Enterprise Edition
software, IBM can provide its 400,000 w3 On Demand
Workplace Search users with the stable, robust architecture
and inclusive search content needed to optimize employee
productivity. The solution can index more than 20 million
documents for speedy and thorough searches. In addition, IBM
can save an estimated US$500,000 each year because of the
xSeries 460 server consolidation and the switch to the
Application Hosting Environment infrastructure.
With OmniFind Enterprise Edition software, IBM can realize
additional savings across its enterprise as the flexible solution
enables other Web sites within IBM, called “adopters,” to
leverage the OmniFind software-based search engine to serve
results on their own sites. The solution enables IBM to set up a
separate, customized collection of search items for the
adopters, and it allows them to manage their own collections.
Additionally, because OmniFind Enterprise Edition software is
compatible with the most current versions of operating systems
and WebSphere middleware, the cluster can optimize itself in
case of a failed node, thereby helping to maximize uptime.
Summary
The IBM Customer Experience Suite enterprise Search
services provide an agile framework and standards-based
optimization features sought by today’s organizations in order
to ‘stand out from the crowd’ and deliver high-quality,
high-value information, building increased web user loyalty,
satisfaction and efficiency.
For more information
To learn more about the Customer Experience Suite and IBM
Project Northstar, please contact your IBM sales representative
or IBM Business Partner, or visit the following website:
ibm.com/northstar
Search Engine Marketing, Inc. Driving Search Traffic to
Your Company’s Web Site,
Mike Moran, Bill Hunt, IBM Press http://www.amazon.de/
Search-Marketing-Driving-Traffic-Companys/
dp/0131852922/ref=sr_1_1?ie=UTF8&s=books-intlde&qid=1202128301&sr=1-1
IBM developerWorks articles – Basics on SEO – Part 1-4
http://www.ibm.com/developerworks/search/searchResults.jsp
?searchType=1&pageLang=&displaySearchScope=dW&search
Site=dW&lastUserQuery1=search+engine+optimization&last
UserQuery2=&lastUserQuery3=&lastUserQuery4=&query=se
arch+engine+optimization+basics&searchScope=dW&Go.
x=0&Go.y=0
The New Voice of the CIO – Insights from the Global Chief
Information Officer Study” http://www-935.ibm.com/services/
us/cio/ciostudy/executive-views.html
1
Sitemaps.org - [In its simplest form, a Sitemap is an XML file
that lists URLs for a site along with additional metadata about
each URL (when it was last updated, how often it usually
changes, and how important it is, relative to other URLs in the
site) so that search engines can more intelligently crawl the
site.]
2
© Copyright IBM Corporation 2010
IBM Global Services
Route 100
Somers, NY 10589
U.S.A.
Produced in the United States of America
September 2010
All Rights Reserved
IBM, the IBM logo and ibm.com are trademarks or registered trademarks
of International Business Machines Corporation in the United States, other
countries, or both. If these and other IBM trademarked terms are marked
on their first occurrence in this information with a trademark symbol
(® or ™), these symbols indicate U.S. registered or common law
trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other
countries. A current list of IBM trademarks is available on the Webat
“Copyright and trademark information” at ibm.com/legal/copytrade.shtml
Other company, product and service names may be trademarks or service
marks of others.
References in this publication to IBM products and services do not
imply that IBM intends to make them available in all countries in which
IBM operates.
Please Recycle
BCE-01565-USEN-00
Download