ppt - BCS IRSG

advertisement
Search-Based Applications:
the Maturation of Search
Gregory Grefenstette
Exalead
Exalead S.A. © 2009
Maturation of Search
1960-1995
• Full text indexing
• Term weighting
• Stemming,
morphological
analysis
• Phrasal indexing
• Numerical fields
• Format
extractors
• Indexing schemes
1995-2005
2005-2010
Web search
Link analysis
Freshness
Spam detection
Facets,
Categories
• Processing
schemes
• Suggested
queries
• Multimedia
• Database
connectors
• XML, structured
data
• Reporting
• Search as a
Service
•
•
•
•
•
2
www.exalead.com/search 8 billion URLS, 2 billion images, 200 million videos
Wikipedia, cloud tags also Labs.exalead.com
3
Two ways to find information
DATABASES
VS
SEARCH ENGINES
4
Recent Past
DATABASES
• Structured
Data
• Transaction
• Precise
• All tuples
• SQL
• Slow
SEARCH ENGINES
• Text
• Similarity
• Ranking
• Intuitive
• Fast
• Partial
5
More Recent
SEARCH ENGINES
DATABASES
• Structured
Data
• Transactions
• Precise
• All tuples
• SQL
• Slow
• Text
• Similarity
• Ranking
•
•
•
•
Top-K
Column store
Map Reduce
Data Cube
•
•
•
•
Connectors
Facets
Map Reduce
Tables
• Intuitive
• Fast
• Partial
6
NOW
DATABASES
SEARCH BASED
APPLICATIONS
SEARCH ENGINES
Search based Application
An application which uses a search engine
component, but whose final purpose is not
searching for a document, but rather a
domain-oriented process result
– Examples:
• Custom response management
• Logistic tracking and tracing
• Contextual Advertising
• Database reporting after offloading
8
Current situation
Databases are the backbone of search in information systems
Data
Warehouse
BI
reports
Database
Business
processes
DataMart
Front-office
users
Search-enabled application
Optimized solution for information access
Data
Warehouse
BI
reports
Database
Business
processes
Search
Engine
Front-office
users
Drawbacks of
Using
Database
Search
As a
Component
Standard Architecture
Search Based Architecture
12
How does a Search Based
Application work?
14
Database converted to Business Items
Stored as structured documents
• Business items are concrete objects directly understandable
by end-users
– Product, Customer, Purchase order, Technical support call
• Each business item becomes a document
• Straightforward and simple format of the document index
allows performance and ease-of-use
• Search engine can offer rich and powerful query language that
allows to make queries as complex and advanced as SQL
despite the flat data model
• Search Engine must support
– typed fields, intra field scope search, category/facets
15
Database into structured documents
Product_ID
Product_Name
Manufacturer_Names
123
control switch
ACME Inc ; The Control Switch Company; Karl GmbH
124
red warning light
…
Scope Search
Product_ID
Product_Name
123
control switch
124
Product_ID
Manufacturer_ID
123
345
123
8574
123
4483
red warning light
All the manufacturers
of a product are aggregated
into a single flat document…
Manufacturer_ID
Manufacturer_NAME
345
ACME Inc.
8574
The Control Switch Company
4483
Karl GmbH
… but the manufacturer names
can still be searched as individual
records with scope search
"ACME GmbH"
does not match the document
here)
Product_ID
Product_Name
Manufacturer_Names
123
control switch
ACME Inc ; The Control Switch Company; Karl GmbH
124
red warning light
…
Hierarchical categories
Product_ID
123
Color
Red
Brand
ACME
Fragile
Y
Multiple kinds of attributes can be
mixed in a same category field. The
hierarchical tree structure of the
categories preserves the
differences between attribute
types
Nb of
wheels
Wheel
type
3
2
Product_ID
Country
123
France
123
UK
123
Germany
Multi-valued attributes can also be
represented by categories. A single
category field can be used to store
hundreds or thousands of attribute
columns.
Product_ID
Attributes
123
Color/Red ; Brand/ACME ; Fragile/Y ; Nb_wheels/3 ; Wheel_type/2;
Country/France ; Country/UK; Country/Germany
124
…
18
Multi-dimensional
facets
19
Multi-dimensional facets
• Search results facets provide aggregate values computed onthe-fly with the search results list
– One single search query can return the equivalent of dozens of
“GROUP BY” SQL clauses
– Numerical values associated with facets (count, score, …) can be used
to perform complex computations on the results list
• Search performance is not affected by the size of the category
tree
– Thousands of attribute types can be represented by categories
– Facets are dynamically selected by the search results: the displayed
attributes are always consistent with the search query (e.g. “color” and
“engine type” when searching for a car, “screen size” and “CPU speed”
when searching for a laptop)
20
CASE STUDY
LOGISTICS TRACK & TRACE
21
Gefco overview
• A subsidiary of French car maker PSA (Peugeot, Citroën)
– Now does most of its business outside of PSA
• Logistics operator
– Carries cars from factories to dealers (road, rail)
– Carries freight (parcels ; originally spare parts)
– Supply chain and logistic platform design
• 3.5B€, 10 000 employees, 100 countries
The original pain
• Classical multi-criteria search over Oracle, 2 million rows
• Poor performance despite 2 years of optimization
– Minute response times
– Ask users to do simple queries and preferably at some given hours
From forms to a search box
24
25
New application
With operational reporting
Partner
French Post Office
Context
Stakes
Exalead
Choice
• Project part of the strategic plan of La Poste to
improve customer service
• Tracing of the mail
• Tracing of the resolution of incidents
• Management of high volumes : 60 million daily
records with a 14 day history
• Management of peaks of 1 000 updates per second
• Internal and external access to the information with
respect of confidentiality
• DataBase Offloading
• High scalability and management of large volumes
• Open API to provide high level applications
28
• Tracing of incidents
• Real-time system
• Used as an internal audit
tool for the mail
• Suggestion of addresses
for customers
• Search in file numbers,
addresses, names, etc.
Case Study: RightMove
31
Rightmove: Reduce Costs and Improve
Performance through Database
Stats
•
•
•
•
2 million real estate ads,
29 million monthly visitors
Peak throughput 400 queries per second (QPS)
99.99% availability rate.
• Replaced 30 Oracle CPUs with 9 search CPUs
• Reduced cost of search per 100 queries from £0.06 to £0.01.
• Rapid Time to Market and Development Independence
Gains
• new platform to market in 3 months,
• data connections handled by a built-in ODBC Connector
• application customization via open, standards-based APIs
• IT staff achieved independence to modify or expand functionality
• Easy scaling by adding inexpensive commodity hardware
Exalead
Choice
• Improve the End User Experience, with Simpler, More Robust Search
and More Timely Data
• Rightmove’s new SBA provides search and navigation features, more
intuitive and more powerful, automatically incorporates data facets
• Data refresh rate of less than 2 minutes.
32
Advantages of Search Based
Applications
33
35
Conclusions
• Search engines mature
– Structured data, high volume, high speed
• Search based Applications offer
– Usage: Search interface familiar to user
– Performance: Search engine geared to
search, eases load on database platform
– Agility: Original database design untouched,
reconfiguring output lightweight
36
Download